[jira] [Commented] (HIVE-5093) Use a combiner for LIMIT with GROUP BY and ORDER BY operators

2013-08-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743881#comment-13743881
 ] 

Edward Capriolo commented on HIVE-5093:
---

I am thinking we should not do this. Hive uses map side aggregation as an 
alternative to combiners. Also I feel like designing and maintaining tons of 
code around limit optimizations is a waste. Who will benefit from this and how 
often? If you have a problem like this better not to use hive.

> Use a combiner for LIMIT with GROUP BY and ORDER BY operators
> -
>
> Key: HIVE-5093
> URL: https://issues.apache.org/jira/browse/HIVE-5093
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-5093-WIP-01.patch
>
>
> Operator trees of the following structure can have a memory friendly combiner 
> put in place after the sort-phase 
> "GBY-LIM" and "OBY-LIM"
> This will cut down on I/O when spilling to disk and particularly during the 
> merge phase of the reducer.
> There are two possible combiners - LimitNKeysCombiner and 
> LimitNValuesCombiner.
> The first one would be ideal for the GROUP-BY case, while the latter would 
> more useful for the ORDER-BY case.
> The combiners are still relevant even if there are 1:1 forward operators on 
> the reducer side and for small data items, the MR base layer does not run the 
> combiners at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4953) Regression: Hive does not build offline anymore

2013-08-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743877#comment-13743877
 ] 

Edward Capriolo commented on HIVE-4953:
---

Its almost not worth worrying about. I am moving pretty quickly with maven 
based build. I can not suffer through ant's slow builds any longer.

> Regression: Hive does not build offline anymore
> ---
>
> Key: HIVE-4953
> URL: https://issues.apache.org/jira/browse/HIVE-4953
> Project: Hive
>  Issue Type: Bug
>Reporter: Edward Capriolo
>
> BUILD FAILED
> /home/edward/Documents/java/hive-trunk/build.xml:233: 
> java.net.UnknownHostException: repo2.maven.org
> Both ant -Doffline=true and eclipse no longer can build offline

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5071) Address thread safety issues with HiveHistoryUtil

2013-08-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13743852#comment-13743852
 ] 

Edward Capriolo commented on HIVE-5071:
---

+1

> Address thread safety issues with HiveHistoryUtil
> -
>
> Key: HIVE-5071
> URL: https://issues.apache.org/jira/browse/HIVE-5071
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Thiruvel Thirumoolan
>Assignee: Teddy Choi
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-5071.1.patch.txt
>
>
> HiveHistoryUtil.parseLine() is not thread safe, it could be used by multiple 
> clients of HWA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Discuss] project chop up

2013-08-16 Thread Edward Capriolo
For those interested in pitching in.
https://github.com/edwardcapriolo/hive



On Fri, Aug 16, 2013 at 11:58 AM, Edward Capriolo wrote:

> Summary from hive-irc channel. Minor edits for spell check/grammar.
>
> The last 10 lines are a summary of the key points.
>
> [10:59:17]  noland: et all. Do you want to talk about hive in
> maven?
> [11:01:06] smonchi [~
> ro...@host34-189-dynamic.23-79-r.retail.telecomitalia.it] has quit IRC:
> Quit: ... 'cause there is no patch for human stupidity ...
> [11:10:04]  ecapriolo: yeah that sounds good to me!
> [11:10:22]  I saw you created the jira but haven't had time to look
> [11:10:32]  So I found a few things
> [11:10:49]  In common there is one or two testats that actually
> fork a process :)
> [11:10:56]  and use build.test.resources
> [11:11:12]  Some serde, uses some methods from ql in testing
> [11:11:27]  and shims really needs a separate hadoop test shim
> [11:11:32]  But that is all simple stuff
> [11:11:47]  The biggest problem is I do not know how to solve
> shims with maven
> [11:11:50]  do you have any ideas
> [11:11:52]  ?
> [11:13:00]  That one is going to be a challenge. It might be that
> in that section we have to drop down to ant
> [11:14:44]  Is it a requirement that we build both the .20 and .23
> shims for a "package" as we do today?
> [11:16:46]  I was thinking we can do it like a JDBC driver
> [11:16:59]  Se separate out the interface of shims
> [11:17:22]  And then at runtime we drop in a driver implementing
> [11:17:34] Wertax [~wer...@wolfkamp.xs4all.nl] has quit IRC: Remote host
> closed the connection
> [11:17:36]  That or we could use maven's profile system
> [11:18:09]  It seems that everything else can actually link
> against hadoop-0.20.2 as a provided dependency
> [11:18:37]  Yeah either would work. The driver method would
> probably require use to use ant build both the drivers?
> [11:18:44]  I am a fan of mvn profiles
> [11:19:05]  I was thinking we kinda separate the shim out into
> its own project,, not a module
> [11:19:10]  to achive that jdbc thing
> [11:19:27]  But I do not have a solution yet, I was looking to
> farm that out to someone smart...like you :)
> [11:19:33]  :)
> [11:19:47]  All I know is that we need a test shim because
> HadoopShim requires hadoop-test jars
> [11:20:10]  then the Mini stuff is only used in qtest anyway
> [11:20:48]  Is this something you want to help with? I was
> thinking of spinning up a github
> [11:20:50]  I think that the separate projects would work and
> perhaps nicely.
> [11:21:01]  Yeah I'd be interested in helping!
> [11:21:17]  But I am going on vacation starting next week for
> about 10 days
> [11:21:27]  Ah cool where are you going?
> [11:21:37]  Netherlands
> [11:21:42]  Biking around and such
> [11:23:52]  The one thing I was thinking about with regards to a
> branch is keeping history. We'll want to keep history for the files but
> AFAICT svn doesn't understand git mv.
> [11:24:16] Wertax [~wer...@wolfkamp.xs4all.nl] has joined #hive
> [11:31:19] jeromatron [~text...@host90-152-1-162.ipv4.regusnet.com] has
> quit IRC: Quit: My MacBook Pro has gone to sleep. ZZZzzz…
> [11:35:49]  noland: Right I do not play to suggest that we will
> do this in git
> [11:36:11]  I just see that we are going to have to hack stuff
> up and it is not the type of work that lends itself well to branches.
> [11:36:17]  Ahh ok
> [11:36:56]  Once we come up with a solution for the shims, and
> we have something that can reasonably build and test hive we can figure out
> how to apply that to a branch/trunk
> [11:36:58]  yeah so just do a POC on github and then implement on
> svn
> [11:37:05]  cool
> [11:37:29]  Along the way we can probably find things that we
> can do like that common test I found and other minor things
> [11:37:41]  sounds good
> [11:37:50]  Those we can likely just commit into the current
> trunk and I will file issues for those now
> [11:37:58]  cool
> [11:38:41]  But yea man. I just cant take the project as it is
> now
> [11:38:51]  in eclipse everytime I touch a file it rebuilds
> everything!
> [11:38:53]  Its like WTF
> [11:39:09]  Running one tests takes like 3 minutes
> [11:39:12]  its out of control
> [11:39:23]  LOL
> [11:39:29]  I agree 110%
> [11:39:32]  eclipse was not always like that I am not sure how
> the hell it happened
> [11:39:51]  The eclipse sep thing is so harmful
> [11:40:08]  dep thing that is
> [11:40:12]  I mean command line ant was always bad, but you
> used to be able to work in eclipse without having to rebuild everything
> every change/test
> [11:40:39]  Yeah the first thing I do these days is disable the
> ant builder
> [11:

Re: [Discuss] project chop up

2013-08-16 Thread Edward Capriolo
ke serde has all this thrift and avro stuff to
support custom formats
[11:42:30]  that is going into its own module
[11:42:43]  Going to rip out all the udfs accept between and or.
[11:43:50]  yeah it'd be nice to have those items in their own
modules so you can just build/test them when you want
[11:44:12]  hbase zookeeper locking
[11:44:31] Wertax [~wer...@wolfkamp.xs4all.nl] has quit IRC: Remote host
closed the connection
[11:44:44]  yeah for sure
[11:45:04]  I think the default for testing should be the in
process locking
[11:45:10]  Absolutely.
[11:45:40]  The other issue I want to tackle is hive-exec.jar
[11:45:54]  I want to jar-jar all the dependencies.
[11:46:46]  I run into to many conflicts with log4j and guava,
and commons-utils all those things need to be packaged into non-conflicting
packages
[11:46:58]  I haven't looked at how we build that yet but I agree
it'd be nice if we could jar-jar things like guava
[11:47:12]  so we can actually use them on server side
[11:47:16]  We dont really need quava. its probably just used
for one tiny thing
[11:47:43]  People are forgetting/do not understand that
hive-exec needs to get sent via the distributed cache
[11:47:57]  Wen we implement range joins they have a RangeMap that
we'll need.
[11:47:57]  so making it hulkingly fat just slows everything down
[11:48:11]  Do we ship it every time?
[11:48:25]  Cause we only have to ship it once per version of the
jar.
[11:48:42]  Recently you need the jackson jars on the auxlib as
well
[11:48:46]  hive will not work without it
[11:49:11]  People are just focused
feature-feature-feature...bigger...bigger bigger
[11:49:24] rubensayshi [drakie@nat/hyves.nl/x-uxywnflkbberbzhq] has quit
IRC: Quit: Leaving
[11:49:27]  yeah maven modules will definitely help us understand
who depends on what.
[11:49:28]  Next up kyro
[11:49:51]  I agree there is a lot of tech debt that needs paying
[11:50:30]  So those are all the high level things I want to
tackle
[11:50:59]  shims, general cleanup, break out non-essential
code, build a better non conflicting hive-exec jar
[11:51:10]  That sounds good. Once we hack on github for a while
it'd be nice to develop a brief high level plan on how to implement
[11:51:26]  Also get maven artifacts with correct depencency
scopes like provided etc
[11:51:40]  Right now pulling a hive jar from maven is like
pulling in the world
[11:52:08] bvanhoy [~Adium@64.124.34.34] has joined #hive


On Thu, Aug 15, 2013 at 11:14 PM, Edward Capriolo wrote:

> I have opened https://issues.apache.org/jira/browse/HIVE-5107 because I
> am growing tired of how long hive's build take.
>
> I have started playing with this by creating a simple multi-module project
> and copying stuff as I go. I have ported a minimal shims and common and I
> have all the tests in common almost running.
>
> Q. This is going to be ugly hacky work for a while, I was thinking it
> should be a branch but it is just going to be a mess of moves and copies
> etc. Not really something you can diff etc.
>
> Is anyone else interested in working on this as well. If so I think we can
> just setup a github and I can arrange for anyone to have access to it.
>
> Thanks,
> Edward
>
>
> On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo wrote:
>
>> "Some of the hard part was that some of the test classes are in the wrong
>> module that references classes in a later module."
>>
>> I think the modules will have to be able to reference each other in many
>> cases. Serde and QL are tightly coupled. QL is really too large and we
>> should find a way to cut that up.
>>
>> Part of this problem is the q.tests
>>
>> I think one way to handle this is to only allow unit tests inside the
>> module. I imagine running all the q tests would be done in a final module
>> hive-qtest. Or possibly two final modules
>> hive-qtest
>> hive-qtest-extra (tangential things like UDFS and input formats not core
>> to hive)
>>
>>
>> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley  wrote:
>>
>>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
>>> > > I'd like to propose we move towards Maven.
>>> >
>>> > Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
>>> etc.)
>>> > are maven based.
>>> >
>>>
>>> A big +1 from me too. I actually took a pass at it a couple of months
>>> ago.
>>> Some of the hard part was that some of the test classes are in the wrong
>>> module that references classes in a later module. Obviously that prevents
>>> any kind of modular build.
>>>
>>> As an additional plus to Maven is that Maven includes tools to correct
>>> the
>>> project and module dependencies.
>>>
>>> -- Owen
>>>
>>
>>
>


Re: Tez branch and tez based patches

2013-08-16 Thread Edward Capriolo
Commit then review, and self commit, destroys the good things we get from
our normal system.

http://anna.gs/blog/2013/08/12/code-review-ftw/

I am most worried about silo's and knowledge, lax testing policies, and
code quality. Which I now have seen on several occasions when something is
happening in a branch. (not calling out tez branch in particular)



On Fri, Aug 16, 2013 at 9:13 AM, Edward Capriolo wrote:

> I still am not sure we are doing this the ideal way. I am not a believer
> in a commit-then-review branch.
>
> This issue is an example.
>
> https://issues.apache.org/jira/browse/HIVE-5108
>
> I ask myself these questions:
> Does this currently work? Are their tests? If so which ones are broken?
> How does the patch fix them without tests to validate?
>
> Having a commit-then-review branch just seems subversive to our normal
> process, and a quick short cut to not have to be bothered by writing tests
> or involving anyone else.
>
>
>
> On Mon, Aug 5, 2013 at 1:54 PM, Alan Gates  wrote:
>
>>
>> On Jul 29, 2013, at 9:53 PM, Edward Capriolo wrote:
>>
>> > Also watched http://www.ustream.tv/recorded/36323173
>> >
>> > I definitely see the win in being able to stream inter-stage output.
>> >
>> > I see some cases where small intermediate results can be kept "In
>> memory".
>> > But I was somewhat under the impression that the map reduce spill
>> settings
>> > kept stuff in memory, isn't that what spill settings are?
>>
>> No.  MapReduce always writes shuffle data to local disk.  And
>> intermediate results between MR jobs are always persisted to HDFS, as
>> there's no other option.  When we talk of being able to keep intermediate
>> results in memory we mean getting rid of both of these disk writes/reads
>> when appropriate (meaning not always, there's a trade off between speed and
>> error handling to be made here, see below for more details).
>>
>> >
>> > There is a few bullet points that came up repeatedly that I do not
>> follow:
>> >
>> > Something was said to the effect of "Container reuse makes X faster".
>> > Hadoop has jvm reuse. Not following what the difference is here? Not
>> > everyone has a 10K node cluster.
>>
>> Sharing JVMs across users is inherently insecure (we can't guarantee what
>> code the first user left behind that may interfere with later users).  As I
>> understand container re-use in Tez it constrains the re-use to one user for
>> security reasons, but still avoids additional JVM start up costs.  But this
>> is a question that the Tez guys could answer better on the Tez lists (
>> d...@tez.incubator.apache.org)
>>
>> >
>> > "Joins in map reduce are hard" Really? I mean some of them are I guess,
>> but
>> > the typical join is very easy. Just shuffle by the join key. There was
>> not
>> > really enough low level details here saying why joins are better in tez.
>>
>> Join is not a natural operation in MapReduce.  MR gives you one input and
>> one output.  You end up having to bend the rules to do have multiple
>> inputs.  The idea here is that Tez can provide operators that naturally
>> work with joins and other operations that don't fit the one input/one
>> output model (eg unions, etc.).
>>
>> >
>> > "Chosing the number of maps and reduces is hard" Really? I do not find
>> it
>> > that hard, I think there are times when it's not perfect but I do not
>> find
>> > it hard. The talk did not really offer anything here technical on how
>> tez
>> > makes this better other then it could make it better.
>>
>> Perhaps manual would be a better term here than hard.  In our experience
>> it takes quite a bit of engineer trial and error to determine the optimal
>> numbers.  This may be ok if you're going to invest the time once and then
>> run the same query every day for 6 months.  But obviously it doesn't work
>> for the ad hoc case.  Even in the batch case it's not optimal because every
>> once and a while an engineer has to go back and re-optimize the query to
>> deal with changing data sizes, data characteristics, etc.  We want the
>> optimizer to handle this without human intervention.
>>
>> >
>> > The presentations mentioned streaming data, how do two nodes stream data
>> > between a tasks and how it it reliable? If the sender or receiver dies
>> does
>> > the entire process have to start again?
>>
>> If the sender or rec

Re: Tez branch and tez based patches

2013-08-16 Thread Edward Capriolo
I still am not sure we are doing this the ideal way. I am not a believer in
a commit-then-review branch.

This issue is an example.

https://issues.apache.org/jira/browse/HIVE-5108

I ask myself these questions:
Does this currently work? Are their tests? If so which ones are broken? How
does the patch fix them without tests to validate?

Having a commit-then-review branch just seems subversive to our normal
process, and a quick short cut to not have to be bothered by writing tests
or involving anyone else.



On Mon, Aug 5, 2013 at 1:54 PM, Alan Gates  wrote:

>
> On Jul 29, 2013, at 9:53 PM, Edward Capriolo wrote:
>
> > Also watched http://www.ustream.tv/recorded/36323173
> >
> > I definitely see the win in being able to stream inter-stage output.
> >
> > I see some cases where small intermediate results can be kept "In
> memory".
> > But I was somewhat under the impression that the map reduce spill
> settings
> > kept stuff in memory, isn't that what spill settings are?
>
> No.  MapReduce always writes shuffle data to local disk.  And intermediate
> results between MR jobs are always persisted to HDFS, as there's no other
> option.  When we talk of being able to keep intermediate results in memory
> we mean getting rid of both of these disk writes/reads when appropriate
> (meaning not always, there's a trade off between speed and error handling
> to be made here, see below for more details).
>
> >
> > There is a few bullet points that came up repeatedly that I do not
> follow:
> >
> > Something was said to the effect of "Container reuse makes X faster".
> > Hadoop has jvm reuse. Not following what the difference is here? Not
> > everyone has a 10K node cluster.
>
> Sharing JVMs across users is inherently insecure (we can't guarantee what
> code the first user left behind that may interfere with later users).  As I
> understand container re-use in Tez it constrains the re-use to one user for
> security reasons, but still avoids additional JVM start up costs.  But this
> is a question that the Tez guys could answer better on the Tez lists (
> d...@tez.incubator.apache.org)
>
> >
> > "Joins in map reduce are hard" Really? I mean some of them are I guess,
> but
> > the typical join is very easy. Just shuffle by the join key. There was
> not
> > really enough low level details here saying why joins are better in tez.
>
> Join is not a natural operation in MapReduce.  MR gives you one input and
> one output.  You end up having to bend the rules to do have multiple
> inputs.  The idea here is that Tez can provide operators that naturally
> work with joins and other operations that don't fit the one input/one
> output model (eg unions, etc.).
>
> >
> > "Chosing the number of maps and reduces is hard" Really? I do not find it
> > that hard, I think there are times when it's not perfect but I do not
> find
> > it hard. The talk did not really offer anything here technical on how tez
> > makes this better other then it could make it better.
>
> Perhaps manual would be a better term here than hard.  In our experience
> it takes quite a bit of engineer trial and error to determine the optimal
> numbers.  This may be ok if you're going to invest the time once and then
> run the same query every day for 6 months.  But obviously it doesn't work
> for the ad hoc case.  Even in the batch case it's not optimal because every
> once and a while an engineer has to go back and re-optimize the query to
> deal with changing data sizes, data characteristics, etc.  We want the
> optimizer to handle this without human intervention.
>
> >
> > The presentations mentioned streaming data, how do two nodes stream data
> > between a tasks and how it it reliable? If the sender or receiver dies
> does
> > the entire process have to start again?
>
> If the sender or receiver dies then the query has to be restarted from
> some previous point where data was persisted to disk.  The idea here is
> that speed vs error recovery trade offs should be made by the optimizer.
>  If the optimizer estimates that a query will complete in 5 seconds it can
> stream everything and if a node fails it just re-runs the whole query.  If
> it estimates that a particular phase of a query will run for an hour it can
> choose to persist the results to HDFS so that in the event of a failure
> downstream the long phase need not be re-run.  Again we want this to be
> done automatically by the system so the user doesn't need to control this
> level of detail.
>
> >
> > Again one of the talks implied there is a prototype out there that
> l

Re: [Discuss] project chop up

2013-08-15 Thread Edward Capriolo
I have opened https://issues.apache.org/jira/browse/HIVE-5107 because I am
growing tired of how long hive's build take.

I have started playing with this by creating a simple multi-module project
and copying stuff as I go. I have ported a minimal shims and common and I
have all the tests in common almost running.

Q. This is going to be ugly hacky work for a while, I was thinking it
should be a branch but it is just going to be a mess of moves and copies
etc. Not really something you can diff etc.

Is anyone else interested in working on this as well. If so I think we can
just setup a github and I can arrange for anyone to have access to it.

Thanks,
Edward


On Wed, Aug 7, 2013 at 5:04 PM, Edward Capriolo wrote:

> "Some of the hard part was that some of the test classes are in the wrong
> module that references classes in a later module."
>
> I think the modules will have to be able to reference each other in many
> cases. Serde and QL are tightly coupled. QL is really too large and we
> should find a way to cut that up.
>
> Part of this problem is the q.tests
>
> I think one way to handle this is to only allow unit tests inside the
> module. I imagine running all the q tests would be done in a final module
> hive-qtest. Or possibly two final modules
> hive-qtest
> hive-qtest-extra (tangential things like UDFS and input formats not core
> to hive)
>
>
> On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley  wrote:
>
>> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>> > > I'd like to propose we move towards Maven.
>> >
>> > Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
>> etc.)
>> > are maven based.
>> >
>>
>> A big +1 from me too. I actually took a pass at it a couple of months ago.
>> Some of the hard part was that some of the test classes are in the wrong
>> module that references classes in a later module. Obviously that prevents
>> any kind of modular build.
>>
>> As an additional plus to Maven is that Maven includes tools to correct the
>> project and module dependencies.
>>
>> -- Owen
>>
>
>


[jira] [Created] (HIVE-5107) Change hive's build to maven

2013-08-15 Thread Edward Capriolo (JIRA)
Edward Capriolo created HIVE-5107:
-

 Summary: Change hive's build to maven
 Key: HIVE-5107
 URL: https://issues.apache.org/jira/browse/HIVE-5107
 Project: Hive
  Issue Type: Task
Reporter: Edward Capriolo
Assignee: Edward Capriolo


I can not cope with hive's build infrastructure any more. I have started 
working on porting the project to maven. When I have some solid progess i will 
github the entire thing for review. Then we can talk about switching the 
project somehow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4963) Support in memory PTF partitions

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741670#comment-13741670
 ] 

Edward Capriolo commented on HIVE-4963:
---

Why cant we mark the fields as transient? Do they need to be serialized in 
other contexts? If they need to be serialized sometimes and not others maybe 
what we need is two different fields?

> Support in memory PTF partitions
> 
>
> Key: HIVE-4963
> URL: https://issues.apache.org/jira/browse/HIVE-4963
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: Harish Butani
> Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
> PTFRowContainer.patch
>
>
> PTF partitions apply the defensive mode of assuming that partitions will not 
> fit in memory. Because of this there is a significant deserialization 
> overhead when accessing elements. 
> Allow the user to specify that there is enough memory to hold partitions 
> through a 'hive.ptf.partition.fits.in.mem' option.  
> Savings depends on partition size and in case of windowing the number of 
> UDAFs and the window ranges. For eg for the following (admittedly extreme) 
> case the PTFOperator exec times went from 39 secs to 8 secs.
>  
> {noformat}
> select t, s, i, b, f, d,
> min(t) over(partition by 1 rows between unbounded preceding and current row), 
> min(s) over(partition by 1 rows between unbounded preceding and current row), 
> min(i) over(partition by 1 rows between unbounded preceding and current row), 
> min(b) over(partition by 1 rows between unbounded preceding and current row) 
> from over10k
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4545) HS2 should return describe table results without space padding

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741666#comment-13741666
 ] 

Edward Capriolo commented on HIVE-4545:
---

It's ok. It seems a bit clunky to have: 

   HIVE_HUMAN_FRIENDLY_FORMAT("hive.human.friendly.format", true),

Maybe we simply need -describe terse- or something.


> HS2 should return describe table results without space padding
> --
>
> Key: HIVE-4545
> URL: https://issues.apache.org/jira/browse/HIVE-4545
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-4545-1.patch, HIVE-4545.2.patch, HIVE-4545.3.patch
>
>
> HIVE-3140 changed behavior of 'DESCRIBE table;' to be like 'DESCRIBE 
> FORMATTED table;'. HIVE-3140 introduced changes to not print header in 
> 'DESCRIBE table;'. But jdbc/odbc calls still get fields padded with space for 
> the 'DESCRIBE table;' query.
> As the jdbc/odbc results are not for direct human consumption the space 
> padding should not be done for hive server2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5077) Provide an option to run local task in process

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741664#comment-13741664
 ] 

Edward Capriolo commented on HIVE-5077:
---

Basically I just want to understand what 'in process' means in this context.

> Provide an option to run local task in process
> --
>
> Key: HIVE-5077
> URL: https://issues.apache.org/jira/browse/HIVE-5077
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5077.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5077) Provide an option to run local task in process

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741662#comment-13741662
 ] 

Edward Capriolo commented on HIVE-5077:
---

I am -1 until I understand something.

It seems like this issue:
https://issues.apache.org/jira/browse/HIVE-5054

Is removing a feature similar to the one this ticket is adding. Can someone 
explain the difference between the thing we are removing and the thing we are 
adding? Also with no description I can not understand why we want this option.

> Provide an option to run local task in process
> --
>
> Key: HIVE-5077
> URL: https://issues.apache.org/jira/browse/HIVE-5077
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5077.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741167#comment-13741167
 ] 

Edward Capriolo commented on HIVE-2608:
---

[~navis] It seems like 
/hive/trunk/ql/src/test/results/clientnegative/udtf_not_supported2.q.out
is failing often.

> Do not require AS a,b,c part in LATERAL VIEW
> 
>
> Key: HIVE-2608
> URL: https://issues.apache.org/jira/browse/HIVE-2608
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, UDF
>Reporter: Igor Kabiljo
>Assignee: Navis
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, 
> HIVE-2608.D4317.6.patch, HIVE-2608.D4317.7.patch, HIVE-2608.D4317.8.patch
>
>
> Currently, it is required to state column names when LATERAL VIEW is used.
> That shouldn't be necessary, since UDTF returns struct which contains column 
> names - and they should be used by default.
> For example, it would be great if this was possible:
> SELECT t.*, t.key1 + t.key4
> FROM some_table
> LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1511) Hive plan serialization is slow

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741101#comment-13741101
 ] 

Edward Capriolo commented on HIVE-1511:
---

Here is the little funny corner case of this ticket. By adding Kryo to the 
distributed cache needed to launch jobs we ARE slowing down queries. The better 
serialization helps more for the very large queries, but for the standard case 
we may be adding time. 

> Hive plan serialization is slow
> ---
>
> Key: HIVE-1511
> URL: https://issues.apache.org/jira/browse/HIVE-1511
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.0
>Reporter: Ning Zhang
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-1511.patch, HIVE-1511-wip2.patch, 
> HIVE-1511-wip3.patch, HIVE-1511-wip.patch
>
>
> As reported by Edward Capriolo:
> For reference I did this as a test case
> SELECT * FROM src where
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> ...(100 more of these)
> No OOM but I gave up after the test case did not go anywhere for about
> 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5055) SessionState temp file gets created in history file directory

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740989#comment-13740989
 ] 

Edward Capriolo commented on HIVE-5055:
---

Committed. Thanks Hari Sankar Sivarama Subramaniyan, or can I just call you 
Hari ? :)

> SessionState temp file gets created in history file directory
> -
>
> Key: HIVE-5055
> URL: https://issues.apache.org/jira/browse/HIVE-5055
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Thejas M Nair
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 0.12.0
>
> Attachments: HIVE-5055.1.patch.txt, HIVE-5055.2.patch.txt
>
>
> SessionState.start creates a temp file for temp results, but this file is 
> created in hive.querylog.location, which supposed to be used only for hive 
> history log files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5055) SessionState temp file gets created in history file directory

2013-08-15 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5055:
--

   Resolution: Fixed
Fix Version/s: 0.12.0
 Release Note: Changes scratch direction, check your configuration if the 
scratch location is important to you.
   Status: Resolved  (was: Patch Available)

> SessionState temp file gets created in history file directory
> -
>
> Key: HIVE-5055
> URL: https://issues.apache.org/jira/browse/HIVE-5055
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Thejas M Nair
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 0.12.0
>
> Attachments: HIVE-5055.1.patch.txt, HIVE-5055.2.patch.txt
>
>
> SessionState.start creates a temp file for temp results, but this file is 
> created in hive.querylog.location, which supposed to be used only for hive 
> history log files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5055) SessionState temp file gets created in history file directory

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740984#comment-13740984
 ] 

Edward Capriolo commented on HIVE-5055:
---


org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported2
   <--maybe related to other change
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
 <-- flaky

Will commit soon.


> SessionState temp file gets created in history file directory
> -
>
> Key: HIVE-5055
> URL: https://issues.apache.org/jira/browse/HIVE-5055
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Thejas M Nair
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5055.1.patch.txt, HIVE-5055.2.patch.txt
>
>
> SessionState.start creates a temp file for temp results, but this file is 
> created in hive.querylog.location, which supposed to be used only for hive 
> history log files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4963) Support in memory PTF partitions

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740980#comment-13740980
 ] 

Edward Capriolo commented on HIVE-4963:
---

Can you please describe why these calls are needed

{noformat}
  PTFUtils.makeTransient(PTFDesc.class, "llInfo");
59  ​PTFUtils.makeTransient(PTFDesc.class, 
"cfg");
{noformat}

This looks like a code-smell. Is there any other way of handling this?


> Support in memory PTF partitions
> 
>
> Key: HIVE-4963
> URL: https://issues.apache.org/jira/browse/HIVE-4963
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: Harish Butani
> Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
> PTFRowContainer.patch
>
>
> PTF partitions apply the defensive mode of assuming that partitions will not 
> fit in memory. Because of this there is a significant deserialization 
> overhead when accessing elements. 
> Allow the user to specify that there is enough memory to hold partitions 
> through a 'hive.ptf.partition.fits.in.mem' option.  
> Savings depends on partition size and in case of windowing the number of 
> UDAFs and the window ranges. For eg for the following (admittedly extreme) 
> case the PTFOperator exec times went from 39 secs to 8 secs.
>  
> {noformat}
> select t, s, i, b, f, d,
> min(t) over(partition by 1 rows between unbounded preceding and current row), 
> min(s) over(partition by 1 rows between unbounded preceding and current row), 
> min(i) over(partition by 1 rows between unbounded preceding and current row), 
> min(b) over(partition by 1 rows between unbounded preceding and current row) 
> from over10k
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4999) Shim class HiveHarFileSystem does not have a hadoop2 counterpart

2013-08-15 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-4999.
---

   Resolution: Fixed
Fix Version/s: 0.12.0

Moveitted... Thanks Brock.

> Shim class HiveHarFileSystem does not have a hadoop2 counterpart
> 
>
> Key: HIVE-4999
> URL: https://issues.apache.org/jira/browse/HIVE-4999
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-4999.patch
>
>
> HiveHarFileSystem only exists in the 0.20 shim.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4999) Shim class HiveHarFileSystem does not have a hadoop2 counterpart

2013-08-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740971#comment-13740971
 ] 

Edward Capriolo commented on HIVE-4999:
---

OIC. +1 will commit.

> Shim class HiveHarFileSystem does not have a hadoop2 counterpart
> 
>
> Key: HIVE-4999
> URL: https://issues.apache.org/jira/browse/HIVE-4999
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Attachments: HIVE-4999.patch
>
>
> HiveHarFileSystem only exists in the 0.20 shim.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4999) Shim class HiveHarFileSystem does not have a hadoop2 counterpart

2013-08-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4999:
--

Status: Open  (was: Patch Available)

> Shim class HiveHarFileSystem does not have a hadoop2 counterpart
> 
>
> Key: HIVE-4999
> URL: https://issues.apache.org/jira/browse/HIVE-4999
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Attachments: HIVE-4999.patch
>
>
> HiveHarFileSystem only exists in the 0.20 shim.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4999) Shim class HiveHarFileSystem does not have a hadoop2 counterpart

2013-08-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740603#comment-13740603
 ] 

Edward Capriolo commented on HIVE-4999:
---

I think we should go around and change all the q tests that are using archiving 
but disabling other versions.

[edward@jackintosh hive-trunk]$ svn diff 
ql/src/test/queries/clientpositive/archive.q
Index: ql/src/test/queries/clientpositive/archive.q
===
--- ql/src/test/queries/clientpositive/archive.q(revision 1514126)
+++ ql/src/test/queries/clientpositive/archive.q(working copy)
@@ -1,8 +1,6 @@
 set hive.archive.enabled = true;
 set hive.enforce.bucketing = true;
 
--- INCLUDE_HADOOP_MAJOR_VERSIONS(0.20)
-
 drop table tstsrc;
 drop table tstsrcpart;


> Shim class HiveHarFileSystem does not have a hadoop2 counterpart
> 
>
> Key: HIVE-4999
> URL: https://issues.apache.org/jira/browse/HIVE-4999
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Attachments: HIVE-4999.patch
>
>
> HiveHarFileSystem only exists in the 0.20 shim.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5055) SessionState temp file gets created in history file directory

2013-08-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740563#comment-13740563
 ] 

Edward Capriolo commented on HIVE-5055:
---

+1 Pending tests.

> SessionState temp file gets created in history file directory
> -
>
> Key: HIVE-5055
> URL: https://issues.apache.org/jira/browse/HIVE-5055
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Thejas M Nair
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5055.1.patch.txt, HIVE-5055.2.patch.txt
>
>
> SessionState.start creates a temp file for temp results, but this file is 
> created in hive.querylog.location, which supposed to be used only for hive 
> history log files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-08-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-2608:
--

Fix Version/s: 0.12.0

> Do not require AS a,b,c part in LATERAL VIEW
> 
>
> Key: HIVE-2608
> URL: https://issues.apache.org/jira/browse/HIVE-2608
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, UDF
>Reporter: Igor Kabiljo
>Assignee: Navis
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, 
> HIVE-2608.D4317.6.patch, HIVE-2608.D4317.7.patch, HIVE-2608.D4317.8.patch
>
>
> Currently, it is required to state column names when LATERAL VIEW is used.
> That shouldn't be necessary, since UDTF returns struct which contains column 
> names - and they should be used by default.
> For example, it would be great if this was possible:
> SELECT t.*, t.key1 + t.key4
> FROM some_table
> LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-08-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-2608:
--

  Resolution: Fixed
Release Note: Committed. Thanks Navis.
  Status: Resolved  (was: Patch Available)

> Do not require AS a,b,c part in LATERAL VIEW
> 
>
> Key: HIVE-2608
> URL: https://issues.apache.org/jira/browse/HIVE-2608
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, UDF
>Reporter: Igor Kabiljo
>Assignee: Navis
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, 
> HIVE-2608.D4317.6.patch, HIVE-2608.D4317.7.patch, HIVE-2608.D4317.8.patch
>
>
> Currently, it is required to state column names when LATERAL VIEW is used.
> That shouldn't be necessary, since UDTF returns struct which contains column 
> names - and they should be used by default.
> For example, it would be great if this was possible:
> SELECT t.*, t.key1 + t.key4
> FROM some_table
> LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-08-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740095#comment-13740095
 ] 

Edward Capriolo commented on HIVE-2608:
---

+1

> Do not require AS a,b,c part in LATERAL VIEW
> 
>
> Key: HIVE-2608
> URL: https://issues.apache.org/jira/browse/HIVE-2608
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, UDF
>Reporter: Igor Kabiljo
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, 
> HIVE-2608.D4317.6.patch, HIVE-2608.D4317.7.patch, HIVE-2608.D4317.8.patch
>
>
> Currently, it is required to state column names when LATERAL VIEW is used.
> That shouldn't be necessary, since UDTF returns struct which contains column 
> names - and they should be used by default.
> For example, it would be great if this was possible:
> SELECT t.*, t.key1 + t.key4
> FROM some_table
> LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5087) Rename npath UDF

2013-08-13 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5087:
--

Attachment: HIVE-5087.patch.txt

Not complete, stupid q tests.

> Rename npath UDF
> 
>
> Key: HIVE-5087
> URL: https://issues.apache.org/jira/browse/HIVE-5087
> Project: Hive
>  Issue Type: Bug
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: HIVE-5087.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4423) Improve RCFile::sync(long) 10x

2013-08-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739224#comment-13739224
 ] 

Edward Capriolo commented on HIVE-4423:
---

Maybe we can add a test when we fix.

> Improve RCFile::sync(long) 10x
> --
>
> Key: HIVE-4423
> URL: https://issues.apache.org/jira/browse/HIVE-4423
> Project: Hive
>  Issue Type: Improvement
> Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
>  Labels: optimization
> Fix For: 0.12.0
>
> Attachments: HIVE-4423.patch
>
>
> RCFile::sync(long) takes approx ~1 second everytime it gets called because of 
> the inner loops in the function.
> From what was observed with HDFS-4710, single byte reads are an order of 
> magnitude slower than larger 512 byte buffer reads. 
> Even when disk I/O is buffered to this size, there is overhead due to the 
> synchronized read() methods in BlockReaderLocal & RemoteBlockReader classes.
> Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) 
> call will speed this function >10x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4003) NullPointerException in ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java

2013-08-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739212#comment-13739212
 ] 

Edward Capriolo commented on HIVE-4003:
---

+1 pending tests.

> NullPointerException in 
> ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
> -
>
> Key: HIVE-4003
> URL: https://issues.apache.org/jira/browse/HIVE-4003
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Thomas Adam
>Assignee: Mark Grover
> Attachments: HIVE-4003.patch, HIVE-4003.patch
>
>
> Utilities.java seems to be throwing a NPE.
> Change contributed by Thomas Adam.
> Reference: 
> https://github.com/tecbot/hive/commit/1e29d88837e4101a76e870a716aadb729437355b#commitcomment-2588350

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Proposing a 0.11.1

2013-08-13 Thread Edward Capriolo
I am fealing more like we should release a 12.0 rather then backport things
into 11.X.




On Wed, Aug 14, 2013 at 12:08 AM, Navis류승우  wrote:

> If this is only for addressing npath problem, we got three months for that.
>
> Would it be enough time for releasing 0.12.0?
>
> ps. IMHO, n-path seemed too generic name to be patented. I hate Teradata.
>
> 2013/8/14 Edward Capriolo :
> > Should we get the npath rename in? Do we have a jira for this? If not I
> > will take it.
> >
> >
> > On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner  >wrote:
> >
> >> It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953 has been
> >> committed to trunk and it looks like 4789 is close.
> >>
> >> Thanks,
> >> Mark
> >>
> >> On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley 
> >> wrote:
> >>
> >> > All,
> >> >I'd like to create an 0.11.1 with some fixes in it. I plan to put
> >> > together a release candidate over the next week. I'm in the process of
> >> > putting together the list of bugs that I want to include, but I
> wanted to
> >> > solicit the jiras that others though would be important for an 0.11.1.
> >> >
> >> > Thanks,
> >> >Owen
> >> >
> >>
>


[jira] [Created] (HIVE-5087) Rename npath UDF

2013-08-13 Thread Edward Capriolo (JIRA)
Edward Capriolo created HIVE-5087:
-

 Summary: Rename npath UDF
 Key: HIVE-5087
 URL: https://issues.apache.org/jira/browse/HIVE-5087
 Project: Hive
  Issue Type: Bug
Reporter: Edward Capriolo
Assignee: Edward Capriolo




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Proposing a 0.11.1

2013-08-13 Thread Edward Capriolo
Should we get the npath rename in? Do we have a jira for this? If not I
will take it.


On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner wrote:

> It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953 has been
> committed to trunk and it looks like 4789 is close.
>
> Thanks,
> Mark
>
> On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley 
> wrote:
>
> > All,
> >I'd like to create an 0.11.1 with some fixes in it. I plan to put
> > together a release candidate over the next week. I'm in the process of
> > putting together the list of bugs that I want to include, but I wanted to
> > solicit the jiras that others though would be important for an 0.11.1.
> >
> > Thanks,
> >Owen
> >
>


[jira] [Commented] (HIVE-4470) HS2 should disable local query execution

2013-08-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739181#comment-13739181
 ] 

Edward Capriolo commented on HIVE-4470:
---

Have you seen HIVE-5054?

It seems like there is a property that we are about to remove that could help 
you disable local execution. Lets try to determine if the tickets are 
conflicting.

> HS2 should disable local query execution
> 
>
> Key: HIVE-4470
> URL: https://issues.apache.org/jira/browse/HIVE-4470
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>
> Hive can run queries in local mode (instead of using a cluster), if the size 
> is small. This happens when "hive.exec.mode.local.auto" is set to true.
> This would affect the stability of the hive server2 node, if you have heavy 
> query processing happening on it. Bugs in udfs triggered by a bad record can 
> potentially add very heavy load making the server inaccessible. 
> By default, HS2 should set these parameters to disallow local execution or 
> send and error message if user tries to set these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5054) Remove unused property submitviachild

2013-08-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739180#comment-13739180
 ] 

Edward Capriolo commented on HIVE-5054:
---

[~ashutoshc]

Have you seen HIVE-4470

Seems like this could be the solution for that?

> Remove unused property submitviachild
> -
>
> Key: HIVE-5054
> URL: https://issues.apache.org/jira/browse/HIVE-5054
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5054.patch, HIVE-5054.patch
>
>
> This property only exist in HiveConf and is always set to false. Lets get rid 
> of dead code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2849) Use the MapReduce output committer

2013-08-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739179#comment-13739179
 ] 

Edward Capriolo commented on HIVE-2849:
---

It sounds like this patch is going to be very shim heavy :)

> Use the MapReduce output committer
> --
>
> Key: HIVE-2849
> URL: https://issues.apache.org/jira/browse/HIVE-2849
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> Currently Hive implements its own output committers based on the size of the 
> files using regexes to parse out the task ids. The MapReduce output 
> committers are very stable and use a two phase commit from the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3173) implement getTypeInfo database metadata method

2013-08-13 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739177#comment-13739177
 ] 

Edward Capriolo commented on HIVE-3173:
---

+1. Lets let the tests run.

> implement getTypeInfo database metadata method 
> ---
>
> Key: HIVE-3173
> URL: https://issues.apache.org/jira/browse/HIVE-3173
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.8.1
>Reporter: N Campbell
> Attachments: Hive-3173.patch.txt
>
>
> The JDBC driver does not implement the database metadata method getTypeInfo. 
> Hence, an application cannot dynamically determine the available type 
> information and associated properties. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737319#comment-13737319
 ] 

Edward Capriolo commented on HIVE-2482:
---

I am ok with it as well, but temember everything you change breaks someones 
workflow. 

> Convenience UDFs for binary data type
> -
>
> Key: HIVE-2482
> URL: https://issues.apache.org/jira/browse/HIVE-2482
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ashutosh Chauhan
>Assignee: Mark Wagner
> Fix For: 0.12.0
>
> Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, 
> HIVE-2482.4.patch
>
>
> HIVE-2380 introduced binary data type in Hive. It will be good to have 
> following udfs to make it more useful:
> * UDF's to convert to/from hex string
> * UDF's to convert to/from string using a specific encoding
> * UDF's to convert to/from base64 string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HIVE-2482) Convenience UDFs for binary data type

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737319#comment-13737319
 ] 

Edward Capriolo edited comment on HIVE-2482 at 8/12/13 8:45 PM:


I am ok with it as well, but remember everything you change breaks someones 
workflow. 

  was (Author: appodictic):
I am ok with it as well, but temember everything you change breaks someones 
workflow. 
  
> Convenience UDFs for binary data type
> -
>
> Key: HIVE-2482
> URL: https://issues.apache.org/jira/browse/HIVE-2482
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ashutosh Chauhan
>Assignee: Mark Wagner
> Fix For: 0.12.0
>
> Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, 
> HIVE-2482.4.patch
>
>
> HIVE-2380 introduced binary data type in Hive. It will be good to have 
> following udfs to make it more useful:
> * UDF's to convert to/from hex string
> * UDF's to convert to/from string using a specific encoding
> * UDF's to convert to/from base64 string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4885) Alternative object serialization for execution plan in hive testing

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737307#comment-13737307
 ] 

Edward Capriolo commented on HIVE-4885:
---

+1 move forward.

> Alternative object serialization for execution plan in hive testing 
> 
>
> Key: HIVE-4885
> URL: https://issues.apache.org/jira/browse/HIVE-4885
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.10.0, 0.11.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.12.0
>
> Attachments: HIVE-4885.patch
>
>
> Currently there are a lot of test cases involving in comparing execution 
> plan, such as those in TestParse suite. XmlEncoder is used to serialize the 
> generated plan by hive, and store it in the file for file diff comparison. 
> However, XmlEncoder is tied with Java compiler, whose implementation may 
> change from version to version. Thus, upgrade the compiler can generate a lot 
> of fake test failures. The following is an example of diff generated when 
> running hive with JDK7:
> {code}
> Begin query: case_sensitivity.q
> diff -a 
> /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out
>  
> /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out
> diff -a -b 
> /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml
>  
> /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
> 3c3
> <  
> ---
> >   > class="org.apache.hadoop.hive.ql.exec.MapRedTask"> 
> 12c12
> <
> ---
> > 
> 14c14
> <   id="MoveTask0">
> ---
> >   > class="org.apache.hadoop.hive.ql.exec.MoveTask"> 
> 18c18
> <   id="MoveTask1">
> ---
> >   > class="org.apache.hadoop.hive.ql.exec.MoveTask"> 
> 22c22
> <   id="StatsTask0">
> ---
> >   > class="org.apache.hadoop.hive.ql.exec.StatsTask"> 
> 60c60
> <   id="MapRedTask1">
> ---
> >   > class="org.apache.hadoop.hive.ql.exec.MapRedTask"> 
> {code}
> As it can be seen, the only difference is the order of the attributes in the 
> serialized XML doc, yet it brings 50+ test failures in Hive.
> We need to have a better plan comparison, or object serialization to improve 
> the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737245#comment-13737245
 ] 

Edward Capriolo commented on HIVE-5003:
---

I understand what you are saying. I am ok with the package private idea and 
dependency injection, I generally prefer that to a heavy solution like mocking. 

I would not call this a blocker, but I think we need to design more with 
testing in mind. Lets talk it over elsewhere. 

> Localize hive exec jar for tez
> --
>
> Key: HIVE-5003
> URL: https://issues.apache.org/jira/browse/HIVE-5003
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Fix For: tez-branch
>
> Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
> HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt
>
>
> Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
> added to vertices and the dag itself as needed. For hive we need to localize 
> the hive-exec.jar.
> NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737225#comment-13737225
 ] 

Edward Capriolo commented on HIVE-5009:
---

Our build process is slow. Technically do not need clean 'every' time mostly 
you only need it when changing the hadoop version or updating one of the libs. 
However the build is still 'slow' regardless of running clean first. Its just 
something we have to deal with for a bit until we re factor everything.

> Fix minor optimization issues
> -
>
> Key: HIVE-5009
> URL: https://issues.apache.org/jira/browse/HIVE-5009
> Project: Hive
>  Issue Type: Improvement
>Reporter: Benjamin Jakobus
>Assignee: Benjamin Jakobus
>Priority: Minor
> Fix For: 0.12.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have found some minor optimization issues in the codebase, which I would 
> like to rectify and contribute. Specifically, these are:
> The optimizations that could be applied to Hive's code base are as follows:
> 1. Use StringBuffer when appending strings - In 184 instances, the 
> concatination operator (+=) was used when appending strings. This is 
> inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
> should be used. 12 instances of this optimization can be applied to the 
> GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
> uses the + operator inside a loop, so does the column projection utilities 
> class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
> Tests showed that using the StringBuilder when appending strings is 57\% 
> faster than using the + operator (using the StringBuffer took 122 
> milliseconds whilst the + operator took 284 milliseconds). The reason as to 
> why using the StringBuffer class is preferred over using the + operator, is 
> because
> String third = first + second;
> gets compiled to:
> StringBuilder builder = new StringBuilder( first );
> builder.append( second );
> third = builder.toString();
> Therefore, when building complex strings, that, for example involve loops, 
> require many instantiations (and as discussed below, creating new objects 
> inside loops is inefficient).
> 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
> is a more efficient at creating  creating lists from arrays than using loops 
> to manually iterate over the elements (using asList is computationally very 
> cheap, O(1), as it merely creates a wrapper object around the array; looping 
> through the list however has a complexity of O(n) since a new list is created 
> and every element in the array is added to this new list). As confirmed by 
> the experiment detailed in Appendix D, the Java compiler does not 
> automatically optimize and replace tight-loop copying with asList: the 
> loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
> instant. 
> Four instances of this optimization can be applied to Hive's codebase (two of 
> these should be applied to the Map-Join container - MapJoinRowContainer) - 
> lines 92 to 98:
>  for (obj = other.first(); obj != null; obj = other.next()) {
>   ArrayList ele = new ArrayList(obj.length);
>   for (int i = 0; i < obj.length; i++) {
> ele.add(obj[i]);
>   }
>   list.add((Row) ele);
> }
> 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
> could be avoided by simply using the provided static conversion methods. As 
> noted in the PMD documentation, "using these avoids the cost of creating 
> objects that also need to be garbage-collected later."
> For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
> more efficient parseDouble method call:
> // Inefficient:
> Double percent = Double.valueOf(value).doubleValue();
> // To be replaced by:
> Double percent = Double.parseDouble(value);
> Our test case in Appendix D confirms this: converting 10,000 strings into 
> integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
> unnecessary wrapper object) took 119 on average; using parseInt() took only 
> 38. Therefore creating even just one unnecessary wrapper object can make your 
> code up to 68% slower.
> 4. Converting literals to strings using + "" - Converting literals to strings 
> using + "" is quite inefficient (see Appendix D) and should be done by 
> calling the toString() method instead: converting 1,000,000 integers to 
> strings using +

[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737198#comment-13737198
 ] 

Edward Capriolo commented on HIVE-5003:
---

I think we are repeating a semi-disturbing trend of writing a lot of code we 
have little direct coverage for. For example take a method like:

{code}
 private static Path getDefaultDestDir(Configuration conf) throws 
LoginException, IOException {
{code}

or 
{code}
 private static String getExecJarPathLocal () {
{code}

I think we should have direct junit style tests around these methods. The code 
clean (for its development state) and well documented. But I think we have the 
chance to "do it better".

Right now, for our current code, and this code. We are totally reliant on our 
end-to-end system to validate every minor change. If we have smaller unit tests 
on things like this we can have more coverage and enhance our ability to make 
changes to the project without having as many worries around side effects that 
will not manifest until final end to end tests. 

I think we should draw a line in the sand and here and attempt to write unit 
tests and design code in a testable way. Not just write it and worry about unit 
tests later. What do you think?



> Localize hive exec jar for tez
> --
>
> Key: HIVE-5003
> URL: https://issues.apache.org/jira/browse/HIVE-5003
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Fix For: tez-branch
>
> Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
> HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt
>
>
> Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
> added to vertices and the dag itself as needed. For hive we need to localize 
> the hive-exec.jar.
> NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4579) Create a SARG interface for RecordReaders

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736113#comment-13736113
 ] 

Edward Capriolo commented on HIVE-4579:
---

One other q.

{quote}
+  public static enum Type {
+INTEGER, // all of the integer types
+FLOAT,   // float and double
+STRING
+  }

{quote}

Should we call these Integral, Real instead of INTEGER, FLOAT? Or should we 
call them LONG and DOUBLE? Because naming them the widest type might make more 
sense?

> Create a SARG interface for RecordReaders
> -
>
> Key: HIVE-4579
> URL: https://issues.apache.org/jira/browse/HIVE-4579
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: h-4579.patch, HIVE-4579.D11409.1.patch, 
> HIVE-4579.D11409.2.patch, pushdown.pdf
>
>
> I think we should create a SARG (http://en.wikipedia.org/wiki/Sargable) 
> interface for RecordReaders. For a first pass, I'll create an API that uses 
> the value stored in hive.io.filter.expr.serialized.
> The desire is to define an simpler interface that the direct AST expression 
> that is provided by hive.io.filter.expr.serialized so that the code to 
> evaluate expressions can be generalized instead of put inside a particular 
> RecordReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4579) Create a SARG interface for RecordReaders

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736112#comment-13736112
 ] 

Edward Capriolo commented on HIVE-4579:
---

Other then the Deque and ArrayDeque I am +1.

> Create a SARG interface for RecordReaders
> -
>
> Key: HIVE-4579
> URL: https://issues.apache.org/jira/browse/HIVE-4579
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: h-4579.patch, HIVE-4579.D11409.1.patch, 
> HIVE-4579.D11409.2.patch, pushdown.pdf
>
>
> I think we should create a SARG (http://en.wikipedia.org/wiki/Sargable) 
> interface for RecordReaders. For a first pass, I'll create an API that uses 
> the value stored in hive.io.filter.expr.serialized.
> The desire is to define an simpler interface that the direct AST expression 
> that is provided by hive.io.filter.expr.serialized so that the code to 
> evaluate expressions can be generalized instead of put inside a particular 
> RecordReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4579) Create a SARG interface for RecordReaders

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736110#comment-13736110
 ] 

Edward Capriolo commented on HIVE-4579:
---

{code}
+private final Stack currentTree =
+new Stack();
{code}
Can you use Deque and ArrayDeque here instead.

> Create a SARG interface for RecordReaders
> -
>
> Key: HIVE-4579
> URL: https://issues.apache.org/jira/browse/HIVE-4579
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: h-4579.patch, HIVE-4579.D11409.1.patch, 
> HIVE-4579.D11409.2.patch, pushdown.pdf
>
>
> I think we should create a SARG (http://en.wikipedia.org/wiki/Sargable) 
> interface for RecordReaders. For a first pass, I'll create an API that uses 
> the value stored in hive.io.filter.expr.serialized.
> The desire is to define an simpler interface that the direct AST expression 
> that is provided by hive.io.filter.expr.serialized so that the code to 
> evaluate expressions can be generalized instead of put inside a particular 
> RecordReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4579) Create a SARG interface for RecordReaders

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736108#comment-13736108
 ] 

Edward Capriolo commented on HIVE-4579:
---

I think I roughly understand the interface, we are going to pass a Sarg into 
the conf of the RecordReader, then the record reader can apply these directly 
on the input row?

> Create a SARG interface for RecordReaders
> -
>
> Key: HIVE-4579
> URL: https://issues.apache.org/jira/browse/HIVE-4579
> Project: Hive
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: h-4579.patch, HIVE-4579.D11409.1.patch, 
> HIVE-4579.D11409.2.patch, pushdown.pdf
>
>
> I think we should create a SARG (http://en.wikipedia.org/wiki/Sargable) 
> interface for RecordReaders. For a first pass, I'll create an API that uses 
> the value stored in hive.io.filter.expr.serialized.
> The desire is to define an simpler interface that the direct AST expression 
> that is provided by hive.io.filter.expr.serialized so that the code to 
> evaluate expressions can be generalized instead of put inside a particular 
> RecordReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4999) Shim class HiveHarFileSystem does not have a hadoop2 counterpart

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736104#comment-13736104
 ] 

Edward Capriolo commented on HIVE-4999:
---

I am +1. Will run move in 24 hours unless someone stops me.

> Shim class HiveHarFileSystem does not have a hadoop2 counterpart
> 
>
> Key: HIVE-4999
> URL: https://issues.apache.org/jira/browse/HIVE-4999
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Attachments: HIVE-4999.patch
>
>
> HiveHarFileSystem only exists in the 0.20 shim.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3772) Fix a concurrency bug in LazyBinaryUtils due to a static field

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736099#comment-13736099
 ] 

Edward Capriolo commented on HIVE-3772:
---

Generally in hive we do not back port we just move forward. There are not many 
.1 or .2 releases. 

> Fix a concurrency bug in LazyBinaryUtils due to a static field
> --
>
> Key: HIVE-3772
> URL: https://issues.apache.org/jira/browse/HIVE-3772
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Fix For: 0.12.0
>
> Attachments: D7155.1.patch, D7155.2.patch, HIVE-3772.1.patch.txt, 
> HIVE-3772-2012-12-04.patch
>
>
> Creating a JIRA for [~rxin]'s patch needed by the Shark project. 
> https://github.com/amplab/hive/commit/17e1c3dd2f6d8eca767115dc46d5a880aed8c765
> writeVLong should not use a static field due to concurrency concerns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Discuss: End of static, thread local

2013-08-10 Thread Edward Capriolo
I just committed https://issues.apache.org/jira/browse/HIVE-3772.

For hive-server2 Carl and others did a lot of work to clean up un thread
safe things from hive.

Hive was originally build as a fat client so it is not surprising that many
such constructs exist. Now since we have retrofitted multi-threaded-ness
onto the project we have a number of edge case bugs.

My suggestions here would be for that the next release 0.13 we make a push
to remove all possible non thread safe code and explicitly pass context
objects or serialized structures everywhere thread safety is needed.

I can see this would start with something like the Function Registry, this
would be a per session object passed around rather then a global object
with static hashmap instances in it.

I know that this probably will not be as simple as removing all static
members from our codebase, but does anyone know of specific challenges that
will be intrinsically hard to solve?

Please comment.


Re: Key components of developer guide are blank!

2013-08-10 Thread Edward Capriolo
I mean to say in my firefox browser I see this:

Running Hive Without a Hadoop Cluster

>From Thejas:

Then you can run 'build/dist/bin/hive' and it will work against your local
file system.
The section which lists the commands are empty.


On Sat, Aug 10, 2013 at 12:56 PM, Lefty Leverenz wrote:

> Those sections have been blank since 2011 (beginning of Page History):
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27820469.
>
>
>
> On Sat, Aug 10, 2013 at 11:43 AM, Edward Capriolo  >wrote:
>
> > IF you editing this page recently please take a look.
> > https://cwiki.apache.org/Hive/developerguide.html
> >
>
>
>
> -- Lefty
>


[jira] [Commented] (HIVE-5054) Remove unused property submitviachild

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735962#comment-13735962
 ] 

Edward Capriolo commented on HIVE-5054:
---

WE have an open issue where we are trying to avoid hive-server crashes caused 
by "bad udfs in local mode" maybe this remains an answer.

> Remove unused property submitviachild
> -
>
> Key: HIVE-5054
> URL: https://issues.apache.org/jira/browse/HIVE-5054
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5054.patch
>
>
> This property only exist in HiveConf and is always set to false. Lets get rid 
> of dead code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5019) Use StringBuffer instead of += (issue 1)

2013-08-10 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5019:
--

Status: Open  (was: Patch Available)

patch has a compile error

{quote}
[javac] Compiling 14 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/build/shims/classes
[javac] 
/data/hive-ptest/working/apache-svn-trunk-source/shims/src/common-secure/java/org/apache/hadoop/hive/thrift/ZooKeeperTokenStore.java:168:
 cannot find symbol
[javac] symbol  : method 
create(java.lang.StringBuffer,byte[],java.util.List,org.apache.zookeeper.CreateMode)
[javac] location: class org.apache.zookeeper.ZooKeeper
[javac] String node = zk.create(currentPath, new byte[0], acl,
{quote}

> Use StringBuffer instead of += (issue 1)
> 
>
> Key: HIVE-5019
> URL: https://issues.apache.org/jira/browse/HIVE-5019
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Benjamin Jakobus
>Assignee: Benjamin Jakobus
> Fix For: 0.12.0
>
> Attachments: HIVE-5019.1.patch.txt, HIVE-5019.2.patch.txt
>
>
> Issue 1 (use of StringBuffer over +=)
> java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
> java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
> java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
> java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
> java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java
> java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java
> java/org/apache/hadoop/hive/ql/udf/UDFLike.java
> java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
> java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
> java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4999) Shim class HiveHarFileSystem does not have a hadoop2 counterpart

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735951#comment-13735951
 ] 

Edward Capriolo commented on HIVE-4999:
---

Ashutosh? Are you reviewing? I am +1 for this. It can not break anything 
existing. 

[~brocknoland] There must be some tests for HAR functionality that are 
excluding 23  we should change those as well?

> Shim class HiveHarFileSystem does not have a hadoop2 counterpart
> 
>
> Key: HIVE-4999
> URL: https://issues.apache.org/jira/browse/HIVE-4999
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Attachments: HIVE-4999.patch
>
>
> HiveHarFileSystem only exists in the 0.20 shim.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Key components of developer guide are blank!

2013-08-10 Thread Edward Capriolo
IF you editing this page recently please take a look.
https://cwiki.apache.org/Hive/developerguide.html


[jira] [Commented] (HIVE-5054) Remove unused property submitviachild

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735945#comment-13735945
 ] 

Edward Capriolo commented on HIVE-5054:
---


Could it be useful in hive server 2 type scenarios?
hive.exec.submitviachildDetermines whether the map/reduce jobs should 
be submitted through a separate jvm in the non local mode.false - By 
default jobs are submitted through the same jvm as the compiler

> Remove unused property submitviachild
> -
>
> Key: HIVE-5054
> URL: https://issues.apache.org/jira/browse/HIVE-5054
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5054.patch
>
>
> This property only exist in HiveConf and is always set to false. Lets get rid 
> of dead code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5054) Remove unused property submitviachild

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735944#comment-13735944
 ] 

Edward Capriolo commented on HIVE-5054:
---

+1 . Thought I have often wondered aboutthis. What is the code design for ? 
Debugging?

> Remove unused property submitviachild
> -
>
> Key: HIVE-5054
> URL: https://issues.apache.org/jira/browse/HIVE-5054
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5054.patch
>
>
> This property only exist in HiveConf and is always set to false. Lets get rid 
> of dead code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-494) Select columns by index instead of name

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735943#comment-13735943
 ] 

Edward Capriolo commented on HIVE-494:
--

I think we should also support negative numbers to query from the right end 
like awk's $NF

> Select columns by index instead of name
> ---
>
> Key: HIVE-494
> URL: https://issues.apache.org/jira/browse/HIVE-494
> Project: Hive
>  Issue Type: Wish
>  Components: Clients, Query Processor
>Reporter: Adam Kramer
>Priority: Minor
>  Labels: SQL
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-494.D1641.1.patch
>
>
> SELECT mytable[0], mytable[2] FROM some_table_name mytable;
> ...should return the first and third columns, respectively, from mytable 
> regardless of their column names.
> The need for "names" specifically is kind of silly when they just get 
> translated into numbers anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3772) Fix a concurrency bug in LazyBinaryUtils due to a static field

2013-08-10 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-3772:
--

Fix Version/s: 0.12.0
 Assignee: Mikhail Bautin
  Summary: Fix a concurrency bug in LazyBinaryUtils due to a static 
field  (was: Fix a concurrency bug in LazyBinaryUtils due to a static field 
(patch by Reynold Xin))

> Fix a concurrency bug in LazyBinaryUtils due to a static field
> --
>
> Key: HIVE-3772
> URL: https://issues.apache.org/jira/browse/HIVE-3772
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Fix For: 0.12.0
>
> Attachments: D7155.1.patch, D7155.2.patch, HIVE-3772.1.patch.txt, 
> HIVE-3772-2012-12-04.patch
>
>
> Creating a JIRA for [~rxin]'s patch needed by the Shark project. 
> https://github.com/amplab/hive/commit/17e1c3dd2f6d8eca767115dc46d5a880aed8c765
> writeVLong should not use a static field due to concurrency concerns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3772) Fix a concurrency bug in LazyBinaryUtils due to a static field

2013-08-10 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-3772:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. Thanks all.

> Fix a concurrency bug in LazyBinaryUtils due to a static field
> --
>
> Key: HIVE-3772
> URL: https://issues.apache.org/jira/browse/HIVE-3772
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Fix For: 0.12.0
>
> Attachments: D7155.1.patch, D7155.2.patch, HIVE-3772.1.patch.txt, 
> HIVE-3772-2012-12-04.patch
>
>
> Creating a JIRA for [~rxin]'s patch needed by the Shark project. 
> https://github.com/amplab/hive/commit/17e1c3dd2f6d8eca767115dc46d5a880aed8c765
> writeVLong should not use a static field due to concurrency concerns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-494) Select columns by index instead of name

2013-08-10 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735932#comment-13735932
 ] 

Edward Capriolo commented on HIVE-494:
--

I think any user will realize that '$1' can change. In the end i think hive 
should be more dynamic somewhat like pig. Imagine something like this:

create table x stored by dynamichandler;

select $1 , $2 from x (inputformat=textinputformat, inpath=/x/y/z);

We are close to this now because Navis added the ability to specify per query 
table properties.

What is, or what is not in the SQL spec should not be our metric, we can 
already do amazing things that SQL can't so I want to keep innovating. As long 
as something does not produce an ambiguity in the language I see no harm in it. 

> Select columns by index instead of name
> ---
>
> Key: HIVE-494
> URL: https://issues.apache.org/jira/browse/HIVE-494
> Project: Hive
>  Issue Type: Wish
>  Components: Clients, Query Processor
>Reporter: Adam Kramer
>Priority: Minor
>  Labels: SQL
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-494.D1641.1.patch
>
>
> SELECT mytable[0], mytable[2] FROM some_table_name mytable;
> ...should return the first and third columns, respectively, from mytable 
> regardless of their column names.
> The need for "names" specifically is kind of silly when they just get 
> translated into numbers anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-494) Select columns by index instead of name

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735746#comment-13735746
 ] 

Edward Capriolo commented on HIVE-494:
--

[~cwsteinbach] [~navis]

I think we should commit this. 
* it is impossible to name a column 1
* it is impossible to name a column alias 1

If order by supports this I do not see group by can't? Do we want to reconsider 
this?I kinda like the feature.


> Select columns by index instead of name
> ---
>
> Key: HIVE-494
> URL: https://issues.apache.org/jira/browse/HIVE-494
> Project: Hive
>  Issue Type: Wish
>  Components: Clients, Query Processor
>Reporter: Adam Kramer
>Priority: Minor
>  Labels: SQL
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-494.D1641.1.patch
>
>
> SELECT mytable[0], mytable[2] FROM some_table_name mytable;
> ...should return the first and third columns, respectively, from mytable 
> regardless of their column names.
> The need for "names" specifically is kind of silly when they just get 
> translated into numbers anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1662) Add file pruning into Hive.

2013-08-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1662:
--

Attachment: HIVE-1662.8.patch.txt

Fixed typo in enum

> Add file pruning into Hive.
> ---
>
> Key: HIVE-1662
> URL: https://issues.apache.org/jira/browse/HIVE-1662
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: Navis
> Attachments: HIVE-1662.8.patch.txt, HIVE-1662.D8391.1.patch, 
> HIVE-1662.D8391.2.patch, HIVE-1662.D8391.3.patch, HIVE-1662.D8391.4.patch, 
> HIVE-1662.D8391.5.patch, HIVE-1662.D8391.6.patch, HIVE-1662.D8391.7.patch
>
>
> now hive support filename virtual column. 
> if a file name filter presents in a query, hive should be able to only add 
> files which passed the filter to input paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3491) Expose column names to UDFs

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735736#comment-13735736
 ] 

Edward Capriolo commented on HIVE-3491:
---

So we are now intitilizing the UDF and passing some context. It should be 
possible to pass along and acquire this information.

> Expose column names to UDFs
> ---
>
> Key: HIVE-3491
> URL: https://issues.apache.org/jira/browse/HIVE-3491
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, UDF
>Reporter: Adam Kramer
>
> If I run
> SELECT MY_FUNC(a.foo, b.bar) FROM baz1 a JOIN baz2 b;
> ...the parsed query structure (i.e., that "foo" and "bar" are the name of the 
> columns) should be available to the UDF in some manner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3926) PPD on virtual column of partitioned table is not working

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735732#comment-13735732
 ] 

Edward Capriolo commented on HIVE-3926:
---

I am not sure what is being asked. The virtual columns have file names and 
offsets into files, I am not seeing how this ties into the metastore.

> PPD on virtual column of partitioned table is not working
> -
>
> Key: HIVE-3926
> URL: https://issues.apache.org/jira/browse/HIVE-3926
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-3926.6.patch, HIVE-3926.D8121.1.patch, 
> HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch, 
> HIVE-3926.D8121.5.patch
>
>
> {code}
> select * from src where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> is working, but
> {code}
> select * from srcpart where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> throws SemanticException. Disabling PPD makes it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3772) Fix a concurrency bug in LazyBinaryUtils due to a static field (patch by Reynold Xin)

2013-08-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-3772:
--

Attachment: HIVE-3772.1.patch.txt

> Fix a concurrency bug in LazyBinaryUtils due to a static field (patch by 
> Reynold Xin)
> -
>
> Key: HIVE-3772
> URL: https://issues.apache.org/jira/browse/HIVE-3772
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Mikhail Bautin
> Attachments: D7155.1.patch, D7155.2.patch, HIVE-3772.1.patch.txt, 
> HIVE-3772-2012-12-04.patch
>
>
> Creating a JIRA for [~rxin]'s patch needed by the Shark project. 
> https://github.com/amplab/hive/commit/17e1c3dd2f6d8eca767115dc46d5a880aed8c765
> writeVLong should not use a static field due to concurrency concerns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3772) Fix a concurrency bug in LazyBinaryUtils due to a static field (patch by Reynold Xin)

2013-08-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-3772:
--

Status: Patch Available  (was: Open)

> Fix a concurrency bug in LazyBinaryUtils due to a static field (patch by 
> Reynold Xin)
> -
>
> Key: HIVE-3772
> URL: https://issues.apache.org/jira/browse/HIVE-3772
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Mikhail Bautin
> Attachments: D7155.1.patch, D7155.2.patch, HIVE-3772.1.patch.txt, 
> HIVE-3772-2012-12-04.patch
>
>
> Creating a JIRA for [~rxin]'s patch needed by the Shark project. 
> https://github.com/amplab/hive/commit/17e1c3dd2f6d8eca767115dc46d5a880aed8c765
> writeVLong should not use a static field due to concurrency concerns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3772) Fix a concurrency bug in LazyBinaryUtils due to a static field (patch by Reynold Xin)

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735717#comment-13735717
 ] 

Edward Capriolo commented on HIVE-3772:
---

I am +1. Thread local is not perfect but surely better then static.

> Fix a concurrency bug in LazyBinaryUtils due to a static field (patch by 
> Reynold Xin)
> -
>
> Key: HIVE-3772
> URL: https://issues.apache.org/jira/browse/HIVE-3772
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Mikhail Bautin
> Attachments: D7155.1.patch, D7155.2.patch, HIVE-3772-2012-12-04.patch
>
>
> Creating a JIRA for [~rxin]'s patch needed by the Shark project. 
> https://github.com/amplab/hive/commit/17e1c3dd2f6d8eca767115dc46d5a880aed8c765
> writeVLong should not use a static field due to concurrency concerns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3772) Fix a concurrency bug in LazyBinaryUtils due to a static field (patch by Reynold Xin)

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735709#comment-13735709
 ] 

Edward Capriolo commented on HIVE-3772:
---

Looking now.

> Fix a concurrency bug in LazyBinaryUtils due to a static field (patch by 
> Reynold Xin)
> -
>
> Key: HIVE-3772
> URL: https://issues.apache.org/jira/browse/HIVE-3772
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Mikhail Bautin
> Attachments: D7155.1.patch, D7155.2.patch, HIVE-3772-2012-12-04.patch
>
>
> Creating a JIRA for [~rxin]'s patch needed by the Shark project. 
> https://github.com/amplab/hive/commit/17e1c3dd2f6d8eca767115dc46d5a880aed8c765
> writeVLong should not use a static field due to concurrency concerns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4863) Fix parallel order by on hadoop2

2013-08-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4863:
--

Attachment: HIVE-4863.2.patch.txt

> Fix parallel order by on hadoop2
> 
>
> Key: HIVE-4863
> URL: https://issues.apache.org/jira/browse/HIVE-4863
> Project: Hive
>  Issue Type: Bug
>    Reporter: Edward Capriolo
>    Assignee: Edward Capriolo
> Attachments: HIVE-4863.1.patch.txt, HIVE-4863.2.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-5038) rank operator is case-sensitive and has odd semantics

2013-08-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-5038.
---

Resolution: Invalid

Ranking functions not matching name is fixed in trunk.

> rank operator is case-sensitive and has odd semantics
> -
>
> Key: HIVE-5038
> URL: https://issues.apache.org/jira/browse/HIVE-5038
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.11.0
>Reporter: Barrett Strausser
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: fake_customer_order_data.csv, 
> rank_semantics_test_data.hive.sql
>
>
> Issue 1 : The rank operator is sensitive to case.
> The following works :
> SELECT
> fco.cmscustid,fco.orderdate, rank() w
> FROM
> fake_customer_orders fco
> window
> w as (partition by fco.cmscustid  order by fco.orderdate)
> While this does not :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK() w 
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> The failing call returns :
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: 
> One or more arguments are expected.
> Issue 2: 
> The following works :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> This does not :
> SELECT 
> fco.cmscustid,fco.orderdate, rank(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> and returns - 
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Ranking Functions can take no arguments
> This has been reproduced by mutliple users and probably pertains to other 
> PTF/windowing functions as well although I haven't duplicated them
> In no case is the returned output the expected output. I'll file another jira 
> for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5038) rank operator is case-sensitive and has odd semantics

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735702#comment-13735702
 ] 

Edward Capriolo commented on HIVE-5038:
---

I confirmed in trunk that the parse issue is no longer and issue
{code}
Time taken: 0.487 seconds
hive> SELECT
> fco.cmscustid,fco.orderdate, RANK(fco.orderdate) w
> FROM
> fake_customer_orders fco
> window
> w as (partition by fco.cmscustid order by fco.orderdate);
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. 
At least 1 group must only depend on input columns. Also check for circular 
dependencies.
Underlying error: Ranking Functions can take no arguments
hive> SELECT
> fco.cmscustid,fco.orderdate, rank(fco.orderdate) w
> FROM
> fake_customer_orders fco
> window
> w as (partition by fco.cmscustid order by fco.orderdate);
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. 
At least 1 group must only depend on input columns. Also check for circular 
dependencies.
Underlying error: Ranking Functions can take no arguments
{code}

I think this was fixed in HIVE-4879, or one of the other ranking cleanups I 
did. RANK() should not accept arguments, none of the unit tests take arguments. 
I am going to mark this as closed since it is fixed in trunk. If the actual 
results are wrong (I think your suggesting that )please open another ticket

> rank operator is case-sensitive and has odd semantics
> -
>
> Key: HIVE-5038
> URL: https://issues.apache.org/jira/browse/HIVE-5038
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.11.0
>Reporter: Barrett Strausser
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: fake_customer_order_data.csv, 
> rank_semantics_test_data.hive.sql
>
>
> Issue 1 : The rank operator is sensitive to case.
> The following works :
> SELECT
> fco.cmscustid,fco.orderdate, rank() w
> FROM
> fake_customer_orders fco
> window
> w as (partition by fco.cmscustid  order by fco.orderdate)
> While this does not :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK() w 
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> The failing call returns :
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: 
> One or more arguments are expected.
> Issue 2: 
> The following works :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> This does not :
> SELECT 
> fco.cmscustid,fco.orderdate, rank(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> and returns - 
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Ranking Functions can take no arguments
> This has been reproduced by mutliple users and probably pertains to other 
> PTF/windowing functions as well although I haven't duplicated them
> In no case is the returned output the expected output. I'll file another jira 
> for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5038) rank operator is case-sensitive and has odd semantics

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735695#comment-13735695
 ] 

Edward Capriolo commented on HIVE-5038:
---

Barrett have you confirmed the issue in trunk?

> rank operator is case-sensitive and has odd semantics
> -
>
> Key: HIVE-5038
> URL: https://issues.apache.org/jira/browse/HIVE-5038
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.11.0
>Reporter: Barrett Strausser
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: fake_customer_order_data.csv, 
> rank_semantics_test_data.hive.sql
>
>
> Issue 1 : The rank operator is sensitive to case.
> The following works :
> SELECT
> fco.cmscustid,fco.orderdate, rank() w
> FROM
> fake_customer_orders fco
> window
> w as (partition by fco.cmscustid  order by fco.orderdate)
> While this does not :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK() w 
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> The failing call returns :
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: 
> One or more arguments are expected.
> Issue 2: 
> The following works :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> This does not :
> SELECT 
> fco.cmscustid,fco.orderdate, rank(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> and returns - 
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Ranking Functions can take no arguments
> This has been reproduced by mutliple users and probably pertains to other 
> PTF/windowing functions as well although I haven't duplicated them
> In no case is the returned output the expected output. I'll file another jira 
> for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5019) Use StringBuffer instead of += (issue 1)

2013-08-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5019:
--

Attachment: HIVE-5019.2.patch.txt

Random test failure. Lets test again.

> Use StringBuffer instead of += (issue 1)
> 
>
> Key: HIVE-5019
> URL: https://issues.apache.org/jira/browse/HIVE-5019
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Benjamin Jakobus
>Assignee: Benjamin Jakobus
> Fix For: 0.12.0
>
> Attachments: HIVE-5019.1.patch.txt, HIVE-5019.2.patch.txt
>
>
> Issue 1 (use of StringBuffer over +=)
> java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
> java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
> java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
> java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
> java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java
> java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java
> java/org/apache/hadoop/hive/ql/udf/UDFLike.java
> java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
> java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
> java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735220#comment-13735220
 ] 

Edward Capriolo commented on HIVE-2482:
---

I was going to say:
This is an incompatible change because the return type of unhex has been 
changed from string to binary 
I think we should not do this ^ lets make another UDF, or overload the 
parameters of this one.

> Convenience UDFs for binary data type
> -
>
> Key: HIVE-2482
> URL: https://issues.apache.org/jira/browse/HIVE-2482
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ashutosh Chauhan
>Assignee: Mark Wagner
> Fix For: 0.12.0
>
> Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, 
> HIVE-2482.4.patch
>
>
> HIVE-2380 introduced binary data type in Hive. It will be good to have 
> following udfs to make it more useful:
> * UDF's to convert to/from hex string
> * UDF's to convert to/from string using a specific encoding
> * UDF's to convert to/from base64 string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5019) Use StringBuffer instead of += (issue 1)

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735188#comment-13735188
 ] 

Edward Capriolo commented on HIVE-5019:
---

The tests have to pass with +1.
Then I comment on the ticket with +1 (Which I have not done yet because I have 
not had time for a full review)
Then we commit the code
Then the ticket is closed and we thank you.


> Use StringBuffer instead of += (issue 1)
> 
>
> Key: HIVE-5019
> URL: https://issues.apache.org/jira/browse/HIVE-5019
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Benjamin Jakobus
>Assignee: Benjamin Jakobus
> Fix For: 0.12.0
>
> Attachments: HIVE-5019.1.patch.txt
>
>
> Issue 1 (use of StringBuffer over +=)
> java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
> java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
> java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
> java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
> java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java
> java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java
> java/org/apache/hadoop/hive/ql/udf/UDFLike.java
> java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
> java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
> java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5019) Use StringBuffer instead of += (issue 1)

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735103#comment-13735103
 ] 

Edward Capriolo commented on HIVE-5019:
---


When you see code like this:
{quote}

   first = true;
   for (int k = 0; k < columnSize; k++) {
-String newColName = i + "_VALUE_" + k; // any name, it does not matter.
+newColName = i + "_VALUE_" + k; // any name, it does not matter.
 if (!first) {
-  valueColNames = valueColNames + ",";
-  valueColTypes = valueColTypes + ",";
+  valueColNames.append(",");
+  valueColTypes.append(",");
 }
-valueColNames = valueColNames + newColName;
-valueColTypes = valueColTypes + valueCols.get(k).getTypeString();
+valueColNames.append(newColName);
+valueColTypes.append(valueCols.get(k).getTypeString());
 first = false;
{quote}
Can you replace it with StringUtil.join()

I have seen this about 4 places in hive. Maybe do that as a follow on.

> Use StringBuffer instead of += (issue 1)
> 
>
> Key: HIVE-5019
> URL: https://issues.apache.org/jira/browse/HIVE-5019
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Benjamin Jakobus
>Assignee: Benjamin Jakobus
> Fix For: 0.12.0
>
> Attachments: HIVE-5019.1.patch.txt
>
>
> Issue 1 (use of StringBuffer over +=)
> java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
> java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
> java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
> java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
> java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java
> java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java
> java/org/apache/hadoop/hive/ql/udf/UDFLike.java
> java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
> java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
> java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5019) Use StringBuffer instead of += (issue 1)

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735065#comment-13735065
 ] 

Edward Capriolo commented on HIVE-5019:
---

please name your patch HIVE-5019.1.patch.txt and it will be automatically 
tested.

> Use StringBuffer instead of += (issue 1)
> 
>
> Key: HIVE-5019
> URL: https://issues.apache.org/jira/browse/HIVE-5019
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Benjamin Jakobus
>Assignee: Benjamin Jakobus
> Fix For: 0.12.0
>
> Attachments: stringbuffer.patch
>
>
> Issue 1 (use of StringBuffer over +=)
> java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
> java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
> java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
> java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java
> java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
> java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java
> java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java
> java/org/apache/hadoop/hive/ql/udf/UDFLike.java
> java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
> java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java
> java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-5038) rank operator is case-sensitive and has odd semantics

2013-08-09 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo reassigned HIVE-5038:
-

Assignee: Edward Capriolo

> rank operator is case-sensitive and has odd semantics
> -
>
> Key: HIVE-5038
> URL: https://issues.apache.org/jira/browse/HIVE-5038
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.11.0
>Reporter: Barrett Strausser
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: fake_customer_order_data.csv, 
> rank_semantics_test_data.hive.sql
>
>
> Issue 1 : The rank operator is sensitive to case.
> The following works :
> SELECT
> fco.cmscustid,fco.orderdate, rank() w
> FROM
> fake_customer_orders fco
> window
> w as (partition by fco.cmscustid  order by fco.orderdate)
> While this does not :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK() w 
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> The failing call returns :
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: 
> One or more arguments are expected.
> Issue 2: 
> The following works :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> This does not :
> SELECT 
> fco.cmscustid,fco.orderdate, rank(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> and returns - 
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Ranking Functions can take no arguments
> This has been reproduced by mutliple users and probably pertains to other 
> PTF/windowing functions as well although I haven't duplicated them
> In no case is the returned output the expected output. I'll file another jira 
> for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5038) rank operator is case-sensitive and has odd semantics

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735047#comment-13735047
 ] 

Edward Capriolo commented on HIVE-5038:
---

I will take a look later today

> rank operator is case-sensitive and has odd semantics
> -
>
> Key: HIVE-5038
> URL: https://issues.apache.org/jira/browse/HIVE-5038
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 0.11.0
>Reporter: Barrett Strausser
>Assignee: Edward Capriolo
>Priority: Minor
> Attachments: fake_customer_order_data.csv, 
> rank_semantics_test_data.hive.sql
>
>
> Issue 1 : The rank operator is sensitive to case.
> The following works :
> SELECT
> fco.cmscustid,fco.orderdate, rank() w
> FROM
> fake_customer_orders fco
> window
> w as (partition by fco.cmscustid  order by fco.orderdate)
> While this does not :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK() w 
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> The failing call returns :
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: 
> One or more arguments are expected.
> Issue 2: 
> The following works :
> SELECT 
> fco.cmscustid,fco.orderdate, RANK(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> This does not :
> SELECT 
> fco.cmscustid,fco.orderdate, rank(fco.orderdate) w
> FROM 
> fake_customer_orders fco   
> window   
> w as (partition by fco.cmscustid  order by fco.orderdate);
> and returns - 
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Ranking Functions can take no arguments
> This has been reproduced by mutliple users and probably pertains to other 
> PTF/windowing functions as well although I haven't duplicated them
> In no case is the returned output the expected output. I'll file another jira 
> for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1662) Add file pruning into Hive.

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734875#comment-13734875
 ] 

Edward Capriolo commented on HIVE-1662:
---

Can you change this typo here:
 HIVEPPDFILESREVOVEFILTER("hive.optimize.ppd.vc.filename.remove.filter", false),

Otherwise +1



> Add file pruning into Hive.
> ---
>
> Key: HIVE-1662
> URL: https://issues.apache.org/jira/browse/HIVE-1662
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: Navis
> Attachments: HIVE-1662.D8391.1.patch, HIVE-1662.D8391.2.patch, 
> HIVE-1662.D8391.3.patch, HIVE-1662.D8391.4.patch, HIVE-1662.D8391.5.patch, 
> HIVE-1662.D8391.6.patch, HIVE-1662.D8391.7.patch
>
>
> now hive support filename virtual column. 
> if a file name filter presents in a query, hive should be able to only add 
> files which passed the filter to input paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1662) Add file pruning into Hive.

2013-08-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734816#comment-13734816
 ] 

Edward Capriolo commented on HIVE-1662:
---

Will review and hopefully not break the build. 

> Add file pruning into Hive.
> ---
>
> Key: HIVE-1662
> URL: https://issues.apache.org/jira/browse/HIVE-1662
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: Navis
> Attachments: HIVE-1662.D8391.1.patch, HIVE-1662.D8391.2.patch, 
> HIVE-1662.D8391.3.patch, HIVE-1662.D8391.4.patch, HIVE-1662.D8391.5.patch, 
> HIVE-1662.D8391.6.patch, HIVE-1662.D8391.7.patch
>
>
> now hive support filename virtual column. 
> if a file name filter presents in a query, hive should be able to only add 
> files which passed the filter to input paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4513) disable hivehistory logs by default

2013-08-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734356#comment-13734356
 ] 

Edward Capriolo commented on HIVE-4513:
---

+1 I never found them of much use.

> disable hivehistory logs by default
> ---
>
> Key: HIVE-4513
> URL: https://issues.apache.org/jira/browse/HIVE-4513
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Logging
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-4513.1.patch, HIVE-4513.2.patch, HIVE-4513.3.patch, 
> HIVE-4513.4.patch, HIVE-4513.5.patch
>
>
> HiveHistory log files (hive_job_log_hive_*.txt files) store information about 
> hive query such as query string, plan , counters and MR job progress 
> information.
> There is no mechanism to delete these files and as a result they get 
> accumulated over time, using up lot of disk space. 
> I don't think this is used by most people, so I think it would better to turn 
> this off by default. Jobtracker logs already capture most of this 
> information, though it is not as structured as history logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez

2013-08-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734350#comment-13734350
 ] 

Edward Capriolo commented on HIVE-5003:
---

What does this mean for adding jars to load UDFs or adding files to the 
distributed cache to run a job, are we going to create folders under this HDFS 
folder? 

> Localize hive exec jar for tez
> --
>
> Key: HIVE-5003
> URL: https://issues.apache.org/jira/browse/HIVE-5003
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Fix For: tez-branch
>
> Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
> HIVE-5003.3.patch.txt
>
>
> Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
> added to vertices and the dag itself as needed. For hive we need to localize 
> the hive-exec.jar.
> NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1545) Add a bunch of UDFs and UDAFs

2013-08-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734306#comment-13734306
 ] 

Edward Capriolo commented on HIVE-1545:
---

The annotations and other things you are seeing are part of an internal testing 
framework at FB that was never open sourced, the hive plugin developer kit had 
similar annotations but they were removed. So the UDFS likely compilefine but 
the test cases will not.

> Add a bunch of UDFs and UDAFs
> -
>
> Key: HIVE-1545
> URL: https://issues.apache.org/jira/browse/HIVE-1545
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Jonathan Chang
>Assignee: Jonathan Chang
>Priority: Minor
> Attachments: core.tar.gz, ext.tar.gz, UDFEndsWith.java, 
> UDFFindInString.java, UDFLtrim.java, UDFRtrim.java, udfs.tar.gz, udfs.tar.gz, 
> UDFStartsWith.java, UDFTrim.java
>
>
> Here some UD(A)Fs which can be incorporated into the Hive distribution:
> UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 
> 5, 3) returns 1.
> UDFBucket - Find the bucket in which the first argument belongs. e.g., 
> BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x > b_{i} 
> but <= b_{i+1}. Returns 0 if x is smaller than all the buckets.
> UDFFindInArray - Finds the 1-index of the first element in the array given as 
> the second argument. Returns 0 if not found. Returns NULL if either argument 
> is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, 
> array(1,2,3)) will return 0.
> UDFGreatCircleDist - Finds the great circle distance (in km) between two 
> lat/long coordinates (in degrees).
> UDFLDA - Performs LDA inference on a vector given fixed topics.
> UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 
> whenever any of its parameters changes.
> UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 
> 5.
> UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches 
> in an array.
> UDFUnescape - Returns the string unescaped (using C/Java style unescaping).
> UDFWhich - Given a boolean array, return the indices which are TRUE.
> UDFJaccard
> UDAFCollect - Takes all the values associated with a row and converts it into 
> a list. Make sure to have: set hive.map.aggr = false;
> UDAFCollectMap - Like collect except that it takes tuples and generates a map.
> UDAFEntropy - Compute the entropy of a column.
> UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two 
> columns.
> UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value 
> of VAL.
> UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated 
> with the N (passed as the third parameter) largest values of VAL.
> UDAFHistogram

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5026) HIVE-3926 is committed in the state of not rebased to trunk

2013-08-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733769#comment-13733769
 ] 

Edward Capriolo commented on HIVE-5026:
---

I do not have time to review this now. If no one else wants to +1 and commit 
this patch, I suggest rolling back and re-opening HIVE-3926 . I will not be 
able to get to this until about 7:00 PM eastern tonight.

> HIVE-3926 is committed in the state of not rebased to trunk
> ---
>
> Key: HIVE-5026
> URL: https://issues.apache.org/jira/browse/HIVE-5026
> Project: Hive
>  Issue Type: Task
>  Components: Tests
>Reporter: Navis
>Assignee: Navis
> Attachments: HIVE-5026.D12099.1.patch
>
>
> Current trunk build fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5026) HIVE-3926 is committed in the state of not rebased to trunk

2013-08-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733573#comment-13733573
 ] 

Edward Capriolo commented on HIVE-5026:
---

My bad. I thought I took the latest patch from jira. 

> HIVE-3926 is committed in the state of not rebased to trunk
> ---
>
> Key: HIVE-5026
> URL: https://issues.apache.org/jira/browse/HIVE-5026
> Project: Hive
>  Issue Type: Task
>  Components: Tests
>Reporter: Navis
>Assignee: Navis
> Attachments: HIVE-5026.D12099.1.patch
>
>
> Current trunk build fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4863) Fix parallel order by on hadoop2

2013-08-08 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733560#comment-13733560
 ] 

Edward Capriolo commented on HIVE-4863:
---

So the case is this in 0.20

 TotalOrderPartitioner.setPartitionFile(JobConf, partitionFile);

in 0.23
  TotalOrderPartitioner.setPartitionFile(Configuration, partitionFile);

JobConf is a child of Configuration

{quote}
 Also, in the 23 version you're setting the file on HiveConf not JobConf which 
I don't think will work, will it?
{quote}
^ I think this will not matter since as long as the conf can find hdfs we 
should be ready do add the file.


{quote}
The shim should do be able to do the exact same call in both cases - the 
important thing is that we compile it separately against 20S and hadoop 23
{quote}
Good point. This shim stuff hurts my head :) I will rebase as you have 
suggested.




> Fix parallel order by on hadoop2
> 
>
> Key: HIVE-4863
> URL: https://issues.apache.org/jira/browse/HIVE-4863
> Project: Hive
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: HIVE-4863.1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3926) PPD on virtual column of partitioned table is not working

2013-08-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733133#comment-13733133
 ] 

Edward Capriolo commented on HIVE-3926:
---

Committed thanks Navis and Gunther.

> PPD on virtual column of partitioned table is not working
> -
>
> Key: HIVE-3926
> URL: https://issues.apache.org/jira/browse/HIVE-3926
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-3926.6.patch, HIVE-3926.D8121.1.patch, 
> HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch, 
> HIVE-3926.D8121.5.patch
>
>
> {code}
> select * from src where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> is working, but
> {code}
> select * from srcpart where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> throws SemanticException. Disabling PPD makes it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Deleted] (HIVE-3926) PPD on virtual column of partitioned table is not working

2013-08-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-3926:
--

Comment: was deleted

(was: Committed. Thanks Navis and Gunther.)

> PPD on virtual column of partitioned table is not working
> -
>
> Key: HIVE-3926
> URL: https://issues.apache.org/jira/browse/HIVE-3926
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-3926.6.patch, HIVE-3926.D8121.1.patch, 
> HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch, 
> HIVE-3926.D8121.5.patch
>
>
> {code}
> select * from src where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> is working, but
> {code}
> select * from srcpart where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> throws SemanticException. Disabling PPD makes it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3926) PPD on virtual column of partitioned table is not working

2013-08-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-3926:
--

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed. Thanks Navis and Gunther.

> PPD on virtual column of partitioned table is not working
> -
>
> Key: HIVE-3926
> URL: https://issues.apache.org/jira/browse/HIVE-3926
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-3926.6.patch, HIVE-3926.D8121.1.patch, 
> HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch, 
> HIVE-3926.D8121.5.patch
>
>
> {code}
> select * from src where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> is working, but
> {code}
> select * from srcpart where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> throws SemanticException. Disabling PPD makes it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3926) PPD on virtual column of partitioned table is not working

2013-08-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733125#comment-13733125
 ] 

Edward Capriolo commented on HIVE-3926:
---

+1 will commit.

> PPD on virtual column of partitioned table is not working
> -
>
> Key: HIVE-3926
> URL: https://issues.apache.org/jira/browse/HIVE-3926
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-3926.6.patch, HIVE-3926.D8121.1.patch, 
> HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch, 
> HIVE-3926.D8121.5.patch
>
>
> {code}
> select * from src where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> is working, but
> {code}
> select * from srcpart where BLOCK__OFFSET__INSIDE__FILE<100;
> {code}
> throws SemanticException. Disabling PPD makes it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1662) Add file pruning into Hive.

2013-08-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733120#comment-13733120
 ] 

Edward Capriolo commented on HIVE-1662:
---

+1 will commit in 24 hours.

> Add file pruning into Hive.
> ---
>
> Key: HIVE-1662
> URL: https://issues.apache.org/jira/browse/HIVE-1662
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: Navis
> Attachments: HIVE-1662.D8391.1.patch, HIVE-1662.D8391.2.patch, 
> HIVE-1662.D8391.3.patch, HIVE-1662.D8391.4.patch, HIVE-1662.D8391.5.patch, 
> HIVE-1662.D8391.6.patch
>
>
> now hive support filename virtual column. 
> if a file name filter presents in a query, hive should be able to only add 
> files which passed the filter to input paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732957#comment-13732957
 ] 

Edward Capriolo commented on HIVE-4964:
---

I think we are getting ahead of ourselves here. HIVE-4963 could be classified 
as a feature, but there are still big cleanups in the PTF work. There is lots 
of dead code, and lots of redundant code, we should clean all those items up 
first. There is  bits in there with xml encoded and other things that need to 
be moved. 

> Cleanup PTF code: remove code dealing with non standard sql behavior we had 
> original introduced
> ---
>
> Key: HIVE-4964
> URL: https://issues.apache.org/jira/browse/HIVE-4964
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Butani
>Priority: Minor
> Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch
>
>
> There are still pieces of code that deal with:
> - supporting select expressions with Windowing
> - supporting a filter with windowing
> Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Discuss] project chop up

2013-08-07 Thread Edward Capriolo
"Some of the hard part was that some of the test classes are in the wrong
module that references classes in a later module."

I think the modules will have to be able to reference each other in many
cases. Serde and QL are tightly coupled. QL is really too large and we
should find a way to cut that up.

Part of this problem is the q.tests

I think one way to handle this is to only allow unit tests inside the
module. I imagine running all the q tests would be done in a final module
hive-qtest. Or possibly two final modules
hive-qtest
hive-qtest-extra (tangential things like UDFS and input formats not core to
hive)


On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley  wrote:

> On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
> > > I'd like to propose we move towards Maven.
> >
> > Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
> etc.)
> > are maven based.
> >
>
> A big +1 from me too. I actually took a pass at it a couple of months ago.
> Some of the hard part was that some of the test classes are in the wrong
> module that references classes in a later module. Obviously that prevents
> any kind of modular build.
>
> As an additional plus to Maven is that Maven includes tools to correct the
> project and module dependencies.
>
> -- Owen
>


[jira] [Commented] (HIVE-5020) HCat reading null-key map entries causes NPE

2013-08-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732719#comment-13732719
 ] 

Edward Capriolo commented on HIVE-5020:
---

If I had to hazard a guess I would say that the original implementation was 
about supporting thrift structures. Possibly if thrift does not support this 
case that design was not carried over.

Personally I think we SHOULD support NULL key and NULL value in maps. The map 
need not be sorted.

> HCat reading null-key map entries causes NPE
> 
>
> Key: HIVE-5020
> URL: https://issues.apache.org/jira/browse/HIVE-5020
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>
> Currently, if someone has a null key in a map, HCatInputFormat will terminate 
> with an NPE while trying to read it.
> {noformat}
> java.lang.NullPointerException
> at java.lang.String.compareTo(String.java:1167)
> at java.lang.String.compareTo(String.java:92)
> at java.util.TreeMap.put(TreeMap.java:545)
> at 
> org.apache.hcatalog.data.HCatRecordSerDe.serializeMap(HCatRecordSerDe.java:222)
> at 
> org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:198)
> at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> at 
> org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
> {noformat}
> This is because we use a TreeMap to preserve order of elements in the map 
> when reading from the underlying storage/serde.
> This problem is easily fixed in a number of ways:
> a) Switch to HashMap, which allows null keys. That does not preserve order of 
> keys, which should not be important for map fields, but if we desire that, we 
> have a solution for that too - LinkedHashMap, which would both retain order 
> and allow us to insert null keys into the map.
> b) Ignore null keyed entries - check if the field we read is null, and if it 
> is, then ignore that item in the record altogether. This way, HCat is robust 
> in what it does - it does not terminate with an NPE, and it does not allow 
> null keys in maps that might be problematic to layers above us that are not 
> used to seeing nulls as keys in maps.
> Why do I bring up the second fix? I bring it up because of the way we 
> discovered this bug. When reading from an RCFile, we do not notice this bug. 
> If the same query that produced the RCFile instead produces an Orcfile, and 
> we try reading from it, we see this problem.
> RCFile seems to be quietly stripping any null key entries, whereas Orc 
> retains them. This is why we didn't notice this problem for a long while, and 
> suddenly, now, we are. Now, if we fix our code to allow nulls in map keys 
> through to layers above, we expose layers above to this change, which may 
> then cause them to break. (Technically, this is stretching the case because 
> we already break now if they care) More importantly, though, we have a case 
> now, where the same data will be exposed differently if it were stored as orc 
> or if it were stored as rcfile. And as a layer that is supposed to make 
> storage invisible to the end user, HCat should attempt to provide some 
> consistency in how data behaves to the end user.
> That said...
> There is another important concern at hand here: nulls in map keys might be 
> due to bad data(corruption or loading error), and by stripping them, we might 
> be silently hiding that from the user. This is an important point that does 
> steer me towards the former approach, of passing it on to layers above, and 
> standardize on an understanding that null keys in maps are acceptable data 
> that layers above us have to handle. After that, it could be taken on as a 
> further consistency fix, to fix RCFile so that it allows nulls in map keys.
> Having gone through this discussion of standardization, another important 
> question is whether or not there is actually a use-case for null keys in maps 
> in data. If there isn't, maybe we shouldn't allow writing that in the first 
> place, and both orc and rcfile must simply error out to the end user if they 
> try to write a  null map key? Well, it is true that it is possible that data 
> errors lead to null keys, but it's also possible that the user wants to store 
> a mapping for value transformations, and they might have a transformation for 
> null as well. In the case I encountered it, they were writing out an 
> intermediate table after having r

Re: [Discuss] project chop up

2013-08-07 Thread Edward Capriolo
I think that is a good idea. I have been thinking about it a lot. I
especially hate how the offline build is now broken.

However I think it is going to take some time. There are some tricks like
how we build hive-exec jar that are not very clean to do in maven. I am
very interested

The last initiative we spoke about on list was moving from forest, I would
like to finish/start that before we get onto the project chop up.


On Wed, Aug 7, 2013 at 3:06 PM, Brock Noland  wrote:

> Thus far there hasn't been any dissent to managing our modules with maven.
>  In addition there have been several comments positive on a move towards
> maven. I'd like to add Ivy seems to have issues managing multiple versions
> of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
> testing patches that installed the new version of DataNucleus  I have had
> the same issue on HIVE-4388. Requiring the deletion of the ivy cache
> is extremely painful for developers that don't have access to high
> bandwidth connections or live in areas far from California where most of
> these jars are hosted.
>
> I'd like to propose we move towards Maven.
>
>
> On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam 
> wrote:
>
> >
> >
> > Yes hive build and test cases got convoluted as the project scope
> > gradually increased. This is the time to take action!
> >
> > Based on my other Apache experiences, I prefer the option #3 "Breakup the
> > projects within our own source tree". Make multiple modules or
> > sub-projects. By default, only key modules will be built.
> >
> > Maven could be a possible candidate.
> >
> > Regards,
> > Mohammad
> >
> >
> >
> > 
> >  From: Edward Capriolo 
> > To: "dev@hive.apache.org" 
> > Sent: Saturday, July 27, 2013 7:03 AM
> > Subject: Re: [Discuss] project chop up
> >
> >
> > Or feel free to suggest different approach. I am used to managing
> software
> > as multi-module maven projects.
> > From a development standpoint if I was working on beeline, it would be
> nice
> > to only require some of the sub-projects to be open in my IDE to do that.
> > Also managing everything globally is not ideal.
> >
> > Hive's project layout, build, and test infrastructure is just funky. It
> has
> > to do a few interesting things (shims, testing), but I do not think what
> we
> > are doing justifies the massive ant build system we have. Ant is so ten
> > years ago.
> >
> >
> >
> > On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates 
> > wrote:
> >
> > > But I assume they'd still be a part of targets like package, tar, and
> > > binary?  Making them compile and test separately and explicitly load
> the
> > > core Hive jars from maven/ivy seems reasonable.
> > >
> > > Alan.
> > >
> > > On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
> > >
> > > > Hi,
> > > >
> > > > I think thats part of it but I'd like to decouple the downstream
> > projects
> > > > even further so that the only connection is the dependency on the
> hive
> > > jars.
> > > >
> > > > Brock
> > > > On Jul 26, 2013 10:10 PM, "Alan Gates" 
> wrote:
> > > >
> > > >> I'm not sure how this is different from what hcat does today.  It
> > needs
> > > >> Hive's jars to compile, so it's one of the last things in the
> compile
> > > step.
> > > >> Would moving the other modules you note to be in the same category
> be
> > > >> enough?  Did you want to also make it so that the default ant target
> > > >> doesn't compile those?
> > > >>
> > > >> Alan.
> > > >>
> > > >> On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
> > > >>
> > > >>> My mistake on saying hcat was a fork metastore. I had a brain fart
> > for
> > > a
> > > >>> moment.
> > > >>>
> > > >>> One way we could do this is create a folder called downstream. In
> our
> > > >>> release step we can execute the downstream builds and then copy the
> > > files
> > > >>> we need back. So nothing downstream will be on the classpath of the
> > > main
> > > >>> project.
> > > >>>
> > > >>> This could help us breakup ql as well. Things like exotic file
> > forma

[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731155#comment-13731155
 ] 

Edward Capriolo commented on HIVE-5009:
---

Your not going to be able to fix any of the thrift / protobuf generated files. 
You will just have to ignore them.

> Fix minor optimization issues
> -
>
> Key: HIVE-5009
> URL: https://issues.apache.org/jira/browse/HIVE-5009
> Project: Hive
>  Issue Type: Improvement
>Reporter: Benjamin Jakobus
>Assignee: Benjamin Jakobus
>Priority: Minor
> Fix For: 0.12.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have found some minor optimization issues in the codebase, which I would 
> like to rectify and contribute. Specifically, these are:
> The optimizations that could be applied to Hive's code base are as follows:
> 1. Use StringBuffer when appending strings - In 184 instances, the 
> concatination operator (+=) was used when appending strings. This is 
> inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
> should be used. 12 instances of this optimization can be applied to the 
> GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
> uses the + operator inside a loop, so does the column projection utilities 
> class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
> Tests showed that using the StringBuilder when appending strings is 57\% 
> faster than using the + operator (using the StringBuffer took 122 
> milliseconds whilst the + operator took 284 milliseconds). The reason as to 
> why using the StringBuffer class is preferred over using the + operator, is 
> because
> String third = first + second;
> gets compiled to:
> StringBuilder builder = new StringBuilder( first );
> builder.append( second );
> third = builder.toString();
> Therefore, when building complex strings, that, for example involve loops, 
> require many instantiations (and as discussed below, creating new objects 
> inside loops is inefficient).
> 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
> is a more efficient at creating  creating lists from arrays than using loops 
> to manually iterate over the elements (using asList is computationally very 
> cheap, O(1), as it merely creates a wrapper object around the array; looping 
> through the list however has a complexity of O(n) since a new list is created 
> and every element in the array is added to this new list). As confirmed by 
> the experiment detailed in Appendix D, the Java compiler does not 
> automatically optimize and replace tight-loop copying with asList: the 
> loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
> instant. 
> Four instances of this optimization can be applied to Hive's codebase (two of 
> these should be applied to the Map-Join container - MapJoinRowContainer) - 
> lines 92 to 98:
>  for (obj = other.first(); obj != null; obj = other.next()) {
>   ArrayList ele = new ArrayList(obj.length);
>   for (int i = 0; i < obj.length; i++) {
> ele.add(obj[i]);
>   }
>   list.add((Row) ele);
> }
> 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
> could be avoided by simply using the provided static conversion methods. As 
> noted in the PMD documentation, "using these avoids the cost of creating 
> objects that also need to be garbage-collected later."
> For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
> more efficient parseDouble method call:
> // Inefficient:
> Double percent = Double.valueOf(value).doubleValue();
> // To be replaced by:
> Double percent = Double.parseDouble(value);
> Our test case in Appendix D confirms this: converting 10,000 strings into 
> integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
> unnecessary wrapper object) took 119 on average; using parseInt() took only 
> 38. Therefore creating even just one unnecessary wrapper object can make your 
> code up to 68% slower.
> 4. Converting literals to strings using + "" - Converting literals to strings 
> using + "" is quite inefficient (see Appendix D) and should be done by 
> calling the toString() method instead: converting 1,000,000 integers to 
> strings using + "" took, on average, 1340 milliseconds whilst using the 
> toString() method only required 1183 milliseconds (hence adding empty strings 
> takes nearly 12% more time). 
> 89 instances of this 

Re: Newbie - proposed changes

2013-08-06 Thread Edward Capriolo
Please read:
https://cwiki.apache.org/confluence/display/Hive/HowToContribute


On Tue, Aug 6, 2013 at 11:45 AM, Benjamin Jakobus <
benjamin.jako...@gmail.com> wrote:

> Hi Edward,
>
> I created an issue with a more detailed description of what I propose to
> modify. Please see https://issues.apache.org/jira/browse/HIVE-5009
>
> Regards,
> Ben
>
>
> On Tue, Aug 6, 2013 at 3:57 PM, Edward Capriolo  >wrote:
>
> > You should go to http://issues.apache.org/jira and create a hive ticket.
> > try to keep your tickets small in scope. IE better to clean up a few
> > classes that are coupled in several waves, in most cases.
> >
> >
> > On Tue, Aug 6, 2013 at 9:06 AM, Benjamin Jakobus <
> > benjamin.jako...@gmail.com
> > > wrote:
> >
> > > Hi all,
> > >
> > > I have begun analyzing the Hive codebase as part of my MSc in Computer
> > > Science and have found some minor optimization issues in the codebase,
> > > which I would like to rectify and contribute. The issues / clean-up
> > > suggestions can be found here (sorry, the file is larger than the
> allowed
> > > max for this mailinglist):
> > > http://www.doc.ic.ac.uk/~bj112/hive-optimizations.txt
> > >
> > > Kind Regards,
> > > Benjamin
> > >
> >
>


Re: Newbie - proposed changes

2013-08-06 Thread Edward Capriolo
You should go to http://issues.apache.org/jira and create a hive ticket.
try to keep your tickets small in scope. IE better to clean up a few
classes that are coupled in several waves, in most cases.


On Tue, Aug 6, 2013 at 9:06 AM, Benjamin Jakobus  wrote:

> Hi all,
>
> I have begun analyzing the Hive codebase as part of my MSc in Computer
> Science and have found some minor optimization issues in the codebase,
> which I would like to rectify and contribute. The issues / clean-up
> suggestions can be found here (sorry, the file is larger than the allowed
> max for this mailinglist):
> http://www.doc.ic.ac.uk/~bj112/hive-optimizations.txt
>
> Kind Regards,
> Benjamin
>


<    1   2   3   4   5   6   7   8   9   10   >