[jira] [Commented] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.

2013-08-25 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749867#comment-13749867
 ] 

Jitendra Nath Pandey commented on HIVE-4959:


This jira requires all vector expressions to be serializable (HIVE-5126). 
This jira also requires HIVE-5146 for some of the tests to work.

> Vectorized plan generation should be added as an optimization transform.
> 
>
> Key: HIVE-4959
> URL: https://issues.apache.org/jira/browse/HIVE-4959
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-4959.1.patch
>
>
> Currently the query plan is vectorized at the query run time in the map task. 
> It will be much cleaner to add vectorization as an optimization step.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.

2013-08-25 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4959:
---

Status: Patch Available  (was: Open)

> Vectorized plan generation should be added as an optimization transform.
> 
>
> Key: HIVE-4959
> URL: https://issues.apache.org/jira/browse/HIVE-4959
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-4959.1.patch
>
>
> Currently the query plan is vectorized at the query run time in the map task. 
> It will be much cleaner to add vectorization as an optimization step.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.

2013-08-25 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4959:
---

Attachment: HIVE-4959.1.patch

Patch uploaded.

> Vectorized plan generation should be added as an optimization transform.
> 
>
> Key: HIVE-4959
> URL: https://issues.apache.org/jira/browse/HIVE-4959
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-4959.1.patch
>
>
> Currently the query plan is vectorized at the query run time in the map task. 
> It will be much cleaner to add vectorization as an optimization step.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5126) Make vector expressions serializable.

2013-08-25 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5126:
---

Status: Patch Available  (was: Open)

> Make vector expressions serializable.
> -
>
> Key: HIVE-5126
> URL: https://issues.apache.org/jira/browse/HIVE-5126
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5126.1.patch
>
>
> We should make all vectorized expressions serializable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5146) FilterExprOrExpr changes the order of the rows

2013-08-25 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5146:
---

Status: Patch Available  (was: Open)

> FilterExprOrExpr changes the order of the rows
> --
>
> Key: HIVE-5146
> URL: https://issues.apache.org/jira/browse/HIVE-5146
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5146.1.patch, HIVE-5146.2.patch
>
>
> FilterExprOrExpr changes the order of the rows which might break some UDFs 
> that assume an order in data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5126) Make vector expressions serializable.

2013-08-25 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5126:
---

Attachment: HIVE-5126.1.patch

> Make vector expressions serializable.
> -
>
> Key: HIVE-5126
> URL: https://issues.apache.org/jira/browse/HIVE-5126
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5126.1.patch
>
>
> We should make all vectorized expressions serializable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-08-25 Thread Jakob Homan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/#review25537
---


One issue in the testing and a few formatting issues.  Otherwise looks good.


serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java


Weird spacing... 2x below as well.



serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java


These should never be null, not even in testing.  It's better to change the 
tests to correctly populate the data structure.



serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java


And this would indicate a bug.


- Jakob Homan


On Aug. 6, 2013, 7:13 p.m., Mohammad Islam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12480/
> ---
> 
> (Updated Aug. 6, 2013, 7:13 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-4732
> https://issues.apache.org/jira/browse/HIVE-4732
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> From our performance analysis, we found AvroSerde's schema.equals() call 
> consumed a substantial amount ( nearly 40%) of time. This patch intends to 
> minimize the number schema.equals() calls by pushing the check as late/fewer 
> as possible.
> 
> At first, we added a unique id for each record reader which is then included 
> in every AvroGenericRecordWritable. Then, we introduce two new data 
> structures (one hashset and one hashmap) to store intermediate data to avoid 
> duplicates checkings. Hashset contains all the record readers' IDs that don't 
> need any re-encoding. On the other hand, HashMap contains the already used 
> re-encoders. It works as cache and allows re-encoders reuse. With this 
> change, our test shows nearly 40% reduction in Avro record reading time.
>  
>
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
> ed2a9af 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
> e994411 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
>  66f0348 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 
> 3828940 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 
> 9af751b 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb 
> 
> Diff: https://reviews.apache.org/r/12480/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>



Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

2013-08-25 Thread Jakob Homan


> On July 29, 2013, 10:41 a.m., Jakob Homan wrote:
> > There is still no text covering a map-reduce job on an already existing, 
> > non-Avro table into an avro table.  ie, create a text table, populate it, 
> > run a CTAS to manipulate the data into an Avro table.
> 
> Mohammad Islam wrote:
> In general, Hive creates "internal" column names such as col0, col1 etc. 
> Due to this, I didn't able to copy non-avro data to avro data and run select 
> SQL. Only option is to change the current behavior to reuse the provided 
> column names. Separate JIRA regarding this could be a choice.
> 
>

Wouldn't select * or using the new column names (they're named 
deterministically) work?  This is a major test since otherwise we're missing 
the most important code path...
ie
have a text file c1, c2, c3
create table t1
load data into t1 from text file
create table a1 as select c3, c2 where c2 = "foo" order by c3;
select * from a1;
describe extended a1;

And verify in the q file's result that the table is avro and that the correct 
rows and columns got converted. 


- Jakob


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review24149
---


On Aug. 7, 2013, 5:24 p.m., Mohammad Islam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11925/
> ---
> 
> (Updated Aug. 7, 2013, 5:24 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-3159
> https://issues.apache.org/jira/browse/HIVE-3159
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Problem:
> Hive doesn't support to create a Avro-based table using HQL create table 
> command. It currently requires to specify Avro schema literal or schema file 
> name.
> For multiple cases, it is very inconvenient for user.
> Some of the un-supported use cases:
> 1. Create table ...  as SELECT ... from 
> 2. Create table ...  as SELECT from 
> 3. Create  table  without specifying Avro schema.
> 
> 
> Diffs
> -
> 
>   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_create_as_select2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 
> 13848b6 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
> PRE-CREATION 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 
> 010f614 
>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/11925/diff/
> 
> 
> Testing
> ---
> 
> Wrote a new java Test class for a new Java class. Added a new test case into 
> existing java test class. In addition, there are 4 .q file for testing 
> multiple use-cases.
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>



[jira] [Updated] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE

2013-08-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4375:
---

Affects Version/s: 0.11.0
   Status: Open  (was: Patch Available)

Following tests failed with the patch:
* TestHBaseCliDriver_single_sorced_multi_insert.q
* TestCliDriver_union28.q
* TestCliDriver_union30.q

> Single sourced multi insert consists of native and non-native table mixed 
> throws NPE
> 
>
> Key: HIVE-4375
> URL: https://issues.apache.org/jira/browse/HIVE-4375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch
>
>
> CREATE TABLE src_x1(key string, value string);
> CREATE TABLE src_x2(key string, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:string");
> explain
> from src a
> insert overwrite table src_x1
> select key,value where a.key > 0 AND a.key < 50
> insert overwrite table src_x2
> select key,value where a.key > 50 AND a.key < 100;
> throws,
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>   at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55)
>   at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
>   at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
>   at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde

2013-08-25 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749836#comment-13749836
 ] 

Jakob Homan commented on HIVE-4734:
---

Reviewed last patch on RB.  Everything looks good except for a change in the 
handling of [T1,Tn,NULL] types.

> Use custom ObjectInspectors for AvroSerde
> -
>
> Key: HIVE-4734
> URL: https://issues.apache.org/jira/browse/HIVE-4734
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Mark Wagner
>Assignee: Mark Wagner
> Fix For: 0.12.0
>
> Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch
>
>
> Currently, the AvroSerde recursively copies all fields of a record from the 
> GenericRecord to a List row object and provides the standard 
> ObjectInspectors. Performance can be improved by providing ObjectInspectors 
> to the Avro record itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: custom Hive artifacts for Shark project

2013-08-25 Thread Konstantin Boudnik
Hi Edward,

Shark is using two jar files from Hive - hive-common and hive-cli. But Shark
community puts a few patches on top of the stock Hive to fix blocking issues
in the latter. The changes aren't proprietary and are either backports from
the newer releases or fixes that weren't committed yet (HIVE-3772 is good
example of this).

Taking into example Hive 0.9 which Shark 0.7 uses. Shark backports a few
bugfixes that were committed into Hive 0.10 or Hive 0.11, but never made it
into Hive 0.9. I believe this is a side effect of Hive always moving forward
and (almost) never making maintenance releases.

Changes and especially massive rewrites bring instability into the software.
It needs to be gradually ironed out with consequent releases. A good example
of such a project would be HBase, that does quite a number of minor releases
to provide their users with stable and robust server-side software. In the
absence of maintenance releases downstream projects tend to find ways to work
around such an obstacle. Hence my earlier email.

As of 0.11.1: Shark currently doesn't support Hive 0.11 because of significant
changes in the APIs of the latter. The support is coming in the next a couple
of months. So, publishing artifacts improving on top of Hive 0.9 might be more
a pressing issue.

Hope it clarifies the situation,
  Cos

On Sun, Aug 25, 2013 at 11:54PM, Edward Capriolo wrote:
> I think we plan on doing an 11.1 or just a 12.0. How does shark use hive?
> Do you just include hive components from maven or does the project somehow
> encorportate our build infrastructure.
> 
> 
> On Sun, Aug 25, 2013 at 7:42 PM, Konstantin Boudnik  wrote:
> 
> > Guys,
> >
> > considering the absence of the input, I take it that it really doesn't
> > matter
> > which way the custom artifact will be published. Is it a correct
> > impression?
> >
> > My first choice would be
> > org.apache.hive.hive-common;0.9-shark0.7
> > org.apache.hive.hive-cli;0.9-shark0.7
> > artifacts.
> > If this meets the objections from the community here, then I'd like to
> > proceed
> > with
> > org.shark-project.hive-common;0.9.0
> > org.shark-project.hive-cli;0.9.0
> >
> > Any of the artifacts are better be published at Maven central to make it
> > readily available for development community.
> >
> > Thoughts?
> > Regards,
> >   Cos
> >
> > On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote:
> > > Guys,
> > >
> > > I am trying to help Spark/Shark community (spark-project.org and now
> > > http://incubator.apache.org/projects/spark) with a predicament. Shark -
> > that's
> > > also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
> > > query optimizer, serdes, and codecs.
> > >
> > > In order to improve some known issues with performance and/or concurrency
> > > Shark developers need to apply a couple of patches on top of the stock
> > Hive:
> > >https://issues.apache.org/jira/browse/HIVE-2891
> > >https://issues.apache.org/jira/browse/HIVE-3772 (just committed to
> > trunk)
> > > (as per https://github.com/amplab/shark/wiki/Hive-Patches)
> > >
> > > The issue here is that latest Shark is working on top if Hive 0.9 (Hive
> > 0.11
> > > work is underway) and having developers to apply the patches and build
> > > their own version of the Hive is an extra step that can be avoided.
> > >
> > > One way to address it is to publish Shark specific versions of Hive
> > artifacts
> > > that would have all needed patches applied to stock release.  This way
> > > downstream projects can simply reference the version org.apache.hive with
> > > version 0.9.0-shark-0.7 instead of building Hive locally every time.
> > >
> > > Perhaps this approach is a little overkill, so perhaps if Hive community
> > is
> > > willing to consider a maintenance release of Hive 0.9.1 and perhaps
> > 0.11.1
> > > to include fixes needed by Shark project?
> > >
> > > I am willing to step up and produce Hive release bits if any of the
> > committers
> > > here can help with publishing.
> > >
> > > --
> > > Thanks in advance,
> > >   Cos
> > >
> >
> >
> >


signature.asc
Description: Digital signature


Re: custom Hive artifacts for Shark project

2013-08-25 Thread Edward Capriolo
I think we plan on doing an 11.1 or just a 12.0. How does shark use hive?
Do you just include hive components from maven or does the project somehow
encorportate our build infrastructure.


On Sun, Aug 25, 2013 at 7:42 PM, Konstantin Boudnik  wrote:

> Guys,
>
> considering the absence of the input, I take it that it really doesn't
> matter
> which way the custom artifact will be published. Is it a correct
> impression?
>
> My first choice would be
> org.apache.hive.hive-common;0.9-shark0.7
> org.apache.hive.hive-cli;0.9-shark0.7
> artifacts.
> If this meets the objections from the community here, then I'd like to
> proceed
> with
> org.shark-project.hive-common;0.9.0
> org.shark-project.hive-cli;0.9.0
>
> Any of the artifacts are better be published at Maven central to make it
> readily available for development community.
>
> Thoughts?
> Regards,
>   Cos
>
> On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote:
> > Guys,
> >
> > I am trying to help Spark/Shark community (spark-project.org and now
> > http://incubator.apache.org/projects/spark) with a predicament. Shark -
> that's
> > also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
> > query optimizer, serdes, and codecs.
> >
> > In order to improve some known issues with performance and/or concurrency
> > Shark developers need to apply a couple of patches on top of the stock
> Hive:
> >https://issues.apache.org/jira/browse/HIVE-2891
> >https://issues.apache.org/jira/browse/HIVE-3772 (just committed to
> trunk)
> > (as per https://github.com/amplab/shark/wiki/Hive-Patches)
> >
> > The issue here is that latest Shark is working on top if Hive 0.9 (Hive
> 0.11
> > work is underway) and having developers to apply the patches and build
> > their own version of the Hive is an extra step that can be avoided.
> >
> > One way to address it is to publish Shark specific versions of Hive
> artifacts
> > that would have all needed patches applied to stock release.  This way
> > downstream projects can simply reference the version org.apache.hive with
> > version 0.9.0-shark-0.7 instead of building Hive locally every time.
> >
> > Perhaps this approach is a little overkill, so perhaps if Hive community
> is
> > willing to consider a maintenance release of Hive 0.9.1 and perhaps
> 0.11.1
> > to include fixes needed by Shark project?
> >
> > I am willing to step up and produce Hive release bits if any of the
> committers
> > here can help with publishing.
> >
> > --
> > Thanks in advance,
> >   Cos
> >
>
>
>


[jira] [Updated] (HIVE-5146) FilterExprOrExpr changes the order of the rows

2013-08-25 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5146:
---

Attachment: HIVE-5146.2.patch

Updated patch with fixes in the tests. Some tests need to be fixed because of 
change in the order of rows. Also, due to the change in order, double 
computations return slightly different results. 
 With this patch, the expected results match exactly with non-vector mode 
computation.

> FilterExprOrExpr changes the order of the rows
> --
>
> Key: HIVE-5146
> URL: https://issues.apache.org/jira/browse/HIVE-5146
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5146.1.patch, HIVE-5146.2.patch
>
>
> FilterExprOrExpr changes the order of the rows which might break some UDFs 
> that assume an order in data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5087) Rename npath UDF

2013-08-25 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749807#comment-13749807
 ] 

Alan Gates commented on HIVE-5087:
--

>From the last [Hive 
>report|http://www.apache.org/foundation/records/minutes/2013/board_minutes_2013_06_19.txt]
> to the Apache board

* In late May Teradata requested that the project remove a UDF
  ('npath') which was included in the 0.11.0 release. Teradata
  alleges that this UDF violates a US patent they hold as well
  as their common law trademark. The Hive PMC has referred this issue
  to the ASF Legal Board.


> Rename npath UDF
> 
>
> Key: HIVE-5087
> URL: https://issues.apache.org/jira/browse/HIVE-5087
> Project: Hive
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: HIVE-5087.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5087) Rename npath UDF

2013-08-25 Thread Alex Breshears (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749806#comment-13749806
 ] 

Alex Breshears commented on HIVE-5087:
--

I guess I could read the patch to get the new name :) regex_path it is.

> Rename npath UDF
> 
>
> Key: HIVE-5087
> URL: https://issues.apache.org/jira/browse/HIVE-5087
> Project: Hive
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: HIVE-5087.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5087) Rename npath UDF

2013-08-25 Thread Alex Breshears (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749796#comment-13749796
 ] 

Alex Breshears commented on HIVE-5087:
--

Couple quick questions: what's driving the rename, and what will the new 
function be named?

> Rename npath UDF
> 
>
> Key: HIVE-5087
> URL: https://issues.apache.org/jira/browse/HIVE-5087
> Project: Hive
>  Issue Type: Bug
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: HIVE-5087.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns

2013-08-25 Thread Yin Huai (JIRA)
Yin Huai created HIVE-5149:
--

 Summary: ReduceSinkDeDuplication can pick the wrong partitioning 
columns
 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai


https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-25 Thread Yin Huai
Created a jira https://issues.apache.org/jira/browse/HIVE-5149


On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai  wrote:

> Seems ReduceSinkDeDuplication picked the wrong partitioning columns.
>
>
> On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP  wrote:
>
>> I think the problem lies with in the group by operation. For this
>> optimization to work the group bys partitioning should be on the column
>> 1 only.
>>
>> It wont effect the correctness of group by, can make it slow but int this
>> case will fasten the overall query performance.
>>
>>
>> On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia <
>> mchett...@rocketfuelinc.com> wrote:
>>
>>> I have attached the hive 10 and 11 query plans, for the sample query
>>> below, for illustration.
>>>
>>>
>>> On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia <
>>> mchett...@rocketfuelinc.com> wrote:
>>>
 Hi,

 We are using DISTRIBUTE BY with custom reducer scripts in our query
 workload.

 After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY
 and custom reducer scripts produced incorrect results. Particularly, rows
 with same value on DISTRIBUTE BY column ends up in multiple reducers and
 thus produce multiple rows in final result, when we expect only one.

 I investigated a little bit and discovered the following behavior for
 Hive 0.11:

 - Hive 0.11 produces a different plan for these queries with incorrect
 results. The extra stage for the DISTRIBUTE BY + Transform is missing and
 the Transform operator for the custom reducer script is pushed into the
 reduce operator tree containing GROUP BY itself.

 - However, *if the SORT BY in the query has a DESC order in it*, the
 right plan is produced, and the results look correct too.

 Hive 0.10 produces the expected plan with right results in all cases.


 To illustrate, here is a simplified repro setup:

 Table:

 *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3
 STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES
 TERMINATED BY '\n' STORED AS TEXTFILE;*

 Query:

 *ADD FILE reducer.py;*

 *FROM(*
 *  SELECT grp, val2 *
 *  FROM test_cluster *
 *  GROUP BY grp, val2 *
 *  DISTRIBUTE BY grp *
 *  SORT BY grp, val2  -- add DESC here to get correct results*
 *) **a*
 *
 *
 *REDUCE a.**
 *USING 'reducer.py'*
 *AS grp, reducedValue*


 If i understand correctly, this is a bug. Is this a known issue? Any
 other insights? We have reverted to Hive 0.10 to avoid the incorrect
 results while we investigate this.

 I have the repro sample, with test data and scripts, if anybody is
 interested.



 Thanks,
 pala

>>>
>>>
>>
>


Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-25 Thread Yin Huai
Seems ReduceSinkDeDuplication picked the wrong partitioning columns.


On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP  wrote:

> I think the problem lies with in the group by operation. For this
> optimization to work the group bys partitioning should be on the column 1
> only.
>
> It wont effect the correctness of group by, can make it slow but int this
> case will fasten the overall query performance.
>
>
> On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia <
> mchett...@rocketfuelinc.com> wrote:
>
>> I have attached the hive 10 and 11 query plans, for the sample query
>> below, for illustration.
>>
>>
>> On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia <
>> mchett...@rocketfuelinc.com> wrote:
>>
>>> Hi,
>>>
>>> We are using DISTRIBUTE BY with custom reducer scripts in our query
>>> workload.
>>>
>>> After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY
>>> and custom reducer scripts produced incorrect results. Particularly, rows
>>> with same value on DISTRIBUTE BY column ends up in multiple reducers and
>>> thus produce multiple rows in final result, when we expect only one.
>>>
>>> I investigated a little bit and discovered the following behavior for
>>> Hive 0.11:
>>>
>>> - Hive 0.11 produces a different plan for these queries with incorrect
>>> results. The extra stage for the DISTRIBUTE BY + Transform is missing and
>>> the Transform operator for the custom reducer script is pushed into the
>>> reduce operator tree containing GROUP BY itself.
>>>
>>> - However, *if the SORT BY in the query has a DESC order in it*, the
>>> right plan is produced, and the results look correct too.
>>>
>>> Hive 0.10 produces the expected plan with right results in all cases.
>>>
>>>
>>> To illustrate, here is a simplified repro setup:
>>>
>>> Table:
>>>
>>> *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3
>>> STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES
>>> TERMINATED BY '\n' STORED AS TEXTFILE;*
>>>
>>> Query:
>>>
>>> *ADD FILE reducer.py;*
>>>
>>> *FROM(*
>>> *  SELECT grp, val2 *
>>> *  FROM test_cluster *
>>> *  GROUP BY grp, val2 *
>>> *  DISTRIBUTE BY grp *
>>> *  SORT BY grp, val2  -- add DESC here to get correct results*
>>> *) **a*
>>> *
>>> *
>>> *REDUCE a.**
>>> *USING 'reducer.py'*
>>> *AS grp, reducedValue*
>>>
>>>
>>> If i understand correctly, this is a bug. Is this a known issue? Any
>>> other insights? We have reverted to Hive 0.10 to avoid the incorrect
>>> results while we investigate this.
>>>
>>> I have the repro sample, with test data and scripts, if anybody is
>>> interested.
>>>
>>>
>>>
>>> Thanks,
>>> pala
>>>
>>
>>
>


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-25 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749766#comment-13749766
 ] 

Yin Huai commented on HIVE-4002:


[~appodictic] Sorry for jumping in late. Seems changes in DemuxOperator and 
MuxOperator will break plans optimized by Correlation Optimizer. Let me take a 
look and leave my comments on phabricator.

> Fetch task aggregation for simple group by query
> 
>
> Key: HIVE-4002
> URL: https://issues.apache.org/jira/browse/HIVE-4002
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
> HIVE-4002.D8739.3.patch
>
>
> Aggregation queries with no group-by clause (for example, select count(*) 
> from src) executes final aggregation in single reduce task. But it's too 
> small even for single reducer because the most of UDAF generates just single 
> row for map aggregation. If final fetch task can aggregate outputs from map 
> tasks, shuffling time can be removed.
> This optimization transforms operator tree something like,
> TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
> into 
> TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
> With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
> min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: custom Hive artifacts for Shark project

2013-08-25 Thread Konstantin Boudnik
Guys,

considering the absence of the input, I take it that it really doesn't matter
which way the custom artifact will be published. Is it a correct impression?

My first choice would be
org.apache.hive.hive-common;0.9-shark0.7
org.apache.hive.hive-cli;0.9-shark0.7
artifacts.
If this meets the objections from the community here, then I'd like to proceed
with 
org.shark-project.hive-common;0.9.0
org.shark-project.hive-cli;0.9.0

Any of the artifacts are better be published at Maven central to make it
readily available for development community.

Thoughts?
Regards,
  Cos

On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote:
> Guys,
> 
> I am trying to help Spark/Shark community (spark-project.org and now
> http://incubator.apache.org/projects/spark) with a predicament. Shark - that's
> also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
> query optimizer, serdes, and codecs. 
> 
> In order to improve some known issues with performance and/or concurrency
> Shark developers need to apply a couple of patches on top of the stock Hive:
>https://issues.apache.org/jira/browse/HIVE-2891
>https://issues.apache.org/jira/browse/HIVE-3772 (just committed to trunk)
> (as per https://github.com/amplab/shark/wiki/Hive-Patches)
> 
> The issue here is that latest Shark is working on top if Hive 0.9 (Hive 0.11
> work is underway) and having developers to apply the patches and build
> their own version of the Hive is an extra step that can be avoided. 
> 
> One way to address it is to publish Shark specific versions of Hive artifacts
> that would have all needed patches applied to stock release.  This way
> downstream projects can simply reference the version org.apache.hive with
> version 0.9.0-shark-0.7 instead of building Hive locally every time.
> 
> Perhaps this approach is a little overkill, so perhaps if Hive community is
> willing to consider a maintenance release of Hive 0.9.1 and perhaps 0.11.1
> to include fixes needed by Shark project?
> 
> I am willing to step up and produce Hive release bits if any of the committers
> here can help with publishing.
> 
> -- 
> Thanks in advance,
>   Cos
> 




signature.asc
Description: Digital signature


[jira] [Commented] (HIVE-4963) Support in memory PTF partitions

2013-08-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749764#comment-13749764
 ] 

Hudson commented on HIVE-4963:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #380 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/380/])
HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1517236)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java
* /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q
* /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out


> Support in memory PTF partitions
> 
>
> Key: HIVE-4963
> URL: https://issues.apache.org/jira/browse/HIVE-4963
> Project: Hive
>  Issue Type: New Feature
>  Components: PTF-Windowing
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.12.0
>
> Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
> HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch
>
>
> PTF partitions apply the defensive mode of assuming that partitions will not 
> fit in memory. Because of this there is a significant deserialization 
> overhead when accessing elements. 
> Allow the user to specify that there is enough memory to hold partitions 
> through a 'hive.ptf.partition.fits.in.mem' option.  
> Savings depends on partition size and in case of windowing the number of 
> UDAFs and the window ranges. For eg for the following (admittedly extreme) 
> case the PTFOperator exec times went from 39 secs to 8 secs.
>  
> {noformat}
> select t, s, i, b, f, d,
> min(t) over(partition by 1 rows between unbounded preceding and current row), 
> min(s) over(partition by 1 rows between unbounded preceding and current row), 
> min(i) over(partition by 1 rows between unbounded preceding and current row), 
> min(b) over(partition by 1 rows between unbounded preceding and current row) 
> from over10k
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5148) Jam sessions w/ Tez

2013-08-25 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5148:
-

Attachment: HIVE-5148.1.patch

> Jam sessions w/ Tez
> ---
>
> Key: HIVE-5148
> URL: https://issues.apache.org/jira/browse/HIVE-5148
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: tez-branch
>
> Attachments: HIVE-5148.1.patch
>
>
> Tez introduced a session api that let's you reuse certain resources during a 
> session (AM, localized files, etc).
> Hive needs to tie these into hive sessions (for both CLI and HS2)
> NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5148) Jam sessions w/ Tez

2013-08-25 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5148:
-

Status: Patch Available  (was: Open)

> Jam sessions w/ Tez
> ---
>
> Key: HIVE-5148
> URL: https://issues.apache.org/jira/browse/HIVE-5148
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: tez-branch
>
> Attachments: HIVE-5148.1.patch
>
>
> Tez introduced a session api that let's you reuse certain resources during a 
> session (AM, localized files, etc).
> Hive needs to tie these into hive sessions (for both CLI and HS2)
> NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5148) Jam sessions w/ Tez

2013-08-25 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-5148:


 Summary: Jam sessions w/ Tez
 Key: HIVE-5148
 URL: https://issues.apache.org/jira/browse/HIVE-5148
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch


Tez introduced a session api that let's you reuse certain resources during a 
session (AM, localized files, etc).

Hive needs to tie these into hive sessions (for both CLI and HS2)

NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Trying to drive through https://issues.apache.org/jira/browse/HIVE-4002

2013-08-25 Thread Edward Capriolo
Hey all,
Hive-4002 is something I would really like to get into trunk. This group by
optimization can help very many use cases.

This has been a couple times now that every time I go to review and commit
it something else ends up touching the same things it will touch. This has
been patch available since Feb, if possible could you sideline any commits
that you suspect may effect this until I can run the tests and get it
committed.

TX


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-25 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749714#comment-13749714
 ] 

Edward Capriolo commented on HIVE-4002:
---

{quote}
[edward@jackintosh hive-trunk]$ patch -p0 < D8739\?download\=true 
patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
Hunk #3 succeeded at 119 (offset 9 lines).
Hunk #4 succeeded at 679 (offset 26 lines).
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Hunk #1 succeeded at 3503 (offset -19 lines).
Hunk #2 succeeded at 3609 (offset -19 lines).
Hunk #3 succeeded at 3622 (offset -19 lines).
Hunk #4 succeeded at 3634 (offset -19 lines).
Hunk #5 succeeded at 3684 (offset -19 lines).
Hunk #6 succeeded at 3713 (offset -19 lines).
Hunk #7 succeeded at 3820 (offset -19 lines).
Hunk #8 succeeded at 6964 (offset -18 lines).
Hunk #9 succeeded at 6990 (offset -18 lines).
patching file ql/src/test/queries/clientpositive/fetch_aggregation.q
patching file ql/src/test/results/clientpositive/fetch_aggregation.q.out
patching file ql/src/test/results/compiler/plan/groupby1.q.xml
Hunk #5 succeeded at 1312 (offset -10 lines).
Hunk #6 succeeded at 1326 (offset -10 lines).
Hunk #7 succeeded at 1345 (offset -10 lines).
Hunk #8 succeeded at 1426 (offset -10 lines).
Hunk #9 succeeded at 1478 (offset -10 lines).
patching file ql/src/test/results/compiler/plan/groupby2.q.xml
Hunk #10 succeeded at 1087 (offset -10 lines).
Hunk #11 succeeded at 1428 (offset -10 lines).
Hunk #12 succeeded at 1482 (offset -10 lines).
Hunk #13 succeeded at 1508 (offset -10 lines).
Hunk #14 succeeded at 1541 (offset -10 lines).
Hunk #15 succeeded at 1618 (offset -10 lines).
Hunk #16 succeeded at 1647 (offset -10 lines).
Hunk #17 succeeded at 1715 (offset -10 lines).
Hunk #18 succeeded at 1734 (offset -10 lines).
Hunk #19 succeeded at 1819 (offset -10 lines).
Hunk #20 succeeded at 1832 (offset -10 lines).
patching file ql/src/test/results/compiler/plan/groupby3.q.xml
Hunk #8 succeeded at 1299 (offset -7 lines).
Hunk #9 succeeded at 1627 (offset -7 lines).
Hunk #10 succeeded at 1640 (offset -7 lines).
Hunk #11 succeeded at 1653 (offset -7 lines).
Hunk #12 succeeded at 1695 (offset -7 lines).
Hunk #13 succeeded at 1709 (offset -7 lines).
Hunk #14 succeeded at 1723 (offset -7 lines).
Hunk #15 succeeded at 1770 (offset -7 lines).
Hunk #16 succeeded at 1846 (offset -7 lines).
Hunk #17 succeeded at 1859 (offset -7 lines).
Hunk #18 succeeded at 1872 (offset -7 lines).
Hunk #19 succeeded at 1938 (offset -7 lines).
Hunk #20 succeeded at 2144 (offset -7 lines).
Hunk #21 succeeded at 2157 (offset -7 lines).
Hunk #22 succeeded at 2170 (offset -7 lines).
patching file ql/src/test/results/compiler/plan/groupby5.q.xml
Hunk #5 succeeded at 1175 (offset -10 lines).
Hunk #6 succeeded at 1189 (offset -10 lines).
Hunk #7 succeeded at 1208 (offset -10 lines).
Hunk #8 succeeded at 1295 (offset -10 lines).
Hunk #9 succeeded at 1347 (offset -10 lines).
patching file serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java

{quote}

THis did not patch perfectly clean. Running test now manually.

> Fetch task aggregation for simple group by query
> 
>
> Key: HIVE-4002
> URL: https://issues.apache.org/jira/browse/HIVE-4002
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
> HIVE-4002.D8739.3.patch
>
>
> Aggregation queries with no group-by clause (for example, select count(*) 
> from src) executes final aggregation in single reduce task. But it's too 
> small even for single reducer because t

[jira] [Commented] (HIVE-3969) Session state for hive server should be cleanup

2013-08-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749712#comment-13749712
 ] 

Ashutosh Chauhan commented on HIVE-3969:


Now that HS2 is committed which I believe does clean up its state between 
different sessions, this should no longer be a problem. Or do you still see 
this leak even with HS2?

> Session state for hive server should be cleanup
> ---
>
> Key: HIVE-3969
> URL: https://issues.apache.org/jira/browse/HIVE-3969
> Project: Hive
>  Issue Type: Bug
>  Components: Server Infrastructure
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-3969.D8325.1.patch
>
>
> Currently "add jar" command by clients are adding child ClassLoader to worker 
> thread cumulatively, causing various problems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE

2013-08-25 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749706#comment-13749706
 ] 

Phabricator commented on HIVE-4375:
---

ashutoshc has accepted the revision "HIVE-4375 [jira] Single sourced multi 
insert consists of native and non-native table mixed throws NPE".

  +1

REVISION DETAIL
  https://reviews.facebook.net/D10329

BRANCH
  HIVE-4375

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, navis
Cc: njain


> Single sourced multi insert consists of native and non-native table mixed 
> throws NPE
> 
>
> Key: HIVE-4375
> URL: https://issues.apache.org/jira/browse/HIVE-4375
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch
>
>
> CREATE TABLE src_x1(key string, value string);
> CREATE TABLE src_x2(key string, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:string");
> explain
> from src a
> insert overwrite table src_x1
> select key,value where a.key > 0 AND a.key < 50
> insert overwrite table src_x2
> select key,value where a.key > 50 AND a.key < 100;
> throws,
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
>   at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55)
>   at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
>   at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
>   at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4963) Support in memory PTF partitions

2013-08-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749703#comment-13749703
 ] 

Hudson commented on HIVE-4963:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2288 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2288/])
HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1517236)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java
* /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q
* /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out


> Support in memory PTF partitions
> 
>
> Key: HIVE-4963
> URL: https://issues.apache.org/jira/browse/HIVE-4963
> Project: Hive
>  Issue Type: New Feature
>  Components: PTF-Windowing
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.12.0
>
> Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
> HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch
>
>
> PTF partitions apply the defensive mode of assuming that partitions will not 
> fit in memory. Because of this there is a significant deserialization 
> overhead when accessing elements. 
> Allow the user to specify that there is enough memory to hold partitions 
> through a 'hive.ptf.partition.fits.in.mem' option.  
> Savings depends on partition size and in case of windowing the number of 
> UDAFs and the window ranges. For eg for the following (admittedly extreme) 
> case the PTFOperator exec times went from 39 secs to 8 secs.
>  
> {noformat}
> select t, s, i, b, f, d,
> min(t) over(partition by 1 rows between unbounded preceding and current row), 
> min(s) over(partition by 1 rows between unbounded preceding and current row), 
> min(i) over(partition by 1 rows between unbounded preceding and current row), 
> min(b) over(partition by 1 rows between unbounded preceding and current row) 
> from over10k
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-25 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749696#comment-13749696
 ] 

Edward Capriolo commented on HIVE-4964:
---

Also when possible avoid stack

{quote}
 Stack fnDefs = new 
Stack();
{quote}
instead use
{quote}
Deque d = new ArrayDeque()
{quote}

Stack is synchronized and has overhead. (i know some things in hive use stack 
already so this is sometimes unavoidable.

> Cleanup PTF code: remove code dealing with non standard sql behavior we had 
> original introduced
> ---
>
> Key: HIVE-4964
> URL: https://issues.apache.org/jira/browse/HIVE-4964
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Butani
>Priority: Minor
> Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch
>
>
> There are still pieces of code that deal with:
> - supporting select expressions with Windowing
> - supporting a filter with windowing
> Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-25 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749694#comment-13749694
 ] 

Edward Capriolo edited comment on HIVE-4964 at 8/25/13 5:10 PM:


One more cleanup: Please remove 'while (true)' + 'break' constructs unless they 
are needed. The do not read well and introducing break logic is generally not 
suggested.


{quote}
while (true) {
 if (iDef instanceof PartitionedTableFunctionDef) {
{quote}



Instead try:
{quote}
Item found == null;
while (found==null){
}
{quote}
or even better
{quote}
 for (item: list){
   if (matchesCriteria(item) ){
 return item;
   }
 }
{quote}

  was (Author: appodictic):
One more cleanup: Please remove 'while (true)' + 'break' constructs unless 
they are needed. The do not read well and introducing break logic is generally 
not suggested.


{quote}
while (true) {
 if (iDef instanceof PartitionedTableFunctionDef) {
{quote}



Instead try:
{quote}
Item found == null;
while (found!=null){
}
{quote}
or even better
{quote}
 for (item: list){
   if (matchesCriteria(item) ){
 return item;
   }
 }
{quote}
  
> Cleanup PTF code: remove code dealing with non standard sql behavior we had 
> original introduced
> ---
>
> Key: HIVE-4964
> URL: https://issues.apache.org/jira/browse/HIVE-4964
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Butani
>Priority: Minor
> Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch
>
>
> There are still pieces of code that deal with:
> - supporting select expressions with Windowing
> - supporting a filter with windowing
> Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-25 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749694#comment-13749694
 ] 

Edward Capriolo commented on HIVE-4964:
---

One more cleanup: Please remove 'while (true)' + 'break' constructs unless they 
are needed. The do not read well and introducing break logic is generally not 
suggested.


{quote}
while (true) {
 if (iDef instanceof PartitionedTableFunctionDef) {
{quote}



Instead try:
{quote}
Item found == null;
while (found!=null){
}
{quote}
or even better
{quote}
 for (item: list){
   if (matchesCriteria(item) ){
 return item;
   }
 }
{quote}

> Cleanup PTF code: remove code dealing with non standard sql behavior we had 
> original introduced
> ---
>
> Key: HIVE-4964
> URL: https://issues.apache.org/jira/browse/HIVE-4964
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Butani
>Priority: Minor
> Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch
>
>
> There are still pieces of code that deal with:
> - supporting select expressions with Windowing
> - supporting a filter with windowing
> Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4963) Support in memory PTF partitions

2013-08-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749572#comment-13749572
 ] 

Hudson commented on HIVE-4963:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #137 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/137/])
HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1517236)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java
* /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q
* /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out


> Support in memory PTF partitions
> 
>
> Key: HIVE-4963
> URL: https://issues.apache.org/jira/browse/HIVE-4963
> Project: Hive
>  Issue Type: New Feature
>  Components: PTF-Windowing
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.12.0
>
> Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
> HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch
>
>
> PTF partitions apply the defensive mode of assuming that partitions will not 
> fit in memory. Because of this there is a significant deserialization 
> overhead when accessing elements. 
> Allow the user to specify that there is enough memory to hold partitions 
> through a 'hive.ptf.partition.fits.in.mem' option.  
> Savings depends on partition size and in case of windowing the number of 
> UDAFs and the window ranges. For eg for the following (admittedly extreme) 
> case the PTFOperator exec times went from 39 secs to 8 secs.
>  
> {noformat}
> select t, s, i, b, f, d,
> min(t) over(partition by 1 rows between unbounded preceding and current row), 
> min(s) over(partition by 1 rows between unbounded preceding and current row), 
> min(i) over(partition by 1 rows between unbounded preceding and current row), 
> min(b) over(partition by 1 rows between unbounded preceding and current row) 
> from over10k
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4963) Support in memory PTF partitions

2013-08-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749567#comment-13749567
 ] 

Hudson commented on HIVE-4963:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #69 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/69/])
HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1517236)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java
* /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q
* /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out


> Support in memory PTF partitions
> 
>
> Key: HIVE-4963
> URL: https://issues.apache.org/jira/browse/HIVE-4963
> Project: Hive
>  Issue Type: New Feature
>  Components: PTF-Windowing
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.12.0
>
> Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
> HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch
>
>
> PTF partitions apply the defensive mode of assuming that partitions will not 
> fit in memory. Because of this there is a significant deserialization 
> overhead when accessing elements. 
> Allow the user to specify that there is enough memory to hold partitions 
> through a 'hive.ptf.partition.fits.in.mem' option.  
> Savings depends on partition size and in case of windowing the number of 
> UDAFs and the window ranges. For eg for the following (admittedly extreme) 
> case the PTFOperator exec times went from 39 secs to 8 secs.
>  
> {noformat}
> select t, s, i, b, f, d,
> min(t) over(partition by 1 rows between unbounded preceding and current row), 
> min(s) over(partition by 1 rows between unbounded preceding and current row), 
> min(i) over(partition by 1 rows between unbounded preceding and current row), 
> min(b) over(partition by 1 rows between unbounded preceding and current row) 
> from over10k
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk

2013-08-25 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-5147:
--

 Summary: Newly added test TestSessionHooks is failing on trunk
 Key: HIVE-5147
 URL: https://issues.apache.org/jira/browse/HIVE-5147
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan


This was recently added via HIVE-4588

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749565#comment-13749565
 ] 

Ashutosh Chauhan commented on HIVE-4964:


[~rhbutani] Patch is not applying cleanly. Can you rebase it on the trunk?

> Cleanup PTF code: remove code dealing with non standard sql behavior we had 
> original introduced
> ---
>
> Key: HIVE-4964
> URL: https://issues.apache.org/jira/browse/HIVE-4964
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Butani
>Priority: Minor
> Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch
>
>
> There are still pieces of code that deal with:
> - supporting select expressions with Windowing
> - supporting a filter with windowing
> Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4963) Support in memory PTF partitions

2013-08-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4963.


   Resolution: Fixed
Fix Version/s: 0.12.0

Committed to trunk. Thanks, Harish!

> Support in memory PTF partitions
> 
>
> Key: HIVE-4963
> URL: https://issues.apache.org/jira/browse/HIVE-4963
> Project: Hive
>  Issue Type: New Feature
>  Components: PTF-Windowing
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.12.0
>
> Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
> HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch
>
>
> PTF partitions apply the defensive mode of assuming that partitions will not 
> fit in memory. Because of this there is a significant deserialization 
> overhead when accessing elements. 
> Allow the user to specify that there is enough memory to hold partitions 
> through a 'hive.ptf.partition.fits.in.mem' option.  
> Savings depends on partition size and in case of windowing the number of 
> UDAFs and the window ranges. For eg for the following (admittedly extreme) 
> case the PTFOperator exec times went from 39 secs to 8 secs.
>  
> {noformat}
> select t, s, i, b, f, d,
> min(t) over(partition by 1 rows between unbounded preceding and current row), 
> min(s) over(partition by 1 rows between unbounded preceding and current row), 
> min(i) over(partition by 1 rows between unbounded preceding and current row), 
> min(b) over(partition by 1 rows between unbounded preceding and current row) 
> from over10k
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira