date:20130825


 [ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4963.


   Resolution: Fixed
Fix Version/s: 0.12.0

Committed to trunk. Thanks, Harish!

 Support in memory PTF partitions
 

 Key: HIVE-4963
 URL: https://issues.apache.org/jira/browse/HIVE-4963
 Project: Hive
  Issue Type: New Feature
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.12.0

 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
 HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch


 PTF partitions apply the defensive mode of assuming that partitions will not 
 fit in memory. Because of this there is a significant deserialization 
 overhead when accessing elements. 
 Allow the user to specify that there is enough memory to hold partitions 
 through a 'hive.ptf.partition.fits.in.mem' option.  
 Savings depends on partition size and in case of windowing the number of 
 UDAFs and the window ranges. For eg for the following (admittedly extreme) 
 case the PTFOperator exec times went from 39 secs to 8 secs.
  
 {noformat}
 select t, s, i, b, f, d,
 min(t) over(partition by 1 rows between unbounded preceding and current row), 
 min(s) over(partition by 1 rows between unbounded preceding and current row), 
 min(i) over(partition by 1 rows between unbounded preceding and current row), 
 min(b) over(partition by 1 rows between unbounded preceding and current row) 
 from over10k
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749565#comment-13749565
 ] 

Ashutosh Chauhan commented on HIVE-4964:


[~rhbutani] Patch is not applying cleanly. Can you rebase it on the trunk?

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk

Ashutosh Chauhan created HIVE-5147:
--

 Summary: Newly added test TestSessionHooks is failing on trunk
 Key: HIVE-5147
 URL: https://issues.apache.org/jira/browse/HIVE-5147
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan


This was recently added via HIVE-4588

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4963) Support in memory PTF partitions


[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749567#comment-13749567
 ] 

Hudson commented on HIVE-4963:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #69 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/69/])
HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517236)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java
* /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q
* /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out


 Support in memory PTF partitions
 

 Key: HIVE-4963
 URL: https://issues.apache.org/jira/browse/HIVE-4963
 Project: Hive
  Issue Type: New Feature
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.12.0

 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
 HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch


 PTF partitions apply the defensive mode of assuming that partitions will not 
 fit in memory. Because of this there is a significant deserialization 
 overhead when accessing elements. 
 Allow the user to specify that there is enough memory to hold partitions 
 through a 'hive.ptf.partition.fits.in.mem' option.  
 Savings depends on partition size and in case of windowing the number of 
 UDAFs and the window ranges. For eg for the following (admittedly extreme) 
 case the PTFOperator exec times went from 39 secs to 8 secs.
  
 {noformat}
 select t, s, i, b, f, d,
 min(t) over(partition by 1 rows between unbounded preceding and current row), 
 min(s) over(partition by 1 rows between unbounded preceding and current row), 
 min(i) over(partition by 1 rows between unbounded preceding and current row), 
 min(b) over(partition by 1 rows between unbounded preceding and current row) 
 from over10k
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4963) Support in memory PTF partitions


[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749572#comment-13749572
 ] 

Hudson commented on HIVE-4963:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #137 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/137/])
HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517236)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java
* /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q
* /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out


 Support in memory PTF partitions
 

 Key: HIVE-4963
 URL: https://issues.apache.org/jira/browse/HIVE-4963
 Project: Hive
  Issue Type: New Feature
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.12.0

 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
 HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch


 PTF partitions apply the defensive mode of assuming that partitions will not 
 fit in memory. Because of this there is a significant deserialization 
 overhead when accessing elements. 
 Allow the user to specify that there is enough memory to hold partitions 
 through a 'hive.ptf.partition.fits.in.mem' option.  
 Savings depends on partition size and in case of windowing the number of 
 UDAFs and the window ranges. For eg for the following (admittedly extreme) 
 case the PTFOperator exec times went from 39 secs to 8 secs.
  
 {noformat}
 select t, s, i, b, f, d,
 min(t) over(partition by 1 rows between unbounded preceding and current row), 
 min(s) over(partition by 1 rows between unbounded preceding and current row), 
 min(i) over(partition by 1 rows between unbounded preceding and current row), 
 min(b) over(partition by 1 rows between unbounded preceding and current row) 
 from over10k
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749694#comment-13749694
 ] 

Edward Capriolo edited comment on HIVE-4964 at 8/25/13 5:10 PM:


One more cleanup: Please remove 'while (true)' + 'break' constructs unless they 
are needed. The do not read well and introducing break logic is generally not 
suggested.


{quote}
while (true) {
 if (iDef instanceof PartitionedTableFunctionDef) {
{quote}



Instead try:
{quote}
Item found == null;
while (found==null){
}
{quote}
or even better
{quote}
 for (item: list){
   if (matchesCriteria(item) ){
 return item;
   }
 }
{quote}

  was (Author: appodictic):
One more cleanup: Please remove 'while (true)' + 'break' constructs unless 
they are needed. The do not read well and introducing break logic is generally 
not suggested.


{quote}
while (true) {
 if (iDef instanceof PartitionedTableFunctionDef) {
{quote}



Instead try:
{quote}
Item found == null;
while (found!=null){
}
{quote}
or even better
{quote}
 for (item: list){
   if (matchesCriteria(item) ){
 return item;
   }
 }
{quote}
  
 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749694#comment-13749694
 ] 

Edward Capriolo commented on HIVE-4964:
---

One more cleanup: Please remove 'while (true)' + 'break' constructs unless they 
are needed. The do not read well and introducing break logic is generally not 
suggested.


{quote}
while (true) {
 if (iDef instanceof PartitionedTableFunctionDef) {
{quote}



Instead try:
{quote}
Item found == null;
while (found!=null){
}
{quote}
or even better
{quote}
 for (item: list){
   if (matchesCriteria(item) ){
 return item;
   }
 }
{quote}

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749696#comment-13749696
 ] 

Edward Capriolo commented on HIVE-4964:
---

Also when possible avoid stack

{quote}
 StackPartitionedTableFunctionDef fnDefs = new 
StackPartitionedTableFunctionDef();
{quote}
instead use
{quote}
Deque d = new ArrayDeque()
{quote}

Stack is synchronized and has overhead. (i know some things in hive use stack 
already so this is sometimes unavoidable.

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4963) Support in memory PTF partitions


[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749703#comment-13749703
 ] 

Hudson commented on HIVE-4963:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2288 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2288/])
HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517236)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java
* /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q
* /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out


 Support in memory PTF partitions
 

 Key: HIVE-4963
 URL: https://issues.apache.org/jira/browse/HIVE-4963
 Project: Hive
  Issue Type: New Feature
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.12.0

 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
 HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch


 PTF partitions apply the defensive mode of assuming that partitions will not 
 fit in memory. Because of this there is a significant deserialization 
 overhead when accessing elements. 
 Allow the user to specify that there is enough memory to hold partitions 
 through a 'hive.ptf.partition.fits.in.mem' option.  
 Savings depends on partition size and in case of windowing the number of 
 UDAFs and the window ranges. For eg for the following (admittedly extreme) 
 case the PTFOperator exec times went from 39 secs to 8 secs.
  
 {noformat}
 select t, s, i, b, f, d,
 min(t) over(partition by 1 rows between unbounded preceding and current row), 
 min(s) over(partition by 1 rows between unbounded preceding and current row), 
 min(i) over(partition by 1 rows between unbounded preceding and current row), 
 min(b) over(partition by 1 rows between unbounded preceding and current row) 
 from over10k
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE

2013-08-25 Thread Phabricator (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749706#comment-13749706
 ] 

Phabricator commented on HIVE-4375:
---

ashutoshc has accepted the revision HIVE-4375 [jira] Single sourced multi 
insert consists of native and non-native table mixed throws NPE.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D10329

BRANCH
  HIVE-4375

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, navis
Cc: njain


 Single sourced multi insert consists of native and non-native table mixed 
 throws NPE
 

 Key: HIVE-4375
 URL: https://issues.apache.org/jira/browse/HIVE-4375
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch


 CREATE TABLE src_x1(key string, value string);
 CREATE TABLE src_x2(key string, value string)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string);
 explain
 from src a
 insert overwrite table src_x1
 select key,value where a.key  0 AND a.key  50
 insert overwrite table src_x2
 select key,value where a.key  50 AND a.key  100;
 throws,
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236)
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3969) Session state for hive server should be cleanup


[ 
https://issues.apache.org/jira/browse/HIVE-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749712#comment-13749712
 ] 

Ashutosh Chauhan commented on HIVE-3969:


Now that HS2 is committed which I believe does clean up its state between 
different sessions, this should no longer be a problem. Or do you still see 
this leak even with HS2?

 Session state for hive server should be cleanup
 ---

 Key: HIVE-3969
 URL: https://issues.apache.org/jira/browse/HIVE-3969
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3969.D8325.1.patch


 Currently add jar command by clients are adding child ClassLoader to worker 
 thread cumulatively, causing various problems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query


[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749714#comment-13749714
 ] 

Edward Capriolo commented on HIVE-4002:
---

{quote}
[edward@jackintosh hive-trunk]$ patch -p0  D8739\?download\=true 
patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
Hunk #3 succeeded at 119 (offset 9 lines).
Hunk #4 succeeded at 679 (offset 26 lines).
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Hunk #1 succeeded at 3503 (offset -19 lines).
Hunk #2 succeeded at 3609 (offset -19 lines).
Hunk #3 succeeded at 3622 (offset -19 lines).
Hunk #4 succeeded at 3634 (offset -19 lines).
Hunk #5 succeeded at 3684 (offset -19 lines).
Hunk #6 succeeded at 3713 (offset -19 lines).
Hunk #7 succeeded at 3820 (offset -19 lines).
Hunk #8 succeeded at 6964 (offset -18 lines).
Hunk #9 succeeded at 6990 (offset -18 lines).
patching file ql/src/test/queries/clientpositive/fetch_aggregation.q
patching file ql/src/test/results/clientpositive/fetch_aggregation.q.out
patching file ql/src/test/results/compiler/plan/groupby1.q.xml
Hunk #5 succeeded at 1312 (offset -10 lines).
Hunk #6 succeeded at 1326 (offset -10 lines).
Hunk #7 succeeded at 1345 (offset -10 lines).
Hunk #8 succeeded at 1426 (offset -10 lines).
Hunk #9 succeeded at 1478 (offset -10 lines).
patching file ql/src/test/results/compiler/plan/groupby2.q.xml
Hunk #10 succeeded at 1087 (offset -10 lines).
Hunk #11 succeeded at 1428 (offset -10 lines).
Hunk #12 succeeded at 1482 (offset -10 lines).
Hunk #13 succeeded at 1508 (offset -10 lines).
Hunk #14 succeeded at 1541 (offset -10 lines).
Hunk #15 succeeded at 1618 (offset -10 lines).
Hunk #16 succeeded at 1647 (offset -10 lines).
Hunk #17 succeeded at 1715 (offset -10 lines).
Hunk #18 succeeded at 1734 (offset -10 lines).
Hunk #19 succeeded at 1819 (offset -10 lines).
Hunk #20 succeeded at 1832 (offset -10 lines).
patching file ql/src/test/results/compiler/plan/groupby3.q.xml
Hunk #8 succeeded at 1299 (offset -7 lines).
Hunk #9 succeeded at 1627 (offset -7 lines).
Hunk #10 succeeded at 1640 (offset -7 lines).
Hunk #11 succeeded at 1653 (offset -7 lines).
Hunk #12 succeeded at 1695 (offset -7 lines).
Hunk #13 succeeded at 1709 (offset -7 lines).
Hunk #14 succeeded at 1723 (offset -7 lines).
Hunk #15 succeeded at 1770 (offset -7 lines).
Hunk #16 succeeded at 1846 (offset -7 lines).
Hunk #17 succeeded at 1859 (offset -7 lines).
Hunk #18 succeeded at 1872 (offset -7 lines).
Hunk #19 succeeded at 1938 (offset -7 lines).
Hunk #20 succeeded at 2144 (offset -7 lines).
Hunk #21 succeeded at 2157 (offset -7 lines).
Hunk #22 succeeded at 2170 (offset -7 lines).
patching file ql/src/test/results/compiler/plan/groupby5.q.xml
Hunk #5 succeeded at 1175 (offset -10 lines).
Hunk #6 succeeded at 1189 (offset -10 lines).
Hunk #7 succeeded at 1208 (offset -10 lines).
Hunk #8 succeeded at 1295 (offset -10 lines).
Hunk #9 succeeded at 1347 (offset -10 lines).
patching file serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java

{quote}

THis did not patch perfectly clean. Running test now manually.

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF

Trying to drive through https://issues.apache.org/jira/browse/HIVE-4002

2013-08-25 Thread Edward Capriolo

Hey all,
Hive-4002 is something I would really like to get into trunk. This group by
optimization can help very many use cases.

This has been a couple times now that every time I go to review and commit
it something else ends up touching the same things it will touch. This has
been patch available since Feb, if possible could you sideline any commits
that you suspect may effect this until I can run the tests and get it
committed.

TX

[jira] [Created] (HIVE-5148) Jam sessions w/ Tez

2013-08-25 Thread Gunther Hagleitner (JIRA)

Gunther Hagleitner created HIVE-5148:


 Summary: Jam sessions w/ Tez
 Key: HIVE-5148
 URL: https://issues.apache.org/jira/browse/HIVE-5148
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch


Tez introduced a session api that let's you reuse certain resources during a 
session (AM, localized files, etc).

Hive needs to tie these into hive sessions (for both CLI and HS2)

NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5148) Jam sessions w/ Tez

2013-08-25 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5148:
-

Attachment: HIVE-5148.1.patch

 Jam sessions w/ Tez
 ---

 Key: HIVE-5148
 URL: https://issues.apache.org/jira/browse/HIVE-5148
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-5148.1.patch


 Tez introduced a session api that let's you reuse certain resources during a 
 session (AM, localized files, etc).
 Hive needs to tie these into hive sessions (for both CLI and HS2)
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5148) Jam sessions w/ Tez

2013-08-25 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5148:
-

Status: Patch Available  (was: Open)

 Jam sessions w/ Tez
 ---

 Key: HIVE-5148
 URL: https://issues.apache.org/jira/browse/HIVE-5148
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-5148.1.patch


 Tez introduced a session api that let's you reuse certain resources during a 
 session (AM, localized files, etc).
 Hive needs to tie these into hive sessions (for both CLI and HS2)
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4963) Support in memory PTF partitions


[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749764#comment-13749764
 ] 

Hudson commented on HIVE-4963:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #380 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/380/])
HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517236)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java
* /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q
* 
/hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q
* /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out


 Support in memory PTF partitions
 

 Key: HIVE-4963
 URL: https://issues.apache.org/jira/browse/HIVE-4963
 Project: Hive
  Issue Type: New Feature
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.12.0

 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
 HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch


 PTF partitions apply the defensive mode of assuming that partitions will not 
 fit in memory. Because of this there is a significant deserialization 
 overhead when accessing elements. 
 Allow the user to specify that there is enough memory to hold partitions 
 through a 'hive.ptf.partition.fits.in.mem' option.  
 Savings depends on partition size and in case of windowing the number of 
 UDAFs and the window ranges. For eg for the following (admittedly extreme) 
 case the PTFOperator exec times went from 39 secs to 8 secs.
  
 {noformat}
 select t, s, i, b, f, d,
 min(t) over(partition by 1 rows between unbounded preceding and current row), 
 min(s) over(partition by 1 rows between unbounded preceding and current row), 
 min(i) over(partition by 1 rows between unbounded preceding and current row), 
 min(b) over(partition by 1 rows between unbounded preceding and current row) 
 from over10k
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: custom Hive artifacts for Shark project

2013-08-25 Thread Konstantin Boudnik

Guys,

considering the absence of the input, I take it that it really doesn't matter
which way the custom artifact will be published. Is it a correct impression?

My first choice would be
org.apache.hive.hive-common;0.9-shark0.7
org.apache.hive.hive-cli;0.9-shark0.7
artifacts.
If this meets the objections from the community here, then I'd like to proceed
with 
org.shark-project.hive-common;0.9.0
org.shark-project.hive-cli;0.9.0

Any of the artifacts are better be published at Maven central to make it
readily available for development community.

Thoughts?
Regards,
  Cos

On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote:
 Guys,
 
 I am trying to help Spark/Shark community (spark-project.org and now
 http://incubator.apache.org/projects/spark) with a predicament. Shark - that's
 also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
 query optimizer, serdes, and codecs. 
 
 In order to improve some known issues with performance and/or concurrency
 Shark developers need to apply a couple of patches on top of the stock Hive:
https://issues.apache.org/jira/browse/HIVE-2891
https://issues.apache.org/jira/browse/HIVE-3772 (just committed to trunk)
 (as per https://github.com/amplab/shark/wiki/Hive-Patches)
 
 The issue here is that latest Shark is working on top if Hive 0.9 (Hive 0.11
 work is underway) and having developers to apply the patches and build
 their own version of the Hive is an extra step that can be avoided. 
 
 One way to address it is to publish Shark specific versions of Hive artifacts
 that would have all needed patches applied to stock release.  This way
 downstream projects can simply reference the version org.apache.hive with
 version 0.9.0-shark-0.7 instead of building Hive locally every time.
 
 Perhaps this approach is a little overkill, so perhaps if Hive community is
 willing to consider a maintenance release of Hive 0.9.1 and perhaps 0.11.1
 to include fixes needed by Shark project?
 
 I am willing to step up and produce Hive release bits if any of the committers
 here can help with publishing.
 
 -- 
 Thanks in advance,
   Cos
 




signature.asc
Description: Digital signature

[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-25 Thread Yin Huai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749766#comment-13749766
]

Yin Huai commented on HIVE-4002:

[~appodictic] Sorry for jumping in late. Seems changes in DemuxOperator and
MuxOperator will break plans optimized by Correlation Optimizer. Let me take a
look and leave my comments on phabricator.

Fetch task aggregation for simple group by query

Key: HIVE-4002
URL: https://issues.apache.org/jira/browse/HIVE-4002
Project: Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch,
HIVE-4002.D8739.3.patch

Aggregation queries with no group-by clause (for example, select count(*)
from src) executes final aggregation in single reduce task. But it's too
small even for single reducer because the most of UDAF generates just single
row for map aggregation. If final fetch task can aggregate outputs from map
tasks, shuffling time can be removed.
This optimization transforms operator tree something like,
TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
into
TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
With the patch, time taken for auto_join_filters.q test reduced to 6 min (10
min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-25 Thread Yin Huai

Seems ReduceSinkDeDuplication picked the wrong partitioning columns.


On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP s...@rocketfuel.com wrote:

 I think the problem lies with in the group by operation. For this
 optimization to work the group bys partitioning should be on the column 1
 only.

 It wont effect the correctness of group by, can make it slow but int this
 case will fasten the overall query performance.


 On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia 
 mchett...@rocketfuelinc.com wrote:

 I have attached the hive 10 and 11 query plans, for the sample query
 below, for illustration.


 On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia 
 mchett...@rocketfuelinc.com wrote:

 Hi,

 We are using DISTRIBUTE BY with custom reducer scripts in our query
 workload.

 After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY
 and custom reducer scripts produced incorrect results. Particularly, rows
 with same value on DISTRIBUTE BY column ends up in multiple reducers and
 thus produce multiple rows in final result, when we expect only one.

 I investigated a little bit and discovered the following behavior for
 Hive 0.11:

 - Hive 0.11 produces a different plan for these queries with incorrect
 results. The extra stage for the DISTRIBUTE BY + Transform is missing and
 the Transform operator for the custom reducer script is pushed into the
 reduce operator tree containing GROUP BY itself.

 - However, *if the SORT BY in the query has a DESC order in it*, the
 right plan is produced, and the results look correct too.

 Hive 0.10 produces the expected plan with right results in all cases.


 To illustrate, here is a simplified repro setup:

 Table:

 *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3
 STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES
 TERMINATED BY '\n' STORED AS TEXTFILE;*

 Query:

 *ADD FILE reducer.py;*

 *FROM(*
 *  SELECT grp, val2 *
 *  FROM test_cluster *
 *  GROUP BY grp, val2 *
 *  DISTRIBUTE BY grp *
 *  SORT BY grp, val2  -- add DESC here to get correct results*
 *) **a*
 *
 *
 *REDUCE a.**
 *USING 'reducer.py'*
 *AS grp, reducedValue*


 If i understand correctly, this is a bug. Is this a known issue? Any
 other insights? We have reverted to Hive 0.10 to avoid the incorrect
 results while we investigate this.

 I have the repro sample, with test data and scripts, if anybody is
 interested.



 Thanks,
 pala

Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-25 Thread Yin Huai

Created a jira https://issues.apache.org/jira/browse/HIVE-5149


On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai huaiyin@gmail.com wrote:

 Seems ReduceSinkDeDuplication picked the wrong partitioning columns.


 On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP s...@rocketfuel.com wrote:

 I think the problem lies with in the group by operation. For this
 optimization to work the group bys partitioning should be on the column
 1 only.

 It wont effect the correctness of group by, can make it slow but int this
 case will fasten the overall query performance.


 On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia 
 mchett...@rocketfuelinc.com wrote:

 I have attached the hive 10 and 11 query plans, for the sample query
 below, for illustration.


 On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia 
 mchett...@rocketfuelinc.com wrote:

 Hi,

 We are using DISTRIBUTE BY with custom reducer scripts in our query
 workload.

 After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY
 and custom reducer scripts produced incorrect results. Particularly, rows
 with same value on DISTRIBUTE BY column ends up in multiple reducers and
 thus produce multiple rows in final result, when we expect only one.

 I investigated a little bit and discovered the following behavior for
 Hive 0.11:

 - Hive 0.11 produces a different plan for these queries with incorrect
 results. The extra stage for the DISTRIBUTE BY + Transform is missing and
 the Transform operator for the custom reducer script is pushed into the
 reduce operator tree containing GROUP BY itself.

 - However, *if the SORT BY in the query has a DESC order in it*, the
 right plan is produced, and the results look correct too.

 Hive 0.10 produces the expected plan with right results in all cases.


 To illustrate, here is a simplified repro setup:

 Table:

 *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3
 STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES
 TERMINATED BY '\n' STORED AS TEXTFILE;*

 Query:

 *ADD FILE reducer.py;*

 *FROM(*
 *  SELECT grp, val2 *
 *  FROM test_cluster *
 *  GROUP BY grp, val2 *
 *  DISTRIBUTE BY grp *
 *  SORT BY grp, val2  -- add DESC here to get correct results*
 *) **a*
 *
 *
 *REDUCE a.**
 *USING 'reducer.py'*
 *AS grp, reducedValue*


 If i understand correctly, this is a bug. Is this a known issue? Any
 other insights? We have reverted to Hive 0.10 to avoid the incorrect
 results while we investigate this.

 I have the repro sample, with test data and scripts, if anybody is
 interested.



 Thanks,
 pala

[jira] [Created] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns

2013-08-25 Thread Yin Huai (JIRA)

Yin Huai created HIVE-5149:
--

 Summary: ReduceSinkDeDuplication can pick the wrong partitioning 
columns
 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai


https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5087) Rename npath UDF

2013-08-25 Thread Alex Breshears (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749796#comment-13749796
 ] 

Alex Breshears commented on HIVE-5087:
--

Couple quick questions: what's driving the rename, and what will the new 
function be named?

 Rename npath UDF
 

 Key: HIVE-5087
 URL: https://issues.apache.org/jira/browse/HIVE-5087
 Project: Hive
  Issue Type: Bug
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5087.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5087) Rename npath UDF

2013-08-25 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749807#comment-13749807
 ] 

Alan Gates commented on HIVE-5087:
--

From the last [Hive 
report|http://www.apache.org/foundation/records/minutes/2013/board_minutes_2013_06_19.txt]
 to the Apache board

* In late May Teradata requested that the project remove a UDF
  ('npath') which was included in the 0.11.0 release. Teradata
  alleges that this UDF violates a US patent they hold as well
  as their common law trademark. The Hive PMC has referred this issue
  to the ASF Legal Board.


 Rename npath UDF
 

 Key: HIVE-5087
 URL: https://issues.apache.org/jira/browse/HIVE-5087
 Project: Hive
  Issue Type: Bug
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5087.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5146) FilterExprOrExpr changes the order of the rows

2013-08-25 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5146:
---

Attachment: HIVE-5146.2.patch

Updated patch with fixes in the tests. Some tests need to be fixed because of 
change in the order of rows. Also, due to the change in order, double 
computations return slightly different results. 
 With this patch, the expected results match exactly with non-vector mode 
computation.

 FilterExprOrExpr changes the order of the rows
 --

 Key: HIVE-5146
 URL: https://issues.apache.org/jira/browse/HIVE-5146
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5146.1.patch, HIVE-5146.2.patch


 FilterExprOrExpr changes the order of the rows which might break some UDFs 
 that assume an order in data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: custom Hive artifacts for Shark project

2013-08-25 Thread Edward Capriolo

I think we plan on doing an 11.1 or just a 12.0. How does shark use hive?
Do you just include hive components from maven or does the project somehow
encorportate our build infrastructure.


On Sun, Aug 25, 2013 at 7:42 PM, Konstantin Boudnik c...@apache.org wrote:

 Guys,

 considering the absence of the input, I take it that it really doesn't
 matter
 which way the custom artifact will be published. Is it a correct
 impression?

 My first choice would be
 org.apache.hive.hive-common;0.9-shark0.7
 org.apache.hive.hive-cli;0.9-shark0.7
 artifacts.
 If this meets the objections from the community here, then I'd like to
 proceed
 with
 org.shark-project.hive-common;0.9.0
 org.shark-project.hive-cli;0.9.0

 Any of the artifacts are better be published at Maven central to make it
 readily available for development community.

 Thoughts?
 Regards,
   Cos

 On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote:
  Guys,
 
  I am trying to help Spark/Shark community (spark-project.org and now
  http://incubator.apache.org/projects/spark) with a predicament. Shark -
 that's
  also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
  query optimizer, serdes, and codecs.
 
  In order to improve some known issues with performance and/or concurrency
  Shark developers need to apply a couple of patches on top of the stock
 Hive:
 https://issues.apache.org/jira/browse/HIVE-2891
 https://issues.apache.org/jira/browse/HIVE-3772 (just committed to
 trunk)
  (as per https://github.com/amplab/shark/wiki/Hive-Patches)
 
  The issue here is that latest Shark is working on top if Hive 0.9 (Hive
 0.11
  work is underway) and having developers to apply the patches and build
  their own version of the Hive is an extra step that can be avoided.
 
  One way to address it is to publish Shark specific versions of Hive
 artifacts
  that would have all needed patches applied to stock release.  This way
  downstream projects can simply reference the version org.apache.hive with
  version 0.9.0-shark-0.7 instead of building Hive locally every time.
 
  Perhaps this approach is a little overkill, so perhaps if Hive community
 is
  willing to consider a maintenance release of Hive 0.9.1 and perhaps
 0.11.1
  to include fixes needed by Shark project?
 
  I am willing to step up and produce Hive release bits if any of the
 committers
  here can help with publishing.
 
  --
  Thanks in advance,
Cos

Re: custom Hive artifacts for Shark project

2013-08-25 Thread Konstantin Boudnik

Hi Edward,

Shark is using two jar files from Hive - hive-common and hive-cli. But Shark
community puts a few patches on top of the stock Hive to fix blocking issues
in the latter. The changes aren't proprietary and are either backports from
the newer releases or fixes that weren't committed yet (HIVE-3772 is good
example of this).

Taking into example Hive 0.9 which Shark 0.7 uses. Shark backports a few
bugfixes that were committed into Hive 0.10 or Hive 0.11, but never made it
into Hive 0.9. I believe this is a side effect of Hive always moving forward
and (almost) never making maintenance releases.

Changes and especially massive rewrites bring instability into the software.
It needs to be gradually ironed out with consequent releases. A good example
of such a project would be HBase, that does quite a number of minor releases
to provide their users with stable and robust server-side software. In the
absence of maintenance releases downstream projects tend to find ways to work
around such an obstacle. Hence my earlier email.

As of 0.11.1: Shark currently doesn't support Hive 0.11 because of significant
changes in the APIs of the latter. The support is coming in the next a couple
of months. So, publishing artifacts improving on top of Hive 0.9 might be more
a pressing issue.

Hope it clarifies the situation,
  Cos

On Sun, Aug 25, 2013 at 11:54PM, Edward Capriolo wrote:
 I think we plan on doing an 11.1 or just a 12.0. How does shark use hive?
 Do you just include hive components from maven or does the project somehow
 encorportate our build infrastructure.
 
 
 On Sun, Aug 25, 2013 at 7:42 PM, Konstantin Boudnik c...@apache.org wrote:
 
  Guys,
 
  considering the absence of the input, I take it that it really doesn't
  matter
  which way the custom artifact will be published. Is it a correct
  impression?
 
  My first choice would be
  org.apache.hive.hive-common;0.9-shark0.7
  org.apache.hive.hive-cli;0.9-shark0.7
  artifacts.
  If this meets the objections from the community here, then I'd like to
  proceed
  with
  org.shark-project.hive-common;0.9.0
  org.shark-project.hive-cli;0.9.0
 
  Any of the artifacts are better be published at Maven central to make it
  readily available for development community.
 
  Thoughts?
  Regards,
Cos
 
  On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote:
   Guys,
  
   I am trying to help Spark/Shark community (spark-project.org and now
   http://incubator.apache.org/projects/spark) with a predicament. Shark -
  that's
   also known as Hive on Spark - is using some parts of Hive, ie HQL parser,
   query optimizer, serdes, and codecs.
  
   In order to improve some known issues with performance and/or concurrency
   Shark developers need to apply a couple of patches on top of the stock
  Hive:
  https://issues.apache.org/jira/browse/HIVE-2891
  https://issues.apache.org/jira/browse/HIVE-3772 (just committed to
  trunk)
   (as per https://github.com/amplab/shark/wiki/Hive-Patches)
  
   The issue here is that latest Shark is working on top if Hive 0.9 (Hive
  0.11
   work is underway) and having developers to apply the patches and build
   their own version of the Hive is an extra step that can be avoided.
  
   One way to address it is to publish Shark specific versions of Hive
  artifacts
   that would have all needed patches applied to stock release.  This way
   downstream projects can simply reference the version org.apache.hive with
   version 0.9.0-shark-0.7 instead of building Hive locally every time.
  
   Perhaps this approach is a little overkill, so perhaps if Hive community
  is
   willing to consider a maintenance release of Hive 0.9.1 and perhaps
  0.11.1
   to include fixes needed by Shark project?
  
   I am willing to step up and produce Hive release bits if any of the
  committers
   here can help with publishing.
  
   --
   Thanks in advance,
 Cos
  
 
 
 


signature.asc
Description: Digital signature

[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde

2013-08-25 Thread Jakob Homan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749836#comment-13749836
 ] 

Jakob Homan commented on HIVE-4734:
---

Reviewed last patch on RB.  Everything looks good except for a change in the 
handling of [T1,Tn,NULL] types.

 Use custom ObjectInspectors for AvroSerde
 -

 Key: HIVE-4734
 URL: https://issues.apache.org/jira/browse/HIVE-4734
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mark Wagner
 Fix For: 0.12.0

 Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch


 Currently, the AvroSerde recursively copies all fields of a record from the 
 GenericRecord to a List row object and provides the standard 
 ObjectInspectors. Performance can be improved by providing ObjectInspectors 
 to the Avro record itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE