date:20130121


 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.16.patch

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.1.patch, hive.3403.2.patch, 
 hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, 
 hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3628) Provide a way to use counters in Hive through UDF


 [ 
https://issues.apache.org/jira/browse/HIVE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3628:
--

Attachment: HIVE-3628.D8007.5.patch

navis updated the revision HIVE-3628 [jira] Provide a way to use counters in 
Hive through UDF.
Reviewers: JIRA

  Missed to ammend. Sorry.


REVISION DETAIL
  https://reviews.facebook.net/D8007

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapper.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapredContext.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTF.java
  ql/src/test/org/apache/hadoop/hive/ql/udf/generic/DummyContextUDF.java
  ql/src/test/queries/clientpositive/udf_context_aware.q
  ql/src/test/results/clientpositive/udf_context_aware.q.out

To: JIRA, navis
Cc: njain


 Provide a way to use counters in Hive through UDF
 -

 Key: HIVE-3628
 URL: https://issues.apache.org/jira/browse/HIVE-3628
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Viji
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3628.D8007.1.patch, HIVE-3628.D8007.2.patch, 
 HIVE-3628.D8007.3.patch, HIVE-3628.D8007.4.patch, HIVE-3628.D8007.5.patch


 Currently it is not possible to generate counters through UDF. We should 
 support this. 
 Pig currently allows this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join


 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.17.patch

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.1.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, 
 hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, 
 hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3464) Merging join tree may reorder joins which could be invalid


[ 
https://issues.apache.org/jira/browse/HIVE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558666#comment-13558666
 ] 

Phabricator commented on HIVE-3464:
---

navis has commented on the revision HIVE-3464 [jira] Merging join tree may 
reorder joins which could be invalid.

INLINE COMMENTS
  ql/src/test/queries/clientpositive/mergejoins_mixed.q:19 1. (a-b-c-d)
  A(a.key=b.key) + B(b.key=c.key) + C(a.key=d.key)
  makes single join ABC(a.key=b.key=c.key=d.key)

  2. ((a-b-d)-c) or (((a-b)-c)-d)
  A(a.key=b.key) + B(b.value=c.key) + C(a.key=d.key)
  before patch, hive tries merging C-B, C-A, B-A order (outer to inner), and 
C-A only will be merged, making two joins : AC(a.key=b.key=d.key) + 
B(b.value=c.key).
  This makes C join is executed prior to B and if join type of C is different 
from that of B, it's illegal.

  Patch consist of two parts.
  1. reverted merging order (inner to outer). It makes it a little easier to 
check condition below.
  2. check if it's possible to switch join ordering (if it has same join type)

REVISION DETAIL
  https://reviews.facebook.net/D5409

To: JIRA, navis
Cc: njain


 Merging join tree may reorder joins which could be invalid
 --

 Key: HIVE-3464
 URL: https://issues.apache.org/jira/browse/HIVE-3464
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-3464.D5409.2.patch, HIVE-3464.D5409.3.patch


 Currently, hive merges join tree from right to left regardless of join types, 
 which may introduce join reordering. For example,
 select * from a join a b on a.key=b.key join a c on b.key=c.key join a d on 
 a.key=d.key; 
 Hive tries to merge join tree in a-d=b-d, a-d=a-b, b-c=a-b order and a-d=a-b 
 and b-c=a-b will be merged. Final join tree is a-(bdc).
 With this, ab-d join will be executed prior to ab-c. But if join type of -c 
 and -d is different, this is not valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3833) object inspectors should be initialized based on partition metadata


 [ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3833:
-

Attachment: hive.3833.16.path

 object inspectors should be initialized based on partition metadata
 ---

 Key: HIVE-3833
 URL: https://issues.apache.org/jira/browse/HIVE-3833
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3833.10.patch, hive.3833.11.patch, 
 hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
 hive.3833.16.path, hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, 
 hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, 
 hive.3833.8.patch, hive.3833.9.patch


 Currently, different partitions can be picked up for the same input split 
 based on the
 serdes' etc. And, we dont allow to change the schema for 
 LazyColumnarBinarySerDe.
 Instead of that, different partitions should be part of the same split, only 
 if the
 partition schemas exactly match. The operator tree object inspectors should 
 be based
 on the partition schema. That would give greater flexibility and also help 
 using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3921) recursive_dir.q fails on 0.23


[ 
https://issues.apache.org/jira/browse/HIVE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558668#comment-13558668
 ] 

Sushanth Sowmyan commented on HIVE-3921:


For this test, no, I did not, since the test explicitly already had an  
INCLUDE_HADOOP_MAJOR_VERSIONS(0.23) already set (I did test across versions for 
the others).

 recursive_dir.q fails on 0.23
 -

 Key: HIVE-3921
 URL: https://issues.apache.org/jira/browse/HIVE-3921
 Project: Hive
  Issue Type: Bug
  Components: Tests
 Environment: Hadoop 0.23 (2.0.2-alpha)
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor
  Labels: 0.23, tests
 Attachments: HIVE-3921.D8055.1.patch


 This test fails in 0.23
   - It insists that hive.mapred.supports.subdirectories must be true for 
 mapred.input.dir.recursive to be used. Currently, HiveConf sets that as 
 false. 
   - HIVE-3643 mentions param and says that once HIVE-3276 is in, we 
 should switch the param, and this jira has been committed.
   - Testing with just setting that parameter in the .q file yeilds a 
 mismatch with the golden file, but one that looks like it should just update 
 the .out file:
 [junit] diff -a 
 /Users/sush/dev/hive.git/build/ql/test/logs/clientpositive/recursive_dir.q.out
  
 /Users/sush/dev/hive.git/ql/src/test/results/clientpositive/recursive_dir.q.out
 [junit] 59d58
 [junit]  PREHOOK: Input: default@fact_daily
 [junit] 64d62
 [junit]  POSTHOOK: Input: default@fact_daily

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1083) allow sub-directories for an external table/partition

2013-01-21 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558674#comment-13558674
 ] 

Harsh J commented on HIVE-1083:
---

Given that MAPREDUCE-1501 is in MR2 today, and Hive can make use of it, should 
we close this out now?

 allow sub-directories for an external table/partition
 -

 Key: HIVE-1083
 URL: https://issues.apache.org/jira/browse/HIVE-1083
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Namit Jain
Assignee: Zheng Shao
  Labels: inputformat

 Sometimes users want to define an external table/partition based on all files 
 (recursively) inside a directory.
 Currently most of the Hadoop InputFormat classes do not support that. We 
 should extract all files recursively in the directory, and add them to the 
 input path of the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2332) If all of the parameters of distinct functions are exists in group by columns, query fails in runtime


[ 
https://issues.apache.org/jira/browse/HIVE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558709#comment-13558709
 ] 

Hudson commented on HIVE-2332:
--

Integrated in Hive-trunk-h0.21 #1928 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1928/])
HIVE-3920 Change test for HIVE-2332
(Ashutosh Chauhan and Navis via namit) (Revision 1436199)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436199
Files : 
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out


 If all of the parameters of distinct functions are exists in group by 
 columns, query fails in runtime
 -

 Key: HIVE-2332
 URL: https://issues.apache.org/jira/browse/HIVE-2332
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0
Reporter: Navis
Assignee: Navis
Priority: Blocker
 Fix For: 0.11.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2332.D663.1.patch, 
 HIVE-2332.1.patch.txt, HIVE-2332.2.patch.txt


 select sum(key_int1), sum(distinct key_int1) from t1 group by key_int1;
 fails with message..
 {code}
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {code}
 hadoop says..
 {code}
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:95)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.(StandardStructObjectInspector.java:86)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:252)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:188)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:197)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:85)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:532)
 {code}
 I think the deficient number of key expression, compared to number of key 
 column, is the problem, which should be equal or more. 
 Would it be solved if add some key expression? I'll try.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3920) Change test for HIVE-2332


[ 
https://issues.apache.org/jira/browse/HIVE-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558710#comment-13558710
 ] 

Hudson commented on HIVE-3920:
--

Integrated in Hive-trunk-h0.21 #1928 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1928/])
HIVE-3920 Change test for HIVE-2332
(Ashutosh Chauhan and Navis via namit) (Revision 1436199)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436199
Files : 
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out


 Change test for HIVE-2332
 -

 Key: HIVE-3920
 URL: https://issues.apache.org/jira/browse/HIVE-3920
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Namit Jain
Assignee: Ashutosh Chauhan
 Fix For: 0.11.0

 Attachments: HIVE-3920.D8067.1.patch, HIVE-3920.patch


 The test groupby_distinct_samekey.q is run on t1, which is a empty table.
 It would be useful to add some data in the table to verify the fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Hive-trunk-h0.21 - Build # 1928 - Fixed

2013-01-21 Thread Apache Jenkins Server

Changes for Build #1926
[hashutosh] HIVE-2332 : If all of the parameters of distinct functions are 
exists in group by columns, query fails in runtime (Navis via Ashutosh Chauhan)


Changes for Build #1927

Changes for Build #1928
[namit] HIVE-3920 Change test for HIVE-2332
(Ashutosh Chauhan and Navis via namit)




All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1928)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1928/ to 
view the results.

[jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join


[ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558826#comment-13558826
 ] 

Namit Jain commented on HIVE-3403:
--

The support for sub-queries has also been added in this.

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.1.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, 
 hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, 
 hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3833) object inspectors should be initialized based on partition metadata


 [ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3833:
-

Attachment: hive.3833.17.patch

 object inspectors should be initialized based on partition metadata
 ---

 Key: HIVE-3833
 URL: https://issues.apache.org/jira/browse/HIVE-3833
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3833.10.patch, hive.3833.11.patch, 
 hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
 hive.3833.16.path, hive.3833.17.patch, hive.3833.1.patch, hive.3833.2.patch, 
 hive.3833.3.patch, hive.3833.4.patch, hive.3833.5.patch, hive.3833.6.patch, 
 hive.3833.7.patch, hive.3833.8.patch, hive.3833.9.patch


 Currently, different partitions can be picked up for the same input split 
 based on the
 serdes' etc. And, we dont allow to change the schema for 
 LazyColumnarBinarySerDe.
 Instead of that, different partitions should be part of the same split, only 
 if the
 partition schemas exactly match. The operator tree object inspectors should 
 be based
 on the partition schema. That would give greater flexibility and also help 
 using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3628) Provide a way to use counters in Hive through UDF


[ 
https://issues.apache.org/jira/browse/HIVE-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558840#comment-13558840
 ] 

Namit Jain commented on HIVE-3628:
--

[~navis], is it ready for review ?

 Provide a way to use counters in Hive through UDF
 -

 Key: HIVE-3628
 URL: https://issues.apache.org/jira/browse/HIVE-3628
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Viji
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3628.D8007.1.patch, HIVE-3628.D8007.2.patch, 
 HIVE-3628.D8007.3.patch, HIVE-3628.D8007.4.patch, HIVE-3628.D8007.5.patch


 Currently it is not possible to generate counters through UDF. We should 
 support this. 
 Pig currently allows this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3877) Implement equi-depth histograms as a UDAF

2013-01-21 Thread Rahul Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558950#comment-13558950
 ] 

Rahul Jain commented on HIVE-3877:
--

Hi Shreepadma,

We're very interested in such a functionality for analytics use cases on 
standard hive... Has this been implemented at all or still at concept stage ? 
If you've already made progress here, I'd be happy to collaborate to bring this 
to completion.

 Implement equi-depth histograms as a UDAF
 -

 Key: HIVE-3877
 URL: https://issues.apache.org/jira/browse/HIVE-3877
 Project: Hive
  Issue Type: Sub-task
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan

 Implement a space and time efficient algorithm to bin numeric column data 
 such that all bins approximately contain the same number of elements. 
 Implement such an algorithm as a generic UDAF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2332) If all of the parameters of distinct functions are exists in group by columns, query fails in runtime


[ 
https://issues.apache.org/jira/browse/HIVE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558989#comment-13558989
 ] 

Hudson commented on HIVE-2332:
--

Integrated in Hive-trunk-hadoop2 #79 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/79/])
HIVE-3920 Change test for HIVE-2332
(Ashutosh Chauhan and Navis via namit) (Revision 1436199)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436199
Files : 
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out


 If all of the parameters of distinct functions are exists in group by 
 columns, query fails in runtime
 -

 Key: HIVE-2332
 URL: https://issues.apache.org/jira/browse/HIVE-2332
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0
Reporter: Navis
Assignee: Navis
Priority: Blocker
 Fix For: 0.11.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2332.D663.1.patch, 
 HIVE-2332.1.patch.txt, HIVE-2332.2.patch.txt


 select sum(key_int1), sum(distinct key_int1) from t1 group by key_int1;
 fails with message..
 {code}
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {code}
 hadoop says..
 {code}
 Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:95)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.(StandardStructObjectInspector.java:86)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:252)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:188)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:197)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:85)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:532)
 {code}
 I think the deficient number of key expression, compared to number of key 
 column, is the problem, which should be equal or more. 
 Would it be solved if add some key expression? I'll try.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3920) Change test for HIVE-2332


[ 
https://issues.apache.org/jira/browse/HIVE-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558990#comment-13558990
 ] 

Hudson commented on HIVE-3920:
--

Integrated in Hive-trunk-hadoop2 #79 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/79/])
HIVE-3920 Change test for HIVE-2332
(Ashutosh Chauhan and Navis via namit) (Revision 1436199)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436199
Files : 
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_distinct_samekey.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out


 Change test for HIVE-2332
 -

 Key: HIVE-3920
 URL: https://issues.apache.org/jira/browse/HIVE-3920
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Namit Jain
Assignee: Ashutosh Chauhan
 Fix For: 0.11.0

 Attachments: HIVE-3920.D8067.1.patch, HIVE-3920.patch


 The test groupby_distinct_samekey.q is run on t1, which is a empty table.
 It would be useful to add some data in the table to verify the fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [DISCUSS] HCatalog becoming a subproject of Hive

2013-01-21 Thread Carl Steinbach

Hi Alan,

Overall this looks good to me. I have a couple small suggestions:

* Replace occurrences of Hive's subversion repository with Hive's
source code repository.
* In the Actions table the sentence This also covers the creation of new
sub-projects within the project should be changed to This also covers the
creation of new sub-projects and sub-modules within the project.

Thanks.

Carl

On Fri, Jan 18, 2013 at 4:42 PM, Alan Gates ga...@hortonworks.com wrote:

I've created a wiki page for my proposed changes at
https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committers

Text to be removed is struck through. Text to be added is in italics.

Any recommended changes before we vote?

Alan.

On Jan 17, 2013, at 2:08 PM, Carl Steinbach wrote:

Sounds like a good plan to me. Since Ashutosh is a member of both the
Hive
and HCatalog PMCs it probably makes more sense for him to call the vote,
but I'm willing to do it too.

On Wed, Jan 16, 2013 at 8:24 AM, Alan Gates ga...@hortonworks.com
wrote:

If you think that's the best path forward that's fine. I can't call a
vote I don't think, since I'm not part of the Hive PMC. But I'm happy
to
draft a resolution for you and then let you call the vote. Should I do
that?

Alan.

On Jan 11, 2013, at 4:34 PM, Carl Steinbach wrote:

Hi Alan,

I agree that submitting this for a vote is the best option.

If anyone has additional proposed modifications please make them.
Otherwise I propose that the Hive PMC vote on this proposal.

In order for the Hive PMC to be able to vote on these changes they need
to be expressed in terms of one or more of the actions listed at the
end
of the Hive project bylaws:

https://cwiki.apache.org/confluence/display/Hive/Bylaws

So I think we first need to amend to the bylaws in order to define the
rights and privileges of a submodule committer, and then separately vote
the HCatalog committers in as Hive submodule committers. Does this make
sense?

Thanks.

Carl

[jira] [Created] (HIVE-3923) join_filters_overlap.q fails on 0.23

Sushanth Sowmyan created HIVE-3923:
--

 Summary: join_filters_overlap.q fails on 0.23
 Key: HIVE-3923
 URL: https://issues.apache.org/jira/browse/HIVE-3923
 Project: Hive
  Issue Type: Bug
  Components: Tests
 Environment: Hadoop 0.23
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor


As with some of the other broken tests on 0.23, this is broken because the 
order of results generated by the query on 0.23 is different from the order in 
the golden output file. However, there appears to be nothing wrong with the 
query itself.

This can be fixed by adding an order-by clause and regenerating the golden file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-21 Thread Alan Gates (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559097#comment-13559097
]

Alan Gates commented on HIVE-896:
-

Harish,

Thanks for you replies. I want to think on your explanation in 2 above some
more, but at least I think I understand your rationale now.

One other question. I tried playing around with this but kept getting an
error. I'm not sure what I'm doing wrong. I have a table that I created with
the following statement:

{code}
create table studenttab10k (name string, age int, gpa float);
{code}

When I run
{code}
select avg(gpa) over (cluster by age) from studenttab10k;
{code}

I get

{code}
FAILED: SemanticException 1:43 No partition specification associated with start
of PTF chain . Error encountered near token 'age'
{code}

I looked through the syntax file and I think I'm doing the right thing, but
obviously I'm not.

Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
---

Key: HIVE-896
URL: https://issues.apache.org/jira/browse/HIVE-896
Project: Hive
Issue Type: New Feature
Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
Attachments: HIVE-896.1.patch.txt

Windowing functions are very useful for click stream processing and similar
time-series/sliding-window analytics.
More details at:
http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
-- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3923) join_filters_overlap.q fails on 0.23


 [ 
https://issues.apache.org/jira/browse/HIVE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3923:
--

Attachment: HIVE-3923.D8079.1.patch

khorgath requested code review of HIVE-3923 [jira] join_filters_overlap.q 
fails on 0.23.
Reviewers: JIRA

  Adding order-by to tests to fix 0.23 test breakage

  As with some of the other broken tests on 0.23, this is broken because the 
order of results generated by the query on 0.23 is different from the order in 
the golden output file. However, there appears to be nothing wrong with the 
query itself.

  This can be fixed by adding an order-by clause and regenerating the golden 
file.

TEST PLAN
  Patch attached is a test fix

REVISION DETAIL
  https://reviews.facebook.net/D8079

AFFECTED FILES
  ql/src/test/queries/clientpositive/join_filters_overlap.q
  ql/src/test/results/clientpositive/join_filters_overlap.q.out

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/19485/

To: JIRA, khorgath


 join_filters_overlap.q fails on 0.23
 

 Key: HIVE-3923
 URL: https://issues.apache.org/jira/browse/HIVE-3923
 Project: Hive
  Issue Type: Bug
  Components: Tests
 Environment: Hadoop 0.23
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor
 Attachments: HIVE-3923.D8079.1.patch


 As with some of the other broken tests on 0.23, this is broken because the 
 order of results generated by the query on 0.23 is different from the order 
 in the golden output file. However, there appears to be nothing wrong with 
 the query itself.
 This can be fixed by adding an order-by clause and regenerating the golden 
 file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3923) join_filters_overlap.q fails on 0.23


 [ 
https://issues.apache.org/jira/browse/HIVE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-3923:
---

Status: Patch Available  (was: Open)

Phabricator link : https://reviews.facebook.net/D8079

 join_filters_overlap.q fails on 0.23
 

 Key: HIVE-3923
 URL: https://issues.apache.org/jira/browse/HIVE-3923
 Project: Hive
  Issue Type: Bug
  Components: Tests
 Environment: Hadoop 0.23
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor
 Attachments: HIVE-3923.D8079.1.patch


 As with some of the other broken tests on 0.23, this is broken because the 
 order of results generated by the query on 0.23 is different from the order 
 in the golden output file. However, there appears to be nothing wrong with 
 the query itself.
 This can be fixed by adding an order-by clause and regenerating the golden 
 file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3924) join_nullsafe.q fails on 0.23

Sushanth Sowmyan created HIVE-3924:
--

 Summary: join_nullsafe.q fails on 0.23
 Key: HIVE-3924
 URL: https://issues.apache.org/jira/browse/HIVE-3924
 Project: Hive
  Issue Type: Bug
  Components: Tests
 Environment: Hadoop 0.23
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor


As with some of the other broken tests on 0.23, this is broken because the 
order of results generated by the query on 0.23 is different from the order in 
the golden output file. However, there appears to be nothing wrong with the 
query itself. This can be fixed by adding an order-by clause and regenerating 
the golden file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [DISCUSS] HCatalog becoming a subproject of Hive

2013-01-21 Thread Alan Gates

Changes made.

Alan.

On Jan 21, 2013, at 12:39 PM, Carl Steinbach wrote:

Hi Alan,

Overall this looks good to me. I have a couple small suggestions:

Thanks.

Carl

On Fri, Jan 18, 2013 at 4:42 PM, Alan Gates ga...@hortonworks.com wrote:

I've created a wiki page for my proposed changes at
https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committers

Text to be removed is struck through. Text to be added is in italics.

Any recommended changes before we vote?

Alan.

On Jan 17, 2013, at 2:08 PM, Carl Steinbach wrote:

Sounds like a good plan to me. Since Ashutosh is a member of both the
Hive
and HCatalog PMCs it probably makes more sense for him to call the vote,
but I'm willing to do it too.

On Wed, Jan 16, 2013 at 8:24 AM, Alan Gates ga...@hortonworks.com
wrote:

Alan.

On Jan 11, 2013, at 4:34 PM, Carl Steinbach wrote:

Hi Alan,

I agree that submitting this for a vote is the best option.

If anyone has additional proposed modifications please make them.
Otherwise I propose that the Hive PMC vote on this proposal.

In order for the Hive PMC to be able to vote on these changes they need
to be expressed in terms of one or more of the actions listed at the
end
of the Hive project bylaws:

https://cwiki.apache.org/confluence/display/Hive/Bylaws

Thanks.

Carl

[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-21 Thread Harish Butani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559154#comment-13559154
 ] 

Harish Butani commented on HIVE-896:


Alan,

Thanks for spending the time.

Yes your e.g. is going to fail. There was a bug in the patch we posted.
This was fixed in commit 0eff864d765c91e0bece497e0f007c6cd2cec72f in our repo 
on Jan 9th.
I can send you a patch privately or post the updated patch here. 
Sorry about this. 


 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
  Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
 Attachments: HIVE-896.1.patch.txt


 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3924) join_nullsafe.q fails on 0.23


 [ 
https://issues.apache.org/jira/browse/HIVE-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-3924:
---

Status: Patch Available  (was: Open)

Phabricator link : https://reviews.facebook.net/D8085

 join_nullsafe.q fails on 0.23
 -

 Key: HIVE-3924
 URL: https://issues.apache.org/jira/browse/HIVE-3924
 Project: Hive
  Issue Type: Bug
  Components: Tests
 Environment: Hadoop 0.23
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor

 As with some of the other broken tests on 0.23, this is broken because the 
 order of results generated by the query on 0.23 is different from the order 
 in the golden output file. However, there appears to be nothing wrong with 
 the query itself. This can be fixed by adding an order-by clause and 
 regenerating the golden file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong


 [ 
https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3326:
--

Attachment: HIVE-3326.D8091.1.patch

navis requested code review of HIVE-3326 [jira] plan for multiple mapjoin 
followed by a normal join is wrong.
Reviewers: JIRA

  DPAL-1968 plan for multiple mapjoin followed by a normal join is wrong

  example queries:

  create table yudi(c1 int, c2 int, c3 int, c4 int);
  create table wangmu(c1 int, c2 int, c3 int, c4 int);
  select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c 
on b.c2=c.c2 join yudi d on a.c3=d.c3;

  in explain mode, I got this:

  hive explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 
join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
  OK
  STAGE DEPENDENCIES:
Stage-8 is a root stage
Stage-2 depends on stages: Stage-8
Stage-7 depends on stages: Stage-2
Stage-3 depends on stages: Stage-7
Stage-1 depends on stages: Stage-3

  STAGE PLANS:
Stage: Stage-8
  Map Reduce Local Work
Alias - Map Local Tables:
  b
  Not Important
Stage: Stage-2
  Map Reduce
Alias - Map Operator Tree:
  a
  Not Important
Local Work:
  Map Reduce Local Work

Stage: Stage-7
  Map Reduce Local Work
Alias - Map Local Tables:
  c
  Not Important
Stage: Stage-3
  Map Reduce
Alias - Map Operator Tree:
 
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
  Not Important
Local Work:
  Map Reduce Local Work

Stage: Stage-1
  Map Reduce
Alias - Map Operator Tree:
  d
TableScan

  
file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
Select Operator

Reduce Operator Tree:
Not Important

  You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', 
not Stage-2(result in '.../-mr-10002').

  To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
  GenMapRedUtils.java

  if (oldMapJoin == null) {
if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
|| local || (oldTask != null)  (parTasks != null)) {
  taskTmpDir = mjCtx.getTaskTmpDir();
  tt_desc = mjCtx.getTTDesc();
  rootOp = mjCtx.getRootMapJoinOp();
  }
  } else {
GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
assert oldMjCtx != null;
taskTmpDir = oldMjCtx.getTaskTmpDir();
tt_desc = oldMjCtx.getTTDesc();
rootOp = oldMjCtx.getRootMapJoinOp();
  }

  my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let 
query go into 'if' block, and it works.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8091

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/test/queries/clientpositive/mapjoin_mapjoin_join.q
  ql/src/test/results/clientpositive/mapjoin_mapjoin_join.q.out

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/19497/

To: JIRA, navis


 plan for multiple mapjoin followed by a normal join is wrong
 

 Key: HIVE-3326
 URL: https://issues.apache.org/jira/browse/HIVE-3326
 Project: Hive
  Issue Type: Bug
  Components: SQL
 Environment: OS X 10.8; java 1.6.0_33
Reporter: Zhang Xinyu
Assignee: Navis
 Attachments: HIVE-3326.D8091.1.patch, patch.diff


 example queries:
 {code}
 create table yudi(c1 int, c2 int, c3 int, c4 int);
 create table wangmu(c1 int, c2 int, c3 int, c4 int);
 select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c 
 on b.c2=c.c2 join yudi d on a.c3=d.c3;
 {code}
 in explain mode, I got this:
 {code}
 hive explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 
 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
 OK
 STAGE DEPENDENCIES:
   Stage-8 is a root stage
   Stage-2 depends on stages: Stage-8
   Stage-7 depends on stages: Stage-2
   Stage-3 depends on stages: Stage-7
   Stage-1 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-8
 Map Reduce Local Work
   Alias - Map Local Tables:
 b
 Not Important
   Stage: Stage-2
 Map Reduce
   Alias - Map Operator Tree:
 a
 Not Important
   Local Work:
 Map Reduce Local Work
   Stage: Stage-7
 Map Reduce Local Work
   Alias - Map Local Tables:
 c
 Not Important
   Stage: Stage-3
 Map Reduce
   Alias - Map Operator

[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-21 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559257#comment-13559257
 ] 

Alan Gates commented on HIVE-896:
-

I'd definitely like to get a new version of the patch.  I'm happy to pull from 
github.  I looked at the repo referenced above ( 
https://github.com/hbutani/SQLWindowing ) but it didn't have any recent 
updates.  

 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
  Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
 Attachments: HIVE-896.1.patch.txt


 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-21 Thread Harish Butani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559263#comment-13559263
 ] 

Harish Butani commented on HIVE-896:


Its https://github.com/hbutani/hive (ptf branch)
The SQLWindowing repo has the work we did on top of hive.

 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
  Components: OLAP, UDF
Reporter: Amr Awadallah
Priority: Minor
 Attachments: HIVE-896.1.patch.txt


 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong


[ 
https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559378#comment-13559378
 ] 

Phabricator commented on HIVE-3326:
---

njain has commented on the revision HIVE-3326 [jira] plan for multiple mapjoin 
followed by a normal join is wrong.

  Navis, I am not sure, we should support this.
  https://issues.apache.org/jira/browse/HIVE-3784 is the right way to go.
  We are adding way more complexity than is needed to solve this problem.

  Let me refresh HIVE-3784 and try to address Ashutosh's concerns.

REVISION DETAIL
  https://reviews.facebook.net/D8091

To: JIRA, navis
Cc: njain


 plan for multiple mapjoin followed by a normal join is wrong
 

 Key: HIVE-3326
 URL: https://issues.apache.org/jira/browse/HIVE-3326
 Project: Hive
  Issue Type: Bug
  Components: SQL
 Environment: OS X 10.8; java 1.6.0_33
Reporter: Zhang Xinyu
Assignee: Navis
 Attachments: HIVE-3326.D8091.1.patch, patch.diff


 example queries:
 {code}
 create table yudi(c1 int, c2 int, c3 int, c4 int);
 create table wangmu(c1 int, c2 int, c3 int, c4 int);
 select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c 
 on b.c2=c.c2 join yudi d on a.c3=d.c3;
 {code}
 in explain mode, I got this:
 {code}
 hive explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 
 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
 OK
 STAGE DEPENDENCIES:
   Stage-8 is a root stage
   Stage-2 depends on stages: Stage-8
   Stage-7 depends on stages: Stage-2
   Stage-3 depends on stages: Stage-7
   Stage-1 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-8
 Map Reduce Local Work
   Alias - Map Local Tables:
 b
 Not Important
   Stage: Stage-2
 Map Reduce
   Alias - Map Operator Tree:
 a
 Not Important
   Local Work:
 Map Reduce Local Work
   Stage: Stage-7
 Map Reduce Local Work
   Alias - Map Local Tables:
 c
 Not Important
   Stage: Stage-3
 Map Reduce
   Alias - Map Operator Tree:

 file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
 Not Important
   Local Work:
 Map Reduce Local Work
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 d
   TableScan
 
 file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
   Select Operator
   Reduce Operator Tree:
   Not Important
 {code}
 You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', 
 not Stage-2(result in '.../-mr-10002').
 To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
 {code:title=GenMapRedUtils.java}
 if (oldMapJoin == null) {
   if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
   || local || (oldTask != null)  (parTasks != null)) {
 taskTmpDir = mjCtx.getTaskTmpDir();
 tt_desc = mjCtx.getTTDesc();
 rootOp = mjCtx.getRootMapJoinOp();
 }
 } else {
   GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
   assert oldMjCtx != null;
   taskTmpDir = oldMjCtx.getTaskTmpDir();
   tt_desc = oldMjCtx.getTTDesc();
   rootOp = oldMjCtx.getRootMapJoinOp();
 }
 {code}
 my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let 
 query go into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3326) plan for multiple mapjoin followed by a normal join is wrong


 [ 
https://issues.apache.org/jira/browse/HIVE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3326:
-

Status: Open  (was: Patch Available)

comments on phabricator

 plan for multiple mapjoin followed by a normal join is wrong
 

 Key: HIVE-3326
 URL: https://issues.apache.org/jira/browse/HIVE-3326
 Project: Hive
  Issue Type: Bug
  Components: SQL
 Environment: OS X 10.8; java 1.6.0_33
Reporter: Zhang Xinyu
Assignee: Navis
 Attachments: HIVE-3326.D8091.1.patch, patch.diff


 example queries:
 {code}
 create table yudi(c1 int, c2 int, c3 int, c4 int);
 create table wangmu(c1 int, c2 int, c3 int, c4 int);
 select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c 
 on b.c2=c.c2 join yudi d on a.c3=d.c3;
 {code}
 in explain mode, I got this:
 {code}
 hive explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 
 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
 OK
 STAGE DEPENDENCIES:
   Stage-8 is a root stage
   Stage-2 depends on stages: Stage-8
   Stage-7 depends on stages: Stage-2
   Stage-3 depends on stages: Stage-7
   Stage-1 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-8
 Map Reduce Local Work
   Alias - Map Local Tables:
 b
 Not Important
   Stage: Stage-2
 Map Reduce
   Alias - Map Operator Tree:
 a
 Not Important
   Local Work:
 Map Reduce Local Work
   Stage: Stage-7
 Map Reduce Local Work
   Alias - Map Local Tables:
 c
 Not Important
   Stage: Stage-3
 Map Reduce
   Alias - Map Operator Tree:

 file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
 Not Important
   Local Work:
 Map Reduce Local Work
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 d
   TableScan
 
 file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r48gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
   Select Operator
   Reduce Operator Tree:
   Not Important
 {code}
 You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', 
 not Stage-2(result in '.../-mr-10002').
 To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
 {code:title=GenMapRedUtils.java}
 if (oldMapJoin == null) {
   if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
   || local || (oldTask != null)  (parTasks != null)) {
 taskTmpDir = mjCtx.getTaskTmpDir();
 tt_desc = mjCtx.getTTDesc();
 rootOp = mjCtx.getRootMapJoinOp();
 }
 } else {
   GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
   assert oldMjCtx != null;
   taskTmpDir = oldMjCtx.getTaskTmpDir();
   tt_desc = oldMjCtx.getTTDesc();
   rootOp = oldMjCtx.getRootMapJoinOp();
 }
 {code}
 my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let 
 query go into 'if' block, and it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3784) de-emphasize mapjoin hint


 [ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3784:
-

Attachment: hive.3784.6.patch

 de-emphasize mapjoin hint
 -

 Key: HIVE-3784
 URL: https://issues.apache.org/jira/browse/HIVE-3784
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
 hive.3784.4.patch, hive.3784.5.patch, hive.3784.6.patch


 hive.auto.convert.join has been around for a long time, and is pretty stable.
 When mapjoin hint was created, the above parameter did not exist.
 The only reason for the user to specify a mapjoin currently is if they want
 it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
 Eventually, that should also go away, but that may take some time to 
 stabilize.
 There are many rules in SemanticAnalyzer to handle the following trees:
 ReduceSink - MapJoin
 Union  - MapJoin
 MapJoin- MapJoin
 This should not be supported anymore. In any of the above scenarios, the
 user can get the mapjoin behavior by setting hive.auto.convert.join to true
 and not specifying the hint. This will simplify the code a lot.
 What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2839) Filters on outer join with mapjoin hint is not applied correctly

2013-01-21 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559423#comment-13559423
 ] 

Navis commented on HIVE-2839:
-

I think all of the patches dealing MAPJOIN hint should wait till HIVE-3784 is 
committed.

 Filters on outer join with mapjoin hint is not applied correctly
 

 Key: HIVE-2839
 URL: https://issues.apache.org/jira/browse/HIVE-2839
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2839.D2079.2.patch


 Testing HIVE-2820, I've found some queries with mapjoin hint makes exceptions.
 {code}
 SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key 
 AND true limit 10;
 FAILED: Hive Internal Error: 
 java.lang.ClassCastException(org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc
  cannot be cast to org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
 java.lang.ClassCastException: 
 org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc cannot be cast to 
 org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc
   at 
 org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertMapJoin(MapJoinProcessor.java:363)
   at 
 org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.generateMapJoinOperator(MapJoinProcessor.java:483)
   at 
 org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.transform(MapJoinProcessor.java:689)
   at 
 org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:87)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7519)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:891)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 {code}
 and 
 {code}
 SELECT /*+ MAPJOIN(a) */ * FROM src a RIGHT OUTER JOIN src b on a.key=b.key 
 AND b.key * 10  '1000' limit 10;
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:198)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1321)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1325)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1325)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:495)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
   ... 8 more
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint

[
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559426#comment-13559426
]

Namit Jain commented on HIVE-3784:
--

I was thinking of adding a size parameter. If n-1 tables are below that size
(for a n-way join), the joinTask should be converted to a mapJoin task
(map-only) instead of a conditional task. We would need a further optimization
step to merge 2 map-only tasks to a single map-only task.

[~navis], what do you think ? Can you think of a better idea ?

de-emphasize mapjoin hint
-

Key: HIVE-3784
URL: https://issues.apache.org/jira/browse/HIVE-3784
Project: Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch,
hive.3784.4.patch, hive.3784.5.patch, hive.3784.6.patch

hive.auto.convert.join has been around for a long time, and is pretty stable.
When mapjoin hint was created, the above parameter did not exist.
The only reason for the user to specify a mapjoin currently is if they want
it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
Eventually, that should also go away, but that may take some time to
stabilize.
There are many rules in SemanticAnalyzer to handle the following trees:
ReduceSink - MapJoin
Union - MapJoin
MapJoin- MapJoin
This should not be supported anymore. In any of the above scenarios, the
user can get the mapjoin behavior by setting hive.auto.convert.join to true
and not specifying the hint. This will simplify the code a lot.
What does everyone think ?

[jira] [Created] (HIVE-3925) dependencies of fetch task are not shown by explain