[jira] [Commented] (HIVE-1603) support CSV text file format

2011-12-21 Thread Sam Wilson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174190#comment-13174190
 ] 

Sam Wilson commented on HIVE-1603:
--

Instead of hard-coding it to work only with Comma-separated-volumes, why not 
have a DelimitedTextFile and a separate set of options to control the delimiter 
and quoting. Some people need pipe-delimited, for example.

 support CSV text file format
 

 Key: HIVE-1603
 URL: https://issues.apache.org/jira/browse/HIVE-1603
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Ning Zhang

 Comma Separated Values (CSV) text format are commonly used in exchanging 
 relational data between heterogeneous systems. Currently Hive uses TextFile 
 format when displaying query results. This could cause confusions when column 
 values contain new lines or tabs. A CSVTextFile format could get around this 
 problem. This will require a new CSVTextInputFormat, CSVTextOutputFormat, and 
 CSVSerDe. 
 A proposed use case is like:
 {code}
 -- exporting a table to CSV files in a directory
 hive set hive.io.output.fileformat=CSVTextFile;
 hive insert overwrite local directory '/tmp/CSVrepos/' select * from S where 
 ... ;
 -- query result in CSV
 hive -e 'set hive.io.output.fileformat=CSVTextFile; select * from T;' | 
 sql_loader_to_other_systems
 -- query CSV files directory from Hive
 hive create table T (...) stored as CSVTextFile;
 hive load data local inpath '/my/CSVfiles' into table T;
 hive select * from T where ...;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive

2011-12-21 Thread Namit Jain (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174211#comment-13174211
 ] 

Namit Jain commented on HIVE-2642:
--

1 general comment about the new test union26.q --

Reduce the test output, I mean, you dont need to load all 500 rows for this 
test.
It makes the test output really difficult to review.

Again, all the above 3 are not blockers - I am still reviewing, I will file a 
enhancement for all the follow-ups.

 fix Hive-2566 and make union optimization more aggressive 
 --

 Key: HIVE-2642
 URL: https://issues.apache.org/jira/browse/HIVE-2642
 Project: Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2642.D735.1.patch


 Hive-2566 did some optimizations to union, but cause some problems. And then 
 got reverted. This is to get it back and fix the problems we saw, and also 
 make union optimization more aggressive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174227#comment-13174227
 ] 

Phabricator commented on HIVE-2642:
---

njain has commented on the revision HIVE-2642 [jira] fix Hive-2566 and make 
union optimization more aggressive.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java:245 remove 
this comment

REVISION DETAIL
  https://reviews.facebook.net/D735


 fix Hive-2566 and make union optimization more aggressive 
 --

 Key: HIVE-2642
 URL: https://issues.apache.org/jira/browse/HIVE-2642
 Project: Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2642.D735.1.patch


 Hive-2566 did some optimizations to union, but cause some problems. And then 
 got reverted. This is to get it back and fix the problems we saw, and also 
 make union optimization more aggressive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive

2011-12-21 Thread Namit Jain (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-2642.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed. Thanks Yongqiang

 fix Hive-2566 and make union optimization more aggressive 
 --

 Key: HIVE-2642
 URL: https://issues.apache.org/jira/browse/HIVE-2642
 Project: Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2642.D735.1.patch


 Hive-2566 did some optimizations to union, but cause some problems. And then 
 got reverted. This is to get it back and fix the problems we saw, and also 
 make union optimization more aggressive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2668) Minor cleanup to HIVE-2642

2011-12-21 Thread Namit Jain (Created) (JIRA)
Minor cleanup to HIVE-2642
--

 Key: HIVE-2668
 URL: https://issues.apache.org/jira/browse/HIVE-2668
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: He Yongqiang


INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java:105 can 
you add some comments here ?

This is not really the top operators - this contains the list of intermediate 
tables also. This code is difficult to debug later on, so more comments would 
be helpful


Look at union22.q.out.

map-join followed by union, an extra stage is introduced.
We dont have to optimize this - just wanted to make sure it is intentional.



1 general comment about the new test union26.q -

Reduce the test output, I mean, you dont need to load all 500 rows for this 
test.
It makes the test output really difficult to review.


ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java:245 remove 
this comment

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2669) remove special processing for map-join

2011-12-21 Thread Namit Jain (Created) (JIRA)
remove special processing for map-join
--

 Key: HIVE-2669
 URL: https://issues.apache.org/jira/browse/HIVE-2669
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain


With hive.auto.convert.join, there is no need for the user to specify map-join 
hint.

It should be completely ignored, other than for bucketized join which can be 
cleaned later.
There is a lot of code in the optimizer for processing union followed by 
map-join etc. which should
be gotten rid of.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2670) A cluster test utility for Hive

2011-12-21 Thread Alan Gates (Created) (JIRA)
A cluster test utility for Hive
---

 Key: HIVE-2670
 URL: https://issues.apache.org/jira/browse/HIVE-2670
 Project: Hive
  Issue Type: New Feature
  Components: Testing Infrastructure
Reporter: Alan Gates


Hive has an extensive set of unit tests, but it does not have an infrastructure 
for testing in a cluster environment.  Pig and HCatalog have been using a test 
harness for cluster testing for some time.  We have written Hive drivers and 
tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2666) StackOverflowError when using custom UDF in map join

2011-12-21 Thread He Yongqiang (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang resolved HIVE-2666.


Resolution: Fixed

committed, thanks Kevin!

 StackOverflowError when using custom UDF in map join
 

 Key: HIVE-2666
 URL: https://issues.apache.org/jira/browse/HIVE-2666
 Project: Hive
  Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2666.D957.1.patch


 When a custom UDF is used as part of a join which is converted to a map join, 
 the XMLEncoder enters an infinite loop when serializing the map reduce task 
 for the second time, as part of sending it to be executed.  This results in a 
 stack overflow error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2666) StackOverflowError when using custom UDF in map join

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174279#comment-13174279
 ] 

Phabricator commented on HIVE-2666:
---

heyongqiang has committed the revision HIVE-2666 [jira] StackOverflowError 
when using custom UDF in map join.

REVISION DETAIL
  https://reviews.facebook.net/D957

COMMIT
  https://reviews.facebook.net/rHIVE1221830


 StackOverflowError when using custom UDF in map join
 

 Key: HIVE-2666
 URL: https://issues.apache.org/jira/browse/HIVE-2666
 Project: Hive
  Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2666.D957.1.patch


 When a custom UDF is used as part of a join which is converted to a map join, 
 the XMLEncoder enters an infinite loop when serializing the map reduce task 
 for the second time, as part of sending it to be executed.  This results in a 
 stack overflow error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2670) A cluster test utility for Hive

2011-12-21 Thread Alan Gates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174292#comment-13174292
 ] 

Alan Gates commented on HIVE-2670:
--

Attached a first patch.  This is not ready for inclusion yet, I'm just putting 
it up here to start getting feedback.  The following will need to be resolved 
before it is checked in:
# Currently it just has the base harness code included as a tar file.  This 
really should be externed from the Pig code base, as HCatalog does.
# I don't know if this is the right place in SVN or not.  I put it all in a 
test-e2e directory right under trunk.  I need feedback on whether this is a 
good spot or somewhere else would be preferred.
# Connect the top level build.xml to this so it is possible to invoke the tests 
from the top level directory.  I was waiting to do this until I had feedback on 
the proper directory structure.

How to use it:

After applying the patch you will need to copy the harness.tar file (attached) 
to test-e2e, since that is not done for you by the patch tool.

First you need an existing Hadoop cluster (it can be very small, just a few 
nodes) and a MySQL database.  I ran my tests against Hadoop 0.20.205.0, but 
this should run against any 0.20.x version of Hadoop.  Then:
# Run the script test-e2e/scripts/create_test_db.sql against your MySQL 
database as a user that can create users and databases, and grant to users 
(root is a good choice)
# Run ant package in the top level Hive directory
# cd test-e2e
# ant -Dharness.hadoop.home=path_to_hadoop_home 
-Dharness.hive.home=path_to_hive_you_want_to_test deploy
# ant -Dharness.hadoop.home=path_to_hadoop_home 
-Dharness.hive.home=path_to_hive_you_want_to_test deploy

Usually path_to_hive_you_want_to_test will be $CWD/../build/dist

The basic design of this test harness is each test consists of three phases:  
run_test, generate_benchmark, and compare_results.  In run_test a particular 
test is run.  generate_benchmark runs the same or a similar test against a 
known source of truth.  compare_results then compares the results and declares 
the test to have succeeded, failed, or aborted.  The harness delegates each of 
these three functions to drivers that are specific to different types of tests.

This patch includes two drivers, a Hive driver and a Hive command line driver.  
The Hive driver uses the MySQL database as a source of truth.  Each SQL script 
is run against Hive and against MySQL and the results compared using the Unix 
cksum tool.  

For more information on the test harness, including how to add tests to it, see 
https://cwiki.apache.org/confluence/display/PIG/HowToTest  The Hive driver does 
not yet support running alternate SQL for benchmarking nor using an old version 
of Hive for the benchmarks, though those should be added sometime.


 A cluster test utility for Hive
 ---

 Key: HIVE-2670
 URL: https://issues.apache.org/jira/browse/HIVE-2670
 Project: Hive
  Issue Type: New Feature
  Components: Testing Infrastructure
Reporter: Alan Gates
 Attachments: harness.tar, hive_cluster_test.patch


 Hive has an extensive set of unit tests, but it does not have an 
 infrastructure for testing in a cluster environment.  Pig and HCatalog have 
 been using a test harness for cluster testing for some time.  We have written 
 Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2670) A cluster test utility for Hive

2011-12-21 Thread Alan Gates (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-2670:
-

Attachment: harness.tar
hive_cluster_test.patch

 A cluster test utility for Hive
 ---

 Key: HIVE-2670
 URL: https://issues.apache.org/jira/browse/HIVE-2670
 Project: Hive
  Issue Type: New Feature
  Components: Testing Infrastructure
Reporter: Alan Gates
 Attachments: harness.tar, hive_cluster_test.patch


 Hive has an extensive set of unit tests, but it does not have an 
 infrastructure for testing in a cluster environment.  Pig and HCatalog have 
 been using a test harness for cluster testing for some time.  We have written 
 Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174329#comment-13174329
 ] 

Hudson commented on HIVE-2642:
--

Integrated in Hive-trunk-h0.23.0 #42 (See 
[https://builds.apache.org/job/Hive-trunk-h0.23.0/42/])
HIVE-2642 fix Hive-2566 and make union optimization more aggressive
(Yongqiang He via namit)

namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1221812
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRRedSink3.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcContext.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/union26.q
* /hive/trunk/ql/src/test/results/clientpositive/auto_join27.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input25.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input26.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join35.q.out
* /hive/trunk/ql/src/test/results/clientpositive/lineage1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part14.q.out
* /hive/trunk/ql/src/test/results/clientpositive/merge4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ppd_union_view.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out
* /hive/trunk/ql/src/test/results/clientpositive/stats1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union11.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union12.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union14.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union15.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union17.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union18.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union19.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union20.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union22.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union24.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union25.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union26.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union5.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union7.q.out


 fix Hive-2566 and make union optimization more aggressive 
 --

 Key: HIVE-2642
 URL: https://issues.apache.org/jira/browse/HIVE-2642
 Project: Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2642.D735.1.patch


 Hive-2566 did some optimizations to union, but cause some problems. And then 
 got reverted. This is to get it back and fix the problems we saw, and also 
 make union optimization more aggressive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2566) reduce the number map-reduce jobs for union all

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174330#comment-13174330
 ] 

Hudson commented on HIVE-2566:
--

Integrated in Hive-trunk-h0.23.0 #42 (See 
[https://builds.apache.org/job/Hive-trunk-h0.23.0/42/])
HIVE-2642 fix Hive-2566 and make union optimization more aggressive
(Yongqiang He via namit)


 reduce the number map-reduce jobs for union all
 ---

 Key: HIVE-2566
 URL: https://issues.apache.org/jira/browse/HIVE-2566
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.8.0

 Attachments: HIVE-2566.D405.1.patch, HIVE-2566.D405.2.patch, 
 HIVE-2566.D405.3.patch


 A query like:
 select s.key, s.value from (
   select key, value from src2  where key  10
   union all 
   select key, value from src3  where key  10
   union all 
   select key, value from src4  where key  10
   union all 
   select key, count(1) as value from src5 group by key
 )s;
 should run the last sub-query 
 'select key, count(1) as value from src5 group by key'
 as a map-reduce job.
 And then the union should be a map-only job reading from the first 3 map-only 
 subqueries
 and the output of the last map-reduce job.
 The current plan is very inefficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2666) StackOverflowError when using custom UDF in map join

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174331#comment-13174331
 ] 

Hudson commented on HIVE-2666:
--

Integrated in Hive-trunk-h0.23.0 #42 (See 
[https://builds.apache.org/job/Hive-trunk-h0.23.0/42/])
HIVE-2666 [jira] StackOverflowError when using custom UDF in map join
(Kevin Wilfong via Yongqiang He)

Summary:
Resource files are now added to the class path as soon as they are added via the
CLI.  This fixes the stack overflow error mentioned in the JIRA by ensuring a
consistent class loader between serializers and deserializers for the same
query.

Note that now serdes which contain a static block to register themselves are now
registered twice, once when adding the file to the class loader, and once when
an instance of the class is created.  Previously, registering a serde twice
resulted in an exception, to avoid this, I have downgraded it to a warning.

When a custom UDF is used as part of a join which is converted to a map join,
the XMLEncoder enters an infinite loop when serializing the map reduce task for
the second time, as part of sending it to be executed.  This results in a stack
overflow error.

Test Plan:
I ran the unit tests to verify nothing was broken.

I ran several queries which used custom UDFs and involved a join which was
converted to a map join.  I verified these completed successfully consistently

Reviewers: JIRA, heyongqiang

Reviewed By: heyongqiang

CC: heyongqiang, kevinwilfong

Differential Revision: 957

heyongqiang : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1221830
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/DeleteResourceProcessor.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java


 StackOverflowError when using custom UDF in map join
 

 Key: HIVE-2666
 URL: https://issues.apache.org/jira/browse/HIVE-2666
 Project: Hive
  Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2666.D957.1.patch


 When a custom UDF is used as part of a join which is converted to a map join, 
 the XMLEncoder enters an infinite loop when serializing the map reduce task 
 for the second time, as part of sending it to be executed.  This results in a 
 stack overflow error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hive-trunk-h0.21 - Build # 1163 - Still Failing

2011-12-21 Thread Apache Jenkins Server
Changes for Build #1144
[jvs] HIVE-1040 [jira] use sed rather than diff for masking out noise in 
diff-based
tests
(Marek Sapota via John Sichi)

Summary:
Replace diff -I with regex masking in Java

The current diff -I approach has two problems:  (1) it does not allow resolution
finer than line-level, so it's impossible to mask out pattern occurrences within
a line, and (2) it produces unmasked files, so if you run diff on the command
line to compare the result .q.out with the checked-in file, you see the noise.

My suggestion is to first run sed to replace noise patterns with an
unlikely-to-occur string like ZYZZYZVA, and then diff the pre-masked files
without using any -I.

This would require a one-time hit to update all existing .q.out files so that
they would contain the pre-masked results.

Test Plan: EMPTY

Reviewers: JIRA, jsichi

Reviewed By: jsichi

CC: jsichi

Differential Revision: 597


Changes for Build #1145

Changes for Build #1146
[namit] HIVE-2640 Add alterPartition to AlterHandler interface
(Kevin Wilfong via namit)


Changes for Build #1147
[namit] HIVE-2617 Insert overwrite table db.tname fails if partition already 
exists
(Chinna Rao Lalam via namit)


Changes for Build #1148
[heyongqiang] HIVE-2651 [jira] The variable hive.exec.mode.local.auto.tasks.max 
should be
changed
(Namit Jain via Yongqiang He)

Summary:
HIVE-2651

It should be called hive.exec.mode.local.auto.input.files.max instead.
The number of input files are checked currently.

Test Plan: EMPTY

Reviewers: JIRA, heyongqiang

Reviewed By: heyongqiang

CC: heyongqiang

Differential Revision: 861

[cws] HIVE-727. Hive Server getSchema() returns wrong schema for 'Explain' 
queries (Prasad Mujumdar via cws)

[namit] HIVE-2611 Make index table output of create index command if
index is table based (Kevin Wilfong via namit)


Changes for Build #1150
[jvs] HIVE-2657 [jira] builtins JAR is not being published to Maven repo  
hive-cli
POM does not depend on it either
(Carl Steinbach via John Sichi)

Summary: Make hive-cli and hive-ql depend on hive-builtins

Test Plan: EMPTY

Reviewers: JIRA, jsichi

Reviewed By: jsichi

CC: jsichi

Differential Revision: 897

[namit] HIVE-2654 hive.querylog.location requires parent directory to be 
exist or
  else folder creation fails (Chinna Rao Lalam via namit)


Changes for Build #1151
[hashutosh] HIVE-1892 : show functions also returns internal operators 
(Priyadarshini via Ashutosh Chauhan)


Changes for Build #1152

Changes for Build #1153
[namit] HIVE-2660 Need better exception handling in RCFile tolerate corruptions
mode (Ramkumar Vadali via namit)


Changes for Build #1154
[cws] HIVE-2631. Make Hive work with Hadoop 1.0.0 (Ashutosh Chauhan via cws)


Changes for Build #1155
[cws] HIVE-BUILD. Update RELEASE_NOTES.txt with 0.8.0 release information (cws)


Changes for Build #1156

Changes for Build #1157

Changes for Build #1158
[namit] HIVE-2602 add support for insert partition overwrite(...) if not
  exists (Chinna Rao Lalam via namit)


Changes for Build #1159

Changes for Build #1160
[cws] HIVE-2005. Implement BETWEEN operator (Navis via cws)


Changes for Build #1161
[jvs] HIVE-2433. add DOAP file for Hive


Changes for Build #1162

Changes for Build #1163



2 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert2_overwrite_partitions

Error Message:
Unexpected exception
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.
at junit.framework.Assert.fail(Assert.java:50)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert2_overwrite_partitions(TestCliDriver.java:16918)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)


FAILED:  

[jira] [Created] (HIVE-2671) GenericUDTFJSONTuple ignores IOExceptions

2011-12-21 Thread Dmytro Molkov (Created) (JIRA)
GenericUDTFJSONTuple ignores IOExceptions
-

 Key: HIVE-2671
 URL: https://issues.apache.org/jira/browse/HIVE-2671
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Dmytro Molkov


When running a query that uses GenericUDTFJSONTuple there is a chance to hit a 
very nasty bug.
If the write pipeline fails the task will not detect this and will simply start 
skipping all the rows in the input.

The UDTF has a catch (Throwable) that catches an IOException and forwards null 
rows, which my guess is are filtered out by the filter operator down the line 
so the map task never tries to write them out.

This happens for every row in the input.
as a result the query runs forever since it produces a log message for every 
row (we've seen tasks run for 20 hours instead of 20 minutes)

This is a stack trace of one of the tasks just in case:
at org.apache.hadoop.io.compress.zlib.ZlibCompressor.deflateBytesDirect(Native 
Method)
at 
org.apache.hadoop.io.compress.zlib.ZlibCompressor.compress(ZlibCompressor.java:315)
- locked 0x9c174f78 (a 
org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor)
at 
org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76)
at 
org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
- locked 0x9c18d4f8 (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
- locked 0x9c18d4d8 (a java.io.DataOutputStream)
at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:894)
at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:875)
at 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:592)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
at 
org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.process(GenericUDTFJSONTuple.java:167)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:368)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309)
at org.apache.hadoop.mapred.Child.main(Child.java:162)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2672) CLI fails to start when run on Hadoop 0.23.0

2011-12-21 Thread Carl Steinbach (Created) (JIRA)
CLI fails to start when run on Hadoop 0.23.0


 Key: HIVE-2672
 URL: https://issues.apache.org/jira/browse/HIVE-2672
 Project: Hive
  Issue Type: Bug
  Components: CLI, Shims
Reporter: Carl Steinbach
Assignee: Carl Steinbach




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2672) CLI fails to start when run on Hadoop 0.23.0

2011-12-21 Thread Carl Steinbach (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174514#comment-13174514
 ] 

Carl Steinbach commented on HIVE-2672:
--

The CLI won't start when run against Hadoop 0.23.0:

{noformat}
% ant clean package -Dhadoop.version=0.23.0 -Dhadoop.security.version=0.23.0 
-Dhadoop.security.version.prefix=0.23
% export HIVE_HOME=`pwd`/build/dist
% export HADOOP_HOME=`pwd`/build/hadoopcore/hadoop-0.23.0
% export PATH=$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH
% hive -hiveconf hive.root.logger=INFO,console

log4j:ERROR Could not find value for key log4j.appender.NullAppender
log4j:ERROR Could not instantiate appender named NullAppender.
log4j:ERROR Could not find value for key log4j.appender.NullAppender
log4j:ERROR Could not instantiate appender named NullAppender.
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
11/12/20 22:09:41 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/hadoop/mapred/JobConf
at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:805)
at org.apache.hadoop.hive.conf.HiveConf.init(HiveConf.java:772)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:576)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 9 more
{noformat}

 CLI fails to start when run on Hadoop 0.23.0
 

 Key: HIVE-2672
 URL: https://issues.apache.org/jira/browse/HIVE-2672
 Project: Hive
  Issue Type: Bug
  Components: CLI, Shims
Reporter: Carl Steinbach
Assignee: Carl Steinbach



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2566) reduce the number map-reduce jobs for union all

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174520#comment-13174520
 ] 

Hudson commented on HIVE-2566:
--

Integrated in Hive-trunk-h0.21 #1164 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1164/])
HIVE-2642 fix Hive-2566 and make union optimization more aggressive
(Yongqiang He via namit)


 reduce the number map-reduce jobs for union all
 ---

 Key: HIVE-2566
 URL: https://issues.apache.org/jira/browse/HIVE-2566
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.8.0

 Attachments: HIVE-2566.D405.1.patch, HIVE-2566.D405.2.patch, 
 HIVE-2566.D405.3.patch


 A query like:
 select s.key, s.value from (
   select key, value from src2  where key  10
   union all 
   select key, value from src3  where key  10
   union all 
   select key, value from src4  where key  10
   union all 
   select key, count(1) as value from src5 group by key
 )s;
 should run the last sub-query 
 'select key, count(1) as value from src5 group by key'
 as a map-reduce job.
 And then the union should be a map-only job reading from the first 3 map-only 
 subqueries
 and the output of the last map-reduce job.
 The current plan is very inefficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174519#comment-13174519
 ] 

Hudson commented on HIVE-2642:
--

Integrated in Hive-trunk-h0.21 #1164 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1164/])
HIVE-2642 fix Hive-2566 and make union optimization more aggressive
(Yongqiang He via namit)

namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1221812
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRRedSink3.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcContext.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/union26.q
* /hive/trunk/ql/src/test/results/clientpositive/auto_join27.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input25.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input26.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join35.q.out
* /hive/trunk/ql/src/test/results/clientpositive/lineage1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part14.q.out
* /hive/trunk/ql/src/test/results/clientpositive/merge4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ppd_union_view.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out
* /hive/trunk/ql/src/test/results/clientpositive/stats1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union11.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union12.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union14.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union15.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union17.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union18.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union19.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union20.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union22.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union24.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union25.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union26.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union5.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union7.q.out


 fix Hive-2566 and make union optimization more aggressive 
 --

 Key: HIVE-2642
 URL: https://issues.apache.org/jira/browse/HIVE-2642
 Project: Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2642.D735.1.patch


 Hive-2566 did some optimizations to union, but cause some problems. And then 
 got reverted. This is to get it back and fix the problems we saw, and also 
 make union optimization more aggressive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2666) StackOverflowError when using custom UDF in map join

2011-12-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174521#comment-13174521
 ] 

Hudson commented on HIVE-2666:
--

Integrated in Hive-trunk-h0.21 #1164 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1164/])
HIVE-2666 [jira] StackOverflowError when using custom UDF in map join
(Kevin Wilfong via Yongqiang He)

Summary:
Resource files are now added to the class path as soon as they are added via the
CLI.  This fixes the stack overflow error mentioned in the JIRA by ensuring a
consistent class loader between serializers and deserializers for the same
query.

Note that now serdes which contain a static block to register themselves are now
registered twice, once when adding the file to the class loader, and once when
an instance of the class is created.  Previously, registering a serde twice
resulted in an exception, to avoid this, I have downgraded it to a warning.

When a custom UDF is used as part of a join which is converted to a map join,
the XMLEncoder enters an infinite loop when serializing the map reduce task for
the second time, as part of sending it to be executed.  This results in a stack
overflow error.

Test Plan:
I ran the unit tests to verify nothing was broken.

I ran several queries which used custom UDFs and involved a join which was
converted to a map join.  I verified these completed successfully consistently

Reviewers: JIRA, heyongqiang

Reviewed By: heyongqiang

CC: heyongqiang, kevinwilfong

Differential Revision: 957

heyongqiang : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1221830
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/DeleteResourceProcessor.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java


 StackOverflowError when using custom UDF in map join
 

 Key: HIVE-2666
 URL: https://issues.apache.org/jira/browse/HIVE-2666
 Project: Hive
  Issue Type: Bug
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2666.D957.1.patch


 When a custom UDF is used as part of a join which is converted to a map join, 
 the XMLEncoder enters an infinite loop when serializing the map reduce task 
 for the second time, as part of sending it to be executed.  This results in a 
 stack overflow error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hive-trunk-h0.21 - Build # 1164 - Still Failing

2011-12-21 Thread Apache Jenkins Server
Changes for Build #1144
[jvs] HIVE-1040 [jira] use sed rather than diff for masking out noise in 
diff-based
tests
(Marek Sapota via John Sichi)

Summary:
Replace diff -I with regex masking in Java

The current diff -I approach has two problems:  (1) it does not allow resolution
finer than line-level, so it's impossible to mask out pattern occurrences within
a line, and (2) it produces unmasked files, so if you run diff on the command
line to compare the result .q.out with the checked-in file, you see the noise.

My suggestion is to first run sed to replace noise patterns with an
unlikely-to-occur string like ZYZZYZVA, and then diff the pre-masked files
without using any -I.

This would require a one-time hit to update all existing .q.out files so that
they would contain the pre-masked results.

Test Plan: EMPTY

Reviewers: JIRA, jsichi

Reviewed By: jsichi

CC: jsichi

Differential Revision: 597


Changes for Build #1145

Changes for Build #1146
[namit] HIVE-2640 Add alterPartition to AlterHandler interface
(Kevin Wilfong via namit)


Changes for Build #1147
[namit] HIVE-2617 Insert overwrite table db.tname fails if partition already 
exists
(Chinna Rao Lalam via namit)


Changes for Build #1148
[heyongqiang] HIVE-2651 [jira] The variable hive.exec.mode.local.auto.tasks.max 
should be
changed
(Namit Jain via Yongqiang He)

Summary:
HIVE-2651

It should be called hive.exec.mode.local.auto.input.files.max instead.
The number of input files are checked currently.

Test Plan: EMPTY

Reviewers: JIRA, heyongqiang

Reviewed By: heyongqiang

CC: heyongqiang

Differential Revision: 861

[cws] HIVE-727. Hive Server getSchema() returns wrong schema for 'Explain' 
queries (Prasad Mujumdar via cws)

[namit] HIVE-2611 Make index table output of create index command if
index is table based (Kevin Wilfong via namit)


Changes for Build #1150
[jvs] HIVE-2657 [jira] builtins JAR is not being published to Maven repo  
hive-cli
POM does not depend on it either
(Carl Steinbach via John Sichi)

Summary: Make hive-cli and hive-ql depend on hive-builtins

Test Plan: EMPTY

Reviewers: JIRA, jsichi

Reviewed By: jsichi

CC: jsichi

Differential Revision: 897

[namit] HIVE-2654 hive.querylog.location requires parent directory to be 
exist or
  else folder creation fails (Chinna Rao Lalam via namit)


Changes for Build #1151
[hashutosh] HIVE-1892 : show functions also returns internal operators 
(Priyadarshini via Ashutosh Chauhan)


Changes for Build #1152

Changes for Build #1153
[namit] HIVE-2660 Need better exception handling in RCFile tolerate corruptions
mode (Ramkumar Vadali via namit)


Changes for Build #1154
[cws] HIVE-2631. Make Hive work with Hadoop 1.0.0 (Ashutosh Chauhan via cws)


Changes for Build #1155
[cws] HIVE-BUILD. Update RELEASE_NOTES.txt with 0.8.0 release information (cws)


Changes for Build #1156

Changes for Build #1157

Changes for Build #1158
[namit] HIVE-2602 add support for insert partition overwrite(...) if not
  exists (Chinna Rao Lalam via namit)


Changes for Build #1159

Changes for Build #1160
[cws] HIVE-2005. Implement BETWEEN operator (Navis via cws)


Changes for Build #1161
[jvs] HIVE-2433. add DOAP file for Hive


Changes for Build #1162

Changes for Build #1163

Changes for Build #1164
[heyongqiang] HIVE-2666 [jira] StackOverflowError when using custom UDF in map 
join
(Kevin Wilfong via Yongqiang He)

Summary:
Resource files are now added to the class path as soon as they are added via the
CLI.  This fixes the stack overflow error mentioned in the JIRA by ensuring a
consistent class loader between serializers and deserializers for the same
query.

Note that now serdes which contain a static block to register themselves are now
registered twice, once when adding the file to the class loader, and once when
an instance of the class is created.  Previously, registering a serde twice
resulted in an exception, to avoid this, I have downgraded it to a warning.

When a custom UDF is used as part of a join which is converted to a map join,
the XMLEncoder enters an infinite loop when serializing the map reduce task for
the second time, as part of sending it to be executed.  This results in a stack
overflow error.

Test Plan:
I ran the unit tests to verify nothing was broken.

I ran several queries which used custom UDFs and involved a join which was
converted to a map join.  I verified these completed successfully consistently

Reviewers: JIRA, heyongqiang

Reviewed By: heyongqiang

CC: heyongqiang, kevinwilfong

Differential Revision: 957

[namit] HIVE-2642 fix Hive-2566 and make union optimization more aggressive
(Yongqiang He via namit)




7 tests failed.
REGRESSION:  
org.apache.hadoop.hive.ql.exec.TestStatsPublisherEnhanced.testStatsPublisherOneStat

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.Utilities.prepareWithRetry(Utilities.java:2211)
at 

[jira] [Created] (HIVE-2673) Eclipse launch configurations fail due to unsatisfied builtins JAR dependency

2011-12-21 Thread Carl Steinbach (Created) (JIRA)
Eclipse launch configurations fail due to unsatisfied builtins JAR dependency
-

 Key: HIVE-2673
 URL: https://issues.apache.org/jira/browse/HIVE-2673
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: John Sichi
 Fix For: 0.8.1




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2673) Eclipse launch configurations fail due to unsatisfied builtins JAR dependency

2011-12-21 Thread Carl Steinbach (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174537#comment-13174537
 ] 

Carl Steinbach commented on HIVE-2673:
--

* Generate eclipse templates and load the project into Eclipse.
* Run the TestJdbc launch configuration, get the following exception:

{noformat}

java.lang.RuntimeException: Failed to load Hive builtin functions
at 
org.apache.hadoop.hive.ql.session.SessionState.init(SessionState.java:190)
at 
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.init(HiveServer.java:135)
at 
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.init(HiveServer.java:121)
at 
org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:76)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at 
org.apache.hadoop.hive.jdbc.TestJdbcDriver.setUp(TestJdbcDriver.java:87)
at junit.framework.TestCase.runBare(TestCase.java:132)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:114)
at java.util.jar.JarFile.init(JarFile.java:135)
at java.util.jar.JarFile.init(JarFile.java:72)
at sun.net.www.protocol.jar.URLJarFile.init(URLJarFile.java:72)
at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:48)
at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:55)
at 
sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:104)
at 
sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:132)
at java.net.URL.openStream(URL.java:1010)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerFunctionsFromPluginJar(FunctionRegistry.java:1196)
at 
org.apache.hadoop.hive.ql.session.SessionState.init(SessionState.java:187)
... 20 more
{noformat}

 Eclipse launch configurations fail due to unsatisfied builtins JAR dependency
 -

 Key: HIVE-2673
 URL: https://issues.apache.org/jira/browse/HIVE-2673
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: John Sichi
 Fix For: 0.8.1




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2672) CLI fails to start when run on Hadoop 0.23.0

2011-12-21 Thread Carl Steinbach (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174554#comment-13174554
 ] 

Carl Steinbach commented on HIVE-2672:
--

Update: I can get the CLI to start if I first set HADOOP_CLASSPATH as follows:
{noformat}
% export 
HADOOP_CLASSPATH=`pwd`/build/hadoopcore/hadoop-0.23.0/modules/hadoop-mapreduce-client-core-0.23.0.jar
{noformat}


 CLI fails to start when run on Hadoop 0.23.0
 

 Key: HIVE-2672
 URL: https://issues.apache.org/jira/browse/HIVE-2672
 Project: Hive
  Issue Type: Bug
  Components: CLI, Shims
Reporter: Carl Steinbach
Assignee: Carl Steinbach



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-21 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2621:
--

Attachment: HIVE-2621.D567.3.patch

kevinwilfong updated the revision HIVE-2621 [jira] Allow multiple group bys 
with the same input data and spray keys to be run on the same reducer..
Reviewers: JIRA

  Updated the diff again to prevent conflicts.

  Added limits in the test cases to prevent the output from getting too long.

REVISION DETAIL
  https://reviews.facebook.net/D567

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/test/results/clientpositive/groupby7_noskew_multi_single_reducer.q.out
  ql/src/test/results/clientpositive/groupby_multi_single_reducer.q.out
  
ql/src/test/results/clientpositive/groupby_complex_types_multi_single_reducer.q.out
  ql/src/test/queries/clientpositive/groupby_multi_single_reducer.q
  ql/src/test/queries/clientpositive/groupby7_noskew_multi_single_reducer.q
  
ql/src/test/queries/clientpositive/groupby_complex_types_multi_single_reducer.q
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java


 Allow multiple group bys with the same input data and spray keys to be run on 
 the same reducer.
 ---

 Key: HIVE-2621
 URL: https://issues.apache.org/jira/browse/HIVE-2621
 Project: Hive
  Issue Type: New Feature
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
 HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch


 Currently, when a user runs a query, such as a multi-insert, where each 
 insertion subclause consists of a simple query followed by a group by, the 
 group bys for each clause are run on a separate reducer.  This requires 
 writing the data for each group by clause to an intermediate file, and then 
 reading it back.  This uses a significant amount of the total CPU consumed by 
 the query for an otherwise simple query.
 If the subclauses are grouped by their distinct expressions and group by 
 keys, with all of the group by expressions for a group of subclauses run on a 
 single reducer, this would reduce the amount of reading/writing to 
 intermediate files for some queries.
 To do this, for each group of subclauses, in the mapper we would execute a 
 the filters for each subclause 'or'd together (provided each subclause has a 
 filter) followed by a reduce sink.  In the reducer, the child operators would 
 be each subclauses filter followed by the group by and any subsequent 
 operations.
 Note that this would require turning off map aggregation, so we would need to 
 make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2674) get_partitions_ps throws TApplicationException if table doesn't exist

2011-12-21 Thread Kevin Wilfong (Created) (JIRA)
get_partitions_ps throws TApplicationException if table doesn't exist
-

 Key: HIVE-2674
 URL: https://issues.apache.org/jira/browse/HIVE-2674
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Kevin Wilfong


If the table passed to get_partition_ps doesn't exist, a NPE is thrown by 
getPartitionPsQueryResults.  There should be a check here, which throws a 
NoSuchObjectException if the table doesn't exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2674) get_partitions_ps throws TApplicationException if table doesn't exist

2011-12-21 Thread Kevin Wilfong (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong reassigned HIVE-2674:
---

Assignee: Kevin Wilfong

 get_partitions_ps throws TApplicationException if table doesn't exist
 -

 Key: HIVE-2674
 URL: https://issues.apache.org/jira/browse/HIVE-2674
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong

 If the table passed to get_partition_ps doesn't exist, a NPE is thrown by 
 getPartitionPsQueryResults.  There should be a check here, which throws a 
 NoSuchObjectException if the table doesn't exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2674) get_partitions_ps throws TApplicationException if table doesn't exist

2011-12-21 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2674:
--

Attachment: HIVE-2674.D987.1.patch

kevinwilfong requested code review of HIVE-2674 [jira] get_partitions_ps 
throws TApplicationException if table doesn't exist.
Reviewers: JIRA

  getPartitionPsQueryResults now throws a NoSuchObjectException instead of a 
NPE if the table named does not exist.  I updated all calls higher up so that 
the exception could propagate to Thrift client.

  If the table passed to get_partition_ps doesn't exist, a NPE is thrown by 
getPartitionPsQueryResults.  There should be a check here, which throws a 
NoSuchObjectException if the table doesn't exist.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D987

AFFECTED FILES
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h
  metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
  metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php
  metastore/if/hive_metastore.thrift

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/2055/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 get_partitions_ps throws TApplicationException if table doesn't exist
 -

 Key: HIVE-2674
 URL: https://issues.apache.org/jira/browse/HIVE-2674
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2674.D987.1.patch


 If the table passed to get_partition_ps doesn't exist, a NPE is thrown by 
 getPartitionPsQueryResults.  There should be a check here, which throws a 
 NoSuchObjectException if the table doesn't exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2674) get_partitions_ps throws TApplicationException if table doesn't exist

2011-12-21 Thread Kevin Wilfong (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-2674:


Status: Patch Available  (was: Open)

 get_partitions_ps throws TApplicationException if table doesn't exist
 -

 Key: HIVE-2674
 URL: https://issues.apache.org/jira/browse/HIVE-2674
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2674.D987.1.patch


 If the table passed to get_partition_ps doesn't exist, a NPE is thrown by 
 getPartitionPsQueryResults.  There should be a check here, which throws a 
 NoSuchObjectException if the table doesn't exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2673) Eclipse launch configurations fail due to unsatisfied builtins JAR dependency

2011-12-21 Thread John Sichi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174605#comment-13174605
 ] 

John Sichi commented on HIVE-2673:
--

Hmmm, it's because Eclipse is loading BuiltinUtils.class from its own crazy 
build location instead of from the .jar like it's supposed to:

Should be:

jar:file:/Users/jsichi/open/hive-trunk/build/builtins/hive-builtins-0.9.0-SNAPSHOT.jar!/META-INF/class-info.xml

But is:

jar:file:/Users/jsichi/open/hive-trunk/build/eclipse-classes/!/META-INF/class-info.xml

Do you know how to tell it to load from the jar instead?


 Eclipse launch configurations fail due to unsatisfied builtins JAR dependency
 -

 Key: HIVE-2673
 URL: https://issues.apache.org/jira/browse/HIVE-2673
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: John Sichi
 Fix For: 0.8.1




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

2011-12-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174671#comment-13174671
 ] 

Phabricator commented on HIVE-2621:
---

njain has commented on the revision HIVE-2621 [jira] Allow multiple group bys 
with the same input data and spray keys to be run on the same reducer..

INLINE COMMENTS
  ql/src/test/queries/clientpositive/groupby7_noskew_multi_single_reducer.q:12 
This does not look right.

  We would like to make hive.multigroupby.singlereducer as true by default.

  But, we are un-necessarily generating 3 MR jobs for this query (with no 
distinct). I think, we can get it in 2 MR jobs today (not 100% sure)
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6273 It 
would be good to merge the code path with the above if block

  (optimizeMultiGroupBy).

  The common distinct expression should return the common distinct
  checking for the parameter HIVEMULTIGROUPBYSINGLEREDUCER.

  Or, it might be simpler to remove the above if block (the 
optimizeMultiGroupby should be covered by this block).
  Anyway, the above if block (6253-6272) seems broken
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6211 I 
think this code can be simplified.

  The function getCommonDistinctExprs can be removed

REVISION DETAIL
  https://reviews.facebook.net/D567


 Allow multiple group bys with the same input data and spray keys to be run on 
 the same reducer.
 ---

 Key: HIVE-2621
 URL: https://issues.apache.org/jira/browse/HIVE-2621
 Project: Hive
  Issue Type: New Feature
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
 HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch


 Currently, when a user runs a query, such as a multi-insert, where each 
 insertion subclause consists of a simple query followed by a group by, the 
 group bys for each clause are run on a separate reducer.  This requires 
 writing the data for each group by clause to an intermediate file, and then 
 reading it back.  This uses a significant amount of the total CPU consumed by 
 the query for an otherwise simple query.
 If the subclauses are grouped by their distinct expressions and group by 
 keys, with all of the group by expressions for a group of subclauses run on a 
 single reducer, this would reduce the amount of reading/writing to 
 intermediate files for some queries.
 To do this, for each group of subclauses, in the mapper we would execute a 
 the filters for each subclause 'or'd together (provided each subclause has a 
 filter) followed by a reduce sink.  In the reducer, the child operators would 
 be each subclauses filter followed by the group by and any subsequent 
 operations.
 Note that this would require turning off map aggregation, so we would need to 
 make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira