[jira] [Commented] (HIVE-1603) support CSV text file format
[ https://issues.apache.org/jira/browse/HIVE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174190#comment-13174190 ] Sam Wilson commented on HIVE-1603: -- Instead of hard-coding it to work only with Comma-separated-volumes, why not have a DelimitedTextFile and a separate set of options to control the delimiter and quoting. Some people need pipe-delimited, for example. support CSV text file format Key: HIVE-1603 URL: https://issues.apache.org/jira/browse/HIVE-1603 Project: Hive Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Ning Zhang Comma Separated Values (CSV) text format are commonly used in exchanging relational data between heterogeneous systems. Currently Hive uses TextFile format when displaying query results. This could cause confusions when column values contain new lines or tabs. A CSVTextFile format could get around this problem. This will require a new CSVTextInputFormat, CSVTextOutputFormat, and CSVSerDe. A proposed use case is like: {code} -- exporting a table to CSV files in a directory hive set hive.io.output.fileformat=CSVTextFile; hive insert overwrite local directory '/tmp/CSVrepos/' select * from S where ... ; -- query result in CSV hive -e 'set hive.io.output.fileformat=CSVTextFile; select * from T;' | sql_loader_to_other_systems -- query CSV files directory from Hive hive create table T (...) stored as CSVTextFile; hive load data local inpath '/my/CSVfiles' into table T; hive select * from T where ...; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive
[ https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174211#comment-13174211 ] Namit Jain commented on HIVE-2642: -- 1 general comment about the new test union26.q -- Reduce the test output, I mean, you dont need to load all 500 rows for this test. It makes the test output really difficult to review. Again, all the above 3 are not blockers - I am still reviewing, I will file a enhancement for all the follow-ups. fix Hive-2566 and make union optimization more aggressive -- Key: HIVE-2642 URL: https://issues.apache.org/jira/browse/HIVE-2642 Project: Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2642.D735.1.patch Hive-2566 did some optimizations to union, but cause some problems. And then got reverted. This is to get it back and fix the problems we saw, and also make union optimization more aggressive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive
[ https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174227#comment-13174227 ] Phabricator commented on HIVE-2642: --- njain has commented on the revision HIVE-2642 [jira] fix Hive-2566 and make union optimization more aggressive. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java:245 remove this comment REVISION DETAIL https://reviews.facebook.net/D735 fix Hive-2566 and make union optimization more aggressive -- Key: HIVE-2642 URL: https://issues.apache.org/jira/browse/HIVE-2642 Project: Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2642.D735.1.patch Hive-2566 did some optimizations to union, but cause some problems. And then got reverted. This is to get it back and fix the problems we saw, and also make union optimization more aggressive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive
[ https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain resolved HIVE-2642. -- Resolution: Fixed Hadoop Flags: Reviewed Committed. Thanks Yongqiang fix Hive-2566 and make union optimization more aggressive -- Key: HIVE-2642 URL: https://issues.apache.org/jira/browse/HIVE-2642 Project: Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2642.D735.1.patch Hive-2566 did some optimizations to union, but cause some problems. And then got reverted. This is to get it back and fix the problems we saw, and also make union optimization more aggressive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2668) Minor cleanup to HIVE-2642
Minor cleanup to HIVE-2642 -- Key: HIVE-2668 URL: https://issues.apache.org/jira/browse/HIVE-2668 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: He Yongqiang INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java:105 can you add some comments here ? This is not really the top operators - this contains the list of intermediate tables also. This code is difficult to debug later on, so more comments would be helpful Look at union22.q.out. map-join followed by union, an extra stage is introduced. We dont have to optimize this - just wanted to make sure it is intentional. 1 general comment about the new test union26.q - Reduce the test output, I mean, you dont need to load all 500 rows for this test. It makes the test output really difficult to review. ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java:245 remove this comment -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2669) remove special processing for map-join
remove special processing for map-join -- Key: HIVE-2669 URL: https://issues.apache.org/jira/browse/HIVE-2669 Project: Hive Issue Type: Improvement Reporter: Namit Jain With hive.auto.convert.join, there is no need for the user to specify map-join hint. It should be completely ignored, other than for bucketized join which can be cleaned later. There is a lot of code in the optimizer for processing union followed by map-join etc. which should be gotten rid of. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2670) A cluster test utility for Hive
A cluster test utility for Hive --- Key: HIVE-2670 URL: https://issues.apache.org/jira/browse/HIVE-2670 Project: Hive Issue Type: New Feature Components: Testing Infrastructure Reporter: Alan Gates Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment. Pig and HCatalog have been using a test harness for cluster testing for some time. We have written Hive drivers and tests to run in this harness. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2666) StackOverflowError when using custom UDF in map join
[ https://issues.apache.org/jira/browse/HIVE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang resolved HIVE-2666. Resolution: Fixed committed, thanks Kevin! StackOverflowError when using custom UDF in map join Key: HIVE-2666 URL: https://issues.apache.org/jira/browse/HIVE-2666 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2666.D957.1.patch When a custom UDF is used as part of a join which is converted to a map join, the XMLEncoder enters an infinite loop when serializing the map reduce task for the second time, as part of sending it to be executed. This results in a stack overflow error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2666) StackOverflowError when using custom UDF in map join
[ https://issues.apache.org/jira/browse/HIVE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174279#comment-13174279 ] Phabricator commented on HIVE-2666: --- heyongqiang has committed the revision HIVE-2666 [jira] StackOverflowError when using custom UDF in map join. REVISION DETAIL https://reviews.facebook.net/D957 COMMIT https://reviews.facebook.net/rHIVE1221830 StackOverflowError when using custom UDF in map join Key: HIVE-2666 URL: https://issues.apache.org/jira/browse/HIVE-2666 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2666.D957.1.patch When a custom UDF is used as part of a join which is converted to a map join, the XMLEncoder enters an infinite loop when serializing the map reduce task for the second time, as part of sending it to be executed. This results in a stack overflow error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2670) A cluster test utility for Hive
[ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174292#comment-13174292 ] Alan Gates commented on HIVE-2670: -- Attached a first patch. This is not ready for inclusion yet, I'm just putting it up here to start getting feedback. The following will need to be resolved before it is checked in: # Currently it just has the base harness code included as a tar file. This really should be externed from the Pig code base, as HCatalog does. # I don't know if this is the right place in SVN or not. I put it all in a test-e2e directory right under trunk. I need feedback on whether this is a good spot or somewhere else would be preferred. # Connect the top level build.xml to this so it is possible to invoke the tests from the top level directory. I was waiting to do this until I had feedback on the proper directory structure. How to use it: After applying the patch you will need to copy the harness.tar file (attached) to test-e2e, since that is not done for you by the patch tool. First you need an existing Hadoop cluster (it can be very small, just a few nodes) and a MySQL database. I ran my tests against Hadoop 0.20.205.0, but this should run against any 0.20.x version of Hadoop. Then: # Run the script test-e2e/scripts/create_test_db.sql against your MySQL database as a user that can create users and databases, and grant to users (root is a good choice) # Run ant package in the top level Hive directory # cd test-e2e # ant -Dharness.hadoop.home=path_to_hadoop_home -Dharness.hive.home=path_to_hive_you_want_to_test deploy # ant -Dharness.hadoop.home=path_to_hadoop_home -Dharness.hive.home=path_to_hive_you_want_to_test deploy Usually path_to_hive_you_want_to_test will be $CWD/../build/dist The basic design of this test harness is each test consists of three phases: run_test, generate_benchmark, and compare_results. In run_test a particular test is run. generate_benchmark runs the same or a similar test against a known source of truth. compare_results then compares the results and declares the test to have succeeded, failed, or aborted. The harness delegates each of these three functions to drivers that are specific to different types of tests. This patch includes two drivers, a Hive driver and a Hive command line driver. The Hive driver uses the MySQL database as a source of truth. Each SQL script is run against Hive and against MySQL and the results compared using the Unix cksum tool. For more information on the test harness, including how to add tests to it, see https://cwiki.apache.org/confluence/display/PIG/HowToTest The Hive driver does not yet support running alternate SQL for benchmarking nor using an old version of Hive for the benchmarks, though those should be added sometime. A cluster test utility for Hive --- Key: HIVE-2670 URL: https://issues.apache.org/jira/browse/HIVE-2670 Project: Hive Issue Type: New Feature Components: Testing Infrastructure Reporter: Alan Gates Attachments: harness.tar, hive_cluster_test.patch Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment. Pig and HCatalog have been using a test harness for cluster testing for some time. We have written Hive drivers and tests to run in this harness. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2670) A cluster test utility for Hive
[ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-2670: - Attachment: harness.tar hive_cluster_test.patch A cluster test utility for Hive --- Key: HIVE-2670 URL: https://issues.apache.org/jira/browse/HIVE-2670 Project: Hive Issue Type: New Feature Components: Testing Infrastructure Reporter: Alan Gates Attachments: harness.tar, hive_cluster_test.patch Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment. Pig and HCatalog have been using a test harness for cluster testing for some time. We have written Hive drivers and tests to run in this harness. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive
[ https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174329#comment-13174329 ] Hudson commented on HIVE-2642: -- Integrated in Hive-trunk-h0.23.0 #42 (See [https://builds.apache.org/job/Hive-trunk-h0.23.0/42/]) HIVE-2642 fix Hive-2566 and make union optimization more aggressive (Yongqiang He via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1221812 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRRedSink3.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcContext.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcFactory.java * /hive/trunk/ql/src/test/queries/clientpositive/union26.q * /hive/trunk/ql/src/test/results/clientpositive/auto_join27.q.out * /hive/trunk/ql/src/test/results/clientpositive/input25.q.out * /hive/trunk/ql/src/test/results/clientpositive/input26.q.out * /hive/trunk/ql/src/test/results/clientpositive/join35.q.out * /hive/trunk/ql/src/test/results/clientpositive/lineage1.q.out * /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part14.q.out * /hive/trunk/ql/src/test/results/clientpositive/merge4.q.out * /hive/trunk/ql/src/test/results/clientpositive/ppd_union_view.q.out * /hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out * /hive/trunk/ql/src/test/results/clientpositive/stats1.q.out * /hive/trunk/ql/src/test/results/clientpositive/union10.q.out * /hive/trunk/ql/src/test/results/clientpositive/union11.q.out * /hive/trunk/ql/src/test/results/clientpositive/union12.q.out * /hive/trunk/ql/src/test/results/clientpositive/union14.q.out * /hive/trunk/ql/src/test/results/clientpositive/union15.q.out * /hive/trunk/ql/src/test/results/clientpositive/union17.q.out * /hive/trunk/ql/src/test/results/clientpositive/union18.q.out * /hive/trunk/ql/src/test/results/clientpositive/union19.q.out * /hive/trunk/ql/src/test/results/clientpositive/union20.q.out * /hive/trunk/ql/src/test/results/clientpositive/union22.q.out * /hive/trunk/ql/src/test/results/clientpositive/union24.q.out * /hive/trunk/ql/src/test/results/clientpositive/union25.q.out * /hive/trunk/ql/src/test/results/clientpositive/union26.q.out * /hive/trunk/ql/src/test/results/clientpositive/union3.q.out * /hive/trunk/ql/src/test/results/clientpositive/union4.q.out * /hive/trunk/ql/src/test/results/clientpositive/union5.q.out * /hive/trunk/ql/src/test/results/clientpositive/union6.q.out * /hive/trunk/ql/src/test/results/clientpositive/union7.q.out fix Hive-2566 and make union optimization more aggressive -- Key: HIVE-2642 URL: https://issues.apache.org/jira/browse/HIVE-2642 Project: Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2642.D735.1.patch Hive-2566 did some optimizations to union, but cause some problems. And then got reverted. This is to get it back and fix the problems we saw, and also make union optimization more aggressive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2566) reduce the number map-reduce jobs for union all
[ https://issues.apache.org/jira/browse/HIVE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174330#comment-13174330 ] Hudson commented on HIVE-2566: -- Integrated in Hive-trunk-h0.23.0 #42 (See [https://builds.apache.org/job/Hive-trunk-h0.23.0/42/]) HIVE-2642 fix Hive-2566 and make union optimization more aggressive (Yongqiang He via namit) reduce the number map-reduce jobs for union all --- Key: HIVE-2566 URL: https://issues.apache.org/jira/browse/HIVE-2566 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.8.0 Attachments: HIVE-2566.D405.1.patch, HIVE-2566.D405.2.patch, HIVE-2566.D405.3.patch A query like: select s.key, s.value from ( select key, value from src2 where key 10 union all select key, value from src3 where key 10 union all select key, value from src4 where key 10 union all select key, count(1) as value from src5 group by key )s; should run the last sub-query 'select key, count(1) as value from src5 group by key' as a map-reduce job. And then the union should be a map-only job reading from the first 3 map-only subqueries and the output of the last map-reduce job. The current plan is very inefficient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2666) StackOverflowError when using custom UDF in map join
[ https://issues.apache.org/jira/browse/HIVE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174331#comment-13174331 ] Hudson commented on HIVE-2666: -- Integrated in Hive-trunk-h0.23.0 #42 (See [https://builds.apache.org/job/Hive-trunk-h0.23.0/42/]) HIVE-2666 [jira] StackOverflowError when using custom UDF in map join (Kevin Wilfong via Yongqiang He) Summary: Resource files are now added to the class path as soon as they are added via the CLI. This fixes the stack overflow error mentioned in the JIRA by ensuring a consistent class loader between serializers and deserializers for the same query. Note that now serdes which contain a static block to register themselves are now registered twice, once when adding the file to the class loader, and once when an instance of the class is created. Previously, registering a serde twice resulted in an exception, to avoid this, I have downgraded it to a warning. When a custom UDF is used as part of a join which is converted to a map join, the XMLEncoder enters an infinite loop when serializing the map reduce task for the second time, as part of sending it to be executed. This results in a stack overflow error. Test Plan: I ran the unit tests to verify nothing was broken. I ran several queries which used custom UDFs and involved a join which was converted to a map join. I verified these completed successfully consistently Reviewers: JIRA, heyongqiang Reviewed By: heyongqiang CC: heyongqiang, kevinwilfong Differential Revision: 957 heyongqiang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1221830 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/DeleteResourceProcessor.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java StackOverflowError when using custom UDF in map join Key: HIVE-2666 URL: https://issues.apache.org/jira/browse/HIVE-2666 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2666.D957.1.patch When a custom UDF is used as part of a join which is converted to a map join, the XMLEncoder enters an infinite loop when serializing the map reduce task for the second time, as part of sending it to be executed. This results in a stack overflow error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1163 - Still Failing
Changes for Build #1144 [jvs] HIVE-1040 [jira] use sed rather than diff for masking out noise in diff-based tests (Marek Sapota via John Sichi) Summary: Replace diff -I with regex masking in Java The current diff -I approach has two problems: (1) it does not allow resolution finer than line-level, so it's impossible to mask out pattern occurrences within a line, and (2) it produces unmasked files, so if you run diff on the command line to compare the result .q.out with the checked-in file, you see the noise. My suggestion is to first run sed to replace noise patterns with an unlikely-to-occur string like ZYZZYZVA, and then diff the pre-masked files without using any -I. This would require a one-time hit to update all existing .q.out files so that they would contain the pre-masked results. Test Plan: EMPTY Reviewers: JIRA, jsichi Reviewed By: jsichi CC: jsichi Differential Revision: 597 Changes for Build #1145 Changes for Build #1146 [namit] HIVE-2640 Add alterPartition to AlterHandler interface (Kevin Wilfong via namit) Changes for Build #1147 [namit] HIVE-2617 Insert overwrite table db.tname fails if partition already exists (Chinna Rao Lalam via namit) Changes for Build #1148 [heyongqiang] HIVE-2651 [jira] The variable hive.exec.mode.local.auto.tasks.max should be changed (Namit Jain via Yongqiang He) Summary: HIVE-2651 It should be called hive.exec.mode.local.auto.input.files.max instead. The number of input files are checked currently. Test Plan: EMPTY Reviewers: JIRA, heyongqiang Reviewed By: heyongqiang CC: heyongqiang Differential Revision: 861 [cws] HIVE-727. Hive Server getSchema() returns wrong schema for 'Explain' queries (Prasad Mujumdar via cws) [namit] HIVE-2611 Make index table output of create index command if index is table based (Kevin Wilfong via namit) Changes for Build #1150 [jvs] HIVE-2657 [jira] builtins JAR is not being published to Maven repo hive-cli POM does not depend on it either (Carl Steinbach via John Sichi) Summary: Make hive-cli and hive-ql depend on hive-builtins Test Plan: EMPTY Reviewers: JIRA, jsichi Reviewed By: jsichi CC: jsichi Differential Revision: 897 [namit] HIVE-2654 hive.querylog.location requires parent directory to be exist or else folder creation fails (Chinna Rao Lalam via namit) Changes for Build #1151 [hashutosh] HIVE-1892 : show functions also returns internal operators (Priyadarshini via Ashutosh Chauhan) Changes for Build #1152 Changes for Build #1153 [namit] HIVE-2660 Need better exception handling in RCFile tolerate corruptions mode (Ramkumar Vadali via namit) Changes for Build #1154 [cws] HIVE-2631. Make Hive work with Hadoop 1.0.0 (Ashutosh Chauhan via cws) Changes for Build #1155 [cws] HIVE-BUILD. Update RELEASE_NOTES.txt with 0.8.0 release information (cws) Changes for Build #1156 Changes for Build #1157 Changes for Build #1158 [namit] HIVE-2602 add support for insert partition overwrite(...) if not exists (Chinna Rao Lalam via namit) Changes for Build #1159 Changes for Build #1160 [cws] HIVE-2005. Implement BETWEEN operator (Navis via cws) Changes for Build #1161 [jvs] HIVE-2433. add DOAP file for Hive Changes for Build #1162 Changes for Build #1163 2 tests failed. FAILED: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert2_overwrite_partitions Error Message: Unexpected exception See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. Stack Trace: junit.framework.AssertionFailedError: Unexpected exception See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. at junit.framework.Assert.fail(Assert.java:50) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert2_overwrite_partitions(TestCliDriver.java:16918) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906) FAILED:
[jira] [Created] (HIVE-2671) GenericUDTFJSONTuple ignores IOExceptions
GenericUDTFJSONTuple ignores IOExceptions - Key: HIVE-2671 URL: https://issues.apache.org/jira/browse/HIVE-2671 Project: Hive Issue Type: Bug Components: UDF Reporter: Dmytro Molkov When running a query that uses GenericUDTFJSONTuple there is a chance to hit a very nasty bug. If the write pipeline fails the task will not detect this and will simply start skipping all the rows in the input. The UDTF has a catch (Throwable) that catches an IOException and forwards null rows, which my guess is are filtered out by the filter operator down the line so the map task never tries to write them out. This happens for every row in the input. as a result the query runs forever since it produces a log message for every row (we've seen tasks run for 20 hours instead of 20 minutes) This is a stack trace of one of the tasks just in case: at org.apache.hadoop.io.compress.zlib.ZlibCompressor.deflateBytesDirect(Native Method) at org.apache.hadoop.io.compress.zlib.ZlibCompressor.compress(ZlibCompressor.java:315) - locked 0x9c174f78 (a org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor) at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76) at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) - locked 0x9c18d4f8 (a java.io.BufferedOutputStream) at java.io.DataOutputStream.write(DataOutputStream.java:90) - locked 0x9c18d4d8 (a java.io.DataOutputStream) at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:894) at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:875) at org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:592) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112) at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.process(GenericUDTFJSONTuple.java:167) at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:368) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309) at org.apache.hadoop.mapred.Child.main(Child.java:162) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2672) CLI fails to start when run on Hadoop 0.23.0
CLI fails to start when run on Hadoop 0.23.0 Key: HIVE-2672 URL: https://issues.apache.org/jira/browse/HIVE-2672 Project: Hive Issue Type: Bug Components: CLI, Shims Reporter: Carl Steinbach Assignee: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2672) CLI fails to start when run on Hadoop 0.23.0
[ https://issues.apache.org/jira/browse/HIVE-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174514#comment-13174514 ] Carl Steinbach commented on HIVE-2672: -- The CLI won't start when run against Hadoop 0.23.0: {noformat} % ant clean package -Dhadoop.version=0.23.0 -Dhadoop.security.version=0.23.0 -Dhadoop.security.version.prefix=0.23 % export HIVE_HOME=`pwd`/build/dist % export HADOOP_HOME=`pwd`/build/hadoopcore/hadoop-0.23.0 % export PATH=$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH % hive -hiveconf hive.root.logger=INFO,console log4j:ERROR Could not find value for key log4j.appender.NullAppender log4j:ERROR Could not instantiate appender named NullAppender. log4j:ERROR Could not find value for key log4j.appender.NullAppender log4j:ERROR Could not instantiate appender named NullAppender. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. 11/12/20 22:09:41 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:805) at org.apache.hadoop.hive.conf.HiveConf.init(HiveConf.java:772) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:576) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:200) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 9 more {noformat} CLI fails to start when run on Hadoop 0.23.0 Key: HIVE-2672 URL: https://issues.apache.org/jira/browse/HIVE-2672 Project: Hive Issue Type: Bug Components: CLI, Shims Reporter: Carl Steinbach Assignee: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2566) reduce the number map-reduce jobs for union all
[ https://issues.apache.org/jira/browse/HIVE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174520#comment-13174520 ] Hudson commented on HIVE-2566: -- Integrated in Hive-trunk-h0.21 #1164 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1164/]) HIVE-2642 fix Hive-2566 and make union optimization more aggressive (Yongqiang He via namit) reduce the number map-reduce jobs for union all --- Key: HIVE-2566 URL: https://issues.apache.org/jira/browse/HIVE-2566 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.8.0 Attachments: HIVE-2566.D405.1.patch, HIVE-2566.D405.2.patch, HIVE-2566.D405.3.patch A query like: select s.key, s.value from ( select key, value from src2 where key 10 union all select key, value from src3 where key 10 union all select key, value from src4 where key 10 union all select key, count(1) as value from src5 group by key )s; should run the last sub-query 'select key, count(1) as value from src5 group by key' as a map-reduce job. And then the union should be a map-only job reading from the first 3 map-only subqueries and the output of the last map-reduce job. The current plan is very inefficient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2642) fix Hive-2566 and make union optimization more aggressive
[ https://issues.apache.org/jira/browse/HIVE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174519#comment-13174519 ] Hudson commented on HIVE-2642: -- Integrated in Hive-trunk-h0.21 #1164 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1164/]) HIVE-2642 fix Hive-2566 and make union optimization more aggressive (Yongqiang He via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1221812 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRRedSink3.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcContext.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/unionproc/UnionProcFactory.java * /hive/trunk/ql/src/test/queries/clientpositive/union26.q * /hive/trunk/ql/src/test/results/clientpositive/auto_join27.q.out * /hive/trunk/ql/src/test/results/clientpositive/input25.q.out * /hive/trunk/ql/src/test/results/clientpositive/input26.q.out * /hive/trunk/ql/src/test/results/clientpositive/join35.q.out * /hive/trunk/ql/src/test/results/clientpositive/lineage1.q.out * /hive/trunk/ql/src/test/results/clientpositive/load_dyn_part14.q.out * /hive/trunk/ql/src/test/results/clientpositive/merge4.q.out * /hive/trunk/ql/src/test/results/clientpositive/ppd_union_view.q.out * /hive/trunk/ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out * /hive/trunk/ql/src/test/results/clientpositive/stats1.q.out * /hive/trunk/ql/src/test/results/clientpositive/union10.q.out * /hive/trunk/ql/src/test/results/clientpositive/union11.q.out * /hive/trunk/ql/src/test/results/clientpositive/union12.q.out * /hive/trunk/ql/src/test/results/clientpositive/union14.q.out * /hive/trunk/ql/src/test/results/clientpositive/union15.q.out * /hive/trunk/ql/src/test/results/clientpositive/union17.q.out * /hive/trunk/ql/src/test/results/clientpositive/union18.q.out * /hive/trunk/ql/src/test/results/clientpositive/union19.q.out * /hive/trunk/ql/src/test/results/clientpositive/union20.q.out * /hive/trunk/ql/src/test/results/clientpositive/union22.q.out * /hive/trunk/ql/src/test/results/clientpositive/union24.q.out * /hive/trunk/ql/src/test/results/clientpositive/union25.q.out * /hive/trunk/ql/src/test/results/clientpositive/union26.q.out * /hive/trunk/ql/src/test/results/clientpositive/union3.q.out * /hive/trunk/ql/src/test/results/clientpositive/union4.q.out * /hive/trunk/ql/src/test/results/clientpositive/union5.q.out * /hive/trunk/ql/src/test/results/clientpositive/union6.q.out * /hive/trunk/ql/src/test/results/clientpositive/union7.q.out fix Hive-2566 and make union optimization more aggressive -- Key: HIVE-2642 URL: https://issues.apache.org/jira/browse/HIVE-2642 Project: Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HIVE-2642.D735.1.patch Hive-2566 did some optimizations to union, but cause some problems. And then got reverted. This is to get it back and fix the problems we saw, and also make union optimization more aggressive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2666) StackOverflowError when using custom UDF in map join
[ https://issues.apache.org/jira/browse/HIVE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174521#comment-13174521 ] Hudson commented on HIVE-2666: -- Integrated in Hive-trunk-h0.21 #1164 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1164/]) HIVE-2666 [jira] StackOverflowError when using custom UDF in map join (Kevin Wilfong via Yongqiang He) Summary: Resource files are now added to the class path as soon as they are added via the CLI. This fixes the stack overflow error mentioned in the JIRA by ensuring a consistent class loader between serializers and deserializers for the same query. Note that now serdes which contain a static block to register themselves are now registered twice, once when adding the file to the class loader, and once when an instance of the class is created. Previously, registering a serde twice resulted in an exception, to avoid this, I have downgraded it to a warning. When a custom UDF is used as part of a join which is converted to a map join, the XMLEncoder enters an infinite loop when serializing the map reduce task for the second time, as part of sending it to be executed. This results in a stack overflow error. Test Plan: I ran the unit tests to verify nothing was broken. I ran several queries which used custom UDFs and involved a join which was converted to a map join. I verified these completed successfully consistently Reviewers: JIRA, heyongqiang Reviewed By: heyongqiang CC: heyongqiang, kevinwilfong Differential Revision: 957 heyongqiang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1221830 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/DeleteResourceProcessor.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java StackOverflowError when using custom UDF in map join Key: HIVE-2666 URL: https://issues.apache.org/jira/browse/HIVE-2666 Project: Hive Issue Type: Bug Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2666.D957.1.patch When a custom UDF is used as part of a join which is converted to a map join, the XMLEncoder enters an infinite loop when serializing the map reduce task for the second time, as part of sending it to be executed. This results in a stack overflow error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1164 - Still Failing
Changes for Build #1144 [jvs] HIVE-1040 [jira] use sed rather than diff for masking out noise in diff-based tests (Marek Sapota via John Sichi) Summary: Replace diff -I with regex masking in Java The current diff -I approach has two problems: (1) it does not allow resolution finer than line-level, so it's impossible to mask out pattern occurrences within a line, and (2) it produces unmasked files, so if you run diff on the command line to compare the result .q.out with the checked-in file, you see the noise. My suggestion is to first run sed to replace noise patterns with an unlikely-to-occur string like ZYZZYZVA, and then diff the pre-masked files without using any -I. This would require a one-time hit to update all existing .q.out files so that they would contain the pre-masked results. Test Plan: EMPTY Reviewers: JIRA, jsichi Reviewed By: jsichi CC: jsichi Differential Revision: 597 Changes for Build #1145 Changes for Build #1146 [namit] HIVE-2640 Add alterPartition to AlterHandler interface (Kevin Wilfong via namit) Changes for Build #1147 [namit] HIVE-2617 Insert overwrite table db.tname fails if partition already exists (Chinna Rao Lalam via namit) Changes for Build #1148 [heyongqiang] HIVE-2651 [jira] The variable hive.exec.mode.local.auto.tasks.max should be changed (Namit Jain via Yongqiang He) Summary: HIVE-2651 It should be called hive.exec.mode.local.auto.input.files.max instead. The number of input files are checked currently. Test Plan: EMPTY Reviewers: JIRA, heyongqiang Reviewed By: heyongqiang CC: heyongqiang Differential Revision: 861 [cws] HIVE-727. Hive Server getSchema() returns wrong schema for 'Explain' queries (Prasad Mujumdar via cws) [namit] HIVE-2611 Make index table output of create index command if index is table based (Kevin Wilfong via namit) Changes for Build #1150 [jvs] HIVE-2657 [jira] builtins JAR is not being published to Maven repo hive-cli POM does not depend on it either (Carl Steinbach via John Sichi) Summary: Make hive-cli and hive-ql depend on hive-builtins Test Plan: EMPTY Reviewers: JIRA, jsichi Reviewed By: jsichi CC: jsichi Differential Revision: 897 [namit] HIVE-2654 hive.querylog.location requires parent directory to be exist or else folder creation fails (Chinna Rao Lalam via namit) Changes for Build #1151 [hashutosh] HIVE-1892 : show functions also returns internal operators (Priyadarshini via Ashutosh Chauhan) Changes for Build #1152 Changes for Build #1153 [namit] HIVE-2660 Need better exception handling in RCFile tolerate corruptions mode (Ramkumar Vadali via namit) Changes for Build #1154 [cws] HIVE-2631. Make Hive work with Hadoop 1.0.0 (Ashutosh Chauhan via cws) Changes for Build #1155 [cws] HIVE-BUILD. Update RELEASE_NOTES.txt with 0.8.0 release information (cws) Changes for Build #1156 Changes for Build #1157 Changes for Build #1158 [namit] HIVE-2602 add support for insert partition overwrite(...) if not exists (Chinna Rao Lalam via namit) Changes for Build #1159 Changes for Build #1160 [cws] HIVE-2005. Implement BETWEEN operator (Navis via cws) Changes for Build #1161 [jvs] HIVE-2433. add DOAP file for Hive Changes for Build #1162 Changes for Build #1163 Changes for Build #1164 [heyongqiang] HIVE-2666 [jira] StackOverflowError when using custom UDF in map join (Kevin Wilfong via Yongqiang He) Summary: Resource files are now added to the class path as soon as they are added via the CLI. This fixes the stack overflow error mentioned in the JIRA by ensuring a consistent class loader between serializers and deserializers for the same query. Note that now serdes which contain a static block to register themselves are now registered twice, once when adding the file to the class loader, and once when an instance of the class is created. Previously, registering a serde twice resulted in an exception, to avoid this, I have downgraded it to a warning. When a custom UDF is used as part of a join which is converted to a map join, the XMLEncoder enters an infinite loop when serializing the map reduce task for the second time, as part of sending it to be executed. This results in a stack overflow error. Test Plan: I ran the unit tests to verify nothing was broken. I ran several queries which used custom UDFs and involved a join which was converted to a map join. I verified these completed successfully consistently Reviewers: JIRA, heyongqiang Reviewed By: heyongqiang CC: heyongqiang, kevinwilfong Differential Revision: 957 [namit] HIVE-2642 fix Hive-2566 and make union optimization more aggressive (Yongqiang He via namit) 7 tests failed. REGRESSION: org.apache.hadoop.hive.ql.exec.TestStatsPublisherEnhanced.testStatsPublisherOneStat Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Utilities.prepareWithRetry(Utilities.java:2211) at
[jira] [Created] (HIVE-2673) Eclipse launch configurations fail due to unsatisfied builtins JAR dependency
Eclipse launch configurations fail due to unsatisfied builtins JAR dependency - Key: HIVE-2673 URL: https://issues.apache.org/jira/browse/HIVE-2673 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Carl Steinbach Assignee: John Sichi Fix For: 0.8.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2673) Eclipse launch configurations fail due to unsatisfied builtins JAR dependency
[ https://issues.apache.org/jira/browse/HIVE-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174537#comment-13174537 ] Carl Steinbach commented on HIVE-2673: -- * Generate eclipse templates and load the project into Eclipse. * Run the TestJdbc launch configuration, get the following exception: {noformat} java.lang.RuntimeException: Failed to load Hive builtin functions at org.apache.hadoop.hive.ql.session.SessionState.init(SessionState.java:190) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.init(HiveServer.java:135) at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.init(HiveServer.java:121) at org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:76) at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:185) at org.apache.hadoop.hive.jdbc.TestJdbcDriver.setUp(TestJdbcDriver.java:87) at junit.framework.TestCase.runBare(TestCase.java:132) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:114) at java.util.jar.JarFile.init(JarFile.java:135) at java.util.jar.JarFile.init(JarFile.java:72) at sun.net.www.protocol.jar.URLJarFile.init(URLJarFile.java:72) at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:48) at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:55) at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:104) at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:132) at java.net.URL.openStream(URL.java:1010) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerFunctionsFromPluginJar(FunctionRegistry.java:1196) at org.apache.hadoop.hive.ql.session.SessionState.init(SessionState.java:187) ... 20 more {noformat} Eclipse launch configurations fail due to unsatisfied builtins JAR dependency - Key: HIVE-2673 URL: https://issues.apache.org/jira/browse/HIVE-2673 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Carl Steinbach Assignee: John Sichi Fix For: 0.8.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2672) CLI fails to start when run on Hadoop 0.23.0
[ https://issues.apache.org/jira/browse/HIVE-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174554#comment-13174554 ] Carl Steinbach commented on HIVE-2672: -- Update: I can get the CLI to start if I first set HADOOP_CLASSPATH as follows: {noformat} % export HADOOP_CLASSPATH=`pwd`/build/hadoopcore/hadoop-0.23.0/modules/hadoop-mapreduce-client-core-0.23.0.jar {noformat} CLI fails to start when run on Hadoop 0.23.0 Key: HIVE-2672 URL: https://issues.apache.org/jira/browse/HIVE-2672 Project: Hive Issue Type: Bug Components: CLI, Shims Reporter: Carl Steinbach Assignee: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.
[ https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2621: -- Attachment: HIVE-2621.D567.3.patch kevinwilfong updated the revision HIVE-2621 [jira] Allow multiple group bys with the same input data and spray keys to be run on the same reducer.. Reviewers: JIRA Updated the diff again to prevent conflicts. Added limits in the test cases to prevent the output from getting too long. REVISION DETAIL https://reviews.facebook.net/D567 AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/test/results/clientpositive/groupby7_noskew_multi_single_reducer.q.out ql/src/test/results/clientpositive/groupby_multi_single_reducer.q.out ql/src/test/results/clientpositive/groupby_complex_types_multi_single_reducer.q.out ql/src/test/queries/clientpositive/groupby_multi_single_reducer.q ql/src/test/queries/clientpositive/groupby7_noskew_multi_single_reducer.q ql/src/test/queries/clientpositive/groupby_complex_types_multi_single_reducer.q ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDesc.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java Allow multiple group bys with the same input data and spray keys to be run on the same reducer. --- Key: HIVE-2621 URL: https://issues.apache.org/jira/browse/HIVE-2621 Project: Hive Issue Type: New Feature Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch Currently, when a user runs a query, such as a multi-insert, where each insertion subclause consists of a simple query followed by a group by, the group bys for each clause are run on a separate reducer. This requires writing the data for each group by clause to an intermediate file, and then reading it back. This uses a significant amount of the total CPU consumed by the query for an otherwise simple query. If the subclauses are grouped by their distinct expressions and group by keys, with all of the group by expressions for a group of subclauses run on a single reducer, this would reduce the amount of reading/writing to intermediate files for some queries. To do this, for each group of subclauses, in the mapper we would execute a the filters for each subclause 'or'd together (provided each subclause has a filter) followed by a reduce sink. In the reducer, the child operators would be each subclauses filter followed by the group by and any subsequent operations. Note that this would require turning off map aggregation, so we would need to make using this type of plan configurable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2674) get_partitions_ps throws TApplicationException if table doesn't exist
get_partitions_ps throws TApplicationException if table doesn't exist - Key: HIVE-2674 URL: https://issues.apache.org/jira/browse/HIVE-2674 Project: Hive Issue Type: Bug Components: Metastore Reporter: Kevin Wilfong If the table passed to get_partition_ps doesn't exist, a NPE is thrown by getPartitionPsQueryResults. There should be a check here, which throws a NoSuchObjectException if the table doesn't exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2674) get_partitions_ps throws TApplicationException if table doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong reassigned HIVE-2674: --- Assignee: Kevin Wilfong get_partitions_ps throws TApplicationException if table doesn't exist - Key: HIVE-2674 URL: https://issues.apache.org/jira/browse/HIVE-2674 Project: Hive Issue Type: Bug Components: Metastore Reporter: Kevin Wilfong Assignee: Kevin Wilfong If the table passed to get_partition_ps doesn't exist, a NPE is thrown by getPartitionPsQueryResults. There should be a check here, which throws a NoSuchObjectException if the table doesn't exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2674) get_partitions_ps throws TApplicationException if table doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2674: -- Attachment: HIVE-2674.D987.1.patch kevinwilfong requested code review of HIVE-2674 [jira] get_partitions_ps throws TApplicationException if table doesn't exist. Reviewers: JIRA getPartitionPsQueryResults now throws a NoSuchObjectException instead of a NPE if the table named does not exist. I updated all calls higher up so that the exception could propagate to Thrift client. If the table passed to get_partition_ps doesn't exist, a NPE is thrown by getPartitionPsQueryResults. There should be a check here, which throws a NoSuchObjectException if the table doesn't exist. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D987 AFFECTED FILES metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php metastore/if/hive_metastore.thrift MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/2055/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. get_partitions_ps throws TApplicationException if table doesn't exist - Key: HIVE-2674 URL: https://issues.apache.org/jira/browse/HIVE-2674 Project: Hive Issue Type: Bug Components: Metastore Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2674.D987.1.patch If the table passed to get_partition_ps doesn't exist, a NPE is thrown by getPartitionPsQueryResults. There should be a check here, which throws a NoSuchObjectException if the table doesn't exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2674) get_partitions_ps throws TApplicationException if table doesn't exist
[ https://issues.apache.org/jira/browse/HIVE-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-2674: Status: Patch Available (was: Open) get_partitions_ps throws TApplicationException if table doesn't exist - Key: HIVE-2674 URL: https://issues.apache.org/jira/browse/HIVE-2674 Project: Hive Issue Type: Bug Components: Metastore Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2674.D987.1.patch If the table passed to get_partition_ps doesn't exist, a NPE is thrown by getPartitionPsQueryResults. There should be a check here, which throws a NoSuchObjectException if the table doesn't exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2673) Eclipse launch configurations fail due to unsatisfied builtins JAR dependency
[ https://issues.apache.org/jira/browse/HIVE-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174605#comment-13174605 ] John Sichi commented on HIVE-2673: -- Hmmm, it's because Eclipse is loading BuiltinUtils.class from its own crazy build location instead of from the .jar like it's supposed to: Should be: jar:file:/Users/jsichi/open/hive-trunk/build/builtins/hive-builtins-0.9.0-SNAPSHOT.jar!/META-INF/class-info.xml But is: jar:file:/Users/jsichi/open/hive-trunk/build/eclipse-classes/!/META-INF/class-info.xml Do you know how to tell it to load from the jar instead? Eclipse launch configurations fail due to unsatisfied builtins JAR dependency - Key: HIVE-2673 URL: https://issues.apache.org/jira/browse/HIVE-2673 Project: Hive Issue Type: Bug Components: Build Infrastructure Reporter: Carl Steinbach Assignee: John Sichi Fix For: 0.8.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.
[ https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174671#comment-13174671 ] Phabricator commented on HIVE-2621: --- njain has commented on the revision HIVE-2621 [jira] Allow multiple group bys with the same input data and spray keys to be run on the same reducer.. INLINE COMMENTS ql/src/test/queries/clientpositive/groupby7_noskew_multi_single_reducer.q:12 This does not look right. We would like to make hive.multigroupby.singlereducer as true by default. But, we are un-necessarily generating 3 MR jobs for this query (with no distinct). I think, we can get it in 2 MR jobs today (not 100% sure) ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6273 It would be good to merge the code path with the above if block (optimizeMultiGroupBy). The common distinct expression should return the common distinct checking for the parameter HIVEMULTIGROUPBYSINGLEREDUCER. Or, it might be simpler to remove the above if block (the optimizeMultiGroupby should be covered by this block). Anyway, the above if block (6253-6272) seems broken ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6211 I think this code can be simplified. The function getCommonDistinctExprs can be removed REVISION DETAIL https://reviews.facebook.net/D567 Allow multiple group bys with the same input data and spray keys to be run on the same reducer. --- Key: HIVE-2621 URL: https://issues.apache.org/jira/browse/HIVE-2621 Project: Hive Issue Type: New Feature Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch Currently, when a user runs a query, such as a multi-insert, where each insertion subclause consists of a simple query followed by a group by, the group bys for each clause are run on a separate reducer. This requires writing the data for each group by clause to an intermediate file, and then reading it back. This uses a significant amount of the total CPU consumed by the query for an otherwise simple query. If the subclauses are grouped by their distinct expressions and group by keys, with all of the group by expressions for a group of subclauses run on a single reducer, this would reduce the amount of reading/writing to intermediate files for some queries. To do this, for each group of subclauses, in the mapper we would execute a the filters for each subclause 'or'd together (provided each subclause has a filter) followed by a reduce sink. In the reducer, the child operators would be each subclauses filter followed by the group by and any subsequent operations. Note that this would require turning off map aggregation, so we would need to make using this type of plan configurable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira