[jira] Updated: (HIVE-1678) NPE in MapJoin
[ https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-1678: -- Attachment: patch-1678.txt The bug is in plan generation when MapJoin is followed MapJoin, and is followed by ReduceSink. ReduceSink operator reads the input from oldMapJoin instead of current MapJoin. Attached patch has one line fix in GenMapRedUtils.initMapJoinPlan to fix the bug. Also includes the testcase. NPE in MapJoin --- Key: HIVE-1678 URL: https://issues.apache.org/jira/browse/HIVE-1678 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: patch-1678.txt The query with two map joins and a group by fails with following NPE: Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1681) ObjectStore.commitTransaction() does not properly handle transactions that have already been rolled back
[ https://issues.apache.org/jira/browse/HIVE-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917520#action_12917520 ] Venkatesh S commented on HIVE-1681: --- The query ran successfully with this patch. Thanks Carl. Appreciate if this can be committed quickly. ObjectStore.commitTransaction() does not properly handle transactions that have already been rolled back Key: HIVE-1681 URL: https://issues.apache.org/jira/browse/HIVE-1681 Project: Hadoop Hive Issue Type: Bug Components: Metastore Affects Versions: 0.5.0, 0.6.0, 0.7.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-1681.1.patch.txt Here's the code for ObjectStore.commitTransaction() and ObjectStore.rollbackTransaction(): {code} public boolean commitTransaction() { assert (openTrasactionCalls = 1); if (!currentTransaction.isActive()) { throw new RuntimeException( Commit is called, but transaction is not active. Either there are + mismatching open and close calls or rollback was called in the same trasaction); } openTrasactionCalls--; if ((openTrasactionCalls == 0) currentTransaction.isActive()) { transactionStatus = TXN_STATUS.COMMITED; currentTransaction.commit(); } return true; } public void rollbackTransaction() { if (openTrasactionCalls 1) { return; } openTrasactionCalls = 0; if (currentTransaction.isActive() transactionStatus != TXN_STATUS.ROLLBACK) { transactionStatus = TXN_STATUS.ROLLBACK; // could already be rolled back currentTransaction.rollback(); } } {code} Now suppose a nested transaction throws an exception which results in the nested pseudo-transaction calling rollbackTransaction(). This causes rollbackTransaction() to rollback the actual transaction, as well as to set openTransactionCalls=0 and transactionStatus = TXN_STATUS.ROLLBACK. Suppose also that this nested transaction squelches the original exception. In this case the stack will unwind and the caller will eventually try to commit the transaction by calling commitTransaction() which will see that currentTransaction.isActive() returns FALSE and will throw a RuntimeException. The fix for this problem is that commitTransaction() needs to first check transactionStatus and return immediately if transactionStatus==TXN_STATUS.ROLLBACK. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1689) Add GROUP_CONCAT to HiveQL
Add GROUP_CONCAT to HiveQL -- Key: HIVE-1689 URL: https://issues.apache.org/jira/browse/HIVE-1689 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Jeff Hammerbacher I often find GROUP_CONCAT to be handy when working with list-type data. See http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat for the MySQL syntax. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1647) Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe )
[ https://issues.apache.org/jira/browse/HIVE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1647: - Status: Open (was: Patch Available) Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe ) Key: HIVE-1647 URL: https://issues.apache.org/jira/browse/HIVE-1647 Project: Hadoop Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.6.0, 0.7.0 Reporter: Raman Grover Assignee: Liyin Tang Fix For: 0.7.0 Attachments: HIVE-1647.patch Original Estimate: 0.17h Remaining Estimate: 0.17h Bug in org.apache.hadoop.hive.ql.io.IOContext in relation to initialization of thread local variable. public class IOContext { private static ThreadLocalIOContext threadLocal = new ThreadLocalIOContext(){ }; static { if (threadLocal.get() == null) { threadLocal.set(new IOContext()); } } In a multi-threaded environment, the thread that gets to load the class first for the JVM (assuming threads share the classloader), gets to initialize itself correctly by executing the code in the static block. Once the class is loaded, any subsequent threads would have their respective threadlocal variable as null. Since IOContext is set during initialization of HiveRecordReader, In a scenario where multiple threads get to acquire an instance of HiveRecordReader, it would result in a NPE for all but the first thread that gets to load the class in the VM. Is the above scenario of multiple threads initializing HiveRecordReader a typical one ? or we could just provide the following fix... private static ThreadLocalIOContext threadLocal = new ThreadLocalIOContext(){ protected synchronized IOContext initialValue() { return new IOContext(); } }; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1545) Add a bunch of UDFs and UDAFs
[ https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917546#action_12917546 ] Terje Marthinussen commented on HIVE-1545: -- Was just quickly looking at this and noticed that grep lib com/facebook/hive/udf/*java com/facebook/hive/udf/UDAFHistogram.java:import com.facebook.hive.udf.lib.Counter; com/facebook/hive/udf/UDFJaccard.java:import com.facebook.hive.udf.lib.SetOps; however, there is no com.facebook.hive.udf.lib included. Add a bunch of UDFs and UDAFs - Key: HIVE-1545 URL: https://issues.apache.org/jira/browse/HIVE-1545 Project: Hadoop Hive Issue Type: New Feature Components: UDF Reporter: Jonathan Chang Assignee: Jonathan Chang Priority: Minor Attachments: udfs.tar.gz Here some UD(A)Fs which can be incorporated into the Hive distribution: UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 5, 3) returns 1. UDFBucket - Find the bucket in which the first argument belongs. e.g., BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x b_{i} but = b_{i+1}. Returns 0 if x is smaller than all the buckets. UDFFindInArray - Finds the 1-index of the first element in the array given as the second argument. Returns 0 if not found. Returns NULL if either argument is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, array(1,2,3)) will return 0. UDFGreatCircleDist - Finds the great circle distance (in km) between two lat/long coordinates (in degrees). UDFLDA - Performs LDA inference on a vector given fixed topics. UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 whenever any of its parameters changes. UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 5. UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches in an array. UDFUnescape - Returns the string unescaped (using C/Java style unescaping). UDFWhich - Given a boolean array, return the indices which are TRUE. UDFJaccard UDAFCollect - Takes all the values associated with a row and converts it into a list. Make sure to have: set hive.map.aggr = false; UDAFCollectMap - Like collect except that it takes tuples and generates a map. UDAFEntropy - Compute the entropy of a column. UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two columns. UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value of VAL. UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated with the N (passed as the third parameter) largest values of VAL. UDAFHistogram -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1678) NPE in MapJoin
[ https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-1678: -- Status: Patch Available (was: Open) NPE in MapJoin --- Key: HIVE-1678 URL: https://issues.apache.org/jira/browse/HIVE-1678 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: patch-1678.txt The query with two map joins and a group by fails with following NPE: Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.18 #559
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/559/ -- [...truncated 31015 lines...] [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.seq [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/complex.seq [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/json.txt [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out [junit] Done query: unknown_table1.q [junit] Begin query: unknown_table2.q [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket0.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket1.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket20.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket21.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket22.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.seq [junit] Loading data to table
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917677#action_12917677 ] Namit Jain commented on HIVE-1546: -- I will take a look in more detail, but overall it looks good. I had the following comments: 1. Instead of TestSemanticAnalyzerHookLoading.java, add tests in test/queries/clientpositive and test/queries/clientnegative 2. Do you want to set the value of hive.semantic.analyzer.hook to a dummy value in data/conf/hive-site.xml for the unit tests ? Can something meaningful be printed here, which can be used for comparing ? Ability to plug custom Semantic Analyzers for Hive Grammar -- Key: HIVE-1546 URL: https://issues.apache.org/jira/browse/HIVE-1546 Project: Hadoop Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, hive-1546_2.patch, hooks.patch, Howl_Semantic_Analysis.txt It will be useful if Semantic Analysis phase is made pluggable such that other projects can do custom analysis of hive queries before doing metastore operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1678) NPE in MapJoin
[ https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917684#action_12917684 ] Namit Jain commented on HIVE-1678: -- Nice catch - Thanks +1 will commit if the tests pass NPE in MapJoin --- Key: HIVE-1678 URL: https://issues.apache.org/jira/browse/HIVE-1678 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: patch-1678.txt The query with two map joins and a group by fails with following NPE: Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.20 #382
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/382/ -- [...truncated 14189 lines...] [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/complex.seq [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/json.txt [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out [junit] Done query: unknown_table1.q [junit] Begin query: unknown_table2.q [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket0.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq [junit] Loading data to table
[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting
[ https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917725#action_12917725 ] He Yongqiang commented on HIVE-1658: +1. Looks good. Can you do the final patch? Fix describe [extended] column formatting - Key: HIVE-1658 URL: https://issues.apache.org/jira/browse/HIVE-1658 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Thiruvel Thirumoolan Attachments: HIVE-1658-PrelimPatch.patch When displaying the column schema, the formatting should follow should be nameTABtypeTABcommentNEWLINE to be inline with the previous formatting style for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting
[ https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917771#action_12917771 ] He Yongqiang commented on HIVE-1658: one more thing, if the time information (create time, last access time etc) is 0, can you put some string like unknown to the output of desc format? Fix describe [extended] column formatting - Key: HIVE-1658 URL: https://issues.apache.org/jira/browse/HIVE-1658 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Thiruvel Thirumoolan Attachments: HIVE-1658-PrelimPatch.patch When displaying the column schema, the formatting should follow should be nameTABtypeTABcommentNEWLINE to be inline with the previous formatting style for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1678) NPE in MapJoin
[ https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1678: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Amareshwari NPE in MapJoin --- Key: HIVE-1678 URL: https://issues.apache.org/jira/browse/HIVE-1678 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Attachments: patch-1678.txt The query with two map joins and a group by fails with following NPE: Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results
[ https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1674: - Attachment: HIVE-1674.patch count(*) returns wrong result when a mapper returns empty results - Key: HIVE-1674 URL: https://issues.apache.org/jira/browse/HIVE-1674 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1674.patch select count(*) from src where false; will return # of mappers rather than 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results
[ https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1674: - Status: Patch Available (was: Open) count(*) returns wrong result when a mapper returns empty results - Key: HIVE-1674 URL: https://issues.apache.org/jira/browse/HIVE-1674 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1674.patch select count(*) from src where false; will return # of mappers rather than 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query
[ https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang updated HIVE-1376: - Status: Patch Available (was: Open) Simple UDAFs with more than 1 parameter crash on empty row query - Key: HIVE-1376 URL: https://issues.apache.org/jira/browse/HIVE-1376 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Mayank Lahiri Assignee: Ning Zhang Attachments: HIVE-1376.2.patch, HIVE-1376.patch Simple UDAFs with more than 1 parameter crash when the query returns no rows. Currently, this only seems to affect the percentile() UDAF where the second parameter is the percentile to be computed (of type double). I've also verified the bug by adding a dummy parameter to ExampleMin in contrib. On an empty query, Hive seems to be trying to resolve an iterate() method with signature {null,null} instead of {null,double}. You can reproduce this bug using: CREATE TABLE pct_test ( val INT ); SELECT percentile(val, 0.5) FROM pct_test; which produces a lot of errors like: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double) on object org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator with arguments {null, null} of size 2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1570: Attachment: 1570.1.patch before running a map-reduce job in local mode we: 1. set a new working directory 2. symlink all added files from that working directory this is pretty much identical to how hadoop sets up task execution environment. all references to scripts and add files using their names only now resolve correctly in local mode. there was some hacky code in SemanticAnalyzer.java to deal with this that doesn't work in all cases (when referenced file is not the first item in command line or in automatic local mode). i have deleted it. duplicated one of the tests so that we get coverage against a real cluster (scriptfile1.q executed against minimr) and local mode (scriptfile2.q). still running tests. referencing an added file by it's name in a transform script does not work in hive local mode - Key: HIVE-1570 URL: https://issues.apache.org/jira/browse/HIVE-1570 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: 1570.1.patch Yongqiang tried this and it fails in local mode: add file ../data/scripts/dumpdata_script.py; select count(distinct subq.key) from (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq; this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1570: Attachment: 1570.2.patch working patch. no need for new test. had to modify some other tests to use 'add file'. referencing an added file by it's name in a transform script does not work in hive local mode - Key: HIVE-1570 URL: https://issues.apache.org/jira/browse/HIVE-1570 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: 1570.1.patch, 1570.2.patch Yongqiang tried this and it fails in local mode: add file ../data/scripts/dumpdata_script.py; select count(distinct subq.key) from (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq; this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode
[ https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-1570: Status: Patch Available (was: Open) referencing an added file by it's name in a transform script does not work in hive local mode - Key: HIVE-1570 URL: https://issues.apache.org/jira/browse/HIVE-1570 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: 1570.1.patch, 1570.2.patch Yongqiang tried this and it fails in local mode: add file ../data/scripts/dumpdata_script.py; select count(distinct subq.key) from (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key = 10) subq; this needs to be fixed because it means we cannot choose local mode automatically in case of transform scripts (since different paths need to be used for cluster vs. local mode execution) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar
[ https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917851#action_12917851 ] Ashutosh Chauhan commented on HIVE-1546: I did it in junit form because John suggested it that way in his earlier comment: {quote} * We need a test for loading a variation on the default semantic analyzer in order to exercise the pluggable configuration. You can create a subclass of the default analyzer (under ql/src/test/org/apache/hadoop/hive/ql/parse) to inject some mock behavior change. {quote} I also feel junit test is better suited for this kind of behavioral testing of code paths (which exercises interface points) rather then forcing through string comparison ways of test/queries/* which are more end-to-end tests for hive. Further if we add dummy hook name in data/conf/hive-site.xml then that dummy hook will get loaded and all the subsequent tests will have it too. Do we want it that way? Ability to plug custom Semantic Analyzers for Hive Grammar -- Key: HIVE-1546 URL: https://issues.apache.org/jira/browse/HIVE-1546 Project: Hadoop Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, hive-1546_2.patch, hooks.patch, Howl_Semantic_Analysis.txt It will be useful if Semantic Analysis phase is made pluggable such that other projects can do custom analysis of hive queries before doing metastore operations on them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results
[ https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917856#action_12917856 ] He Yongqiang commented on HIVE-1674: will take a look. count(*) returns wrong result when a mapper returns empty results - Key: HIVE-1674 URL: https://issues.apache.org/jira/browse/HIVE-1674 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Ning Zhang Attachments: HIVE-1674.patch select count(*) from src where false; will return # of mappers rather than 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks
[ https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Melick reassigned HIVE-1501: Assignee: Skye Berghel (was: Russell Melick) when generating reentrant INSERT for index rebuild, quote identifiers using backticks - Key: HIVE-1501 URL: https://issues.apache.org/jira/browse/HIVE-1501 Project: Hadoop Hive Issue Type: Bug Components: Indexing Affects Versions: 0.7.0 Reporter: John Sichi Assignee: Skye Berghel Fix For: 0.7.0 Yongqiang, you mentioned that you weren't able to do this due to SORT BY not accepting them. The SORT BY is gone now as of HIVE-1494 (and SORT BY needs to be fixed anyway). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.