[jira] Updated: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1678:
--

Attachment: patch-1678.txt

The bug is in plan generation when MapJoin is followed MapJoin, and is followed 
by ReduceSink. ReduceSink operator reads the input from oldMapJoin instead of 
current MapJoin.

Attached patch has one line fix in GenMapRedUtils.initMapJoinPlan to fix the 
bug. Also includes the testcase.

 NPE in MapJoin 
 ---

 Key: HIVE-1678
 URL: https://issues.apache.org/jira/browse/HIVE-1678
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1678.txt


 The query with two map joins and a group by fails with following NPE:
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1681) ObjectStore.commitTransaction() does not properly handle transactions that have already been rolled back

2010-10-04 Thread Venkatesh S (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917520#action_12917520
 ] 

Venkatesh S commented on HIVE-1681:
---

The query ran successfully with this patch. Thanks Carl. Appreciate if this can 
be committed quickly.

 ObjectStore.commitTransaction() does not properly handle transactions that 
 have already been rolled back
 

 Key: HIVE-1681
 URL: https://issues.apache.org/jira/browse/HIVE-1681
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.5.0, 0.6.0, 0.7.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1681.1.patch.txt


 Here's the code for ObjectStore.commitTransaction() and 
 ObjectStore.rollbackTransaction():
 {code}
   public boolean commitTransaction() {
 assert (openTrasactionCalls = 1);
 if (!currentTransaction.isActive()) {
   throw new RuntimeException(
   Commit is called, but transaction is not active. Either there are
   +  mismatching open and close calls or rollback was called in 
 the same trasaction);
 }
 openTrasactionCalls--;
 if ((openTrasactionCalls == 0)  currentTransaction.isActive()) {
   transactionStatus = TXN_STATUS.COMMITED;
   currentTransaction.commit();
 }
 return true;
   }
   public void rollbackTransaction() {
 if (openTrasactionCalls  1) {
   return;
 }
 openTrasactionCalls = 0;
 if (currentTransaction.isActive()
  transactionStatus != TXN_STATUS.ROLLBACK) {
   transactionStatus = TXN_STATUS.ROLLBACK;
   // could already be rolled back
   currentTransaction.rollback();
 }
   }
 {code}
 Now suppose a nested transaction throws an exception which results
 in the nested pseudo-transaction calling rollbackTransaction(). This causes
 rollbackTransaction() to rollback the actual transaction, as well as to set 
 openTransactionCalls=0 and transactionStatus = TXN_STATUS.ROLLBACK.
 Suppose also that this nested transaction squelches the original exception.
 In this case the stack will unwind and the caller will eventually try to 
 commit the
 transaction by calling commitTransaction() which will see that 
 currentTransaction.isActive() returns
 FALSE and will throw a RuntimeException. The fix for this problem is
 that commitTransaction() needs to first check transactionStatus and return 
 immediately
 if transactionStatus==TXN_STATUS.ROLLBACK.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1689) Add GROUP_CONCAT to HiveQL

2010-10-04 Thread Jeff Hammerbacher (JIRA)
Add GROUP_CONCAT to HiveQL
--

 Key: HIVE-1689
 URL: https://issues.apache.org/jira/browse/HIVE-1689
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Jeff Hammerbacher


I often find GROUP_CONCAT to be handy when working with list-type data. See 
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
 for the MySQL syntax.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1647) Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe )

2010-10-04 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1647:
-

Status: Open  (was: Patch Available)

 Incorrect initialization of thread local variable inside IOContext ( 
 implementation is not threadsafe ) 
 

 Key: HIVE-1647
 URL: https://issues.apache.org/jira/browse/HIVE-1647
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.6.0, 0.7.0
Reporter: Raman Grover
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: HIVE-1647.patch

   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 Bug in org.apache.hadoop.hive.ql.io.IOContext
 in relation to initialization of thread local variable.
  
 public class IOContext {
  
   private static ThreadLocalIOContext threadLocal = new 
 ThreadLocalIOContext(){ };
  
   static {
 if (threadLocal.get() == null) {
   threadLocal.set(new IOContext());
 }
   }
  
 In a multi-threaded environment, the thread that gets to load the class first 
 for the JVM (assuming threads share the classloader),
 gets to initialize itself correctly by executing the code in the static 
 block. Once the class is loaded, 
 any subsequent threads would  have their respective threadlocal variable as 
 null.  Since IOContext
 is set during initialization of HiveRecordReader, In a scenario where 
 multiple threads get to acquire
  an instance of HiveRecordReader, it would result in a NPE for all but the 
 first thread that gets to load the class in the VM.
  
 Is the above scenario of multiple threads initializing HiveRecordReader a 
 typical one ?  or we could just provide the following fix...
  
   private static ThreadLocalIOContext threadLocal = new 
 ThreadLocalIOContext(){
 protected synchronized IOContext initialValue() {
   return new IOContext();
 }  
   };

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1545) Add a bunch of UDFs and UDAFs

2010-10-04 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917546#action_12917546
 ] 

Terje Marthinussen commented on HIVE-1545:
--

Was just quickly looking at this and noticed that

grep lib com/facebook/hive/udf/*java
com/facebook/hive/udf/UDAFHistogram.java:import 
com.facebook.hive.udf.lib.Counter;
com/facebook/hive/udf/UDFJaccard.java:import com.facebook.hive.udf.lib.SetOps;

however, there is no com.facebook.hive.udf.lib included.





 Add a bunch of UDFs and UDAFs
 -

 Key: HIVE-1545
 URL: https://issues.apache.org/jira/browse/HIVE-1545
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Jonathan Chang
Assignee: Jonathan Chang
Priority: Minor
 Attachments: udfs.tar.gz


 Here some UD(A)Fs which can be incorporated into the Hive distribution:
 UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 
 5, 3) returns 1.
 UDFBucket - Find the bucket in which the first argument belongs. e.g., 
 BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x  b_{i} 
 but = b_{i+1}. Returns 0 if x is smaller than all the buckets.
 UDFFindInArray - Finds the 1-index of the first element in the array given as 
 the second argument. Returns 0 if not found. Returns NULL if either argument 
 is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, 
 array(1,2,3)) will return 0.
 UDFGreatCircleDist - Finds the great circle distance (in km) between two 
 lat/long coordinates (in degrees).
 UDFLDA - Performs LDA inference on a vector given fixed topics.
 UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 
 whenever any of its parameters changes.
 UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 
 5.
 UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches 
 in an array.
 UDFUnescape - Returns the string unescaped (using C/Java style unescaping).
 UDFWhich - Given a boolean array, return the indices which are TRUE.
 UDFJaccard
 UDAFCollect - Takes all the values associated with a row and converts it into 
 a list. Make sure to have: set hive.map.aggr = false;
 UDAFCollectMap - Like collect except that it takes tuples and generates a map.
 UDAFEntropy - Compute the entropy of a column.
 UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two 
 columns.
 UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value 
 of VAL.
 UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated 
 with the N (passed as the third parameter) largest values of VAL.
 UDAFHistogram

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1678:
--

Status: Patch Available  (was: Open)

 NPE in MapJoin 
 ---

 Key: HIVE-1678
 URL: https://issues.apache.org/jira/browse/HIVE-1678
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1678.txt


 The query with two map joins and a group by fails with following NPE:
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.18 #559

2010-10-04 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/559/

--
[...truncated 31015 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.seq
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/complex.seq
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/json.txt
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out
 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket0.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/data/files/kv1.seq
[junit] Loading data to table 

[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-10-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917677#action_12917677
 ] 

Namit Jain commented on HIVE-1546:
--

I will take a look in more detail, but overall it looks good. I had the 
following comments:

1. Instead of TestSemanticAnalyzerHookLoading.java, add tests in 
test/queries/clientpositive and test/queries/clientnegative
2. Do you want to set the value of hive.semantic.analyzer.hook to a dummy value 
in data/conf/hive-site.xml for the unit tests ?
Can something meaningful be printed here, which can be used for comparing ?


 Ability to plug custom Semantic Analyzers for Hive Grammar
 --

 Key: HIVE-1546
 URL: https://issues.apache.org/jira/browse/HIVE-1546
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, 
 hive-1546_2.patch, hooks.patch, Howl_Semantic_Analysis.txt


 It will be useful if Semantic Analysis phase is made pluggable such that 
 other projects can do custom analysis of hive queries before doing metastore 
 operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917684#action_12917684
 ] 

Namit Jain commented on HIVE-1678:
--

Nice catch - Thanks

+1

will commit if the tests pass

 NPE in MapJoin 
 ---

 Key: HIVE-1678
 URL: https://issues.apache.org/jira/browse/HIVE-1678
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1678.txt


 The query with two map joins and a group by fails with following NPE:
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.20 #382

2010-10-04 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/382/

--
[...truncated 14189 lines...]
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/complex.seq
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/json.txt
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out
 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out
[junit] Done query: unknown_table1.q
[junit] Begin query: unknown_table2.q
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12)
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket0.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: Output: defa...@srcbucket2
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.seq
[junit] Loading data to table 

[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-10-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917725#action_12917725
 ] 

He Yongqiang commented on HIVE-1658:


+1. Looks good. Can you do the final patch?

 Fix describe [extended] column formatting
 -

 Key: HIVE-1658
 URL: https://issues.apache.org/jira/browse/HIVE-1658
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-1658-PrelimPatch.patch


 When displaying the column schema, the formatting should follow should be 
 nameTABtypeTABcommentNEWLINE
 to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-10-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917771#action_12917771
 ] 

He Yongqiang commented on HIVE-1658:


one more thing, if the time information (create time, last access time etc) is 
0, can you put some string like unknown to the output of desc format?

 Fix describe [extended] column formatting
 -

 Key: HIVE-1658
 URL: https://issues.apache.org/jira/browse/HIVE-1658
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-1658-PrelimPatch.patch


 When displaying the column schema, the formatting should follow should be 
 nameTABtypeTABcommentNEWLINE
 to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1678:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Amareshwari

 NPE in MapJoin 
 ---

 Key: HIVE-1678
 URL: https://issues.apache.org/jira/browse/HIVE-1678
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1678.txt


 The query with two map joins and a group by fails with following NPE:
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1674:
-

Attachment: HIVE-1674.patch

 count(*) returns wrong result when a mapper returns empty results
 -

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1674.patch


 select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1674:
-

Status: Patch Available  (was: Open)

 count(*) returns wrong result when a mapper returns empty results
 -

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1674.patch


 select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-10-04 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1376:
-

Status: Patch Available  (was: Open)

 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Ning Zhang
 Attachments: HIVE-1376.2.patch, HIVE-1376.patch


 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-04 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Attachment: 1570.1.patch

before running a map-reduce job in local mode we:
1. set a new working directory
2. symlink all added files from that working directory

this is pretty much identical to how hadoop sets up task execution environment. 
all references to scripts and add files using their names only now resolve 
correctly in local mode.

there was some hacky code in SemanticAnalyzer.java to deal with this that 
doesn't work in all cases (when referenced file is not the first item in 
command line or in automatic local mode). i have deleted it.

duplicated one of the tests so that we get coverage against a real cluster 
(scriptfile1.q executed against minimr) and local mode (scriptfile2.q).

still running tests.

 referencing an added file by it's name in a transform script does not work in 
 hive local mode
 -

 Key: HIVE-1570
 URL: https://issues.apache.org/jira/browse/HIVE-1570
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: 1570.1.patch


 Yongqiang tried this and it fails in local mode:
 add file ../data/scripts/dumpdata_script.py;
 select count(distinct subq.key) from
 (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
 = 10) subq;
 this needs to be fixed because it means we cannot choose local mode 
 automatically in case of transform scripts (since different paths need to be 
 used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-04 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Attachment: 1570.2.patch

working patch. no need for new test. had to modify some other tests to use 'add 
file'.

 referencing an added file by it's name in a transform script does not work in 
 hive local mode
 -

 Key: HIVE-1570
 URL: https://issues.apache.org/jira/browse/HIVE-1570
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: 1570.1.patch, 1570.2.patch


 Yongqiang tried this and it fails in local mode:
 add file ../data/scripts/dumpdata_script.py;
 select count(distinct subq.key) from
 (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
 = 10) subq;
 this needs to be fixed because it means we cannot choose local mode 
 automatically in case of transform scripts (since different paths need to be 
 used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1570) referencing an added file by it's name in a transform script does not work in hive local mode

2010-10-04 Thread Joydeep Sen Sarma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HIVE-1570:


Status: Patch Available  (was: Open)

 referencing an added file by it's name in a transform script does not work in 
 hive local mode
 -

 Key: HIVE-1570
 URL: https://issues.apache.org/jira/browse/HIVE-1570
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: 1570.1.patch, 1570.2.patch


 Yongqiang tried this and it fails in local mode:
 add file ../data/scripts/dumpdata_script.py;
 select count(distinct subq.key) from
 (FROM src MAP src.key USING 'python dumpdata_script.py' AS key WHERE src.key 
 = 10) subq;
 this needs to be fixed because it means we cannot choose local mode 
 automatically in case of transform scripts (since different paths need to be 
 used for cluster vs. local mode execution)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-10-04 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917851#action_12917851
 ] 

Ashutosh Chauhan commented on HIVE-1546:


I did it in junit form because John suggested it that way in his earlier 
comment:

{quote}
* We need a test for loading a variation on the default semantic analyzer 
in order to exercise the pluggable configuration. You can create a subclass of 
the default analyzer (under ql/src/test/org/apache/hadoop/hive/ql/parse) to 
inject some mock behavior change.
{quote}

I also feel junit test is better suited for this kind of behavioral testing of 
code paths (which exercises interface points) rather then forcing through 
string comparison ways of test/queries/*  which are more end-to-end tests for 
hive. Further if we add dummy hook name in data/conf/hive-site.xml then that 
dummy hook will get loaded and all the subsequent tests will have it too. Do we 
want it that way?

 Ability to plug custom Semantic Analyzers for Hive Grammar
 --

 Key: HIVE-1546
 URL: https://issues.apache.org/jira/browse/HIVE-1546
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, 
 hive-1546_2.patch, hooks.patch, Howl_Semantic_Analysis.txt


 It will be useful if Semantic Analysis phase is made pluggable such that 
 other projects can do custom analysis of hive queries before doing metastore 
 operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1674) count(*) returns wrong result when a mapper returns empty results

2010-10-04 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917856#action_12917856
 ] 

He Yongqiang commented on HIVE-1674:


will take a look.

 count(*) returns wrong result when a mapper returns empty results
 -

 Key: HIVE-1674
 URL: https://issues.apache.org/jira/browse/HIVE-1674
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1674.patch


 select count(*) from src where false; will return # of mappers rather than 0. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-10-04 Thread Russell Melick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick reassigned HIVE-1501:


Assignee: Skye Berghel  (was: Russell Melick)

 when generating reentrant INSERT for index rebuild, quote identifiers using 
 backticks
 -

 Key: HIVE-1501
 URL: https://issues.apache.org/jira/browse/HIVE-1501
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Skye Berghel
 Fix For: 0.7.0


 Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
 accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
 to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.