[jira] Commented: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query

2010-09-30 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916573#action_12916573
 ] 

Zheng Shao commented on HIVE-1376:
--

I think (3) makes the most sense.  If (3) does not work for whatever 
hard-to-fix reason, we can do (1).
In any case, the change should be pretty simple.


 Simple UDAFs with more than 1 parameter crash on empty row query 
 -

 Key: HIVE-1376
 URL: https://issues.apache.org/jira/browse/HIVE-1376
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri

 Simple UDAFs with more than 1 parameter crash when the query returns no rows. 
 Currently, this only seems to affect the percentile() UDAF where the second 
 parameter is the percentile to be computed (of type double). I've also 
 verified the bug by adding a dummy parameter to ExampleMin in contrib. 
 On an empty query, Hive seems to be trying to resolve an iterate() method 
 with signature {null,null} instead of {null,double}. You can reproduce this 
 bug using:
 CREATE TABLE pct_test ( val INT );
 SELECT percentile(val, 0.5) FROM pct_test;
 which produces a lot of errors like: 
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to 
 execute method public boolean 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {null, null} of size 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-22 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913670#action_12913670
 ] 

Zheng Shao commented on HIVE-537:
-

{code}
unionT0,T1,T2 create_union(byte tag, T0 o0, T1 o1, T2 o2, ...)
Some real examples:
unionSchool,Company create_union( is_student ? 0 : 1, school, company)
{code}

Depending on the value of the tag, the returned union object will choose to 
store only the object corresponding to that tag.


 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-20 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912614#action_12912614
 ] 

Zheng Shao commented on HIVE-537:
-

I think so. Let's use a different name for the UDF.

Using 'UNION' as UDF name will not cause grammar ambiguity, but it may cause 
other issues in the future.

Zheng


 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-895) Add SerDe for Avro serialized data

2010-07-19 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889772#action_12889772
 ] 

Zheng Shao commented on HIVE-895:
-

We should just copy \the schema information from the file header to the hive 
metastore.


 Add SerDe for Avro serialized data
 --

 Key: HIVE-895
 URL: https://issues.apache.org/jira/browse/HIVE-895
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Jeff Hammerbacher

 As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro 
 data seems like a solid win.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1468) intermediate data produced for select queries ignores hive.exec.compress.intermediate

2010-07-17 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889528#action_12889528
 ] 

Zheng Shao commented on HIVE-1468:
--

select queries means SELECT without INSERT, correct?

I agree that we should treat these queries differently, specifically, no 
compression (or maybe use lzo to same bandwidth - clients can be in other data 
centers) will be a big win.


 intermediate data produced for select queries ignores 
 hive.exec.compress.intermediate
 -

 Key: HIVE-1468
 URL: https://issues.apache.org/jira/browse/HIVE-1468
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Joydeep Sen Sarma

  set hive.exec.compress.intermediate=false;
  explain extended select xxx from yyy;
 ...
 File Output Operator
   compressed: true
   GlobalTableId: 0
 looks like we only intermediate locations identified during splitting mr 
 tasks follow this directive. this should be fixed because this forces clients 
 to always decompress output data (even if the config setting is altered).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1460) JOIN should not output rows for NULL values

2010-07-12 Thread Zheng Shao (JIRA)
JOIN should not output rows for NULL values
---

 Key: HIVE-1460
 URL: https://issues.apache.org/jira/browse/HIVE-1460
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao


We should filter out rows with NULL keys from the result of this query
{code}
SELECT * FROM a JOIN b on a.key = b.key
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1460) JOIN should not output rows for NULL values

2010-07-12 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887543#action_12887543
 ] 

Zheng Shao commented on HIVE-1460:
--

That's a good use case to consider.

I believe Hive currently does not support that (the condition after ON has to 
be conjunctive), but it's good to keep it in mind.


 JOIN should not output rows for NULL values
 ---

 Key: HIVE-1460
 URL: https://issues.apache.org/jira/browse/HIVE-1460
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao

 We should filter out rows with NULL keys from the result of this query
 {code}
 SELECT * FROM a JOIN b on a.key = b.key
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-09 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886794#action_12886794
 ] 

Zheng Shao commented on HIVE-287:
-

The plan looks good to me.

Just one comment: I think we should change the comment/class name for 
GenericUDAFResolver2.  Let's explicitly say GenericUDAFResolver2 is for UDAFs 
that want to have control over whether DISTINCT or * should be treated 
differently. For normal UDAFs, they should still inherite from 
GenericUDAFResolver.

Does that sound OK?



 count distinct on multiple columns does not work
 

 Key: HIVE-287
 URL: https://issues.apache.org/jira/browse/HIVE-287
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Arvind Prabhakar
 Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
 HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch


 The following query does not work:
 select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-09 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886882#action_12886882
 ] 

Zheng Shao commented on HIVE-287:
-

Talked with John offline also.

I agree that we can use the new interface going forward. Can you do these also 
in this patch:
1. Change the comments for the 2 new fields.  It's easy for UDAF writers to 
assume that the UDAF itself needs to handle whether it's distinct or whether 
it's all columns.
2. Deprecate the old interface, and move all existing GenericUDAF to inherit 
from the new one.

{code}
+  /**
+   * @return true if the UDAF invocation was qualified with ttDISTINCT/tt
+   * keyword, false otherwise.
+   */
+  boolean isDistinct();
+
+  /**
+   * @return true if the UDAF invocation was done with a wildcard instead of
+   * explicit parameter list.
+   */
+  boolean isAllColumns();
{code}

After this patch is in, here is a list of follow-ups. Can you open JIRA for 
these:

1. Let UDAF and UDF support * and regex-based column specification
2. Special-case COUNT(*) because that does not require reading any columns, 
while MY_UDAF(*) needs all columns.


 count distinct on multiple columns does not work
 

 Key: HIVE-287
 URL: https://issues.apache.org/jira/browse/HIVE-287
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Arvind Prabhakar
 Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
 HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch


 The following query does not work:
 select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-287) count distinct on multiple columns does not work

2010-07-07 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886183#action_12886183
 ] 

Zheng Shao commented on HIVE-287:
-

Hi Arvind, sorry for coming late for the party. I have 2 questions on the new 
UDAF2 interface:

1. Why do we put the DISTINCT in the information? DISTINCT is currently done by 
the framework, instead of individual UDAF.
This is good because the logic of removing duplicates are common for all UDAFs. 
 We do support SUM(DISTINCT val).

2. Why do we special-case *? It seems to me that * is just a short-cut.  
Hive already supports regex-based multi-column specification, so that we can 
say `abc.*` for all columns with name starting with abc. The compiler should 
just expand * and give all the columns to the UDAF.

Since COUNT(*) is a special-case in the SQL standard (COUNT(*) is different 
from COUNT(col) even if the table has a single column col), I think we should 
just special-case that and replace that with count(1) at some place.

What do you think?

 count distinct on multiple columns does not work
 

 Key: HIVE-287
 URL: https://issues.apache.org/jira/browse/HIVE-287
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Arvind Prabhakar
 Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, 
 HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch


 The following query does not work:
 select count(distinct col1, col2) from Tbl

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1447) Speed up reflection method calls

2010-07-01 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1447:
-

Attachment: A.java

A.java for performance test. Some of the code is borrowed from 
http://www.jguru.com/faq/view.jsp?EID=246569



 Speed up reflection method calls
 

 Key: HIVE-1447
 URL: https://issues.apache.org/jira/browse/HIVE-1447
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: A.java


 See http://www.cowtowncoder.com/blog/archives/2010/04/entry_396.html and 
 http://www.jguru.com/faq/view.jsp?EID=246569
 There is a huge drop of overhead (more than half) if we do 
 field.setAccessible(true) for the field that we want to access.
 I did a simple experiment and that worked well with method as well.
 The results are (note that the method just add 1 to an integer):
 {code}
 1 regular method calls:26 milliseconds.
 1 reflective method calls without lookup:4029 milliseconds.
 1 accessible reflective method calls without lookup:1810 milliseconds.
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1447) Speed up reflection method calls

2010-07-01 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1447:
-

Attachment: HIVE-1447.1.patch

Patch that sets setAccessible for both GenericUDFBridge.java and 
GenericUDAFBridge.java

 Speed up reflection method calls
 

 Key: HIVE-1447
 URL: https://issues.apache.org/jira/browse/HIVE-1447
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: A.java, HIVE-1447.1.patch


 See http://www.cowtowncoder.com/blog/archives/2010/04/entry_396.html and 
 http://www.jguru.com/faq/view.jsp?EID=246569
 There is a huge drop of overhead (more than half) if we do 
 field.setAccessible(true) for the field that we want to access.
 I did a simple experiment and that worked well with method as well.
 The results are (note that the method just add 1 to an integer):
 {code}
 1 regular method calls:26 milliseconds.
 1 reflective method calls without lookup:4029 milliseconds.
 1 accessible reflective method calls without lookup:1810 milliseconds.
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1447) Speed up reflection method calls

2010-07-01 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1447:
-

Status: Patch Available  (was: Open)

 Speed up reflection method calls
 

 Key: HIVE-1447
 URL: https://issues.apache.org/jira/browse/HIVE-1447
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: A.java, HIVE-1447.1.patch


 See http://www.cowtowncoder.com/blog/archives/2010/04/entry_396.html and 
 http://www.jguru.com/faq/view.jsp?EID=246569
 There is a huge drop of overhead (more than half) if we do 
 field.setAccessible(true) for the field that we want to access.
 I did a simple experiment and that worked well with method as well.
 The results are (note that the method just add 1 to an integer):
 {code}
 1 regular method calls:26 milliseconds.
 1 reflective method calls without lookup:4029 milliseconds.
 1 accessible reflective method calls without lookup:1810 milliseconds.
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch

2010-06-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883051#action_12883051
 ] 

Zheng Shao commented on HIVE-1271:
--

I might be too late for the party, but I have a question on removing the field 
name comparison for struct type info.

We have 3 choices:
C1: Compare field names case sensitively.
C2: Compare field names case insensitively.
C3: Don't compare field names at all.

The old implementation was following C1, and the new one is following C3.
Is there any reason that we don't do C2? C2 seems to provide some minimal 
sanity checks that users will need in practice.




 Case sensitiveness of type information specified when using custom reducer 
 causes type mismatch
 ---

 Key: HIVE-1271
 URL: https://issues.apache.org/jira/browse/HIVE-1271
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Arvind Prabhakar
 Fix For: 0.6.0

 Attachments: HIVE-1271-1.patch, HIVE-1271.patch


 Type information specified  while using a custom reduce script is converted 
 to lower case, and causes type mismatch during query semantic analysis .  The 
 following REDUCE query where field name =  userId failed.
 hive CREATE TABLE SS (
 a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 );
 OK
 hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s
 INSERT OVERWRITE TABLE SS
 REDUCE *
 USING 'myreduce.py'
 AS
 (a INT,
 b INT,
 vals ARRAYSTRUCTuserId:INT, y:STRING
 )
 ;
 FAILED: Error in semantic analysis: line 2:27 Cannot insert into
 target table because column number/types are different SS: Cannot
 convert column 2 from arraystructuserId:int,y:string to
 arraystructuserid:int,y:string.
 The same query worked fine after changing userId to userid.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1338) Fix bin/ext/jar.sh to work with hadoop 0.20 and above

2010-05-05 Thread Zheng Shao (JIRA)
Fix bin/ext/jar.sh to work with hadoop 0.20 and above
-

 Key: HIVE-1338
 URL: https://issues.apache.org/jira/browse/HIVE-1338
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Zheng Shao


{{bin/ext/jar.sh}} is not working with hadoop 0.20 and above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1338) Fix bin/ext/jar.sh to work with hadoop 0.20 and above

2010-05-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1338:
-

Attachment: HIVE-1338.1.patch

This patch follows the same way as {{bin/ext/hiveserver.sh}}

 Fix bin/ext/jar.sh to work with hadoop 0.20 and above
 -

 Key: HIVE-1338
 URL: https://issues.apache.org/jira/browse/HIVE-1338
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1338.1.patch


 {{bin/ext/jar.sh}} is not working with hadoop 0.20 and above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1338) Fix bin/ext/jar.sh to work with hadoop 0.20 and above

2010-05-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1338:
-

Status: Patch Available  (was: Open)

 Fix bin/ext/jar.sh to work with hadoop 0.20 and above
 -

 Key: HIVE-1338
 URL: https://issues.apache.org/jira/browse/HIVE-1338
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1338.1.patch


 {{bin/ext/jar.sh}} is not working with hadoop 0.20 and above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1311) bug is use of hadoop supports splittable

2010-04-15 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1311:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
Release Note: HIVE-1311. Bug in use of parameter hadoop supports 
splittable. (Namit Jain via zshao)
  Resolution: Fixed

Committed. Thanks Namit!
(Sorry I didn't see Ning's comment before committing)

 bug is use of hadoop supports splittable
 

 Key: HIVE-1311
 URL: https://issues.apache.org/jira/browse/HIVE-1311
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1311.1.patch


 CombineHiveInputFormat: getSplits()
  if (this.mrwork != null  this.mrwork.getHadoopSupportsSplittable()) 
 should check if hadoop supports splittable is false

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HIVE-1312) hive trunk does compile with hadoop 0.17 any more

2010-04-15 Thread Zheng Shao (JIRA)
hive trunk does compile with hadoop 0.17 any more
-

 Key: HIVE-1312
 URL: https://issues.apache.org/jira/browse/HIVE-1312
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: John Sichi


This is caused by HIVE-1295.

{code}
compile:
 [echo] Compiling: hive
[javac] Compiling 527 source files to 
/hadoop_hive_trunk/.ptest_0/build/ql/classes
[javac] 
/hadoop_hive_trunk/.ptest_0/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOu\
tputFormat.java:69: cannot find symbol
[javac] symbol  : method getBytes()
[javac] location: class org.apache.hadoop.io.BytesWritable
[javac]   keyWritable.set(bw.getBytes(), 0, bw.getLength());
[javac] ^
[javac] 
/hadoop_hive_trunk/.ptest_0/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOu\
tputFormat.java:69: cannot find symbol
[javac] symbol  : method getLength()
[javac] location: class org.apache.hadoop.io.BytesWritable
[javac]   keyWritable.set(bw.getBytes(), 0, bw.getLength());
[javac]   ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 2 errors
{code}


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1280) problem in combinehiveinputformat with nested directories

2010-04-06 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854162#action_12854162
 ] 

Zheng Shao commented on HIVE-1280:
--

splitable - splittable

 problem in combinehiveinputformat with nested directories
 -

 Key: HIVE-1280
 URL: https://issues.apache.org/jira/browse/HIVE-1280
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1280.1.patch, hive.1280.2.patch, hive.1280.3.patch, 
 hive.1280.4.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1292) Bug in generating partition pruner expression

2010-04-06 Thread Zheng Shao (JIRA)
Bug in generating partition pruner expression
-

 Key: HIVE-1292
 URL: https://issues.apache.org/jira/browse/HIVE-1292
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Zheng Shao


The logic for generating the in GenericFuncExprProcessor has a problem.

None of the partitions passed the pruner in the following query:
{code}
SELECT *
FROM mytable a
WHERE pcol0 = '2010-04-03' 
AND
CASE WHEN ((col0 ='a') OR (col0 = 'b')) THEN 'a' ELSE NULL END IS NOT NULL;
{code}

While the partition '2010-04-03' did pass the pruner in the following query:
{code}
SELECT *
FROM mytable a
WHERE pcol0 = '2010-04-03' 
AND
CASE WHEN (col0 ='a') THEN 'a' ELSE NULL END IS NOT NULL;
{code}

The logic for generating the pruner condition is here:
org.apache.hadoop.hive.ql.optimizer.ppr.ExprProcFactory.GenericFuncExprProcessor.process(...)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1280) problem in combinehiveinputformat with nested directories

2010-04-06 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1280:
-

Status: Open  (was: Patch Available)

 problem in combinehiveinputformat with nested directories
 -

 Key: HIVE-1280
 URL: https://issues.apache.org/jira/browse/HIVE-1280
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1280.1.patch, hive.1280.2.patch, hive.1280.3.patch, 
 hive.1280.4.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1280) problem in combinehiveinputformat with nested directories

2010-04-06 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1280:
-

  Resolution: Fixed
Release Note: HIVE-1280. Add option to CombineHiveInputFormat for 
non-splittable inputs. (Namit Jain via zshao)
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Namit!

 problem in combinehiveinputformat with nested directories
 -

 Key: HIVE-1280
 URL: https://issues.apache.org/jira/browse/HIVE-1280
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1280.1.patch, hive.1280.2.patch, hive.1280.3.patch, 
 hive.1280.4.patch, hive.1280.5.patch, hive.1280.6.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-04-05 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853323#action_12853323
 ] 

Zheng Shao commented on HIVE-1131:
--

Still seeing test failures from HIVE-1131_7.patch

{code}
.ptest_0/test.17.2.1.log:[junit] Begin query: groupby8.q
.ptest_0/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: 
Client execution results failed with error code = 1
--
.ptest_1/test.17.2.1.log:[junit] Begin query: groupby8_map_skew.q
.ptest_1/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: 
Client execution results failed with error code = 1
--
.ptest_1/test.17.2.1.log:[junit] Begin query: multi_insert.q
.ptest_1/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: 
Client execution results failed with error code = 1
--
.ptest_1/test.17.2.1.log:[junit] Begin query: reduce_deduplicate.q
.ptest_1/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: 
Client execution results failed with error code = 1
--
.ptest_1/test.17.2.1.log:[junit] Begin query: union18.q
.ptest_1/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: 
Client execution results failed with error code = 1
--
.ptest_2/test.17.2.1.log:[junit] Begin query: groupby7.q
.ptest_2/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: 
Client execution results failed with error code = 1
--
.ptest_2/test.17.2.1.log:[junit] Begin query: groupby8_noskew.q
.ptest_2/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: 
Client execution results failed with error code = 1
--
.ptest_2/test.17.2.1.log:[junit] Begin query: input12.q
.ptest_2/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: 
Client execution results failed with error code = 1
--
{code}


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1291) Fix UDAFPercentile ndexOutOfBoundsException

2010-04-05 Thread Zheng Shao (JIRA)
Fix UDAFPercentile ndexOutOfBoundsException
---

 Key: HIVE-1291
 URL: https://issues.apache.org/jira/browse/HIVE-1291
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao


The counts array can be empty. We should directly return null in that case.

{code}
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method 
public org.apache.hadoop.hive.serde2.io.DoubleWritable 
org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate()
  on object 
org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@530d0eae 
of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
with arguments {} of size 0
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:725)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.terminate(GenericUDAFBridge.java:181)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.evaluate(GenericUDAFEvaluator.java:157)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:838)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:885)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:539)
at 
org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:300)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:701)
... 9 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at 
org.apache.hadoop.hive.ql.udf.UDAFPercentile.getPercentile(UDAFPercentile.java:97)
at 
org.apache.hadoop.hive.ql.udf.UDAFPercentile.access$300(UDAFPercentile.java:44)
at 
org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate(UDAFPercentile.java:196)
... 14 more
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1291) Fix UDAFPercentile ndexOutOfBoundsException

2010-04-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1291:
-

Attachment: HIVE-1291.1.patch

This patch fixes the bug.

 Fix UDAFPercentile ndexOutOfBoundsException
 ---

 Key: HIVE-1291
 URL: https://issues.apache.org/jira/browse/HIVE-1291
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1291.1.patch


 The counts array can be empty. We should directly return null in that case.
 {code}
 org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method 
 public org.apache.hadoop.hive.serde2.io.DoubleWritable 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate()
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@530d0eae 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {} of size 0
   at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:725)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.terminate(GenericUDAFBridge.java:181)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.evaluate(GenericUDAFEvaluator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:838)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:885)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:539)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:300)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:701)
   ... 9 more
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile.getPercentile(UDAFPercentile.java:97)
   at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile.access$300(UDAFPercentile.java:44)
   at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate(UDAFPercentile.java:196)
   ... 14 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1291) Fix UDAFPercentile ndexOutOfBoundsException

2010-04-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1291:
-

Fix Version/s: 0.6.0
   Status: Patch Available  (was: Open)

 Fix UDAFPercentile ndexOutOfBoundsException
 ---

 Key: HIVE-1291
 URL: https://issues.apache.org/jira/browse/HIVE-1291
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1291.1.patch


 The counts array can be empty. We should directly return null in that case.
 {code}
 org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method 
 public org.apache.hadoop.hive.serde2.io.DoubleWritable 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate()
   on object 
 org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@530d0eae 
 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator 
 with arguments {} of size 0
   at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:725)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.terminate(GenericUDAFBridge.java:181)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.evaluate(GenericUDAFEvaluator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:838)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:885)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:539)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:300)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:701)
   ... 9 more
 Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile.getPercentile(UDAFPercentile.java:97)
   at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile.access$300(UDAFPercentile.java:44)
   at 
 org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate(UDAFPercentile.java:196)
   ... 14 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1253) date_sub() function returns wrong date because of daylight saving time difference

2010-04-01 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852279#action_12852279
 ] 

Zheng Shao commented on HIVE-1253:
--

+1. Will test and commit.

 date_sub() function returns wrong date because of daylight saving time 
 difference
 -

 Key: HIVE-1253
 URL: https://issues.apache.org/jira/browse/HIVE-1253
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: mingran wang
 Attachments: HIVE-1253.patch


 date_sub('2010-03-15', 7) returns '2010-03-07'. This is because we have time 
 shifts on 2010-03-14 for daylight saving time.
 Looking at ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java, it is 
 getting a calendar instance in UTC time zone.
 def calendar = Calendar.getInstance(TimeZone.getTimeZone(UTC));
 And use calendar.add() to substract 7 days, then conver the time to 
 '-MM-dd' format.
 If it simply uses default timezone, the problem is solved: def calendar = 
 Calendar.getInstance());
 When people use date_sub('2010-03-15', 7), I think they mean substract 7 
 days, instead of substracting 7*24 hours. So it should be an easy fix. The 
 same changes should go to date_add and date_diff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1253) date_sub() function returns wrong date because of daylight saving time difference

2010-04-01 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1253:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
 Release Note: HIVE-1253. Fix Date_sub and Date_add in case of daylight 
saving. (Bryan Talbot via zshao)  (was: Fix off-by-one issue with date_sub and 
date_add when date ranges include a daylight savings time change.)
   Status: Resolved  (was: Patch Available)

Committed. Thanks Bryan!

 date_sub() function returns wrong date because of daylight saving time 
 difference
 -

 Key: HIVE-1253
 URL: https://issues.apache.org/jira/browse/HIVE-1253
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: mingran wang
Assignee: Bryan Talbot
 Fix For: 0.6.0

 Attachments: HIVE-1253.patch


 date_sub('2010-03-15', 7) returns '2010-03-07'. This is because we have time 
 shifts on 2010-03-14 for daylight saving time.
 Looking at ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java, it is 
 getting a calendar instance in UTC time zone.
 def calendar = Calendar.getInstance(TimeZone.getTimeZone(UTC));
 And use calendar.add() to substract 7 days, then conver the time to 
 '-MM-dd' format.
 If it simply uses default timezone, the problem is solved: def calendar = 
 Calendar.getInstance());
 When people use date_sub('2010-03-15', 7), I think they mean substract 7 
 days, instead of substracting 7*24 hours. So it should be an easy fix. The 
 same changes should go to date_add and date_diff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1272) Add SymlinkTextInputFormat to Hive

2010-03-31 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852015#action_12852015
 ] 

Zheng Shao commented on HIVE-1272:
--

Can you add a test case? Take a look at the .q files in 
ql/src/test/clientpositive


 Add SymlinkTextInputFormat to Hive
 --

 Key: HIVE-1272
 URL: https://issues.apache.org/jira/browse/HIVE-1272
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Zheng Shao
Assignee: Guanghao Shen
 Attachments: HIVE-1272.1.patch


 We'd like to add a symlink text input format so that we can specify the list 
 of files for a table/partition based on the content of a text file.
 For example, the location of the table is /user/hive/mytable.
 There is a file called /user/hive/mytable/myfile.txt.
 Inside the file, there are 2 lines, /user/myname/textfile1.txt and 
 /user/myname/textfile2.txt
 We can do:
 {code}
 CREATE TABLE mytable (...) STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.io.SymlinkTextInputFormat' LOCATION 
 '/user/hive/mytable';
 SELECT * FROM mytable;
 {code}
 which will return the content of the 2 files: /user/myname/textfile1.txt 
 and /user/myname/textfile2.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1272) Add SymlinkTextInputFormat to Hive

2010-03-31 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852040#action_12852040
 ] 

Zheng Shao commented on HIVE-1272:
--

We can add a file with name data/symlink.txt which contains the text 
../src/kv.txt
then in ql/src/test/clientpositive/mysymlink.q we can do this:

{code}
CREATE TABLE mysymlink (key STRING, value STRING) STORE AS INPUTFORMAT ...
dfs -cp ../data/symlink.txt 
../build/ql/test/data/warehouse/mysymlink/symlink1.txt;
dfs -cp ../data/symlink.txt 
../build/ql/test/data/warehouse/mysymlink/symlink2.txt;

SELECT * FROM mysymlink;
SELECT count(1) FROM mysymlink;
{code}

In order to test, run:
ant test -Doffline=true -Dtestcase=TestCliDriver -Dqfile=mysymlink.q 
-Doverwrite=true
And do svn add ql/.../mysymlink.q.out

Run without -Doverwrite=true to verify the result.





 Add SymlinkTextInputFormat to Hive
 --

 Key: HIVE-1272
 URL: https://issues.apache.org/jira/browse/HIVE-1272
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Zheng Shao
Assignee: Guanghao Shen
 Attachments: HIVE-1272.1.patch


 We'd like to add a symlink text input format so that we can specify the list 
 of files for a table/partition based on the content of a text file.
 For example, the location of the table is /user/hive/mytable.
 There is a file called /user/hive/mytable/myfile.txt.
 Inside the file, there are 2 lines, /user/myname/textfile1.txt and 
 /user/myname/textfile2.txt
 We can do:
 {code}
 CREATE TABLE mytable (...) STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.io.SymlinkTextInputFormat' LOCATION 
 '/user/hive/mytable';
 SELECT * FROM mytable;
 {code}
 which will return the content of the 2 files: /user/myname/textfile1.txt 
 and /user/myname/textfile2.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1286) error/info message being emitted on standard output

2010-03-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1286:
-

  Resolution: Fixed
Release Note: 
HIVE-1286. Remove debug message from stdout in ColumnarSerDe. (Yongqiang He via 
zshao)

Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Yongqiang!

 error/info message being emitted on standard output
 ---

 Key: HIVE-1286
 URL: https://issues.apache.org/jira/browse/HIVE-1286
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
Priority: Minor
 Fix For: 0.6.0

 Attachments: hive.1286.1.patch, hive.1286.2.patch


 'Found class for org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
 should go to stderr where other informational messages are sent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1289) Make gz text file work with CombineHiveInputFormat

2010-03-30 Thread Zheng Shao (JIRA)
Make gz text file work with CombineHiveInputFormat
--

 Key: HIVE-1289
 URL: https://issues.apache.org/jira/browse/HIVE-1289
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao


If the user has applied MAPREDUCE-1649, he should be able to use 
CombineHiveInputFormat with .gz text files.

We should add an option to enable that.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-30 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851664#action_12851664
 ] 

Zheng Shao commented on HIVE-1131:
--

 S1. Can we make lineage partition-level instead of table-level?
I don't see this implemented in the new patch. After looking at the code more, 
I'd agree that this is too hard (and inefficient) to do, when the query has a 
range over a lot of partitions.

 S3. Use {} even for single statement in if, for etc.
I cannot find any instances of these now.


Still have some questions:
 S2. We might want to define formally the concepts of these levels, especially 
 how they are composited (What will be UDAF of UDF, or UDF of UDAF, like 
 round(sum(col)), or sum(round(col))) 
LineageInfo.java: Can you add some comments on what DependencyType the nested 
dependencies like round(sum(col)) or sum(round(col))) have?

S6. The best place to store LineageInfo is probably in the QueryPlan instead of 
SessionState.  Otherwise the LineageInfo will be lost when we run a query that 
is compiled earlier. Thoughts?


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks

2010-03-30 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851718#action_12851718
 ] 

Zheng Shao commented on HIVE-1131:
--

 Look at the DataContainer class. That has a partition in it. And the 
 Dependency has a mapping from Partition to the dependencies. Can you explain 
 more your concerns on inefficiency?

I see. So the DataContainer captures the output partition information, but we 
don't have input partition information (BaseColumnInfo/TableAliasInfo). This is 
reasonable since the input can be lots of partitions.

 For S6 actually the queryplan is the wrong place to store the lineageinfo. 
 Because of the dynamic partitioning work that Ning is doing, I have to 
 generate the partition to dependency mapping at run time. So I would rather 
 store it in a run time structure as opposed to a compile time structure. 
 SessionState fits that bill, though I think we should have another structure 
 called ExecutionCtx for this. But otherwise I think we want to store this in 
 a runtime structure.

+1 on the ExecutionCtx idea. SessionState is at the session level, and 
LineageInfo is at the query level. It will be great to put LineageInfo into 
ExecutionCtx.


 Add column lineage information to the pre execution hooks
 -

 Key: HIVE-1131
 URL: https://issues.apache.org/jira/browse/HIVE-1131
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ashish Thusoo
Assignee: Ashish Thusoo
 Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
 HIVE-1131_4.patch


 We need a mechanism to pass the lineage information of the various columns of 
 a table to a pre execution hook so that applications can use that for:
 - auditing
 - dependency checking
 and many other applications.
 The proposal is to expose this through a bunch of classes to the pre 
 execution hook interface to the clients and put in the necessary 
 transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1286) error/info message being emitted on standard output

2010-03-29 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851048#action_12851048
 ] 

Zheng Shao commented on HIVE-1286:
--

Shall we use LOG.info or LOG.debug instead?

 error/info message being emitted on standard output
 ---

 Key: HIVE-1286
 URL: https://issues.apache.org/jira/browse/HIVE-1286
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
Priority: Minor
 Fix For: 0.6.0

 Attachments: hive.1286.1.patch


 'Found class for org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
 should go to stderr where other informational messages are sent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1287) Struct datatype should not use field names for type equivalence.

2010-03-29 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851075#action_12851075
 ] 

Zheng Shao commented on HIVE-1287:
--

I think we should support the following query:
{code}
insert overwrite table sink select CAST(foo AS structy: string) from source;
{code}

This is better than directly converting them, because there can be confusions 
(There are 2 ways to convert from structx: string, y: string and structy: 
string, x: string, and Hive is taking one of them).


 Struct datatype should not use field names for type equivalence.
 

 Key: HIVE-1287
 URL: https://issues.apache.org/jira/browse/HIVE-1287
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Mac OS X (10.6.2) Java SE 6 ( 1.6.0_17)
Reporter: Arvind Prabhakar

 The field names for {{Struct}} types are currently being matched for testing 
 type equivalence. This is readily seen by running the following example:
 {noformat}
 hive create table source ( foo struct  x : string  );
 OK
 Time taken: 3.094 seconds
 hive load data local inpath '/path/to/sample/data.txt' overwrite into table 
 source;
 Copying data from file:/path/to/sample/data.txt
 Loading data to table source
 OK
 Time taken: 0.593 seconds
 hive create table sink ( bar struct  y : string );
 OK
 Time taken: 0.11 seconds
 hive insert overwrite table sink select foo from source;
 FAILED: Error in semantic analysis: line 1:23 Cannot insert into target table 
 because column number/types are different sink: Cannot convert column 0 
 from structx:string to structy:string.
 {noformat}
 Since both {{soruce.foo}} and {{sink.bar}} are similar in definition with 
 only field names being different, data movement between these two should be 
 allowed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

2010-03-26 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850427#action_12850427
 ] 

Zheng Shao commented on HIVE-1019:
--

The concept of session is longer than a query. See HIVE-584.

We should not start a new session inside a query. Instead we should introduce a 
separate concept (maybe a combination of session id and task id) for that and 
use that for the PLAN.


 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 

 Key: HIVE-1019
 URL: https://issues.apache.org/jira/browse/HIVE-1019
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Fix For: 0.6.0

 Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, 
 HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, 
 HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt


 I keep getting errors like this:
 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 and :
 java.io.IOException: cannot find dir = 
 hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in 
 partToPartitionInfo!
 when running multiple threads with roughly similar queries.
 I have a patch for this which works for me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1256) fix Hive logo img tag to avoid stretching

2010-03-25 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao resolved HIVE-1256.
--

Resolution: Fixed


 svn info; svn commit -m Fixed hive_logo_medium.jpg
Path: .
URL: https://svn.apache.org/repos/asf/hadoop/hive/site
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 915946
Node Kind: directory
Schedule: normal
Last Changed Author: zshao
Last Changed Rev: 915691
Last Changed Date: 2010-02-23 21:54:01 -0800 (Tue, 23 Feb 2010)

Sendingauthor/src/documentation/content/xdocs/hive_logo_medium.jpg
Sendingpublish/images/hive_logo_medium.jpg
Transmitting file data ..
Committed revision 927292.


 fix Hive logo img tag to avoid stretching
 -

 Key: HIVE-1256
 URL: https://issues.apache.org/jira/browse/HIVE-1256
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Zheng Shao
 Fix For: 0.6.0


 From comment on HIVE-422:
 Aaron Newton added a comment - 17/Mar/10 02:32 AM
 Hey guys,
 I saw this article on TC today:
 http://techcrunch.com/2010/03/16/big-data-freedom/
 and noticed the hive logo was all out of whack - all stretched out. Then I 
 noticed it's like that on the Hive home page. Can someone fix the dimensions 
 of the image tag? It looks kinda bad (and people are apparently using it like 
 that elsewhere as seen on the TC article).
 http://hadoop.apache.org/hive/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-03-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849438#action_12849438
 ] 

Zheng Shao commented on HIVE-1255:
--

Edward, can you add back the (unnecessary) type casts in FunctionRegistry.java?
These are required to get hive compilable with hadoop 0.17.

 Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
 --

 Key: HIVE-1255
 URL: https://issues.apache.org/jira/browse/HIVE-1255
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1255-patch.txt


 Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan

2010-03-24 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1255:
-

Status: Open  (was: Patch Available)

 Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
 --

 Key: HIVE-1255
 URL: https://issues.apache.org/jira/browse/HIVE-1255
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1255-patch.txt


 Add support for PI, E, degrees, radians, tan, sign and atan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1272) Add SymlinkTextInputFormat to Hive

2010-03-23 Thread Zheng Shao (JIRA)
Add SymlinkTextInputFormat to Hive
--

 Key: HIVE-1272
 URL: https://issues.apache.org/jira/browse/HIVE-1272
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Zheng Shao


We'd like to add a symlink text input format so that we can specify the list of 
files for a table/partition based on the content of a text file.

For example, the location of the table is /user/hive/mytable.
There is a file called /user/hive/mytable/myfile.txt.
Inside the file, there are 2 lines, /user/myname/textfile1.txt and 
/user/myname/textfile2.txt

We can do:
{code}
CREATE TABLE mytable (...) STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.io.SymlinkTextInputFormat' LOCATION 
'/user/hive/mytable';
SELECT * FROM mytable;
{code}

which will return the content of the 2 files: /user/myname/textfile1.txt and 
/user/myname/textfile2.txt



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1273) UDF_Percentile NullPointerException

2010-03-23 Thread Zheng Shao (JIRA)
UDF_Percentile NullPointerException
---

 Key: HIVE-1273
 URL: https://issues.apache.org/jira/browse/HIVE-1273
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Zheng Shao




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1273) UDF_Percentile NullPointerException

2010-03-23 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1273:
-

Attachment: HIVE-1273.1.patch

Ignore null in merge.

 UDF_Percentile NullPointerException
 ---

 Key: HIVE-1273
 URL: https://issues.apache.org/jira/browse/HIVE-1273
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1273.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-365) Create Table to support multiple levels of delimiters

2010-03-18 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847048#action_12847048
 ] 

Zheng Shao commented on HIVE-365:
-

I am thinking something like:

{code}
CREATE TABLE nested(array_of_arrays ARRAY  ARRAY INT, map_of_maps MAP  
STRING, MAP  INT, INT  )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001' '\002' '\003' '\004' '\005';
{code}

Basically allowing multiple separators after FIELDS TERMINATED.

The top level (fields) consumes 1 level of separators. Each level of array 
consumes 1 level of separators, while each level of map consumes 2.


 Create Table to support multiple levels of delimiters
 -

 Key: HIVE-365
 URL: https://issues.apache.org/jira/browse/HIVE-365
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao

 From HIVE-337, the SerDe layer now supports multiple-levels of delimiters, 
 for the purpose of supporting nested map/array/struct.
 Array(the same as List) and struct consume a single level of separator, and 
 Map consumes 2 levels.
 DDL (Create Table) needs to allow users to specify multiple levels of 
 delimiters in order to take the advantage of this new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1219) More robust handling of metastore connection failures

2010-03-17 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846568#action_12846568
 ] 

Zheng Shao commented on HIVE-1219:
--

nitpick: HiveConf.ConfVars.METATOREATTEMPTS has a typo.


 More robust handling of metastore connection failures
 -

 Key: HIVE-1219
 URL: https://issues.apache.org/jira/browse/HIVE-1219
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Paul Yang
Assignee: Paul Yang
 Fix For: 0.6.0

 Attachments: HIVE-1219.1.patch, HIVE-1219.2.patch, HIVE-1219.3.patch, 
 HIVE-1219.4.patch


 Currently, if metastore's connection to the datastore is broken, the query 
 fails and the exception such as the following is thrown
 {code}
 2010-01-28 11:50:20,885 ERROR exec.MoveTask 
 (SessionState.java:printError(248)) - Failed with exception Unable to fetch 
 table tmp_table
 org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
 tmp_table
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:362)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:333)
 at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:112)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:99)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:582)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:462)
 at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:324)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
 at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:200)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:256)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: javax.jdo.JDODataStoreException: Communications link failure
 Last packet sent to the server was 1 ms ago.
 NestedThrowables:
 com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link 
 failure
 Last packet sent to the server was 1 ms ago.
 at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:289)
 {code}
 In order to reduce the impact of transient network issues and momentarily 
 unavailable datastores, two possible improvements are:
 1. Retrying the metastore command in case of connection failure before 
 propagating up the exception.
 2. Retrieving the datastore hostname / connection URL through the use of an 
 extension. This extension would be useful in the case where a remote service 
 maintained the location of the currently available datastore. In case of 
 hostname changes or failovers to a backup datastore, the extension would 
 allow hive clients to run without manual intervention.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1254) CTAS creates column names starting with _ while the grammar does not allow column names starting with _

2010-03-17 Thread Zheng Shao (JIRA)
CTAS creates column names starting with _ while the grammar does not allow 
column names starting with _
---

 Key: HIVE-1254
 URL: https://issues.apache.org/jira/browse/HIVE-1254
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Ning Zhang


{code}
CREATE TABLE tmp_table AS 
SELECT adid, min(timestamp)
FROM ads
GROUP BY adid;
{code}

The second column name is _c1.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1242) CombineHiveInputFormat does not work for compressed text files

2010-03-12 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844663#action_12844663
 ] 

Zheng Shao commented on HIVE-1242:
--

Talked with Namit offline. HIVE-1200 needs a small fix that will be included 
together by Namit.


 CombineHiveInputFormat does not work for compressed text files
 --

 Key: HIVE-1242
 URL: https://issues.apache.org/jira/browse/HIVE-1242
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.5.1, 0.6.0

 Attachments: hive.1242.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1238) Get partitions with a partial specification

2010-03-11 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844187#action_12844187
 ] 

Zheng Shao commented on HIVE-1238:
--

{{get_partitions_mp_by_name}} will be much more efficient than the other.

HIVE-804 can be used as a test case for the new API if we refactor it.




 Get  partitions with a partial specification
 

 Key: HIVE-1238
 URL: https://issues.apache.org/jira/browse/HIVE-1238
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Paul Yang
Assignee: Paul Yang
 Fix For: 0.6.0


 Currently, the metastore API only allows retrieval of all the partitions of a 
 table, or the retrieval of a single partition given a complete partition 
 specification. For HIVE-936, a method to retrieve all partitions that match a 
 partial partition specification would be useful. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1216) Show the row with error in mapper/reducer

2010-03-09 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1216:
-

Attachment: HIVE-1216.4.patch

Sorry I forgot to include a newly added file: UDFTestErrorOnFalse.java

HIVE-1216.4.patch should work fine now.


 Show the row with error in mapper/reducer
 -

 Key: HIVE-1216
 URL: https://issues.apache.org/jira/browse/HIVE-1216
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1216.1.patch, HIVE-1216.3.patch, HIVE-1216.4.patch


 It will be very useful for user to debug the HiveQL if mapper/reducer can 
 show the row that caused error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging

2010-03-08 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Attachment: HIVE-1212.3.patch

Addressed the comment from Paul.


 Explicitly say Hive Internal Error to ease debugging
 --

 Key: HIVE-1212
 URL: https://issues.apache.org/jira/browse/HIVE-1212
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1212.1.patch, HIVE-1212.2.patch, HIVE-1212.3.patch


 Our users complain that hive fails error messages like FAILED: Unknown 
 exception: null.
 We should explicitly mention that's an internal error of Hive, and provide 
 more information (stacktrace) on the screen to ease bug reporting and 
 debugging.
 In other cases, we will still put the detailed information (stacktrace) in 
 the log, since users should be able to figure out what's wrong with a single 
 line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1216) Show the row with error in mapper/reducer

2010-03-08 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1216:
-

Attachment: HIVE-1216.3.patch

This patch fixes some checkstyle warnings.

 Show the row with error in mapper/reducer
 -

 Key: HIVE-1216
 URL: https://issues.apache.org/jira/browse/HIVE-1216
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1216.1.patch, HIVE-1216.3.patch


 It will be very useful for user to debug the HiveQL if mapper/reducer can 
 show the row that caused error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1216) Show the row with error in mapper/reducer

2010-03-08 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842908#action_12842908
 ] 

Zheng Shao commented on HIVE-1216:
--

Which test case? I tried them but they were fine for me.

 Show the row with error in mapper/reducer
 -

 Key: HIVE-1216
 URL: https://issues.apache.org/jira/browse/HIVE-1216
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1216.1.patch, HIVE-1216.3.patch


 It will be very useful for user to debug the HiveQL if mapper/reducer can 
 show the row that caused error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1179) Add UDF array_contains

2010-03-06 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao reassigned HIVE-1179:


Assignee: Arvind Prabhakar

 Add UDF array_contains
 --

 Key: HIVE-1179
 URL: https://issues.apache.org/jira/browse/HIVE-1179
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Arvind Prabhakar
 Attachments: HIVE-1179.patch


 Returns true or false, depending on whether an element is in an array.
 {{array_contains(T element, arrayT theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1216) Show the row with error in mapper/reducer

2010-03-06 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1216:
-

Status: Patch Available  (was: Open)

 Show the row with error in mapper/reducer
 -

 Key: HIVE-1216
 URL: https://issues.apache.org/jira/browse/HIVE-1216
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1216.1.patch


 It will be very useful for user to debug the HiveQL if mapper/reducer can 
 show the row that caused error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1216) Show the row with error in mapper/reducer

2010-03-06 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1216:
-

Attachment: HIVE-1216.1.patch

 Show the row with error in mapper/reducer
 -

 Key: HIVE-1216
 URL: https://issues.apache.org/jira/browse/HIVE-1216
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1216.1.patch


 It will be very useful for user to debug the HiveQL if mapper/reducer can 
 show the row that caused error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1179) Add UDF array_contains

2010-03-06 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842237#action_12842237
 ] 

Zheng Shao commented on HIVE-1179:
--

Hi Arvind, we have to restore the unnecessary type conversion for hadoop 0.17.
Try the following command and you will see why:

{code}
ant -Dhadoop.version=0.17.2.1 clean package
{code}


 Add UDF array_contains
 --

 Key: HIVE-1179
 URL: https://issues.apache.org/jira/browse/HIVE-1179
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Arvind Prabhakar
 Attachments: HIVE-1179.patch


 Returns true or false, depending on whether an element is in an array.
 {{array_contains(T element, arrayT theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1179) Add UDF array_contains

2010-03-06 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1179:
-

Status: Open  (was: Patch Available)

 Add UDF array_contains
 --

 Key: HIVE-1179
 URL: https://issues.apache.org/jira/browse/HIVE-1179
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Arvind Prabhakar
 Attachments: HIVE-1179.patch


 Returns true or false, depending on whether an element is in an array.
 {{array_contains(T element, arrayT theArray)}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1211) Tapping logs from child processes

2010-03-06 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842239#action_12842239
 ] 

Zheng Shao commented on HIVE-1211:
--

Hi bc, can you talk a bit more about the use case in your mind?


 Tapping logs from child processes
 -

 Key: HIVE-1211
 URL: https://issues.apache.org/jira/browse/HIVE-1211
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Logging
Reporter: bc Wong
 Attachments: HIVE-1211.1.patch


 Stdout/stderr from child processes (e.g. {{MapRedTask}}) are redirected to 
 the parent's stdout/stderr. There is little one can do to to sort out which 
 log is from which query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1211) Tapping logs from child processes

2010-03-06 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao reassigned HIVE-1211:


Assignee: bc Wong

 Tapping logs from child processes
 -

 Key: HIVE-1211
 URL: https://issues.apache.org/jira/browse/HIVE-1211
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Logging
Reporter: bc Wong
Assignee: bc Wong
 Attachments: HIVE-1211.1.patch


 Stdout/stderr from child processes (e.g. {{MapRedTask}}) are redirected to 
 the parent's stdout/stderr. There is little one can do to to sort out which 
 log is from which query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1216) Show the row with error in mapper/reducer

2010-03-05 Thread Zheng Shao (JIRA)
Show the row with error in mapper/reducer
-

 Key: HIVE-1216
 URL: https://issues.apache.org/jira/browse/HIVE-1216
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao


It will be very useful for user to debug the HiveQL if mapper/reducer can show 
the row that caused error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging

2010-03-05 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Attachment: HIVE-1212.2.patch

Cleaned up some error processing code.

 Explicitly say Hive Internal Error to ease debugging
 --

 Key: HIVE-1212
 URL: https://issues.apache.org/jira/browse/HIVE-1212
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1212.1.patch, HIVE-1212.2.patch


 Our users complain that hive fails error messages like FAILED: Unknown 
 exception: null.
 We should explicitly mention that's an internal error of Hive, and provide 
 more information (stacktrace) on the screen to ease bug reporting and 
 debugging.
 In other cases, we will still put the detailed information (stacktrace) in 
 the log, since users should be able to figure out what's wrong with a single 
 line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1216) Show the row with error in mapper/reducer

2010-03-05 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841971#action_12841971
 ] 

Zheng Shao commented on HIVE-1216:
--

Thanks for the link Jeff. This JIRA aims to do a bit different thing.
Instead of writing the data into _skip file and letting the job finish, we will 
print out the row to stderr/stdout or just using LOG.

The advantage of this:
1. Really easy for debugging - people don't need to use command line tools to 
fetch the _skip file.
2. Should be able to attach column names with their values, because Hive knows 
the column names.


 Show the row with error in mapper/reducer
 -

 Key: HIVE-1216
 URL: https://issues.apache.org/jira/browse/HIVE-1216
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao

 It will be very useful for user to debug the HiveQL if mapper/reducer can 
 show the row that caused error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-431) Auto-add table property select to be the select statement that created the table

2010-03-05 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842014#action_12842014
 ] 

Zheng Shao commented on HIVE-431:
-

I guess the information is already in lineage.

I think it's a good idea to keep lineage information away from the core 
metadata, especially given that we are going to have column lineage etc.
But we should provide an easy way for users to retrieve the lineage information.


 Auto-add table property select to be the select statement that created the 
 table
 --

 Key: HIVE-431
 URL: https://issues.apache.org/jira/browse/HIVE-431
 Project: Hadoop Hive
  Issue Type: Wish
Reporter: Adam Kramer

 A syntactic copy of the query that was used to fill a table would often be 
 AMAZINGLY useful for figuring out where the data in the table came from.
 I think the best way to implement this would be to automatically add a table 
 property which includes the SELECT statement. For partitioned tables, this 
 would need to exist for each partition...or perhaps use some canonical name 
 like selectquery for unpartitioned tables, plus selectquery_ds=DATEID for 
 partitioned tables.
 This problem is growing as more and more tables in our database are generated 
 by either root or by people who are no longer easy to contact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates

2010-03-04 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841714#action_12841714
 ] 

Zheng Shao commented on HIVE-224:
-

Hi James, currently we don't have the bandwidth to do this, but I guess it 
won't be too hard - we just need to use 
http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashMap.html (search 
for LRU).
Are you interested in joining force on this?


 implement lfu based flushing policy for map side aggregates
 ---

 Key: HIVE-224
 URL: https://issues.apache.org/jira/browse/HIVE-224
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Joydeep Sen Sarma

 currently we flush some random set of rows when the map side hash table 
 approaches memory limits.
 we have discussed a strategy of flushing hash table entries that have the 
 been seen the least number of times (effectively LFU flushing strategy). This 
 will be very effective at reducing the amount of data sent from map to reduce 
 step - as well as reduce the chances for any skews.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging

2010-03-04 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Affects Version/s: 0.6.0
   Status: Patch Available  (was: Open)

 Explicitly say Hive Internal Error to ease debugging
 --

 Key: HIVE-1212
 URL: https://issues.apache.org/jira/browse/HIVE-1212
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1212.1.patch


 Our users complain that hive fails error messages like FAILED: Unknown 
 exception: null.
 We should explicitly mention that's an internal error of Hive, and provide 
 more information (stacktrace) on the screen to ease bug reporting and 
 debugging.
 In other cases, we will still put the detailed information (stacktrace) in 
 the log, since users should be able to figure out what's wrong with a single 
 line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging

2010-03-04 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Attachment: HIVE-1212.1.patch

This also fixes UDFArgumentException reporting.


 Explicitly say Hive Internal Error to ease debugging
 --

 Key: HIVE-1212
 URL: https://issues.apache.org/jira/browse/HIVE-1212
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1212.1.patch


 Our users complain that hive fails error messages like FAILED: Unknown 
 exception: null.
 We should explicitly mention that's an internal error of Hive, and provide 
 more information (stacktrace) on the screen to ease bug reporting and 
 debugging.
 In other cases, we will still put the detailed information (stacktrace) in 
 the log, since users should be able to figure out what's wrong with a single 
 line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging

2010-03-04 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Attachment: (was: HIVE-1212.1.patch)

 Explicitly say Hive Internal Error to ease debugging
 --

 Key: HIVE-1212
 URL: https://issues.apache.org/jira/browse/HIVE-1212
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao

 Our users complain that hive fails error messages like FAILED: Unknown 
 exception: null.
 We should explicitly mention that's an internal error of Hive, and provide 
 more information (stacktrace) on the screen to ease bug reporting and 
 debugging.
 In other cases, we will still put the detailed information (stacktrace) in 
 the log, since users should be able to figure out what's wrong with a single 
 line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging

2010-03-04 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1212:
-

Attachment: HIVE-1212.1.patch

 Explicitly say Hive Internal Error to ease debugging
 --

 Key: HIVE-1212
 URL: https://issues.apache.org/jira/browse/HIVE-1212
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1212.1.patch


 Our users complain that hive fails error messages like FAILED: Unknown 
 exception: null.
 We should explicitly mention that's an internal error of Hive, and provide 
 more information (stacktrace) on the screen to ease bug reporting and 
 debugging.
 In other cases, we will still put the detailed information (stacktrace) in 
 the log, since users should be able to figure out what's wrong with a single 
 line of message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion

2010-03-01 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao reassigned HIVE-1203:


Assignee: Vladimir Klimontovich

 HiveInputFormat.getInputFormatFromCache swallows  cause exception when 
 trowing IOExcpetion
 

 Key: HIVE-1203
 URL: https://issues.apache.org/jira/browse/HIVE-1203
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0, 0.4.1, 0.5.0
Reporter: Vladimir Klimontovich
Assignee: Vladimir Klimontovich
 Fix For: 0.4.2, 0.5.1, 0.6.0

 Attachments: 0.4.patch, 0.5.patch, trunk.patch


 To fix this it's simply needed to add second parameter to IOException 
 constructor. Patches for 0.4, 0.5 and trunk are available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1202) Unknown exception : null while join

2010-03-01 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839940#action_12839940
 ] 

Zheng Shao commented on HIVE-1202:
--

{code}
select * from 
(
select name from classes 
) a
join classes b
where a.date_partition = '2010-02-01' AND b.date_partition = '2010-03-01';
{code}

It seems with the patch, we won't do partition pruning for this case?
Is that a problem?


 Unknown exception : null while join
 -

 Key: HIVE-1202
 URL: https://issues.apache.org/jira/browse/HIVE-1202
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.1
 Environment: hive-0.4.1
 hadoop 0.19.1
Reporter: Mafish
 Fix For: 0.4.1

 Attachments: HIVE-1202.branch-0.4.1.patch


 Hive throws Unknown exception : null with query:
 select * from 
 (
   select name from classes 
 ) a
   join classes b
 where a.name  b.number
 After tracing the code, I found this bug will occur with following
 conditions:
 1. It is join operation.
 2. At least one of the source of join is physical table (right side in
 above case).
 3. With where condition and condition(s) of where clause must include
 columns from both side of join (a.name and b.number in case)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1207) ScriptOperator AutoProgressor does not set the interval

2010-03-01 Thread Zheng Shao (JIRA)
ScriptOperator AutoProgressor does not set the interval
---

 Key: HIVE-1207
 URL: https://issues.apache.org/jira/browse/HIVE-1207
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Zheng Shao


As title. I will show more details in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1207) ScriptOperator AutoProgressor does not set the interval

2010-03-01 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1207:
-

Attachment: HIVE-1207.1.patch

 ScriptOperator AutoProgressor does not set the interval
 ---

 Key: HIVE-1207
 URL: https://issues.apache.org/jira/browse/HIVE-1207
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1207.1.patch


 As title. I will show more details in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1207) ScriptOperator AutoProgressor does not set the interval

2010-03-01 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1207:
-

Affects Version/s: 0.6.0
   0.5.0
   Status: Patch Available  (was: Open)

 ScriptOperator AutoProgressor does not set the interval
 ---

 Key: HIVE-1207
 URL: https://issues.apache.org/jira/browse/HIVE-1207
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.0, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1207.1.patch


 As title. I will show more details in the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1208) Bug with error cannot find ObjectInspector for VOID

2010-03-01 Thread Zheng Shao (JIRA)
Bug with error cannot find ObjectInspector for VOID
-

 Key: HIVE-1208
 URL: https://issues.apache.org/jira/browse/HIVE-1208
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao


This happens when using constant null, but not when using CAST(null AS 
STRING).

{code}
explain extended FROM 
(select 1 as a, null as b
 from zshao_tt
 distribute by a) tmp
SELECT transform(a, b)
USING 'cat';
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-259:


Attachment: HIVE-259.5.patch

We take the method recommended by NIST.

See http://en.wikipedia.org/wiki/Percentile#Alternative_methods

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839393#action_12839393
 ] 

Zheng Shao commented on HIVE-259:
-

 (1) I am not familiar with the exact definition of percentile function. Is 
 the percentile()'s result must be a member of input data?
See the link above.

 (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? 
 this is a small and can be ignored.
In the beginning of new test case, 
I think HashMap is better here. The reason is that the number of iterate is 
usually much higher than the number of unique numbers (the size of the 
HashMap). By using HashMap we reduce the cost of iterate.

 In the beginning of new test case, .. appears two times
Fixed in HIVE-259.5.patch


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1201) Add a python command-line interface for Hive

2010-02-27 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839200#action_12839200
 ] 

Zheng Shao commented on HIVE-1201:
--

Yes this is a client module (using Metastore Thrift API) that we can use in 
Python interpreter. 

 Add a python command-line interface for Hive
 

 Key: HIVE-1201
 URL: https://issues.apache.org/jira/browse/HIVE-1201
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Venky Iyer

 Venky has a nice python command-line interface for Hive. It uses thrift API 
 to talk with metastore. It uses hadoop command line to submit jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1197) create a new input format where a mapper spans a file

2010-02-26 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838763#action_12838763
 ] 

Zheng Shao commented on HIVE-1197:
--

Can you explain what does a mapper spans a file mean?


 create a new input format where a mapper spans a file
 -

 Key: HIVE-1197
 URL: https://issues.apache.org/jira/browse/HIVE-1197
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.6.0


 This will be needed for Sort merge joins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function

2010-02-26 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-259:


Attachment: HIVE-259.4.patch

This one fixes all checkstyle errors, and uses *Writable classes to avoid 
creating new objects as much as possible.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1137) build references IVY_HOME incorrectly

2010-02-25 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1137:
-

  Resolution: Fixed
Release Note: HIVE-1137. Fix build.xml for references to IVY_HOME. (Carl 
Steinbach via zshao)
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Carl!

 build references IVY_HOME incorrectly
 -

 Key: HIVE-1137
 URL: https://issues.apache.org/jira/browse/HIVE-1137
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Carl Steinbach
 Fix For: 0.6.0

 Attachments: HIVE-1137.patch


 The build references env.IVY_HOME, but doesn't actually import env as it 
 should (via property environment=env/).
 It's not clear what the IVY_HOME reference is for since the build doesn't 
 even use ivy.home (instead, it installs under the build/ivy directory).
 It looks like someone copied bits and pieces from the Automatically section 
 here:
 http://ant.apache.org/ivy/history/latest-milestone/install.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition

2010-02-25 Thread Zheng Shao (JIRA)
Fix CombineHiveInputFormat to work with multi-level of directories in a single 
table/partition
--

 Key: HIVE-1200
 URL: https://issues.apache.org/jira/browse/HIVE-1200
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.1, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao


The CombineHiveInputFormat does not work with multi-level of directories in a 
single table/partition, because it uses an exact match logic, instead of the 
relativize logic as in MapOperator

{code}
MapOperator.java:
  if (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) 
{
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1032) Better Error Messages for Execution Errors

2010-02-25 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1032:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
 Release Note: HIVE-1032. Better Error Messages for Execution Errors. (Paul 
Yang via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Paul!

 Better Error Messages for Execution Errors
 --

 Key: HIVE-1032
 URL: https://issues.apache.org/jira/browse/HIVE-1032
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Fix For: 0.6.0

 Attachments: HIVE-1032.1.patch, HIVE-1032.2.patch, HIVE-1032.3.patch, 
 HIVE-1032.4.patch, HIVE-1032.5.patch, HIVE-1032.6.patch


 Three common errors that occur during execution are:
 1. Map-side group-by causing an out of memory exception due to large 
 aggregation hash tables
 2. ScriptOperator failing due to the user's script throwing an exception or 
 otherwise returning a non-zero error code
 3. Incorrectly specifying the join order of small and large tables, causing 
 the large table to be loaded into memory and producing an out of memory 
 exception.
 These errors are typically discovered by manually examining the error log 
 files of the failed task. This task proposes to create a feature that would 
 automatically read the error logs and output a probable cause and solution to 
 the command line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1201) Add a python command-line interface for Hive

2010-02-25 Thread Zheng Shao (JIRA)
Add a python command-line interface for Hive


 Key: HIVE-1201
 URL: https://issues.apache.org/jira/browse/HIVE-1201
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Venky Iyer


Venky has a nice python command-line interface for Hive. It uses thrift API to 
talk with metastore. It uses hadoop command line to submit jobs.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition

2010-02-25 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1200:
-

Attachment: HIVE-1200.1.branch-0.5.patch
HIVE-1200.1.patch

 Fix CombineHiveInputFormat to work with multi-level of directories in a 
 single table/partition
 --

 Key: HIVE-1200
 URL: https://issues.apache.org/jira/browse/HIVE-1200
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.1, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1200.1.branch-0.5.patch, HIVE-1200.1.patch


 The CombineHiveInputFormat does not work with multi-level of directories in a 
 single table/partition, because it uses an exact match logic, instead of the 
 relativize logic as in MapOperator
 {code}
 MapOperator.java:
   if 
 (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) {
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition

2010-02-25 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1200:
-

Status: Patch Available  (was: Open)

 Fix CombineHiveInputFormat to work with multi-level of directories in a 
 single table/partition
 --

 Key: HIVE-1200
 URL: https://issues.apache.org/jira/browse/HIVE-1200
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.1, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1200.1.branch-0.5.patch, HIVE-1200.1.patch


 The CombineHiveInputFormat does not work with multi-level of directories in a 
 single table/partition, because it uses an exact match logic, instead of the 
 relativize logic as in MapOperator
 {code}
 MapOperator.java:
   if 
 (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) {
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838718#action_12838718
 ] 

Zheng Shao commented on HIVE-259:
-

Hi Jerome, using ArrayListInteger won't cause unnecessary Object creation. We 
will just create a single ArrayListInteger and use it forever.
Does that make sense?


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1193) ensure sorting properties for a table

2010-02-25 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838737#action_12838737
 ] 

Zheng Shao commented on HIVE-1193:
--

Can we have some more description on the JIRA?
The patch contains 2 properties: enforceBucketing and enforceSorting. But I 
don't see it from the JIRA.

1. How do we make sure that the data is bucketed / sorted? By adding an 
additional map-reduce job?
2. What if the user already specified CLUSTER BY key in his query?
3. Do we disable merging of small files when we do this?


 ensure sorting properties for a table
 -

 Key: HIVE-1193
 URL: https://issues.apache.org/jira/browse/HIVE-1193
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.6.0

 Attachments: hive.1193.1.patch


 If a table is sorted, and data is being inserted into that - currently, we 
 dont make sure that data is sorted. That might be useful some downstream 
 operations.
 This cannot be made the default due to backward compatibility, but an option 
 can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1195) Increase ObjectInspector[] length on demand

2010-02-24 Thread Zheng Shao (JIRA)
Increase ObjectInspector[] length on demand
---

 Key: HIVE-1195
 URL: https://issues.apache.org/jira/browse/HIVE-1195
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao


{code}
Operator.java
  protected transient ObjectInspector[] inputObjInspectors = new 
ObjectInspector[Short.MAX_VALUE];
{code}

An array of 32K elements takes 256KB memory under 64-bit Java.
We are seeing hive client going out of memory because of that.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1195) Increase ObjectInspector[] length on demand

2010-02-24 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1195:
-

Attachment: HIVE-1195.1.patch

 Increase ObjectInspector[] length on demand
 ---

 Key: HIVE-1195
 URL: https://issues.apache.org/jira/browse/HIVE-1195
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1195.1.patch


 {code}
 Operator.java
   protected transient ObjectInspector[] inputObjInspectors = new 
 ObjectInspector[Short.MAX_VALUE];
 {code}
 An array of 32K elements takes 256KB memory under 64-bit Java.
 We are seeing hive client going out of memory because of that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1195) Increase ObjectInspector[] length on demand

2010-02-24 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1195:
-

Fix Version/s: 0.6.0
   0.5.1
   Status: Patch Available  (was: Open)

 Increase ObjectInspector[] length on demand
 ---

 Key: HIVE-1195
 URL: https://issues.apache.org/jira/browse/HIVE-1195
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.5.1, 0.6.0

 Attachments: HIVE-1195.1.patch


 {code}
 Operator.java
   protected transient ObjectInspector[] inputObjInspectors = new 
 ObjectInspector[Short.MAX_VALUE];
 {code}
 An array of 32K elements takes 256KB memory under 64-bit Java.
 We are seeing hive client going out of memory because of that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1195) Increase ObjectInspector[] length on demand

2010-02-24 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1195:
-

Attachment: HIVE-1195.2.patch
HIVE-1195.2.branch-0.5.patch

Fixed an obvious bug which caused unit test failures.

 Increase ObjectInspector[] length on demand
 ---

 Key: HIVE-1195
 URL: https://issues.apache.org/jira/browse/HIVE-1195
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0, 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.5.1, 0.6.0

 Attachments: HIVE-1195-branch-0.5.patch, HIVE-1195.1.patch, 
 HIVE-1195.2.branch-0.5.patch, HIVE-1195.2.patch


 {code}
 Operator.java
   protected transient ObjectInspector[] inputObjInspectors = new 
 ObjectInspector[Short.MAX_VALUE];
 {code}
 An array of 32K elements takes 256KB memory under 64-bit Java.
 We are seeing hive client going out of memory because of that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838118#action_12838118
 ] 

Zheng Shao commented on HIVE-259:
-

Also see http://wiki.apache.org/hadoop/Hive/HowToContribute#Coding_Convention

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838119#action_12838119
 ] 

Zheng Shao commented on HIVE-259:
-

The test cases looks a bit too trivial or the results have problems? They 
always return the same number for the 3 different percentile values.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1194) sorted merge join

2010-02-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838120#action_12838120
 ] 

Zheng Shao commented on HIVE-1194:
--

Why does SortMergeJoinOperator extends MapJoinOperator?
It seems to me that SortMergeJoinOperator does NOTneed the 
in-memory/disk-backed HashMap that MapJoinOperator has, correct?


 sorted merge join
 -

 Key: HIVE-1194
 URL: https://issues.apache.org/jira/browse/HIVE-1194
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0


 If the input tables are sorted on the join key, and a mapjoin is being 
 performed, it is useful to exploit the sorted properties of the table.
 This can lead to substantial cpu savings - this needs to work across bucketed 
 map joins also.
 Since, sorted properties of a table are not enforced currently, a new 
 parameter can be added to specify to use the sort-merge join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1194) sorted merge join

2010-02-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838132#action_12838132
 ] 

Zheng Shao commented on HIVE-1194:
--

If it does not inherit any methods, shall we add an AbstractMapJoinOperator as 
the common parent?
That AbstractMapJoinOperator can be converted to MapJoinOperator (or 
HashBasedMapJoinOperator, to be accurate) or SortMergeJoinOperator depending on 
the configuration/table properties.


 sorted merge join
 -

 Key: HIVE-1194
 URL: https://issues.apache.org/jira/browse/HIVE-1194
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0


 If the input tables are sorted on the join key, and a mapjoin is being 
 performed, it is useful to exploit the sorted properties of the table.
 This can lead to substantial cpu savings - this needs to work across bucketed 
 map joins also.
 Since, sorted properties of a table are not enforced currently, a new 
 parameter can be added to specify to use the sort-merge join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1189) Add package-info.java to Hive

2010-02-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838148#action_12838148
 ] 

Zheng Shao commented on HIVE-1189:
--

I am checking the BuildVersion which contains everything.
I need to think of a way to do a negative test.


 Add package-info.java to Hive
 -

 Key: HIVE-1189
 URL: https://issues.apache.org/jira/browse/HIVE-1189
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1189.1.patch


 Hadoop automatically generates build/src/org/apache/hadoop/package-info.java 
 with information like this:
 {code}
 /*
  * Generated by src/saveVersion.sh
  */
 @HadoopVersionAnnotation(version=0.20.2-dev, revision=826568,
  user=zshao, date=Sun Oct 18 17:46:56 PDT 2009, 
 url=http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20;)
 package org.apache.hadoop;
 {code}
 Hive should do the same thing so that we can easily know the version of the 
 code at runtime.
 This will help us identify whether we are still running the same version of 
 Hive, if we serialize the plan and later continue the execution (See 
 HIVE-1100).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




[jira] Commented: (HIVE-1032) Better Error Messages for Execution Errors

2010-02-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838156#action_12838156
 ] 

Zheng Shao commented on HIVE-1032:
--

That makes sense to me. As long as it's compilable with 0.17 it should be OK.

Sorry there is another last thing :) Can you run ant checkstyle and fix the 
checkstyle warnings introduced by this patch (especially in the new files).

 Better Error Messages for Execution Errors
 --

 Key: HIVE-1032
 URL: https://issues.apache.org/jira/browse/HIVE-1032
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1032.1.patch, HIVE-1032.2.patch, HIVE-1032.3.patch, 
 HIVE-1032.4.patch, HIVE-1032.5.patch


 Three common errors that occur during execution are:
 1. Map-side group-by causing an out of memory exception due to large 
 aggregation hash tables
 2. ScriptOperator failing due to the user's script throwing an exception or 
 otherwise returning a non-zero error code
 3. Incorrectly specifying the join order of small and large tables, causing 
 the large table to be loaded into memory and producing an out of memory 
 exception.
 These errors are typically discovered by manually examining the error log 
 files of the failed task. This task proposes to create a feature that would 
 automatically read the error logs and output a probable cause and solution to 
 the command line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1184) Expression Not In Group By Key error is sometimes masked

2010-02-23 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1184:
-

Status: Open  (was: Patch Available)

 Expression Not In Group By Key error is sometimes masked
 

 Key: HIVE-1184
 URL: https://issues.apache.org/jira/browse/HIVE-1184
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1184.1.patch


 Depending on the order of expressions, the error message for a expression not 
 in group key is not displayed; instead it is null.
 {code}
 hive select concat(value, concat(value)) from src group by concat(value);
 FAILED: Error in semantic analysis: null
 hive select concat(concat(value), value) from src group by concat(value);
 FAILED: Error in semantic analysis: line 1:29 Expression Not In Group By Key 
 value
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1184) Expression Not In Group By Key error is sometimes masked

2010-02-23 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837476#action_12837476
 ] 

Zheng Shao commented on HIVE-1184:
--

The explanation looks good to me, but I am not convinced the solution will 
solve the problem.

When processing concat(value, concat(value)). we will set error when 
processing the first value, then overwrite the error when processing the 
second value, correct?
I think the error should be part of the return value of the process 
function, instead of a global field in the context.

Does that make sense?


 Expression Not In Group By Key error is sometimes masked
 

 Key: HIVE-1184
 URL: https://issues.apache.org/jira/browse/HIVE-1184
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1184.1.patch


 Depending on the order of expressions, the error message for a expression not 
 in group key is not displayed; instead it is null.
 {code}
 hive select concat(value, concat(value)) from src group by concat(value);
 FAILED: Error in semantic analysis: null
 hive select concat(concat(value), value) from src group by concat(value);
 FAILED: Error in semantic analysis: line 1:29 Expression Not In Group By Key 
 value
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   7   8   9   10   >