[jira] Commented: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query
[ https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916573#action_12916573 ] Zheng Shao commented on HIVE-1376: -- I think (3) makes the most sense. If (3) does not work for whatever hard-to-fix reason, we can do (1). In any case, the change should be pretty simple. Simple UDAFs with more than 1 parameter crash on empty row query - Key: HIVE-1376 URL: https://issues.apache.org/jira/browse/HIVE-1376 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Mayank Lahiri Simple UDAFs with more than 1 parameter crash when the query returns no rows. Currently, this only seems to affect the percentile() UDAF where the second parameter is the percentile to be computed (of type double). I've also verified the bug by adding a dummy parameter to ExampleMin in contrib. On an empty query, Hive seems to be trying to resolve an iterate() method with signature {null,null} instead of {null,double}. You can reproduce this bug using: CREATE TABLE pct_test ( val INT ); SELECT percentile(val, 0.5) FROM pct_test; which produces a lot of errors like: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double) on object org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@11d13272 of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator with arguments {null, null} of size 2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
[ https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913670#action_12913670 ] Zheng Shao commented on HIVE-537: - {code} unionT0,T1,T2 create_union(byte tag, T0 o0, T1 o1, T2 o2, ...) Some real examples: unionSchool,Company create_union( is_student ? 0 : 1, school, company) {code} Depending on the value of the tag, the returned union object will choose to store only the object corresponding to that tag. Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map) --- Key: HIVE-537 URL: https://issues.apache.org/jira/browse/HIVE-537 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Amareshwari Sriramadasu Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt There are already some cases inside the code that we use heterogeneous data: JoinOperator, and UnionOperator (in the sense that different parents can pass in records with different ObjectInspectors). We currently use Operator's parentID to distinguish that. However that approach does not extend to more complex plans that might be needed in the future. We will support the union type like this: {code} TypeDefinition: type: primitivetype | structtype | arraytype | maptype | uniontype uniontype: union tag : type (, tag : type)* Example: union0:int,1:double,2:arraystring,3:structa:int,b:string Example of serialized data format: We will first store the tag byte before we serialize the object. On deserialization, we will first read out the tag byte, then we know what is the current type of the following object, so we can deserialize it successfully. Interface for ObjectInspector: interface UnionObjectInspector { /** Returns the array of OIs that are for each of the tags */ ObjectInspector[] getObjectInspectors(); /** Return the tag of the object. */ byte getTag(Object o); /** Return the field based on the tag value associated with the Object. */ Object getField(Object o); }; An example serialization format (Using deliminated format, with ' ' as first-level delimitor and '=' as second-level delimitor) userid:int,log:union0:structtouserid:int,message:string,1:string 123 1=login 123 0=243=helloworld 123 1=logout {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
[ https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912614#action_12912614 ] Zheng Shao commented on HIVE-537: - I think so. Let's use a different name for the UDF. Using 'UNION' as UDF name will not cause grammar ambiguity, but it may cause other issues in the future. Zheng Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map) --- Key: HIVE-537 URL: https://issues.apache.org/jira/browse/HIVE-537 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Amareshwari Sriramadasu Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt There are already some cases inside the code that we use heterogeneous data: JoinOperator, and UnionOperator (in the sense that different parents can pass in records with different ObjectInspectors). We currently use Operator's parentID to distinguish that. However that approach does not extend to more complex plans that might be needed in the future. We will support the union type like this: {code} TypeDefinition: type: primitivetype | structtype | arraytype | maptype | uniontype uniontype: union tag : type (, tag : type)* Example: union0:int,1:double,2:arraystring,3:structa:int,b:string Example of serialized data format: We will first store the tag byte before we serialize the object. On deserialization, we will first read out the tag byte, then we know what is the current type of the following object, so we can deserialize it successfully. Interface for ObjectInspector: interface UnionObjectInspector { /** Returns the array of OIs that are for each of the tags */ ObjectInspector[] getObjectInspectors(); /** Return the tag of the object. */ byte getTag(Object o); /** Return the field based on the tag value associated with the Object. */ Object getField(Object o); }; An example serialization format (Using deliminated format, with ' ' as first-level delimitor and '=' as second-level delimitor) userid:int,log:union0:structtouserid:int,message:string,1:string 123 1=login 123 0=243=helloworld 123 1=logout {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-895) Add SerDe for Avro serialized data
[ https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889772#action_12889772 ] Zheng Shao commented on HIVE-895: - We should just copy \the schema information from the file header to the hive metastore. Add SerDe for Avro serialized data -- Key: HIVE-895 URL: https://issues.apache.org/jira/browse/HIVE-895 Project: Hadoop Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Jeff Hammerbacher As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro data seems like a solid win. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1468) intermediate data produced for select queries ignores hive.exec.compress.intermediate
[ https://issues.apache.org/jira/browse/HIVE-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889528#action_12889528 ] Zheng Shao commented on HIVE-1468: -- select queries means SELECT without INSERT, correct? I agree that we should treat these queries differently, specifically, no compression (or maybe use lzo to same bandwidth - clients can be in other data centers) will be a big win. intermediate data produced for select queries ignores hive.exec.compress.intermediate - Key: HIVE-1468 URL: https://issues.apache.org/jira/browse/HIVE-1468 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Joydeep Sen Sarma set hive.exec.compress.intermediate=false; explain extended select xxx from yyy; ... File Output Operator compressed: true GlobalTableId: 0 looks like we only intermediate locations identified during splitting mr tasks follow this directive. this should be fixed because this forces clients to always decompress output data (even if the config setting is altered). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1460) JOIN should not output rows for NULL values
JOIN should not output rows for NULL values --- Key: HIVE-1460 URL: https://issues.apache.org/jira/browse/HIVE-1460 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao We should filter out rows with NULL keys from the result of this query {code} SELECT * FROM a JOIN b on a.key = b.key {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1460) JOIN should not output rows for NULL values
[ https://issues.apache.org/jira/browse/HIVE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887543#action_12887543 ] Zheng Shao commented on HIVE-1460: -- That's a good use case to consider. I believe Hive currently does not support that (the condition after ON has to be conjunctive), but it's good to keep it in mind. JOIN should not output rows for NULL values --- Key: HIVE-1460 URL: https://issues.apache.org/jira/browse/HIVE-1460 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao We should filter out rows with NULL keys from the result of this query {code} SELECT * FROM a JOIN b on a.key = b.key {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-287) count distinct on multiple columns does not work
[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886794#action_12886794 ] Zheng Shao commented on HIVE-287: - The plan looks good to me. Just one comment: I think we should change the comment/class name for GenericUDAFResolver2. Let's explicitly say GenericUDAFResolver2 is for UDAFs that want to have control over whether DISTINCT or * should be treated differently. For normal UDAFs, they should still inherite from GenericUDAFResolver. Does that sound OK? count distinct on multiple columns does not work Key: HIVE-287 URL: https://issues.apache.org/jira/browse/HIVE-287 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Arvind Prabhakar Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch The following query does not work: select count(distinct col1, col2) from Tbl -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-287) count distinct on multiple columns does not work
[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886882#action_12886882 ] Zheng Shao commented on HIVE-287: - Talked with John offline also. I agree that we can use the new interface going forward. Can you do these also in this patch: 1. Change the comments for the 2 new fields. It's easy for UDAF writers to assume that the UDAF itself needs to handle whether it's distinct or whether it's all columns. 2. Deprecate the old interface, and move all existing GenericUDAF to inherit from the new one. {code} + /** + * @return true if the UDAF invocation was qualified with ttDISTINCT/tt + * keyword, false otherwise. + */ + boolean isDistinct(); + + /** + * @return true if the UDAF invocation was done with a wildcard instead of + * explicit parameter list. + */ + boolean isAllColumns(); {code} After this patch is in, here is a list of follow-ups. Can you open JIRA for these: 1. Let UDAF and UDF support * and regex-based column specification 2. Special-case COUNT(*) because that does not require reading any columns, while MY_UDAF(*) needs all columns. count distinct on multiple columns does not work Key: HIVE-287 URL: https://issues.apache.org/jira/browse/HIVE-287 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Arvind Prabhakar Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch The following query does not work: select count(distinct col1, col2) from Tbl -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-287) count distinct on multiple columns does not work
[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886183#action_12886183 ] Zheng Shao commented on HIVE-287: - Hi Arvind, sorry for coming late for the party. I have 2 questions on the new UDAF2 interface: 1. Why do we put the DISTINCT in the information? DISTINCT is currently done by the framework, instead of individual UDAF. This is good because the logic of removing duplicates are common for all UDAFs. We do support SUM(DISTINCT val). 2. Why do we special-case *? It seems to me that * is just a short-cut. Hive already supports regex-based multi-column specification, so that we can say `abc.*` for all columns with name starting with abc. The compiler should just expand * and give all the columns to the UDAF. Since COUNT(*) is a special-case in the SQL standard (COUNT(*) is different from COUNT(col) even if the table has a single column col), I think we should just special-case that and replace that with count(1) at some place. What do you think? count distinct on multiple columns does not work Key: HIVE-287 URL: https://issues.apache.org/jira/browse/HIVE-287 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Arvind Prabhakar Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch The following query does not work: select count(distinct col1, col2) from Tbl -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1447) Speed up reflection method calls
[ https://issues.apache.org/jira/browse/HIVE-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1447: - Attachment: A.java A.java for performance test. Some of the code is borrowed from http://www.jguru.com/faq/view.jsp?EID=246569 Speed up reflection method calls Key: HIVE-1447 URL: https://issues.apache.org/jira/browse/HIVE-1447 Project: Hadoop Hive Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: A.java See http://www.cowtowncoder.com/blog/archives/2010/04/entry_396.html and http://www.jguru.com/faq/view.jsp?EID=246569 There is a huge drop of overhead (more than half) if we do field.setAccessible(true) for the field that we want to access. I did a simple experiment and that worked well with method as well. The results are (note that the method just add 1 to an integer): {code} 1 regular method calls:26 milliseconds. 1 reflective method calls without lookup:4029 milliseconds. 1 accessible reflective method calls without lookup:1810 milliseconds. {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1447) Speed up reflection method calls
[ https://issues.apache.org/jira/browse/HIVE-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1447: - Attachment: HIVE-1447.1.patch Patch that sets setAccessible for both GenericUDFBridge.java and GenericUDAFBridge.java Speed up reflection method calls Key: HIVE-1447 URL: https://issues.apache.org/jira/browse/HIVE-1447 Project: Hadoop Hive Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: A.java, HIVE-1447.1.patch See http://www.cowtowncoder.com/blog/archives/2010/04/entry_396.html and http://www.jguru.com/faq/view.jsp?EID=246569 There is a huge drop of overhead (more than half) if we do field.setAccessible(true) for the field that we want to access. I did a simple experiment and that worked well with method as well. The results are (note that the method just add 1 to an integer): {code} 1 regular method calls:26 milliseconds. 1 reflective method calls without lookup:4029 milliseconds. 1 accessible reflective method calls without lookup:1810 milliseconds. {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1447) Speed up reflection method calls
[ https://issues.apache.org/jira/browse/HIVE-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1447: - Status: Patch Available (was: Open) Speed up reflection method calls Key: HIVE-1447 URL: https://issues.apache.org/jira/browse/HIVE-1447 Project: Hadoop Hive Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Attachments: A.java, HIVE-1447.1.patch See http://www.cowtowncoder.com/blog/archives/2010/04/entry_396.html and http://www.jguru.com/faq/view.jsp?EID=246569 There is a huge drop of overhead (more than half) if we do field.setAccessible(true) for the field that we want to access. I did a simple experiment and that worked well with method as well. The results are (note that the method just add 1 to an integer): {code} 1 regular method calls:26 milliseconds. 1 reflective method calls without lookup:4029 milliseconds. 1 accessible reflective method calls without lookup:1810 milliseconds. {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883051#action_12883051 ] Zheng Shao commented on HIVE-1271: -- I might be too late for the party, but I have a question on removing the field name comparison for struct type info. We have 3 choices: C1: Compare field names case sensitively. C2: Compare field names case insensitively. C3: Don't compare field names at all. The old implementation was following C1, and the new one is following C3. Is there any reason that we don't do C2? C2 seems to provide some minimal sanity checks that users will need in practice. Case sensitiveness of type information specified when using custom reducer causes type mismatch --- Key: HIVE-1271 URL: https://issues.apache.org/jira/browse/HIVE-1271 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1271-1.patch, HIVE-1271.patch Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = userId failed. hive CREATE TABLE SS ( a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ); OK hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s INSERT OVERWRITE TABLE SS REDUCE * USING 'myreduce.py' AS (a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ) ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from arraystructuserId:int,y:string to arraystructuserid:int,y:string. The same query worked fine after changing userId to userid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1338) Fix bin/ext/jar.sh to work with hadoop 0.20 and above
Fix bin/ext/jar.sh to work with hadoop 0.20 and above - Key: HIVE-1338 URL: https://issues.apache.org/jira/browse/HIVE-1338 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Zheng Shao {{bin/ext/jar.sh}} is not working with hadoop 0.20 and above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1338) Fix bin/ext/jar.sh to work with hadoop 0.20 and above
[ https://issues.apache.org/jira/browse/HIVE-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1338: - Attachment: HIVE-1338.1.patch This patch follows the same way as {{bin/ext/hiveserver.sh}} Fix bin/ext/jar.sh to work with hadoop 0.20 and above - Key: HIVE-1338 URL: https://issues.apache.org/jira/browse/HIVE-1338 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1338.1.patch {{bin/ext/jar.sh}} is not working with hadoop 0.20 and above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1338) Fix bin/ext/jar.sh to work with hadoop 0.20 and above
[ https://issues.apache.org/jira/browse/HIVE-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1338: - Status: Patch Available (was: Open) Fix bin/ext/jar.sh to work with hadoop 0.20 and above - Key: HIVE-1338 URL: https://issues.apache.org/jira/browse/HIVE-1338 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1338.1.patch {{bin/ext/jar.sh}} is not working with hadoop 0.20 and above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1311) bug is use of hadoop supports splittable
[ https://issues.apache.org/jira/browse/HIVE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1311: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Release Note: HIVE-1311. Bug in use of parameter hadoop supports splittable. (Namit Jain via zshao) Resolution: Fixed Committed. Thanks Namit! (Sorry I didn't see Ning's comment before committing) bug is use of hadoop supports splittable Key: HIVE-1311 URL: https://issues.apache.org/jira/browse/HIVE-1311 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.6.0 Attachments: hive.1311.1.patch CombineHiveInputFormat: getSplits() if (this.mrwork != null this.mrwork.getHadoopSupportsSplittable()) should check if hadoop supports splittable is false -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-1312) hive trunk does compile with hadoop 0.17 any more
hive trunk does compile with hadoop 0.17 any more - Key: HIVE-1312 URL: https://issues.apache.org/jira/browse/HIVE-1312 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: John Sichi This is caused by HIVE-1295. {code} compile: [echo] Compiling: hive [javac] Compiling 527 source files to /hadoop_hive_trunk/.ptest_0/build/ql/classes [javac] /hadoop_hive_trunk/.ptest_0/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOu\ tputFormat.java:69: cannot find symbol [javac] symbol : method getBytes() [javac] location: class org.apache.hadoop.io.BytesWritable [javac] keyWritable.set(bw.getBytes(), 0, bw.getLength()); [javac] ^ [javac] /hadoop_hive_trunk/.ptest_0/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOu\ tputFormat.java:69: cannot find symbol [javac] symbol : method getLength() [javac] location: class org.apache.hadoop.io.BytesWritable [javac] keyWritable.set(bw.getBytes(), 0, bw.getLength()); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1280) problem in combinehiveinputformat with nested directories
[ https://issues.apache.org/jira/browse/HIVE-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854162#action_12854162 ] Zheng Shao commented on HIVE-1280: -- splitable - splittable problem in combinehiveinputformat with nested directories - Key: HIVE-1280 URL: https://issues.apache.org/jira/browse/HIVE-1280 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1280.1.patch, hive.1280.2.patch, hive.1280.3.patch, hive.1280.4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1292) Bug in generating partition pruner expression
Bug in generating partition pruner expression - Key: HIVE-1292 URL: https://issues.apache.org/jira/browse/HIVE-1292 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: Zheng Shao The logic for generating the in GenericFuncExprProcessor has a problem. None of the partitions passed the pruner in the following query: {code} SELECT * FROM mytable a WHERE pcol0 = '2010-04-03' AND CASE WHEN ((col0 ='a') OR (col0 = 'b')) THEN 'a' ELSE NULL END IS NOT NULL; {code} While the partition '2010-04-03' did pass the pruner in the following query: {code} SELECT * FROM mytable a WHERE pcol0 = '2010-04-03' AND CASE WHEN (col0 ='a') THEN 'a' ELSE NULL END IS NOT NULL; {code} The logic for generating the pruner condition is here: org.apache.hadoop.hive.ql.optimizer.ppr.ExprProcFactory.GenericFuncExprProcessor.process(...) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1280) problem in combinehiveinputformat with nested directories
[ https://issues.apache.org/jira/browse/HIVE-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1280: - Status: Open (was: Patch Available) problem in combinehiveinputformat with nested directories - Key: HIVE-1280 URL: https://issues.apache.org/jira/browse/HIVE-1280 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1280.1.patch, hive.1280.2.patch, hive.1280.3.patch, hive.1280.4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1280) problem in combinehiveinputformat with nested directories
[ https://issues.apache.org/jira/browse/HIVE-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1280: - Resolution: Fixed Release Note: HIVE-1280. Add option to CombineHiveInputFormat for non-splittable inputs. (Namit Jain via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Namit! problem in combinehiveinputformat with nested directories - Key: HIVE-1280 URL: https://issues.apache.org/jira/browse/HIVE-1280 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.1280.1.patch, hive.1280.2.patch, hive.1280.3.patch, hive.1280.4.patch, hive.1280.5.patch, hive.1280.6.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853323#action_12853323 ] Zheng Shao commented on HIVE-1131: -- Still seeing test failures from HIVE-1131_7.patch {code} .ptest_0/test.17.2.1.log:[junit] Begin query: groupby8.q .ptest_0/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 -- .ptest_1/test.17.2.1.log:[junit] Begin query: groupby8_map_skew.q .ptest_1/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 -- .ptest_1/test.17.2.1.log:[junit] Begin query: multi_insert.q .ptest_1/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 -- .ptest_1/test.17.2.1.log:[junit] Begin query: reduce_deduplicate.q .ptest_1/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 -- .ptest_1/test.17.2.1.log:[junit] Begin query: union18.q .ptest_1/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 -- .ptest_2/test.17.2.1.log:[junit] Begin query: groupby7.q .ptest_2/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 -- .ptest_2/test.17.2.1.log:[junit] Begin query: groupby8_noskew.q .ptest_2/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 -- .ptest_2/test.17.2.1.log:[junit] Begin query: input12.q .ptest_2/test.17.2.1.log:[junit] junit.framework.AssertionFailedError: Client execution results failed with error code = 1 -- {code} Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1291) Fix UDAFPercentile ndexOutOfBoundsException
Fix UDAFPercentile ndexOutOfBoundsException --- Key: HIVE-1291 URL: https://issues.apache.org/jira/browse/HIVE-1291 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao The counts array can be empty. We should directly return null in that case. {code} org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.hive.serde2.io.DoubleWritable org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate() on object org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@530d0eae of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator with arguments {} of size 0 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:725) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.terminate(GenericUDAFBridge.java:181) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.evaluate(GenericUDAFEvaluator.java:157) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:885) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:539) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:300) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412) at org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:701) ... 9 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.udf.UDAFPercentile.getPercentile(UDAFPercentile.java:97) at org.apache.hadoop.hive.ql.udf.UDAFPercentile.access$300(UDAFPercentile.java:44) at org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate(UDAFPercentile.java:196) ... 14 more {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1291) Fix UDAFPercentile ndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1291: - Attachment: HIVE-1291.1.patch This patch fixes the bug. Fix UDAFPercentile ndexOutOfBoundsException --- Key: HIVE-1291 URL: https://issues.apache.org/jira/browse/HIVE-1291 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1291.1.patch The counts array can be empty. We should directly return null in that case. {code} org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.hive.serde2.io.DoubleWritable org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate() on object org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@530d0eae of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator with arguments {} of size 0 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:725) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.terminate(GenericUDAFBridge.java:181) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.evaluate(GenericUDAFEvaluator.java:157) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:885) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:539) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:300) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412) at org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:701) ... 9 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.udf.UDAFPercentile.getPercentile(UDAFPercentile.java:97) at org.apache.hadoop.hive.ql.udf.UDAFPercentile.access$300(UDAFPercentile.java:44) at org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate(UDAFPercentile.java:196) ... 14 more {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1291) Fix UDAFPercentile ndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1291: - Fix Version/s: 0.6.0 Status: Patch Available (was: Open) Fix UDAFPercentile ndexOutOfBoundsException --- Key: HIVE-1291 URL: https://issues.apache.org/jira/browse/HIVE-1291 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Fix For: 0.6.0 Attachments: HIVE-1291.1.patch The counts array can be empty. We should directly return null in that case. {code} org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.hive.serde2.io.DoubleWritable org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate() on object org.apache.hadoop.hive.ql.udf.udafpercentile$percentilelongevalua...@530d0eae of class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator with arguments {} of size 0 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:725) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.terminate(GenericUDAFBridge.java:181) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.evaluate(GenericUDAFEvaluator.java:157) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:838) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:885) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:539) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:300) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412) at org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:701) ... 9 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.udf.UDAFPercentile.getPercentile(UDAFPercentile.java:97) at org.apache.hadoop.hive.ql.udf.UDAFPercentile.access$300(UDAFPercentile.java:44) at org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.terminate(UDAFPercentile.java:196) ... 14 more {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1253) date_sub() function returns wrong date because of daylight saving time difference
[ https://issues.apache.org/jira/browse/HIVE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852279#action_12852279 ] Zheng Shao commented on HIVE-1253: -- +1. Will test and commit. date_sub() function returns wrong date because of daylight saving time difference - Key: HIVE-1253 URL: https://issues.apache.org/jira/browse/HIVE-1253 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: mingran wang Attachments: HIVE-1253.patch date_sub('2010-03-15', 7) returns '2010-03-07'. This is because we have time shifts on 2010-03-14 for daylight saving time. Looking at ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java, it is getting a calendar instance in UTC time zone. def calendar = Calendar.getInstance(TimeZone.getTimeZone(UTC)); And use calendar.add() to substract 7 days, then conver the time to '-MM-dd' format. If it simply uses default timezone, the problem is solved: def calendar = Calendar.getInstance()); When people use date_sub('2010-03-15', 7), I think they mean substract 7 days, instead of substracting 7*24 hours. So it should be an easy fix. The same changes should go to date_add and date_diff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1253) date_sub() function returns wrong date because of daylight saving time difference
[ https://issues.apache.org/jira/browse/HIVE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1253: - Resolution: Fixed Fix Version/s: 0.6.0 Release Note: HIVE-1253. Fix Date_sub and Date_add in case of daylight saving. (Bryan Talbot via zshao) (was: Fix off-by-one issue with date_sub and date_add when date ranges include a daylight savings time change.) Status: Resolved (was: Patch Available) Committed. Thanks Bryan! date_sub() function returns wrong date because of daylight saving time difference - Key: HIVE-1253 URL: https://issues.apache.org/jira/browse/HIVE-1253 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: mingran wang Assignee: Bryan Talbot Fix For: 0.6.0 Attachments: HIVE-1253.patch date_sub('2010-03-15', 7) returns '2010-03-07'. This is because we have time shifts on 2010-03-14 for daylight saving time. Looking at ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java, it is getting a calendar instance in UTC time zone. def calendar = Calendar.getInstance(TimeZone.getTimeZone(UTC)); And use calendar.add() to substract 7 days, then conver the time to '-MM-dd' format. If it simply uses default timezone, the problem is solved: def calendar = Calendar.getInstance()); When people use date_sub('2010-03-15', 7), I think they mean substract 7 days, instead of substracting 7*24 hours. So it should be an easy fix. The same changes should go to date_add and date_diff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1272) Add SymlinkTextInputFormat to Hive
[ https://issues.apache.org/jira/browse/HIVE-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852015#action_12852015 ] Zheng Shao commented on HIVE-1272: -- Can you add a test case? Take a look at the .q files in ql/src/test/clientpositive Add SymlinkTextInputFormat to Hive -- Key: HIVE-1272 URL: https://issues.apache.org/jira/browse/HIVE-1272 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.5.0 Reporter: Zheng Shao Assignee: Guanghao Shen Attachments: HIVE-1272.1.patch We'd like to add a symlink text input format so that we can specify the list of files for a table/partition based on the content of a text file. For example, the location of the table is /user/hive/mytable. There is a file called /user/hive/mytable/myfile.txt. Inside the file, there are 2 lines, /user/myname/textfile1.txt and /user/myname/textfile2.txt We can do: {code} CREATE TABLE mytable (...) STORED AS INPUTFORMAT 'org.apache.hadoop.hive.io.SymlinkTextInputFormat' LOCATION '/user/hive/mytable'; SELECT * FROM mytable; {code} which will return the content of the 2 files: /user/myname/textfile1.txt and /user/myname/textfile2.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1272) Add SymlinkTextInputFormat to Hive
[ https://issues.apache.org/jira/browse/HIVE-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852040#action_12852040 ] Zheng Shao commented on HIVE-1272: -- We can add a file with name data/symlink.txt which contains the text ../src/kv.txt then in ql/src/test/clientpositive/mysymlink.q we can do this: {code} CREATE TABLE mysymlink (key STRING, value STRING) STORE AS INPUTFORMAT ... dfs -cp ../data/symlink.txt ../build/ql/test/data/warehouse/mysymlink/symlink1.txt; dfs -cp ../data/symlink.txt ../build/ql/test/data/warehouse/mysymlink/symlink2.txt; SELECT * FROM mysymlink; SELECT count(1) FROM mysymlink; {code} In order to test, run: ant test -Doffline=true -Dtestcase=TestCliDriver -Dqfile=mysymlink.q -Doverwrite=true And do svn add ql/.../mysymlink.q.out Run without -Doverwrite=true to verify the result. Add SymlinkTextInputFormat to Hive -- Key: HIVE-1272 URL: https://issues.apache.org/jira/browse/HIVE-1272 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.5.0 Reporter: Zheng Shao Assignee: Guanghao Shen Attachments: HIVE-1272.1.patch We'd like to add a symlink text input format so that we can specify the list of files for a table/partition based on the content of a text file. For example, the location of the table is /user/hive/mytable. There is a file called /user/hive/mytable/myfile.txt. Inside the file, there are 2 lines, /user/myname/textfile1.txt and /user/myname/textfile2.txt We can do: {code} CREATE TABLE mytable (...) STORED AS INPUTFORMAT 'org.apache.hadoop.hive.io.SymlinkTextInputFormat' LOCATION '/user/hive/mytable'; SELECT * FROM mytable; {code} which will return the content of the 2 files: /user/myname/textfile1.txt and /user/myname/textfile2.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1286) error/info message being emitted on standard output
[ https://issues.apache.org/jira/browse/HIVE-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1286: - Resolution: Fixed Release Note: HIVE-1286. Remove debug message from stdout in ColumnarSerDe. (Yongqiang He via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Yongqiang! error/info message being emitted on standard output --- Key: HIVE-1286 URL: https://issues.apache.org/jira/browse/HIVE-1286 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Priority: Minor Fix For: 0.6.0 Attachments: hive.1286.1.patch, hive.1286.2.patch 'Found class for org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' should go to stderr where other informational messages are sent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1289) Make gz text file work with CombineHiveInputFormat
Make gz text file work with CombineHiveInputFormat -- Key: HIVE-1289 URL: https://issues.apache.org/jira/browse/HIVE-1289 Project: Hadoop Hive Issue Type: Improvement Reporter: Zheng Shao If the user has applied MAPREDUCE-1649, he should be able to use CombineHiveInputFormat with .gz text files. We should add an option to enable that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851664#action_12851664 ] Zheng Shao commented on HIVE-1131: -- S1. Can we make lineage partition-level instead of table-level? I don't see this implemented in the new patch. After looking at the code more, I'd agree that this is too hard (and inefficient) to do, when the query has a range over a lot of partitions. S3. Use {} even for single statement in if, for etc. I cannot find any instances of these now. Still have some questions: S2. We might want to define formally the concepts of these levels, especially how they are composited (What will be UDAF of UDF, or UDF of UDAF, like round(sum(col)), or sum(round(col))) LineageInfo.java: Can you add some comments on what DependencyType the nested dependencies like round(sum(col)) or sum(round(col))) have? S6. The best place to store LineageInfo is probably in the QueryPlan instead of SessionState. Otherwise the LineageInfo will be lost when we run a query that is compiled earlier. Thoughts? Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851718#action_12851718 ] Zheng Shao commented on HIVE-1131: -- Look at the DataContainer class. That has a partition in it. And the Dependency has a mapping from Partition to the dependencies. Can you explain more your concerns on inefficiency? I see. So the DataContainer captures the output partition information, but we don't have input partition information (BaseColumnInfo/TableAliasInfo). This is reasonable since the input can be lots of partitions. For S6 actually the queryplan is the wrong place to store the lineageinfo. Because of the dynamic partitioning work that Ning is doing, I have to generate the partition to dependency mapping at run time. So I would rather store it in a run time structure as opposed to a compile time structure. SessionState fits that bill, though I think we should have another structure called ExecutionCtx for this. But otherwise I think we want to store this in a runtime structure. +1 on the ExecutionCtx idea. SessionState is at the session level, and LineageInfo is at the query level. It will be great to put LineageInfo into ExecutionCtx. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1286) error/info message being emitted on standard output
[ https://issues.apache.org/jira/browse/HIVE-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851048#action_12851048 ] Zheng Shao commented on HIVE-1286: -- Shall we use LOG.info or LOG.debug instead? error/info message being emitted on standard output --- Key: HIVE-1286 URL: https://issues.apache.org/jira/browse/HIVE-1286 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Priority: Minor Fix For: 0.6.0 Attachments: hive.1286.1.patch 'Found class for org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' should go to stderr where other informational messages are sent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1287) Struct datatype should not use field names for type equivalence.
[ https://issues.apache.org/jira/browse/HIVE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851075#action_12851075 ] Zheng Shao commented on HIVE-1287: -- I think we should support the following query: {code} insert overwrite table sink select CAST(foo AS structy: string) from source; {code} This is better than directly converting them, because there can be confusions (There are 2 ways to convert from structx: string, y: string and structy: string, x: string, and Hive is taking one of them). Struct datatype should not use field names for type equivalence. Key: HIVE-1287 URL: https://issues.apache.org/jira/browse/HIVE-1287 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Environment: Mac OS X (10.6.2) Java SE 6 ( 1.6.0_17) Reporter: Arvind Prabhakar The field names for {{Struct}} types are currently being matched for testing type equivalence. This is readily seen by running the following example: {noformat} hive create table source ( foo struct x : string ); OK Time taken: 3.094 seconds hive load data local inpath '/path/to/sample/data.txt' overwrite into table source; Copying data from file:/path/to/sample/data.txt Loading data to table source OK Time taken: 0.593 seconds hive create table sink ( bar struct y : string ); OK Time taken: 0.11 seconds hive insert overwrite table sink select foo from source; FAILED: Error in semantic analysis: line 1:23 Cannot insert into target table because column number/types are different sink: Cannot convert column 0 from structx:string to structy:string. {noformat} Since both {{soruce.foo}} and {{sink.bar}} are similar in definition with only field names being different, data movement between these two should be allowed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
[ https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850427#action_12850427 ] Zheng Shao commented on HIVE-1019: -- The concept of session is longer than a query. See HIVE-584. We should not start a new session inside a query. Instead we should introduce a separate concept (maybe a combination of session id and task id) for that and use that for the PLAN. java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) Key: HIVE-1019 URL: https://issues.apache.org/jira/browse/HIVE-1019 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: Bennie Schut Assignee: Bennie Schut Priority: Minor Fix For: 0.6.0 Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt I keep getting errors like this: java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) and : java.io.IOException: cannot find dir = hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in partToPartitionInfo! when running multiple threads with roughly similar queries. I have a patch for this which works for me. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1256) fix Hive logo img tag to avoid stretching
[ https://issues.apache.org/jira/browse/HIVE-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao resolved HIVE-1256. -- Resolution: Fixed svn info; svn commit -m Fixed hive_logo_medium.jpg Path: . URL: https://svn.apache.org/repos/asf/hadoop/hive/site Repository Root: https://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 915946 Node Kind: directory Schedule: normal Last Changed Author: zshao Last Changed Rev: 915691 Last Changed Date: 2010-02-23 21:54:01 -0800 (Tue, 23 Feb 2010) Sendingauthor/src/documentation/content/xdocs/hive_logo_medium.jpg Sendingpublish/images/hive_logo_medium.jpg Transmitting file data .. Committed revision 927292. fix Hive logo img tag to avoid stretching - Key: HIVE-1256 URL: https://issues.apache.org/jira/browse/HIVE-1256 Project: Hadoop Hive Issue Type: Bug Components: Documentation Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Zheng Shao Fix For: 0.6.0 From comment on HIVE-422: Aaron Newton added a comment - 17/Mar/10 02:32 AM Hey guys, I saw this article on TC today: http://techcrunch.com/2010/03/16/big-data-freedom/ and noticed the hive logo was all out of whack - all stretched out. Then I noticed it's like that on the Hive home page. Can someone fix the dimensions of the image tag? It looks kinda bad (and people are apparently using it like that elsewhere as seen on the TC article). http://hadoop.apache.org/hive/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
[ https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849438#action_12849438 ] Zheng Shao commented on HIVE-1255: -- Edward, can you add back the (unnecessary) type casts in FunctionRegistry.java? These are required to get hive compilable with hadoop 0.17. Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan -- Key: HIVE-1255 URL: https://issues.apache.org/jira/browse/HIVE-1255 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1255-patch.txt Add support for PI, E, degrees, radians, tan, sign and atan -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1255) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan
[ https://issues.apache.org/jira/browse/HIVE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1255: - Status: Open (was: Patch Available) Add mathamatical UDFs PI, E, degrees, radians, tan, sign, and atan -- Key: HIVE-1255 URL: https://issues.apache.org/jira/browse/HIVE-1255 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1255-patch.txt Add support for PI, E, degrees, radians, tan, sign and atan -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1272) Add SymlinkTextInputFormat to Hive
Add SymlinkTextInputFormat to Hive -- Key: HIVE-1272 URL: https://issues.apache.org/jira/browse/HIVE-1272 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.5.0 Reporter: Zheng Shao We'd like to add a symlink text input format so that we can specify the list of files for a table/partition based on the content of a text file. For example, the location of the table is /user/hive/mytable. There is a file called /user/hive/mytable/myfile.txt. Inside the file, there are 2 lines, /user/myname/textfile1.txt and /user/myname/textfile2.txt We can do: {code} CREATE TABLE mytable (...) STORED AS INPUTFORMAT 'org.apache.hadoop.hive.io.SymlinkTextInputFormat' LOCATION '/user/hive/mytable'; SELECT * FROM mytable; {code} which will return the content of the 2 files: /user/myname/textfile1.txt and /user/myname/textfile2.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1273) UDF_Percentile NullPointerException
UDF_Percentile NullPointerException --- Key: HIVE-1273 URL: https://issues.apache.org/jira/browse/HIVE-1273 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Zheng Shao -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1273) UDF_Percentile NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1273: - Attachment: HIVE-1273.1.patch Ignore null in merge. UDF_Percentile NullPointerException --- Key: HIVE-1273 URL: https://issues.apache.org/jira/browse/HIVE-1273 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Fix For: 0.6.0 Attachments: HIVE-1273.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-365) Create Table to support multiple levels of delimiters
[ https://issues.apache.org/jira/browse/HIVE-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847048#action_12847048 ] Zheng Shao commented on HIVE-365: - I am thinking something like: {code} CREATE TABLE nested(array_of_arrays ARRAY ARRAY INT, map_of_maps MAP STRING, MAP INT, INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' '\002' '\003' '\004' '\005'; {code} Basically allowing multiple separators after FIELDS TERMINATED. The top level (fields) consumes 1 level of separators. Each level of array consumes 1 level of separators, while each level of map consumes 2. Create Table to support multiple levels of delimiters - Key: HIVE-365 URL: https://issues.apache.org/jira/browse/HIVE-365 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao From HIVE-337, the SerDe layer now supports multiple-levels of delimiters, for the purpose of supporting nested map/array/struct. Array(the same as List) and struct consume a single level of separator, and Map consumes 2 levels. DDL (Create Table) needs to allow users to specify multiple levels of delimiters in order to take the advantage of this new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1219) More robust handling of metastore connection failures
[ https://issues.apache.org/jira/browse/HIVE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846568#action_12846568 ] Zheng Shao commented on HIVE-1219: -- nitpick: HiveConf.ConfVars.METATOREATTEMPTS has a typo. More robust handling of metastore connection failures - Key: HIVE-1219 URL: https://issues.apache.org/jira/browse/HIVE-1219 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Reporter: Paul Yang Assignee: Paul Yang Fix For: 0.6.0 Attachments: HIVE-1219.1.patch, HIVE-1219.2.patch, HIVE-1219.3.patch, HIVE-1219.4.patch Currently, if metastore's connection to the datastore is broken, the query fails and the exception such as the following is thrown {code} 2010-01-28 11:50:20,885 ERROR exec.MoveTask (SessionState.java:printError(248)) - Failed with exception Unable to fetch table tmp_table org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table tmp_table at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:362) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:333) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:112) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:99) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:64) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:582) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:462) at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:324) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:312) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:200) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:256) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: javax.jdo.JDODataStoreException: Communications link failure Last packet sent to the server was 1 ms ago. NestedThrowables: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure Last packet sent to the server was 1 ms ago. at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:289) {code} In order to reduce the impact of transient network issues and momentarily unavailable datastores, two possible improvements are: 1. Retrying the metastore command in case of connection failure before propagating up the exception. 2. Retrieving the datastore hostname / connection URL through the use of an extension. This extension would be useful in the case where a remote service maintained the location of the currently available datastore. In case of hostname changes or failovers to a backup datastore, the extension would allow hive clients to run without manual intervention. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1254) CTAS creates column names starting with _ while the grammar does not allow column names starting with _
CTAS creates column names starting with _ while the grammar does not allow column names starting with _ --- Key: HIVE-1254 URL: https://issues.apache.org/jira/browse/HIVE-1254 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Ning Zhang {code} CREATE TABLE tmp_table AS SELECT adid, min(timestamp) FROM ads GROUP BY adid; {code} The second column name is _c1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1242) CombineHiveInputFormat does not work for compressed text files
[ https://issues.apache.org/jira/browse/HIVE-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844663#action_12844663 ] Zheng Shao commented on HIVE-1242: -- Talked with Namit offline. HIVE-1200 needs a small fix that will be included together by Namit. CombineHiveInputFormat does not work for compressed text files -- Key: HIVE-1242 URL: https://issues.apache.org/jira/browse/HIVE-1242 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.5.1, 0.6.0 Attachments: hive.1242.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1238) Get partitions with a partial specification
[ https://issues.apache.org/jira/browse/HIVE-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844187#action_12844187 ] Zheng Shao commented on HIVE-1238: -- {{get_partitions_mp_by_name}} will be much more efficient than the other. HIVE-804 can be used as a test case for the new API if we refactor it. Get partitions with a partial specification Key: HIVE-1238 URL: https://issues.apache.org/jira/browse/HIVE-1238 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Reporter: Paul Yang Assignee: Paul Yang Fix For: 0.6.0 Currently, the metastore API only allows retrieval of all the partitions of a table, or the retrieval of a single partition given a complete partition specification. For HIVE-936, a method to retrieve all partitions that match a partial partition specification would be useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1216) Show the row with error in mapper/reducer
[ https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1216: - Attachment: HIVE-1216.4.patch Sorry I forgot to include a newly added file: UDFTestErrorOnFalse.java HIVE-1216.4.patch should work fine now. Show the row with error in mapper/reducer - Key: HIVE-1216 URL: https://issues.apache.org/jira/browse/HIVE-1216 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1216.1.patch, HIVE-1216.3.patch, HIVE-1216.4.patch It will be very useful for user to debug the HiveQL if mapper/reducer can show the row that caused error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging
[ https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1212: - Attachment: HIVE-1212.3.patch Addressed the comment from Paul. Explicitly say Hive Internal Error to ease debugging -- Key: HIVE-1212 URL: https://issues.apache.org/jira/browse/HIVE-1212 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1212.1.patch, HIVE-1212.2.patch, HIVE-1212.3.patch Our users complain that hive fails error messages like FAILED: Unknown exception: null. We should explicitly mention that's an internal error of Hive, and provide more information (stacktrace) on the screen to ease bug reporting and debugging. In other cases, we will still put the detailed information (stacktrace) in the log, since users should be able to figure out what's wrong with a single line of message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1216) Show the row with error in mapper/reducer
[ https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1216: - Attachment: HIVE-1216.3.patch This patch fixes some checkstyle warnings. Show the row with error in mapper/reducer - Key: HIVE-1216 URL: https://issues.apache.org/jira/browse/HIVE-1216 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1216.1.patch, HIVE-1216.3.patch It will be very useful for user to debug the HiveQL if mapper/reducer can show the row that caused error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1216) Show the row with error in mapper/reducer
[ https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842908#action_12842908 ] Zheng Shao commented on HIVE-1216: -- Which test case? I tried them but they were fine for me. Show the row with error in mapper/reducer - Key: HIVE-1216 URL: https://issues.apache.org/jira/browse/HIVE-1216 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1216.1.patch, HIVE-1216.3.patch It will be very useful for user to debug the HiveQL if mapper/reducer can show the row that caused error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1179) Add UDF array_contains
[ https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao reassigned HIVE-1179: Assignee: Arvind Prabhakar Add UDF array_contains -- Key: HIVE-1179 URL: https://issues.apache.org/jira/browse/HIVE-1179 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Arvind Prabhakar Attachments: HIVE-1179.patch Returns true or false, depending on whether an element is in an array. {{array_contains(T element, arrayT theArray)}} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1216) Show the row with error in mapper/reducer
[ https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1216: - Status: Patch Available (was: Open) Show the row with error in mapper/reducer - Key: HIVE-1216 URL: https://issues.apache.org/jira/browse/HIVE-1216 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1216.1.patch It will be very useful for user to debug the HiveQL if mapper/reducer can show the row that caused error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1216) Show the row with error in mapper/reducer
[ https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1216: - Attachment: HIVE-1216.1.patch Show the row with error in mapper/reducer - Key: HIVE-1216 URL: https://issues.apache.org/jira/browse/HIVE-1216 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1216.1.patch It will be very useful for user to debug the HiveQL if mapper/reducer can show the row that caused error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1179) Add UDF array_contains
[ https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842237#action_12842237 ] Zheng Shao commented on HIVE-1179: -- Hi Arvind, we have to restore the unnecessary type conversion for hadoop 0.17. Try the following command and you will see why: {code} ant -Dhadoop.version=0.17.2.1 clean package {code} Add UDF array_contains -- Key: HIVE-1179 URL: https://issues.apache.org/jira/browse/HIVE-1179 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Arvind Prabhakar Attachments: HIVE-1179.patch Returns true or false, depending on whether an element is in an array. {{array_contains(T element, arrayT theArray)}} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1179) Add UDF array_contains
[ https://issues.apache.org/jira/browse/HIVE-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1179: - Status: Open (was: Patch Available) Add UDF array_contains -- Key: HIVE-1179 URL: https://issues.apache.org/jira/browse/HIVE-1179 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Arvind Prabhakar Attachments: HIVE-1179.patch Returns true or false, depending on whether an element is in an array. {{array_contains(T element, arrayT theArray)}} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1211) Tapping logs from child processes
[ https://issues.apache.org/jira/browse/HIVE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842239#action_12842239 ] Zheng Shao commented on HIVE-1211: -- Hi bc, can you talk a bit more about the use case in your mind? Tapping logs from child processes - Key: HIVE-1211 URL: https://issues.apache.org/jira/browse/HIVE-1211 Project: Hadoop Hive Issue Type: Improvement Components: Logging Reporter: bc Wong Attachments: HIVE-1211.1.patch Stdout/stderr from child processes (e.g. {{MapRedTask}}) are redirected to the parent's stdout/stderr. There is little one can do to to sort out which log is from which query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1211) Tapping logs from child processes
[ https://issues.apache.org/jira/browse/HIVE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao reassigned HIVE-1211: Assignee: bc Wong Tapping logs from child processes - Key: HIVE-1211 URL: https://issues.apache.org/jira/browse/HIVE-1211 Project: Hadoop Hive Issue Type: Improvement Components: Logging Reporter: bc Wong Assignee: bc Wong Attachments: HIVE-1211.1.patch Stdout/stderr from child processes (e.g. {{MapRedTask}}) are redirected to the parent's stdout/stderr. There is little one can do to to sort out which log is from which query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1216) Show the row with error in mapper/reducer
Show the row with error in mapper/reducer - Key: HIVE-1216 URL: https://issues.apache.org/jira/browse/HIVE-1216 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao It will be very useful for user to debug the HiveQL if mapper/reducer can show the row that caused error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging
[ https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1212: - Attachment: HIVE-1212.2.patch Cleaned up some error processing code. Explicitly say Hive Internal Error to ease debugging -- Key: HIVE-1212 URL: https://issues.apache.org/jira/browse/HIVE-1212 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1212.1.patch, HIVE-1212.2.patch Our users complain that hive fails error messages like FAILED: Unknown exception: null. We should explicitly mention that's an internal error of Hive, and provide more information (stacktrace) on the screen to ease bug reporting and debugging. In other cases, we will still put the detailed information (stacktrace) in the log, since users should be able to figure out what's wrong with a single line of message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1216) Show the row with error in mapper/reducer
[ https://issues.apache.org/jira/browse/HIVE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841971#action_12841971 ] Zheng Shao commented on HIVE-1216: -- Thanks for the link Jeff. This JIRA aims to do a bit different thing. Instead of writing the data into _skip file and letting the job finish, we will print out the row to stderr/stdout or just using LOG. The advantage of this: 1. Really easy for debugging - people don't need to use command line tools to fetch the _skip file. 2. Should be able to attach column names with their values, because Hive knows the column names. Show the row with error in mapper/reducer - Key: HIVE-1216 URL: https://issues.apache.org/jira/browse/HIVE-1216 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao It will be very useful for user to debug the HiveQL if mapper/reducer can show the row that caused error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-431) Auto-add table property select to be the select statement that created the table
[ https://issues.apache.org/jira/browse/HIVE-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842014#action_12842014 ] Zheng Shao commented on HIVE-431: - I guess the information is already in lineage. I think it's a good idea to keep lineage information away from the core metadata, especially given that we are going to have column lineage etc. But we should provide an easy way for users to retrieve the lineage information. Auto-add table property select to be the select statement that created the table -- Key: HIVE-431 URL: https://issues.apache.org/jira/browse/HIVE-431 Project: Hadoop Hive Issue Type: Wish Reporter: Adam Kramer A syntactic copy of the query that was used to fill a table would often be AMAZINGLY useful for figuring out where the data in the table came from. I think the best way to implement this would be to automatically add a table property which includes the SELECT statement. For partitioned tables, this would need to exist for each partition...or perhaps use some canonical name like selectquery for unpartitioned tables, plus selectquery_ds=DATEID for partitioned tables. This problem is growing as more and more tables in our database are generated by either root or by people who are no longer easy to contact. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-224) implement lfu based flushing policy for map side aggregates
[ https://issues.apache.org/jira/browse/HIVE-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841714#action_12841714 ] Zheng Shao commented on HIVE-224: - Hi James, currently we don't have the bandwidth to do this, but I guess it won't be too hard - we just need to use http://java.sun.com/j2se/1.4.2/docs/api/java/util/LinkedHashMap.html (search for LRU). Are you interested in joining force on this? implement lfu based flushing policy for map side aggregates --- Key: HIVE-224 URL: https://issues.apache.org/jira/browse/HIVE-224 Project: Hadoop Hive Issue Type: Improvement Reporter: Joydeep Sen Sarma currently we flush some random set of rows when the map side hash table approaches memory limits. we have discussed a strategy of flushing hash table entries that have the been seen the least number of times (effectively LFU flushing strategy). This will be very effective at reducing the amount of data sent from map to reduce step - as well as reduce the chances for any skews. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging
[ https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1212: - Affects Version/s: 0.6.0 Status: Patch Available (was: Open) Explicitly say Hive Internal Error to ease debugging -- Key: HIVE-1212 URL: https://issues.apache.org/jira/browse/HIVE-1212 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1212.1.patch Our users complain that hive fails error messages like FAILED: Unknown exception: null. We should explicitly mention that's an internal error of Hive, and provide more information (stacktrace) on the screen to ease bug reporting and debugging. In other cases, we will still put the detailed information (stacktrace) in the log, since users should be able to figure out what's wrong with a single line of message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging
[ https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1212: - Attachment: HIVE-1212.1.patch This also fixes UDFArgumentException reporting. Explicitly say Hive Internal Error to ease debugging -- Key: HIVE-1212 URL: https://issues.apache.org/jira/browse/HIVE-1212 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1212.1.patch Our users complain that hive fails error messages like FAILED: Unknown exception: null. We should explicitly mention that's an internal error of Hive, and provide more information (stacktrace) on the screen to ease bug reporting and debugging. In other cases, we will still put the detailed information (stacktrace) in the log, since users should be able to figure out what's wrong with a single line of message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging
[ https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1212: - Attachment: (was: HIVE-1212.1.patch) Explicitly say Hive Internal Error to ease debugging -- Key: HIVE-1212 URL: https://issues.apache.org/jira/browse/HIVE-1212 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Our users complain that hive fails error messages like FAILED: Unknown exception: null. We should explicitly mention that's an internal error of Hive, and provide more information (stacktrace) on the screen to ease bug reporting and debugging. In other cases, we will still put the detailed information (stacktrace) in the log, since users should be able to figure out what's wrong with a single line of message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1212) Explicitly say Hive Internal Error to ease debugging
[ https://issues.apache.org/jira/browse/HIVE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1212: - Attachment: HIVE-1212.1.patch Explicitly say Hive Internal Error to ease debugging -- Key: HIVE-1212 URL: https://issues.apache.org/jira/browse/HIVE-1212 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1212.1.patch Our users complain that hive fails error messages like FAILED: Unknown exception: null. We should explicitly mention that's an internal error of Hive, and provide more information (stacktrace) on the screen to ease bug reporting and debugging. In other cases, we will still put the detailed information (stacktrace) in the log, since users should be able to figure out what's wrong with a single line of message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1203) HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion
[ https://issues.apache.org/jira/browse/HIVE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao reassigned HIVE-1203: Assignee: Vladimir Klimontovich HiveInputFormat.getInputFormatFromCache swallows cause exception when trowing IOExcpetion Key: HIVE-1203 URL: https://issues.apache.org/jira/browse/HIVE-1203 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0, 0.4.1, 0.5.0 Reporter: Vladimir Klimontovich Assignee: Vladimir Klimontovich Fix For: 0.4.2, 0.5.1, 0.6.0 Attachments: 0.4.patch, 0.5.patch, trunk.patch To fix this it's simply needed to add second parameter to IOException constructor. Patches for 0.4, 0.5 and trunk are available. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1202) Unknown exception : null while join
[ https://issues.apache.org/jira/browse/HIVE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839940#action_12839940 ] Zheng Shao commented on HIVE-1202: -- {code} select * from ( select name from classes ) a join classes b where a.date_partition = '2010-02-01' AND b.date_partition = '2010-03-01'; {code} It seems with the patch, we won't do partition pruning for this case? Is that a problem? Unknown exception : null while join - Key: HIVE-1202 URL: https://issues.apache.org/jira/browse/HIVE-1202 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.4.1 Environment: hive-0.4.1 hadoop 0.19.1 Reporter: Mafish Fix For: 0.4.1 Attachments: HIVE-1202.branch-0.4.1.patch Hive throws Unknown exception : null with query: select * from ( select name from classes ) a join classes b where a.name b.number After tracing the code, I found this bug will occur with following conditions: 1. It is join operation. 2. At least one of the source of join is physical table (right side in above case). 3. With where condition and condition(s) of where clause must include columns from both side of join (a.name and b.number in case) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1207) ScriptOperator AutoProgressor does not set the interval
ScriptOperator AutoProgressor does not set the interval --- Key: HIVE-1207 URL: https://issues.apache.org/jira/browse/HIVE-1207 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Zheng Shao As title. I will show more details in the patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1207) ScriptOperator AutoProgressor does not set the interval
[ https://issues.apache.org/jira/browse/HIVE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1207: - Attachment: HIVE-1207.1.patch ScriptOperator AutoProgressor does not set the interval --- Key: HIVE-1207 URL: https://issues.apache.org/jira/browse/HIVE-1207 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1207.1.patch As title. I will show more details in the patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1207) ScriptOperator AutoProgressor does not set the interval
[ https://issues.apache.org/jira/browse/HIVE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1207: - Affects Version/s: 0.6.0 0.5.0 Status: Patch Available (was: Open) ScriptOperator AutoProgressor does not set the interval --- Key: HIVE-1207 URL: https://issues.apache.org/jira/browse/HIVE-1207 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.5.0, 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1207.1.patch As title. I will show more details in the patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1208) Bug with error cannot find ObjectInspector for VOID
Bug with error cannot find ObjectInspector for VOID - Key: HIVE-1208 URL: https://issues.apache.org/jira/browse/HIVE-1208 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao This happens when using constant null, but not when using CAST(null AS STRING). {code} explain extended FROM (select 1 as a, null as b from zshao_tt distribute by a) tmp SELECT transform(a, b) USING 'cat'; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-259: Attachment: HIVE-259.5.patch We take the method recommended by NIST. See http://en.wikipedia.org/wiki/Percentile#Alternative_methods Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839393#action_12839393 ] Zheng Shao commented on HIVE-259: - (1) I am not familiar with the exact definition of percentile function. Is the percentile()'s result must be a member of input data? See the link above. (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? this is a small and can be ignored. In the beginning of new test case, I think HashMap is better here. The reason is that the number of iterate is usually much higher than the number of unique numbers (the size of the HashMap). By using HashMap we reduce the cost of iterate. In the beginning of new test case, .. appears two times Fixed in HIVE-259.5.patch Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1201) Add a python command-line interface for Hive
[ https://issues.apache.org/jira/browse/HIVE-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839200#action_12839200 ] Zheng Shao commented on HIVE-1201: -- Yes this is a client module (using Metastore Thrift API) that we can use in Python interpreter. Add a python command-line interface for Hive Key: HIVE-1201 URL: https://issues.apache.org/jira/browse/HIVE-1201 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Venky Iyer Venky has a nice python command-line interface for Hive. It uses thrift API to talk with metastore. It uses hadoop command line to submit jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1197) create a new input format where a mapper spans a file
[ https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838763#action_12838763 ] Zheng Shao commented on HIVE-1197: -- Can you explain what does a mapper spans a file mean? create a new input format where a mapper spans a file - Key: HIVE-1197 URL: https://issues.apache.org/jira/browse/HIVE-1197 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Siying Dong Fix For: 0.6.0 This will be needed for Sort merge joins. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-259: Attachment: HIVE-259.4.patch This one fixes all checkstyle errors, and uses *Writable classes to avoid creating new objects as much as possible. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1137) build references IVY_HOME incorrectly
[ https://issues.apache.org/jira/browse/HIVE-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1137: - Resolution: Fixed Release Note: HIVE-1137. Fix build.xml for references to IVY_HOME. (Carl Steinbach via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Carl! build references IVY_HOME incorrectly - Key: HIVE-1137 URL: https://issues.apache.org/jira/browse/HIVE-1137 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Carl Steinbach Fix For: 0.6.0 Attachments: HIVE-1137.patch The build references env.IVY_HOME, but doesn't actually import env as it should (via property environment=env/). It's not clear what the IVY_HOME reference is for since the build doesn't even use ivy.home (instead, it installs under the build/ivy directory). It looks like someone copied bits and pieces from the Automatically section here: http://ant.apache.org/ivy/history/latest-milestone/install.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition
Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition -- Key: HIVE-1200 URL: https://issues.apache.org/jira/browse/HIVE-1200 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.5.1, 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao The CombineHiveInputFormat does not work with multi-level of directories in a single table/partition, because it uses an exact match logic, instead of the relativize logic as in MapOperator {code} MapOperator.java: if (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1032) Better Error Messages for Execution Errors
[ https://issues.apache.org/jira/browse/HIVE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1032: - Resolution: Fixed Fix Version/s: 0.6.0 Release Note: HIVE-1032. Better Error Messages for Execution Errors. (Paul Yang via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Paul! Better Error Messages for Execution Errors -- Key: HIVE-1032 URL: https://issues.apache.org/jira/browse/HIVE-1032 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Fix For: 0.6.0 Attachments: HIVE-1032.1.patch, HIVE-1032.2.patch, HIVE-1032.3.patch, HIVE-1032.4.patch, HIVE-1032.5.patch, HIVE-1032.6.patch Three common errors that occur during execution are: 1. Map-side group-by causing an out of memory exception due to large aggregation hash tables 2. ScriptOperator failing due to the user's script throwing an exception or otherwise returning a non-zero error code 3. Incorrectly specifying the join order of small and large tables, causing the large table to be loaded into memory and producing an out of memory exception. These errors are typically discovered by manually examining the error log files of the failed task. This task proposes to create a feature that would automatically read the error logs and output a probable cause and solution to the command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1201) Add a python command-line interface for Hive
Add a python command-line interface for Hive Key: HIVE-1201 URL: https://issues.apache.org/jira/browse/HIVE-1201 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao Assignee: Venky Iyer Venky has a nice python command-line interface for Hive. It uses thrift API to talk with metastore. It uses hadoop command line to submit jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition
[ https://issues.apache.org/jira/browse/HIVE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1200: - Attachment: HIVE-1200.1.branch-0.5.patch HIVE-1200.1.patch Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition -- Key: HIVE-1200 URL: https://issues.apache.org/jira/browse/HIVE-1200 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.5.1, 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1200.1.branch-0.5.patch, HIVE-1200.1.patch The CombineHiveInputFormat does not work with multi-level of directories in a single table/partition, because it uses an exact match logic, instead of the relativize logic as in MapOperator {code} MapOperator.java: if (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1200) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition
[ https://issues.apache.org/jira/browse/HIVE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1200: - Status: Patch Available (was: Open) Fix CombineHiveInputFormat to work with multi-level of directories in a single table/partition -- Key: HIVE-1200 URL: https://issues.apache.org/jira/browse/HIVE-1200 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.5.1, 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1200.1.branch-0.5.patch, HIVE-1200.1.patch The CombineHiveInputFormat does not work with multi-level of directories in a single table/partition, because it uses an exact match logic, instead of the relativize logic as in MapOperator {code} MapOperator.java: if (!onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri())) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838718#action_12838718 ] Zheng Shao commented on HIVE-259: - Hi Jerome, using ArrayListInteger won't cause unnecessary Object creation. We will just create a single ArrayListInteger and use it forever. Does that make sense? Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1193) ensure sorting properties for a table
[ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838737#action_12838737 ] Zheng Shao commented on HIVE-1193: -- Can we have some more description on the JIRA? The patch contains 2 properties: enforceBucketing and enforceSorting. But I don't see it from the JIRA. 1. How do we make sure that the data is bucketed / sorted? By adding an additional map-reduce job? 2. What if the user already specified CLUSTER BY key in his query? 3. Do we disable merging of small files when we do this? ensure sorting properties for a table - Key: HIVE-1193 URL: https://issues.apache.org/jira/browse/HIVE-1193 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.6.0 Attachments: hive.1193.1.patch If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations. This cannot be made the default due to backward compatibility, but an option can be added for the same -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1195) Increase ObjectInspector[] length on demand
Increase ObjectInspector[] length on demand --- Key: HIVE-1195 URL: https://issues.apache.org/jira/browse/HIVE-1195 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.5.0, 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao {code} Operator.java protected transient ObjectInspector[] inputObjInspectors = new ObjectInspector[Short.MAX_VALUE]; {code} An array of 32K elements takes 256KB memory under 64-bit Java. We are seeing hive client going out of memory because of that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1195) Increase ObjectInspector[] length on demand
[ https://issues.apache.org/jira/browse/HIVE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1195: - Attachment: HIVE-1195.1.patch Increase ObjectInspector[] length on demand --- Key: HIVE-1195 URL: https://issues.apache.org/jira/browse/HIVE-1195 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.5.0, 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1195.1.patch {code} Operator.java protected transient ObjectInspector[] inputObjInspectors = new ObjectInspector[Short.MAX_VALUE]; {code} An array of 32K elements takes 256KB memory under 64-bit Java. We are seeing hive client going out of memory because of that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1195) Increase ObjectInspector[] length on demand
[ https://issues.apache.org/jira/browse/HIVE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1195: - Fix Version/s: 0.6.0 0.5.1 Status: Patch Available (was: Open) Increase ObjectInspector[] length on demand --- Key: HIVE-1195 URL: https://issues.apache.org/jira/browse/HIVE-1195 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.5.0, 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Fix For: 0.5.1, 0.6.0 Attachments: HIVE-1195.1.patch {code} Operator.java protected transient ObjectInspector[] inputObjInspectors = new ObjectInspector[Short.MAX_VALUE]; {code} An array of 32K elements takes 256KB memory under 64-bit Java. We are seeing hive client going out of memory because of that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1195) Increase ObjectInspector[] length on demand
[ https://issues.apache.org/jira/browse/HIVE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1195: - Attachment: HIVE-1195.2.patch HIVE-1195.2.branch-0.5.patch Fixed an obvious bug which caused unit test failures. Increase ObjectInspector[] length on demand --- Key: HIVE-1195 URL: https://issues.apache.org/jira/browse/HIVE-1195 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.5.0, 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Fix For: 0.5.1, 0.6.0 Attachments: HIVE-1195-branch-0.5.patch, HIVE-1195.1.patch, HIVE-1195.2.branch-0.5.patch, HIVE-1195.2.patch {code} Operator.java protected transient ObjectInspector[] inputObjInspectors = new ObjectInspector[Short.MAX_VALUE]; {code} An array of 32K elements takes 256KB memory under 64-bit Java. We are seeing hive client going out of memory because of that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838118#action_12838118 ] Zheng Shao commented on HIVE-259: - Also see http://wiki.apache.org/hadoop/Hive/HowToContribute#Coding_Convention Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838119#action_12838119 ] Zheng Shao commented on HIVE-259: - The test cases looks a bit too trivial or the results have problems? They always return the same number for the 3 different percentile values. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838120#action_12838120 ] Zheng Shao commented on HIVE-1194: -- Why does SortMergeJoinOperator extends MapJoinOperator? It seems to me that SortMergeJoinOperator does NOTneed the in-memory/disk-backed HashMap that MapJoinOperator has, correct? sorted merge join - Key: HIVE-1194 URL: https://issues.apache.org/jira/browse/HIVE-1194 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: He Yongqiang Fix For: 0.6.0 If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table. This can lead to substantial cpu savings - this needs to work across bucketed map joins also. Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1194) sorted merge join
[ https://issues.apache.org/jira/browse/HIVE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838132#action_12838132 ] Zheng Shao commented on HIVE-1194: -- If it does not inherit any methods, shall we add an AbstractMapJoinOperator as the common parent? That AbstractMapJoinOperator can be converted to MapJoinOperator (or HashBasedMapJoinOperator, to be accurate) or SortMergeJoinOperator depending on the configuration/table properties. sorted merge join - Key: HIVE-1194 URL: https://issues.apache.org/jira/browse/HIVE-1194 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: He Yongqiang Fix For: 0.6.0 If the input tables are sorted on the join key, and a mapjoin is being performed, it is useful to exploit the sorted properties of the table. This can lead to substantial cpu savings - this needs to work across bucketed map joins also. Since, sorted properties of a table are not enforced currently, a new parameter can be added to specify to use the sort-merge join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1189) Add package-info.java to Hive
[ https://issues.apache.org/jira/browse/HIVE-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838148#action_12838148 ] Zheng Shao commented on HIVE-1189: -- I am checking the BuildVersion which contains everything. I need to think of a way to do a negative test. Add package-info.java to Hive - Key: HIVE-1189 URL: https://issues.apache.org/jira/browse/HIVE-1189 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Fix For: 0.6.0 Attachments: HIVE-1189.1.patch Hadoop automatically generates build/src/org/apache/hadoop/package-info.java with information like this: {code} /* * Generated by src/saveVersion.sh */ @HadoopVersionAnnotation(version=0.20.2-dev, revision=826568, user=zshao, date=Sun Oct 18 17:46:56 PDT 2009, url=http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20;) package org.apache.hadoop; {code} Hive should do the same thing so that we can easily know the version of the code at runtime. This will help us identify whether we are still running the same version of Hive, if we serialize the plan and later continue the execution (See HIVE-1100). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1032) Better Error Messages for Execution Errors
[ https://issues.apache.org/jira/browse/HIVE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838156#action_12838156 ] Zheng Shao commented on HIVE-1032: -- That makes sense to me. As long as it's compilable with 0.17 it should be OK. Sorry there is another last thing :) Can you run ant checkstyle and fix the checkstyle warnings introduced by this patch (especially in the new files). Better Error Messages for Execution Errors -- Key: HIVE-1032 URL: https://issues.apache.org/jira/browse/HIVE-1032 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1032.1.patch, HIVE-1032.2.patch, HIVE-1032.3.patch, HIVE-1032.4.patch, HIVE-1032.5.patch Three common errors that occur during execution are: 1. Map-side group-by causing an out of memory exception due to large aggregation hash tables 2. ScriptOperator failing due to the user's script throwing an exception or otherwise returning a non-zero error code 3. Incorrectly specifying the join order of small and large tables, causing the large table to be loaded into memory and producing an out of memory exception. These errors are typically discovered by manually examining the error log files of the failed task. This task proposes to create a feature that would automatically read the error logs and output a probable cause and solution to the command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1184) Expression Not In Group By Key error is sometimes masked
[ https://issues.apache.org/jira/browse/HIVE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-1184: - Status: Open (was: Patch Available) Expression Not In Group By Key error is sometimes masked Key: HIVE-1184 URL: https://issues.apache.org/jira/browse/HIVE-1184 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1184.1.patch Depending on the order of expressions, the error message for a expression not in group key is not displayed; instead it is null. {code} hive select concat(value, concat(value)) from src group by concat(value); FAILED: Error in semantic analysis: null hive select concat(concat(value), value) from src group by concat(value); FAILED: Error in semantic analysis: line 1:29 Expression Not In Group By Key value {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1184) Expression Not In Group By Key error is sometimes masked
[ https://issues.apache.org/jira/browse/HIVE-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837476#action_12837476 ] Zheng Shao commented on HIVE-1184: -- The explanation looks good to me, but I am not convinced the solution will solve the problem. When processing concat(value, concat(value)). we will set error when processing the first value, then overwrite the error when processing the second value, correct? I think the error should be part of the return value of the process function, instead of a global field in the context. Does that make sense? Expression Not In Group By Key error is sometimes masked Key: HIVE-1184 URL: https://issues.apache.org/jira/browse/HIVE-1184 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang Attachments: HIVE-1184.1.patch Depending on the order of expressions, the error message for a expression not in group key is not displayed; instead it is null. {code} hive select concat(value, concat(value)) from src group by concat(value); FAILED: Error in semantic analysis: null hive select concat(concat(value), value) from src group by concat(value); FAILED: Error in semantic analysis: line 1:29 Expression Not In Group By Key value {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.