[jira] [Updated] (HIVE-4345) Pushing down query conditions to support on-the-fly filtering at file parsing
[ https://issues.apache.org/jira/browse/HIVE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifeng Geng updated HIVE-4345: -- Attachment: hive-0.10.0.patch2 Pushing down query conditions to support on-the-fly filtering at file parsing - Key: HIVE-4345 URL: https://issues.apache.org/jira/browse/HIVE-4345 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.10.0 Reporter: Yifeng Geng Labels: patch Fix For: 0.10.0 Serialize predicate conditions in query plan to MapredWork class, so the FileFormat class can use the conditions to do on-the-fly filtering on the files. It can improve the performance a lot for processsing certain binary files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4345) Pushing down query conditions to support on-the-fly filtering at file parsing
[ https://issues.apache.org/jira/browse/HIVE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifeng Geng updated HIVE-4345: -- Attachment: (was: hive-0.10.0.patch2) Pushing down query conditions to support on-the-fly filtering at file parsing - Key: HIVE-4345 URL: https://issues.apache.org/jira/browse/HIVE-4345 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.10.0 Reporter: Yifeng Geng Labels: patch Fix For: 0.10.0 Serialize predicate conditions in query plan to MapredWork class, so the FileFormat class can use the conditions to do on-the-fly filtering on the files. It can improve the performance a lot for processsing certain binary files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4345) Pushing down query conditions to support on-the-fly filtering at file parsing
[ https://issues.apache.org/jira/browse/HIVE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifeng Geng updated HIVE-4345: -- Status: Patch Available (was: Open) Pushing down query conditions to support on-the-fly filtering at file parsing - Key: HIVE-4345 URL: https://issues.apache.org/jira/browse/HIVE-4345 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.10.0 Reporter: Yifeng Geng Labels: patch Fix For: 0.10.0 Serialize predicate conditions in query plan to MapredWork class, so the FileFormat class can use the conditions to do on-the-fly filtering on the files. It can improve the performance a lot for processsing certain binary files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4345) Pushing down query conditions to support on-the-fly filtering at file parsing
[ https://issues.apache.org/jira/browse/HIVE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifeng Geng updated HIVE-4345: -- Attachment: HIVE-4345.patch Pushing down query conditions to support on-the-fly filtering at file parsing - Key: HIVE-4345 URL: https://issues.apache.org/jira/browse/HIVE-4345 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.10.0 Reporter: Yifeng Geng Labels: patch Fix For: 0.10.0 Attachments: HIVE-4345.patch Serialize predicate conditions in query plan to MapredWork class, so the FileFormat class can use the conditions to do on-the-fly filtering on the files. It can improve the performance a lot for processsing certain binary files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4345) Pushing down query conditions to support on-the-fly filtering at file parsing
[ https://issues.apache.org/jira/browse/HIVE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifeng Geng updated HIVE-4345: -- Description: Serialize predicate conditions in query plan to MapredWork class, so the FileFormat class can use the conditions to do on-the-fly filtering on the files. It can improve the performance a lot for processsing certain binary files(NetCDF files for example). (was: Serialize predicate conditions in query plan to MapredWork class, so the FileFormat class can use the conditions to do on-the-fly filtering on the files. It can improve the performance a lot for processsing certain binary files.) Pushing down query conditions to support on-the-fly filtering at file parsing - Key: HIVE-4345 URL: https://issues.apache.org/jira/browse/HIVE-4345 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.10.0 Reporter: Yifeng Geng Labels: patch Fix For: 0.10.0 Attachments: HIVE-4345.patch Serialize predicate conditions in query plan to MapredWork class, so the FileFormat class can use the conditions to do on-the-fly filtering on the files. It can improve the performance a lot for processsing certain binary files(NetCDF files for example). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4345) Pushing down query conditions to support on-the-fly filtering at the file parsing
[ https://issues.apache.org/jira/browse/HIVE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifeng Geng updated HIVE-4345: -- Summary: Pushing down query conditions to support on-the-fly filtering at the file parsing (was: Pushing down query conditions to support on-the-fly filtering at file parsing) Pushing down query conditions to support on-the-fly filtering at the file parsing - Key: HIVE-4345 URL: https://issues.apache.org/jira/browse/HIVE-4345 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.10.0 Reporter: Yifeng Geng Labels: patch Fix For: 0.10.0 Attachments: HIVE-4345.patch Serialize predicate conditions in query plan to MapredWork class, so the FileFormat class can use the conditions to do on-the-fly filtering on the files. It can improve the performance a lot for processsing certain binary files(NetCDF files for example). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4294) Single sourced multi query cannot handle lateral view
[ https://issues.apache.org/jira/browse/HIVE-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4294: - Attachment: hive.4294.3.patch Single sourced multi query cannot handle lateral view - Key: HIVE-4294 URL: https://issues.apache.org/jira/browse/HIVE-4294 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: hive.4294.3.patch, HIVE-4294.D10161.1.patch, HIVE-4294.D10161.2.patch For example, {noformat} hive explain from src select key, C lateral view explode(array(key, value)) A as C; FAILED: ParseException line 3:22 missing EOF at 'view' near 'lateral' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4280) TestRetryingHMSHandler is failing on trunk.
[ https://issues.apache.org/jira/browse/HIVE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4280: - Attachment: HIVE-4280-2.patch.txt HIVE-4280-1.patch.txt I uploaded two patches. HIVE-4280-2.patch.txt is [~ashutoshc]'s suggestion. In HIVE-4280-1.patch.txt, other database names are changed, too. Both of them passed tests. TestRetryingHMSHandler is failing on trunk. --- Key: HIVE-4280 URL: https://issues.apache.org/jira/browse/HIVE-4280 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Ashutosh Chauhan Assignee: Teddy Choi Attachments: HIVE-4280-1.patch.txt, HIVE-4280-2.patch.txt Newly added testcase TestRetryingHMSHandler fails on trunk. https://builds.apache.org/job/Hive-trunk-h0.21/2040/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4280) TestRetryingHMSHandler is failing on trunk.
[ https://issues.apache.org/jira/browse/HIVE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4280: - Fix Version/s: 0.11.0 Status: Patch Available (was: Open) TestRetryingHMSHandler is failing on trunk. --- Key: HIVE-4280 URL: https://issues.apache.org/jira/browse/HIVE-4280 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Ashutosh Chauhan Assignee: Teddy Choi Fix For: 0.11.0 Attachments: HIVE-4280-1.patch.txt, HIVE-4280-2.patch.txt Newly added testcase TestRetryingHMSHandler fails on trunk. https://builds.apache.org/job/Hive-trunk-h0.21/2040/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4241) optimize hive.enforce.sorting and hive.enforce bucketing join
[ https://issues.apache.org/jira/browse/HIVE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629901#comment-13629901 ] Hudson commented on HIVE-4241: -- Integrated in Hive-trunk-h0.21 #2058 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2058/]) HIVE-4241 optimize hive.enforce.sorting and hive.enforce bucketing join (Namit Jain via Gang Tim Liu) (Revision 1467174) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467174 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_1.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_2.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_3.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_4.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_5.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_6.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_7.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_8.q * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_3.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_5.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_7.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_8.q.out optimize hive.enforce.sorting and hive.enforce bucketing join - Key: HIVE-4241 URL: https://issues.apache.org/jira/browse/HIVE-4241 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.4241.1.patch, hive.4241.1.patch-nohcat, hive.4241.2.patch-nohcat, hive.4241.3.patch, hive.4241.4.patch Consider the following scenario: T1: sorted and bucketed by key into 2 buckets T2: sorted and bucketed by key into 2 buckets T3: sorted and bucketed by key into 2 buckets set hive.enforce.sorting=true; set hive.enforce.bucketing=true; insert overwrite table T3 select .. from T1 join T2 on T1.key = T2.key; Since T1, T2 and T3 are sorted/bucketed by the join, and the above join is being performed as a sort-merge join, T3 should be bucketed/sorted without the need for an extra reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4241) optimize hive.enforce.sorting and hive.enforce bucketing join
[ https://issues.apache.org/jira/browse/HIVE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629900#comment-13629900 ] Hudson commented on HIVE-4241: -- Integrated in Hive-trunk-hadoop2 #153 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/153/]) HIVE-4241 optimize hive.enforce.sorting and hive.enforce bucketing join (Namit Jain via Gang Tim Liu) (Revision 1467174) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467174 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_1.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_2.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_3.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_4.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_5.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_6.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_7.q * /hive/trunk/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_8.q * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_1.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_2.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_3.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_5.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_6.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_7.q.out * /hive/trunk/ql/src/test/results/clientpositive/bucketsortoptimize_insert_8.q.out optimize hive.enforce.sorting and hive.enforce bucketing join - Key: HIVE-4241 URL: https://issues.apache.org/jira/browse/HIVE-4241 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.4241.1.patch, hive.4241.1.patch-nohcat, hive.4241.2.patch-nohcat, hive.4241.3.patch, hive.4241.4.patch Consider the following scenario: T1: sorted and bucketed by key into 2 buckets T2: sorted and bucketed by key into 2 buckets T3: sorted and bucketed by key into 2 buckets set hive.enforce.sorting=true; set hive.enforce.bucketing=true; insert overwrite table T3 select .. from T1 join T2 on T1.key = T2.key; Since T1, T2 and T3 are sorted/bucketed by the join, and the above join is being performed as a sort-merge join, T3 should be bucketed/sorted without the need for an extra reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4294) Single sourced multi query cannot handle lateral view
[ https://issues.apache.org/jira/browse/HIVE-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4294: - Attachment: hive.4294.4.patch Single sourced multi query cannot handle lateral view - Key: HIVE-4294 URL: https://issues.apache.org/jira/browse/HIVE-4294 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: hive.4294.3.patch, hive.4294.4.patch, HIVE-4294.D10161.1.patch, HIVE-4294.D10161.2.patch For example, {noformat} hive explain from src select key, C lateral view explode(array(key, value)) A as C; FAILED: ParseException line 3:22 missing EOF at 'view' near 'lateral' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4294) Single sourced multi query cannot handle lateral view
[ https://issues.apache.org/jira/browse/HIVE-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4294: - Resolution: Fixed Fix Version/s: 0.11.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks Navis Single sourced multi query cannot handle lateral view - Key: HIVE-4294 URL: https://issues.apache.org/jira/browse/HIVE-4294 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.11.0 Attachments: hive.4294.3.patch, hive.4294.4.patch, HIVE-4294.D10161.1.patch, HIVE-4294.D10161.2.patch For example, {noformat} hive explain from src select key, C lateral view explode(array(key, value)) A as C; FAILED: ParseException line 3:22 missing EOF at 'view' near 'lateral' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3891) physical optimizer changes for auto sort-merge join
[ https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3891: - Attachment: hive.3891.10.patch physical optimizer changes for auto sort-merge join --- Key: HIVE-3891 URL: https://issues.apache.org/jira/browse/HIVE-3891 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: auto_sortmerge_join_1.q, auto_sortmerge_join_1.q.out, hive.3891.10.patch, hive.3891.1.patch, hive.3891.2.patch, hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, hive.3891.7.patch, HIVE-3891_8.patch, hive.3891.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4294) Single sourced multi query cannot handle lateral view
[ https://issues.apache.org/jira/browse/HIVE-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629933#comment-13629933 ] Hudson commented on HIVE-4294: -- Integrated in Hive-trunk-h0.21 #2059 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2059/]) HIVE-4294 Single sourced multi query cannot handle lateral view (Navis via namit) (Revision 1467196) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467196 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_lateral_view.q * /hive/trunk/ql/src/test/results/clientpositive/multi_insert_lateral_view.q.out Single sourced multi query cannot handle lateral view - Key: HIVE-4294 URL: https://issues.apache.org/jira/browse/HIVE-4294 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.11.0 Attachments: hive.4294.3.patch, hive.4294.4.patch, HIVE-4294.D10161.1.patch, HIVE-4294.D10161.2.patch For example, {noformat} hive explain from src select key, C lateral view explode(array(key, value)) A as C; FAILED: ParseException line 3:22 missing EOF at 'view' near 'lateral' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4294) Single sourced multi query cannot handle lateral view
[ https://issues.apache.org/jira/browse/HIVE-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629936#comment-13629936 ] Hudson commented on HIVE-4294: -- Integrated in Hive-trunk-hadoop2 #154 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/154/]) HIVE-4294 Single sourced multi query cannot handle lateral view (Navis via namit) (Revision 1467196) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467196 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_lateral_view.q * /hive/trunk/ql/src/test/results/clientpositive/multi_insert_lateral_view.q.out Single sourced multi query cannot handle lateral view - Key: HIVE-4294 URL: https://issues.apache.org/jira/browse/HIVE-4294 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.11.0 Attachments: hive.4294.3.patch, hive.4294.4.patch, HIVE-4294.D10161.1.patch, HIVE-4294.D10161.2.patch For example, {noformat} hive explain from src select key, C lateral view explode(array(key, value)) A as C; FAILED: ParseException line 3:22 missing EOF at 'view' near 'lateral' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-hadoop2 - Build # 154 - Still Failing
Changes for Build #138 [namit] HIVE-4289 HCatalog build fails when behind a firewall (Samuel Yuan via namit) [namit] HIVE-4281 add hive.map.groupby.sorted.testmode (Namit via Gang Tim Liu) [hashutosh] Moving hcatalog site outside of trunk [hashutosh] Moving hcatalog branches outside of trunk [hashutosh] HIVE-4259 : SEL operator created with missing columnExprMap for unions (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4156 : need to add protobuf classes to hive-exec.jar (Owen Omalley via Ashutosh Chauhan) [hashutosh] HIVE-3464 : Merging join tree may reorder joins which could be invalid (Navis via Ashutosh Chauhan) [hashutosh] HIVE-4138 : ORC's union object inspector returns a type name that isn't parseable by TypeInfoUtils (Owen Omalley via Ashutosh Chauhan) [cws] HIVE-4119. ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty (Shreepadma Venugopalan via cws) [hashutosh] HIVE-4252 : hiveserver2 string representation of complex types are inconsistent with cli (Thejas Nair via Ashutosh Chauhan) [hashutosh] HIVE-4179 : NonBlockingOpDeDup does not merge SEL operators correctly (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4269 : fix handling of binary type in hiveserver2, jdbc driver (Thejas Nair via Ashutosh Chauhan) [namit] HIVE-4174 Round UDF converts BigInts to double (Chen Chun via namit) [namit] HIVE-4240 optimize hive.enforce.bucketing and hive.enforce sorting insert (Gang Tim Liu via namit) [navis] HIVE-4288 Add IntelliJ project files files to .gitignore (Roshan Naik via Navis) [namit] HIVE-4272 partition wise metadata does not work for text files [hashutosh] HIVE-896 : Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. (Harish Butani via Ashutosh Chauhan) [namit] HIVE-4260 union_remove_12, union_remove_13 are failing on hadoop2 (Gunther Hagleitner via namit) [hashutosh] HIVE-3951 : Allow Decimal type columns in Regex Serde (Mark Grover via Ashutosh Chauhan) [namit] HIVE-4270 bug in hive.map.groupby.sorted in the presence of multiple input partitions (Namit via Gang Tim Liu) [hashutosh] HIVE-3850 : hour() function returns 12 hour clock value when using timestamp datatype (Anandha and Franklin via Ashutosh Chauhan) [hashutosh] HIVE-4122 : Queries fail if timestamp data not in expected format (Prasad Mujumdar via Ashutosh Chauhan) [hashutosh] HIVE-4170 : [REGRESSION] FsShell.close closes filesystem, removing temporary directories (Navis via Ashutosh Chauhan) [gates] HIVE-4264 Moved hcatalog trunk code up to hive/trunk/hcatalog [hashutosh] HIVE-4263 : Adjust build.xml package command to move all hcat jars and binaries into build (Alan Gates via Ashutosh Chauhan) [namit] HIVE-4258 Log logical plan tree for debugging (Navis via namit) [navis] HIVE-2264 Hive server is SHUTTING DOWN when invalid queries beeing executed [kevinwilfong] HIVE-4235. CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists. (Gang Tim Liu via kevinwilfong) [gangtimliu] HIVE-4157: ORC runs out of heap when writing (Kevin Wilfong vi Gang Tim Liu) [gangtimliu] HIVE-4155: Expose ORC's FileDump as a service [gangtimliu] HIVE-4159:RetryingHMSHandler doesn't retry in enough cases (Kevin Wilfong vi Gang Tim Liu) [namit] HIVE-4149 wrong results big outer joins with array of ints (Navis via namit) [namit] HIVE-3958 support partial scan for analyze command - RCFile (Gang Tim Liu via namit) [gates] Removing old branches to limit size of Hive downloads. [gates] Removing tags directory as we no longer need them and they're in the history. [gates] Moving HCatalog into Hive. [gates] Test that perms work for hcatalog [hashutosh] HIVE-4007 : Create abstract classes for serializer and deserializer (Namit Jain via Ashutosh Chauhan) [hashutosh] HIVE-3381 : Result of outer join is not valid (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3980 : Cleanup after 3403 (Namit Jain via Ashutosh Chauhan) [hashutosh] HIVE-4042 : ignore mapjoin hint (Namit Jain via Ashutosh Chauhan) [namit] HIVE-3348 semi-colon in comments in .q file does not work (Nick Collins via namit) [namit] HIVE-4212 sort merge join should work for outer joins for more than 8 inputs (Namit via Gang Tim Liu) [namit] HIVE-4219 explain dependency does not capture the input table (Namit via Gang Tim Liu) [kevinwilfong] HIVE-4092. Store complete names of tables in column access analyzer (Samuel Yuan via kevinwilfong) [namit] HIVE-4208 Clientpositive test parenthesis_star_by is non-deteministic (Mark Grover via namit) [cws] HIVE-4217. Fix show_create_table_*.q test failures (Carl Steinbach via cws) [namit] HIVE-4206 Sort merge join does not work for outer joins for 7 inputs (Namit via Gang Tim Liu) [kevinwilfong] HIVE-4188. TestJdbcDriver2.testDescribeTable failing consistently. (Prasad Mujumdar via kevinwilfong) [hashutosh] HIVE-3820 Consider creating a literal like D or BD for representing Decimal type constants (Gunther Hagleitner
[jira] [Updated] (HIVE-3891) physical optimizer changes for auto sort-merge join
[ https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3891: - Attachment: hive.3891.11.patch physical optimizer changes for auto sort-merge join --- Key: HIVE-3891 URL: https://issues.apache.org/jira/browse/HIVE-3891 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: auto_sortmerge_join_1.q, auto_sortmerge_join_1.q.out, hive.3891.10.patch, hive.3891.11.patch, hive.3891.1.patch, hive.3891.2.patch, hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, hive.3891.7.patch, HIVE-3891_8.patch, hive.3891.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4346) when writing data into filesystem from queries ,the output files could contain a line of column names
caofangkun created HIVE-4346: Summary: when writing data into filesystem from queries ,the output files could contain a line of column names Key: HIVE-4346 URL: https://issues.apache.org/jira/browse/HIVE-4346 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: caofangkun Priority: Minor For example : hivedesc src; key string value string hiveselect * from src; 1 10 2 20 hiveset hive.output.contain.columnnames=true; hiveinsert overwrite local directory './test1' select * from src ; hive!cat './test1/00_0'; key^Avalue 1^A10 2^A20 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4167) Hive converts bucket map join to SMB join even when tables are not sorted
[ https://issues.apache.org/jira/browse/HIVE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain reassigned HIVE-4167: Assignee: Namit Jain (was: Vikram Dixit K) Hive converts bucket map join to SMB join even when tables are not sorted - Key: HIVE-4167 URL: https://issues.apache.org/jira/browse/HIVE-4167 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Namit Jain Priority: Blocker Attachments: HIVE-4167.patch If tables are just bucketed but not sorted, we are generating smb join operator. This results in loss of rows in queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4167) Hive converts bucket map join to SMB join even when tables are not sorted
[ https://issues.apache.org/jira/browse/HIVE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629948#comment-13629948 ] Namit Jain commented on HIVE-4167: -- I was able to reproduce it: CREATE TABLE bucket_small (key string, value string) partitioned by (ds string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; load data local inpath '../data/files/smallsrcsortbucket1outof4.txt' INTO TABLE bucket_small partition(ds='2008-04-08'); load data local inpath '../data/files/smallsrcsortbucket2outof4.txt' INTO TABLE bucket_small partition(ds='2008-04-08'); CREATE TABLE bucket_big (key string, value string) partitioned by (ds string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE; load data local inpath '../data/files/srcsortbucket1outof4.txt' INTO TABLE bucket_big partition(ds='2008-04-08'); load data local inpath '../data/files/srcsortbucket2outof4.txt' INTO TABLE bucket_big partition(ds='2008-04-08'); load data local inpath '../data/files/srcsortbucket3outof4.txt' INTO TABLE bucket_big partition(ds='2008-04-08'); load data local inpath '../data/files/srcsortbucket4outof4.txt' INTO TABLE bucket_big partition(ds='2008-04-08'); load data local inpath '../data/files/srcsortbucket1outof4.txt' INTO TABLE bucket_big partition(ds='2008-04-09'); load data local inpath '../data/files/srcsortbucket2outof4.txt' INTO TABLE bucket_big partition(ds='2008-04-09'); load data local inpath '../data/files/srcsortbucket3outof4.txt' INTO TABLE bucket_big partition(ds='2008-04-09'); load data local inpath '../data/files/srcsortbucket4outof4.txt' INTO TABLE bucket_big partition(ds='2008-04-09'); set hive.auto.convert.join=true; set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; -- Since size is being used to find the big table, the order of the tables in the join does not matter explain extended select count(*) FROM bucket_small a JOIN bucket_big b ON a.key = b.key; select count(*) FROM bucket_small a JOIN bucket_big b ON a.key = b.key; explain extended select count(*) FROM bucket_big a JOIN bucket_small b ON a.key = b.key; select count(*) FROM bucket_big a JOIN bucket_small b ON a.key = b.key; Hive converts bucket map join to SMB join even when tables are not sorted - Key: HIVE-4167 URL: https://issues.apache.org/jira/browse/HIVE-4167 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Namit Jain Priority: Blocker Attachments: HIVE-4167.patch If tables are just bucketed but not sorted, we are generating smb join operator. This results in loss of rows in queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4167) Hive converts bucket map join to SMB join even when tables are not sorted
[ https://issues.apache.org/jira/browse/HIVE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629956#comment-13629956 ] Vikram Dixit K commented on HIVE-4167: -- Hi [~namit] I have a rebased patch on trunk. I was trying to produce a test using the tables available in the unit tests. Can I use the test you have provided in this jira? Thanks Vikram. Hive converts bucket map join to SMB join even when tables are not sorted - Key: HIVE-4167 URL: https://issues.apache.org/jira/browse/HIVE-4167 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Namit Jain Priority: Blocker Attachments: HIVE-4167.patch If tables are just bucketed but not sorted, we are generating smb join operator. This results in loss of rows in queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629957#comment-13629957 ] Gunther Hagleitner commented on HIVE-4318: -- [~kevinwilfong]: Here are the additional numbers. Summary: You were right about counters having a significant effect despite the flag, but OperatorHooks are definitely expensive too. All tests were run on EC2, single node setup. I used ~3m rows, single table, stored in rc file. Query was count\(*\) with a simple not very selective where clause. I've ran each different build 5 times and averaged the last 3 runs. There was little difference between the runs. Hive.task.progress was off in all runs, no actual operator hooks were installed. I've also tested both removing counters and a fixed version of counters. The fixed version places the check for the flag at the right place to avoid unnecessary calls to System.currentTimeMillis(), as well as unnecessary counting of the rows, etc. Numbers: {noformat} Current trunk: 44.5 seconds Fix for counters, unchanged operator hooks: 33.5 seconds (Kevin, that's the run you asked for) Fix for counters, removal of operator hooks: 29.3 seconds Removal of both operator hooks and counters completely: 27.9 seconds {noformat} Proposal: - Remove operator hooks and backport to 0.11 branch. That's a regression that was introduced between 0.10 and 0.11, I believe. - Remove profiler for now and backport to 0.11 branch. Profiler doesn't work without operator hooks right now. I'll open a jira to re-introduce profiler in a way that doesn't add any code to the inner loop (maybe hidden behind static final var that is false, so compiler removes it). - Counters: Change this patch to include my fix for counters and backport to 0.11. This gives us a significant boost, but isn't a regression from the last version. I'll open a jira to dig deeper and see if we can get even closer to the result with the counters completely removed. How does that sound? OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join
[ https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-3996: - Status: Patch Available (was: Open) Correctly enforce the memory limit on the multi-table map-join -- Key: HIVE-3996 URL: https://issues.apache.org/jira/browse/HIVE-3996 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996_4.patch, HIVE-3996_5.patch, HIVE-3996_6.patch, HIVE-3996_7.patch, HIVE-3996_8.patch, HIVE-3996_9.patch, HIVE-3996.patch Currently with HIVE-3784, the joins are converted to map-joins based on checks of the table size against the config variable: hive.auto.convert.join.noconditionaltask.size. However, the current implementation will also merge multiple mapjoin operators into a single task regardless of whether the sum of the table sizes will exceed the configured value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-446) Implement TRUNCATE
[ https://issues.apache.org/jira/browse/HIVE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629960#comment-13629960 ] caofangkun commented on HIVE-446: - Hi ALL: Whether it is necessary to enhance the syntax like TRUNCATE TABLE srcpart_truncate PARTITION (dt='201130412') FORCE; to remove data from EXTERNAL table ? Implement TRUNCATE -- Key: HIVE-446 URL: https://issues.apache.org/jira/browse/HIVE-446 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Prasad Chakka Assignee: Navis Fix For: 0.11.0 Attachments: HIVE-446.D7371.1.patch, HIVE-446.D7371.2.patch, HIVE-446.D7371.3.patch, HIVE-446.D7371.4.patch truncate the data but leave the table and metadata intact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4167) Hive converts bucket map join to SMB join even when tables are not sorted
[ https://issues.apache.org/jira/browse/HIVE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629965#comment-13629965 ] Vikram Dixit K commented on HIVE-4167: -- Hi Namit, I was able to reproduce this issue so far on my setup. However, I wasn't sure on how to reproduce this issue on using tables in unit-tests. I can provide an updated patch with your test right away. I am still actively working on this issue. Thanks Vikram. -- Nothing better than when appreciated for hard work. -Mark Hive converts bucket map join to SMB join even when tables are not sorted - Key: HIVE-4167 URL: https://issues.apache.org/jira/browse/HIVE-4167 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Namit Jain Priority: Blocker Attachments: HIVE-4167.patch If tables are just bucketed but not sorted, we are generating smb join operator. This results in loss of rows in queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4167) Hive converts bucket map join to SMB join even when tables are not sorted
[ https://issues.apache.org/jira/browse/HIVE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4167: - Attachment: hive.4167.1.patch Hive converts bucket map join to SMB join even when tables are not sorted - Key: HIVE-4167 URL: https://issues.apache.org/jira/browse/HIVE-4167 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Namit Jain Priority: Blocker Attachments: hive.4167.1.patch, HIVE-4167.patch If tables are just bucketed but not sorted, we are generating smb join operator. This results in loss of rows in queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4167) Hive converts bucket map join to SMB join even when tables are not sorted
[ https://issues.apache.org/jira/browse/HIVE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629966#comment-13629966 ] Namit Jain commented on HIVE-4167: -- I have a fix, and a testcase for this. Can you take a look ? Hive converts bucket map join to SMB join even when tables are not sorted - Key: HIVE-4167 URL: https://issues.apache.org/jira/browse/HIVE-4167 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Namit Jain Priority: Blocker Attachments: hive.4167.1.patch, HIVE-4167.patch If tables are just bucketed but not sorted, we are generating smb join operator. This results in loss of rows in queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4167) Hive converts bucket map join to SMB join even when tables are not sorted
[ https://issues.apache.org/jira/browse/HIVE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629967#comment-13629967 ] Namit Jain commented on HIVE-4167: -- https://reviews.facebook.net/D10209 Hive converts bucket map join to SMB join even when tables are not sorted - Key: HIVE-4167 URL: https://issues.apache.org/jira/browse/HIVE-4167 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Namit Jain Priority: Blocker Attachments: hive.4167.1.patch, HIVE-4167.patch If tables are just bucketed but not sorted, we are generating smb join operator. This results in loss of rows in queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join
[ https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629969#comment-13629969 ] Namit Jain commented on HIVE-3996: -- +1 Running tests Correctly enforce the memory limit on the multi-table map-join -- Key: HIVE-3996 URL: https://issues.apache.org/jira/browse/HIVE-3996 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996_4.patch, HIVE-3996_5.patch, HIVE-3996_6.patch, HIVE-3996_7.patch, HIVE-3996_8.patch, HIVE-3996_9.patch, HIVE-3996.patch Currently with HIVE-3784, the joins are converted to map-joins based on checks of the table size against the config variable: hive.auto.convert.join.noconditionaltask.size. However, the current implementation will also merge multiple mapjoin operators into a single task regardless of whether the sum of the table sizes will exceed the configured value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4106) SMB joins fail in multi-way joins
[ https://issues.apache.org/jira/browse/HIVE-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629979#comment-13629979 ] Namit Jain commented on HIVE-4106: -- [~vikram.dixit], can you load the failing query ? SMB joins fail in multi-way joins - Key: HIVE-4106 URL: https://issues.apache.org/jira/browse/HIVE-4106 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Blocker Attachments: auto_sortmerge_join_12.q, HIVE-4106.patch I see array out of bounds exception in case of multi way smb joins. This is related to changes that went in as part of HIVE-3403. This issue has been discussed in HIVE-3891. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4106) SMB joins fail in multi-way joins
[ https://issues.apache.org/jira/browse/HIVE-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629994#comment-13629994 ] Vikram Dixit K commented on HIVE-4106: -- [~namit] {noformat} select i_item_id ,i_item_desc ,i_current_price from item i join inventory inv on (inv.inv_item_sk = i.i_item_sk) join date_dim d on (d.d_date_sk = inv.inv_date_sk) join store_sales ss on (ss.ss_item_sk = i.i_item_sk) where i_current_price between 62 and 62+30.0 and d_date between '2000-05-25' and '2000-07-27' and i_manufact_id in (129,270,821,423) and inv_quantity_on_hand between 100 and 500 group by i_item_id,i_item_desc,i_current_price order by i_item_id limit 100; {noformat} This is the TPC-DS benchmark query (scale 1 but this does not matter) where store_sales and inventory are sorted as follows: store_sales: sorted by ss_item_sk, partitioned on ss_sold_date, clustered by ss_item_sk inventory: partitioned by (inv_date string) clustered by (inv_item_sk) sorted by (inv_item_sk) item: non-partitioned, bucketed clustered by (i_item_sk) sorted by (i_item_sk) date_dim: non-partitioned, non-bucketed. SMB joins fail in multi-way joins - Key: HIVE-4106 URL: https://issues.apache.org/jira/browse/HIVE-4106 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Blocker Attachments: auto_sortmerge_join_12.q, HIVE-4106.patch I see array out of bounds exception in case of multi way smb joins. This is related to changes that went in as part of HIVE-3403. This issue has been discussed in HIVE-3891. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join
[ https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3996: - Attachment: hive.3996.9.patch-nohcat Correctly enforce the memory limit on the multi-table map-join -- Key: HIVE-3996 URL: https://issues.apache.org/jira/browse/HIVE-3996 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-3996_2.patch, HIVE-3996_3.patch, HIVE-3996_4.patch, HIVE-3996_5.patch, HIVE-3996_6.patch, HIVE-3996_7.patch, HIVE-3996_8.patch, HIVE-3996_9.patch, hive.3996.9.patch-nohcat, HIVE-3996.patch Currently with HIVE-3784, the joins are converted to map-joins based on checks of the table size against the config variable: hive.auto.convert.join.noconditionaltask.size. However, the current implementation will also merge multiple mapjoin operators into a single task regardless of whether the sum of the table sizes will exceed the configured value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4106) SMB joins fail in multi-way joins
[ https://issues.apache.org/jira/browse/HIVE-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629996#comment-13629996 ] Namit Jain commented on HIVE-4106: -- This should not a mutli-way SMB join - this is due to HIVE-4167. Can you apply the patch for 4167 and check if this works ? SMB joins fail in multi-way joins - Key: HIVE-4106 URL: https://issues.apache.org/jira/browse/HIVE-4106 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Blocker Attachments: auto_sortmerge_join_12.q, HIVE-4106.patch I see array out of bounds exception in case of multi way smb joins. This is related to changes that went in as part of HIVE-3403. This issue has been discussed in HIVE-3891. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4275) Hive does not differentiate scheme and authority in file uris
[ https://issues.apache.org/jira/browse/HIVE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4275: - Status: Patch Available (was: Open) Hive does not differentiate scheme and authority in file uris - Key: HIVE-4275 URL: https://issues.apache.org/jira/browse/HIVE-4275 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.11.0 Attachments: HIVE-4275.patch Consider the following set of queries: ALTER TABLE abc ADD PARTITION (x='0') LOCATION 'file:///foo'; ALTER TABLE abc ADD PARTITION (x='1') LOCATION '/foo'; select count(*) from abc; Even though there are different files under these directories, depending on number of mappers, the count produces a value = num of mappers * num of files in the 2 directories. This is incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4275) Hive does not differentiate scheme and authority in file uris
[ https://issues.apache.org/jira/browse/HIVE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4275: - Attachment: HIVE-4275.patch Without the change, hive generates count as 4 (when it should be 2). Hive does not differentiate scheme and authority in file uris - Key: HIVE-4275 URL: https://issues.apache.org/jira/browse/HIVE-4275 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.11.0 Attachments: HIVE-4275.patch Consider the following set of queries: ALTER TABLE abc ADD PARTITION (x='0') LOCATION 'file:///foo'; ALTER TABLE abc ADD PARTITION (x='1') LOCATION '/foo'; select count(*) from abc; Even though there are different files under these directories, depending on number of mappers, the count produces a value = num of mappers * num of files in the 2 directories. This is incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4278) HCat needs to get current Hive jars instead of pulling them from maven repo
[ https://issues.apache.org/jira/browse/HIVE-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-4278: --- Attachment: HIVE-4278.approach2.patch Hi folks, I tried another approach that might hopefully be acceptable, and have uploaded a patch for that. The premise: a) We want to unblock HCat's ability to build from inside hive, picking the latest jars hive builds. b) We want to do this without changing how the hive workflow looks like, including not changing publishing to the current ivy cache dir. c) We don't want to perform invasive surgery to HCat switching it back to ivy either, till we can form a consensus as to which build tool we should standardize around. Assumption: a) Builtins can be removed ( HIVE-4304 and hive-dev mailing list discussion ) or, rather, at least, in the meanwhile, be disabled as a transitive dependency from other targets. b) It is okay for hcat, during its build, to look at the jars hive has just built, and publish that to the local maven cache. Future work: a) Trying to unify version numbers between hcat and hive - I'm still unhappy about the number of files in which the string 0.12.0-SNAPSHOT occurs. Is this compromise acceptable to all? Thoughts? HCat needs to get current Hive jars instead of pulling them from maven repo --- Key: HIVE-4278 URL: https://issues.apache.org/jira/browse/HIVE-4278 Project: Hive Issue Type: Sub-task Components: Build Infrastructure, HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Travis Crawford Priority: Blocker Fix For: 0.11.0 Attachments: HIVE-4278.approach2.patch, HIVE-4278.D9981.1.patch The HCatalog build is currently pulling Hive jars from the maven repo instead of using the ones built as part of the current build. Now that it is part of Hive it should use the jars being built instead of pulling them from maven. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4167) Hive converts bucket map join to SMB join even when tables are not sorted
[ https://issues.apache.org/jira/browse/HIVE-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4167: - Attachment: hive.4167.2.patch Hive converts bucket map join to SMB join even when tables are not sorted - Key: HIVE-4167 URL: https://issues.apache.org/jira/browse/HIVE-4167 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Namit Jain Priority: Blocker Attachments: hive.4167.1.patch, hive.4167.2.patch, HIVE-4167.patch If tables are just bucketed but not sorted, we are generating smb join operator. This results in loss of rows in queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4304) Remove unused builtins and pdk submodules
[ https://issues.apache.org/jira/browse/HIVE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630154#comment-13630154 ] Ashutosh Chauhan commented on HIVE-4304: [~traviscrawford] Are you planning to upload a patch for this soon? I believe this will accelerate HIVE-4278 Remove unused builtins and pdk submodules - Key: HIVE-4304 URL: https://issues.apache.org/jira/browse/HIVE-4304 Project: Hive Issue Type: Improvement Reporter: Travis Crawford Assignee: Travis Crawford Attachments: HIVE-4304.1.patch Moving from email. The [builtins|http://svn.apache.org/repos/asf/hive/trunk/builtins/] and [pdk|http://svn.apache.org/repos/asf/hive/trunk/pdk/] submodules are not believed to be in use and should be removed. The main benefits are simplification and maintainability of the Hive code base. Forwarded conversation Subject: builtins submodule - is it still needed? From: Travis Crawford traviscrawf...@gmail.com Date: Thu, Apr 4, 2013 at 2:01 PM To: u...@hive.apache.org, dev@hive.apache.org Hey hive gurus - Is the builtins hive submodule in use? The submodule was added in HIVE-2523 as a location for builtin-UDFs, but it appears to not have taken off. Any objections to removing it? DETAILS For HIVE-4278 I'm making some build changes for the HCatalog integration. The builtins submodule causes issues because it delays building until the packaging phase - so HCatalog can't depend on builtins, which it does transitively. While investigating a path forward I discovered the builtins submodule contains very little code, and likely could either go away entirely or merge into ql, simplifying things both for users and developers. Thoughts? Can anyone with context help me understand builtins, both in general and around its non-standard build? For your trouble I'll either make the submodule go away/merge into another submodule, or update the docs with what we learn. Thanks! Travis -- From: Ashutosh Chauhan ashutosh.chau...@gmail.com Date: Fri, Apr 5, 2013 at 3:10 PM To: dev@hive.apache.org Cc: u...@hive.apache.org u...@hive.apache.org I haven't used it myself anytime till now. Neither have met anyone who used it or plan to use it. Ashutosh On Thu, Apr 4, 2013 at 2:01 PM, Travis Crawford traviscrawf...@gmail.comwrote: -- From: Gunther Hagleitner ghagleit...@hortonworks.com Date: Fri, Apr 5, 2013 at 3:11 PM To: dev@hive.apache.org Cc: u...@hive.apache.org +1 I would actually go a step further and propose to remove both PDK and builtins. I've went through the code for both and here is what I found: Builtins: - BuiltInUtils.java: Empty file - UDAFUnionMap: Merges maps. Doesn't seem to be useful by itself, but was intended as a building block for PDK PDK: - some helper build.xml/test setup + teardown scripts - Classes/annotations to help run unit tests - rot13 as an example From what I can tell it's a fair assessment that it hasn't taken off, last commits to it seem to have happened more than 1.5 years ago. Thanks, Gunther. On Thu, Apr 4, 2013 at 2:01 PM, Travis Crawford traviscrawf...@gmail.comwrote: -- From: Owen O'Malley omal...@apache.org Date: Fri, Apr 5, 2013 at 4:45 PM To: u...@hive.apache.org +1 to removing them. We have a Rot13 example in ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13{In,Out}putFormat.java anyways. *smile* -- Owen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4275) Hive does not differentiate scheme and authority in file uris
[ https://issues.apache.org/jira/browse/HIVE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4275: --- Fix Version/s: (was: 0.11.0) Status: Open (was: Patch Available) I get following error after I applied your patch and ran the included test. {noformat} [junit] Failed query: schemeAuthority.q [junit] mkdir: Incomplete HDFS URI, no host: hdfs:///tmp/test {noformat} Hive does not differentiate scheme and authority in file uris - Key: HIVE-4275 URL: https://issues.apache.org/jira/browse/HIVE-4275 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4275.patch Consider the following set of queries: ALTER TABLE abc ADD PARTITION (x='0') LOCATION 'file:///foo'; ALTER TABLE abc ADD PARTITION (x='1') LOCATION '/foo'; select count(*) from abc; Even though there are different files under these directories, depending on number of mappers, the count produces a value = num of mappers * num of files in the 2 directories. This is incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-446) Implement TRUNCATE
[ https://issues.apache.org/jira/browse/HIVE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630258#comment-13630258 ] Gang Tim Liu commented on HIVE-446: --- External table is used in the context where data is not fully managed. If it ends up that there is a need to remove data behind external table, a question can be asked why do you define it as external table?. Saying that, possibly the proposed syntax and semantics are not consistent to external table use case. thanks Implement TRUNCATE -- Key: HIVE-446 URL: https://issues.apache.org/jira/browse/HIVE-446 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Prasad Chakka Assignee: Navis Fix For: 0.11.0 Attachments: HIVE-446.D7371.1.patch, HIVE-446.D7371.2.patch, HIVE-446.D7371.3.patch, HIVE-446.D7371.4.patch truncate the data but leave the table and metadata intact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4348) Unit test compile fail at hbase-handler project on Windows becuase of illegal escape character
Shuaishuai Nie created HIVE-4348: Summary: Unit test compile fail at hbase-handler project on Windows becuase of illegal escape character Key: HIVE-4348 URL: https://issues.apache.org/jira/browse/HIVE-4348 Project: Hive Issue Type: Bug Components: HBase Handler, Testing Infrastructure, Windows Affects Versions: 0.11.0 Environment: Windows 8 Reporter: Shuaishuai Nie The problem is because the automatically generated test case hardcoded file path string of query file using \ instead of \\ as escape character. The change should be in the TestHBaseCliDriver.vm and TestHBaseNegativeCliDriver.vm -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4161) create clean and small default set of tests for TestBeeLineDriver
[ https://issues.apache.org/jira/browse/HIVE-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630298#comment-13630298 ] Rob Weltman commented on HIVE-4161: --- See the new tests in the patch to HIVE-4268. That is probably a better place to put BeeLine tests than in ql. create clean and small default set of tests for TestBeeLineDriver - Key: HIVE-4161 URL: https://issues.apache.org/jira/browse/HIVE-4161 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Labels: HiveServer2 Fix For: 0.11.0 HiveServer2 (HIVE-2935) has added TestBeeLineDriver on the lines of TestCliDriver, which runs all the tests in TestCliDriver through the beeline commandline, which uses jdbc+hive server2. There are failures in many of the test cases after the rebase of the patch against latest hive code. The tests also almost double the time taken to run hive unit tests because TestCliDriver takes bulk of the hive unit test runtime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 2060 - Still Failing
Changes for Build #2032 [namit] HIVE-4219 explain dependency does not capture the input table (Namit via Gang Tim Liu) Changes for Build #2033 [gates] Removing old branches to limit size of Hive downloads. [gates] Removing tags directory as we no longer need them and they're in the history. [gates] Moving HCatalog into Hive. [gates] Test that perms work for hcatalog [hashutosh] HIVE-4007 : Create abstract classes for serializer and deserializer (Namit Jain via Ashutosh Chauhan) [hashutosh] HIVE-3381 : Result of outer join is not valid (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3980 : Cleanup after 3403 (Namit Jain via Ashutosh Chauhan) [hashutosh] HIVE-4042 : ignore mapjoin hint (Namit Jain via Ashutosh Chauhan) [namit] HIVE-3348 semi-colon in comments in .q file does not work (Nick Collins via namit) [namit] HIVE-4212 sort merge join should work for outer joins for more than 8 inputs (Namit via Gang Tim Liu) Changes for Build #2034 [namit] HIVE-3958 support partial scan for analyze command - RCFile (Gang Tim Liu via namit) Changes for Build #2035 [kevinwilfong] HIVE-4235. CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists. (Gang Tim Liu via kevinwilfong) [gangtimliu] HIVE-4157: ORC runs out of heap when writing (Kevin Wilfong vi Gang Tim Liu) [gangtimliu] HIVE-4155: Expose ORC's FileDump as a service [gangtimliu] HIVE-4159:RetryingHMSHandler doesn't retry in enough cases (Kevin Wilfong vi Gang Tim Liu) [namit] HIVE-4149 wrong results big outer joins with array of ints (Navis via namit) Changes for Build #2036 [gates] HIVE-4264 Moved hcatalog trunk code up to hive/trunk/hcatalog [hashutosh] HIVE-4263 : Adjust build.xml package command to move all hcat jars and binaries into build (Alan Gates via Ashutosh Chauhan) [namit] HIVE-4258 Log logical plan tree for debugging (Navis via namit) [navis] HIVE-2264 Hive server is SHUTTING DOWN when invalid queries beeing executed Changes for Build #2037 Changes for Build #2038 [hashutosh] HIVE-4122 : Queries fail if timestamp data not in expected format (Prasad Mujumdar via Ashutosh Chauhan) [hashutosh] HIVE-4170 : [REGRESSION] FsShell.close closes filesystem, removing temporary directories (Navis via Ashutosh Chauhan) Changes for Build #2039 [hashutosh] HIVE-3850 : hour() function returns 12 hour clock value when using timestamp datatype (Anandha and Franklin via Ashutosh Chauhan) Changes for Build #2040 [hashutosh] HIVE-3951 : Allow Decimal type columns in Regex Serde (Mark Grover via Ashutosh Chauhan) [namit] HIVE-4270 bug in hive.map.groupby.sorted in the presence of multiple input partitions (Namit via Gang Tim Liu) Changes for Build #2041 Changes for Build #2042 Changes for Build #2043 [hashutosh] HIVE-4252 : hiveserver2 string representation of complex types are inconsistent with cli (Thejas Nair via Ashutosh Chauhan) [hashutosh] HIVE-4179 : NonBlockingOpDeDup does not merge SEL operators correctly (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4269 : fix handling of binary type in hiveserver2, jdbc driver (Thejas Nair via Ashutosh Chauhan) [namit] HIVE-4174 Round UDF converts BigInts to double (Chen Chun via namit) [namit] HIVE-4240 optimize hive.enforce.bucketing and hive.enforce sorting insert (Gang Tim Liu via namit) [navis] HIVE-4288 Add IntelliJ project files files to .gitignore (Roshan Naik via Navis) Changes for Build #2044 [namit] HIVE-4289 HCatalog build fails when behind a firewall (Samuel Yuan via namit) [namit] HIVE-4281 add hive.map.groupby.sorted.testmode (Namit via Gang Tim Liu) [hashutosh] Moving hcatalog site outside of trunk [hashutosh] Moving hcatalog branches outside of trunk [hashutosh] HIVE-4259 : SEL operator created with missing columnExprMap for unions (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4156 : need to add protobuf classes to hive-exec.jar (Owen Omalley via Ashutosh Chauhan) [hashutosh] HIVE-3464 : Merging join tree may reorder joins which could be invalid (Navis via Ashutosh Chauhan) [hashutosh] HIVE-4138 : ORC's union object inspector returns a type name that isn't parseable by TypeInfoUtils (Owen Omalley via Ashutosh Chauhan) [cws] HIVE-4119. ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty (Shreepadma Venugopalan via cws) Changes for Build #2045 Changes for Build #2046 [hashutosh] HIVE-4067 : Followup to HIVE-701: reduce ambiguity in grammar (Samuel Yuan via Ashutosh Chauhan) Changes for Build #2047 Changes for Build #2048 [gangtimliu] HIVE-4298: add tests for distincts for hive.map.groutp.sorted. (Namit via Gang Tim Liu) [hashutosh] HIVE-4128 : Support avg(decimal) (Brock Noland via Ashutosh Chauhan) [kevinwilfong] HIVE-4151. HiveProfiler NPE with ScriptOperator. (Pamela Vagata via kevinwilfong) Changes for Build #2049 [hashutosh] HIVE-3985 : Update new UDAFs introduced for Windowing to work with new Decimal Type (Brock Noland via Ashutosh Chauhan)
Hive-trunk-hadoop2 - Build # 155 - Still Failing
Changes for Build #138 [namit] HIVE-4289 HCatalog build fails when behind a firewall (Samuel Yuan via namit) [namit] HIVE-4281 add hive.map.groupby.sorted.testmode (Namit via Gang Tim Liu) [hashutosh] Moving hcatalog site outside of trunk [hashutosh] Moving hcatalog branches outside of trunk [hashutosh] HIVE-4259 : SEL operator created with missing columnExprMap for unions (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4156 : need to add protobuf classes to hive-exec.jar (Owen Omalley via Ashutosh Chauhan) [hashutosh] HIVE-3464 : Merging join tree may reorder joins which could be invalid (Navis via Ashutosh Chauhan) [hashutosh] HIVE-4138 : ORC's union object inspector returns a type name that isn't parseable by TypeInfoUtils (Owen Omalley via Ashutosh Chauhan) [cws] HIVE-4119. ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty (Shreepadma Venugopalan via cws) [hashutosh] HIVE-4252 : hiveserver2 string representation of complex types are inconsistent with cli (Thejas Nair via Ashutosh Chauhan) [hashutosh] HIVE-4179 : NonBlockingOpDeDup does not merge SEL operators correctly (Gunther Hagleitner via Ashutosh Chauhan) [hashutosh] HIVE-4269 : fix handling of binary type in hiveserver2, jdbc driver (Thejas Nair via Ashutosh Chauhan) [namit] HIVE-4174 Round UDF converts BigInts to double (Chen Chun via namit) [namit] HIVE-4240 optimize hive.enforce.bucketing and hive.enforce sorting insert (Gang Tim Liu via namit) [navis] HIVE-4288 Add IntelliJ project files files to .gitignore (Roshan Naik via Navis) [namit] HIVE-4272 partition wise metadata does not work for text files [hashutosh] HIVE-896 : Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. (Harish Butani via Ashutosh Chauhan) [namit] HIVE-4260 union_remove_12, union_remove_13 are failing on hadoop2 (Gunther Hagleitner via namit) [hashutosh] HIVE-3951 : Allow Decimal type columns in Regex Serde (Mark Grover via Ashutosh Chauhan) [namit] HIVE-4270 bug in hive.map.groupby.sorted in the presence of multiple input partitions (Namit via Gang Tim Liu) [hashutosh] HIVE-3850 : hour() function returns 12 hour clock value when using timestamp datatype (Anandha and Franklin via Ashutosh Chauhan) [hashutosh] HIVE-4122 : Queries fail if timestamp data not in expected format (Prasad Mujumdar via Ashutosh Chauhan) [hashutosh] HIVE-4170 : [REGRESSION] FsShell.close closes filesystem, removing temporary directories (Navis via Ashutosh Chauhan) [gates] HIVE-4264 Moved hcatalog trunk code up to hive/trunk/hcatalog [hashutosh] HIVE-4263 : Adjust build.xml package command to move all hcat jars and binaries into build (Alan Gates via Ashutosh Chauhan) [namit] HIVE-4258 Log logical plan tree for debugging (Navis via namit) [navis] HIVE-2264 Hive server is SHUTTING DOWN when invalid queries beeing executed [kevinwilfong] HIVE-4235. CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists. (Gang Tim Liu via kevinwilfong) [gangtimliu] HIVE-4157: ORC runs out of heap when writing (Kevin Wilfong vi Gang Tim Liu) [gangtimliu] HIVE-4155: Expose ORC's FileDump as a service [gangtimliu] HIVE-4159:RetryingHMSHandler doesn't retry in enough cases (Kevin Wilfong vi Gang Tim Liu) [namit] HIVE-4149 wrong results big outer joins with array of ints (Navis via namit) [namit] HIVE-3958 support partial scan for analyze command - RCFile (Gang Tim Liu via namit) [gates] Removing old branches to limit size of Hive downloads. [gates] Removing tags directory as we no longer need them and they're in the history. [gates] Moving HCatalog into Hive. [gates] Test that perms work for hcatalog [hashutosh] HIVE-4007 : Create abstract classes for serializer and deserializer (Namit Jain via Ashutosh Chauhan) [hashutosh] HIVE-3381 : Result of outer join is not valid (Navis via Ashutosh Chauhan) [hashutosh] HIVE-3980 : Cleanup after 3403 (Namit Jain via Ashutosh Chauhan) [hashutosh] HIVE-4042 : ignore mapjoin hint (Namit Jain via Ashutosh Chauhan) [namit] HIVE-3348 semi-colon in comments in .q file does not work (Nick Collins via namit) [namit] HIVE-4212 sort merge join should work for outer joins for more than 8 inputs (Namit via Gang Tim Liu) [namit] HIVE-4219 explain dependency does not capture the input table (Namit via Gang Tim Liu) [kevinwilfong] HIVE-4092. Store complete names of tables in column access analyzer (Samuel Yuan via kevinwilfong) [namit] HIVE-4208 Clientpositive test parenthesis_star_by is non-deteministic (Mark Grover via namit) [cws] HIVE-4217. Fix show_create_table_*.q test failures (Carl Steinbach via cws) [namit] HIVE-4206 Sort merge join does not work for outer joins for 7 inputs (Namit via Gang Tim Liu) [kevinwilfong] HIVE-4188. TestJdbcDriver2.testDescribeTable failing consistently. (Prasad Mujumdar via kevinwilfong) [hashutosh] HIVE-3820 Consider creating a literal like D or BD for representing Decimal type constants (Gunther Hagleitner
Preferred way to run unit tests
Hello, I have been trying to run the unit tests for the last hive release (0.10). For me they have been taking in access of 10 hrs to run (not to mention the occasional failures with some of the flaky tests). Current I am just doing a ant clean package test. Is there a better way to run these? Also is it possible for the build to ignore any test failures and complete? Thanks for any help. -- Swarnim
[jira] [Commented] (HIVE-4304) Remove unused builtins and pdk submodules
[ https://issues.apache.org/jira/browse/HIVE-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630350#comment-13630350 ] Travis Crawford commented on HIVE-4304: --- Just started tests at https://travis.ci.cloudbees.com/job/HIVE-4304_rm_builtins_pdk/ and will post if they pass. Remove unused builtins and pdk submodules - Key: HIVE-4304 URL: https://issues.apache.org/jira/browse/HIVE-4304 Project: Hive Issue Type: Improvement Reporter: Travis Crawford Assignee: Travis Crawford Attachments: HIVE-4304.1.patch Moving from email. The [builtins|http://svn.apache.org/repos/asf/hive/trunk/builtins/] and [pdk|http://svn.apache.org/repos/asf/hive/trunk/pdk/] submodules are not believed to be in use and should be removed. The main benefits are simplification and maintainability of the Hive code base. Forwarded conversation Subject: builtins submodule - is it still needed? From: Travis Crawford traviscrawf...@gmail.com Date: Thu, Apr 4, 2013 at 2:01 PM To: u...@hive.apache.org, dev@hive.apache.org Hey hive gurus - Is the builtins hive submodule in use? The submodule was added in HIVE-2523 as a location for builtin-UDFs, but it appears to not have taken off. Any objections to removing it? DETAILS For HIVE-4278 I'm making some build changes for the HCatalog integration. The builtins submodule causes issues because it delays building until the packaging phase - so HCatalog can't depend on builtins, which it does transitively. While investigating a path forward I discovered the builtins submodule contains very little code, and likely could either go away entirely or merge into ql, simplifying things both for users and developers. Thoughts? Can anyone with context help me understand builtins, both in general and around its non-standard build? For your trouble I'll either make the submodule go away/merge into another submodule, or update the docs with what we learn. Thanks! Travis -- From: Ashutosh Chauhan ashutosh.chau...@gmail.com Date: Fri, Apr 5, 2013 at 3:10 PM To: dev@hive.apache.org Cc: u...@hive.apache.org u...@hive.apache.org I haven't used it myself anytime till now. Neither have met anyone who used it or plan to use it. Ashutosh On Thu, Apr 4, 2013 at 2:01 PM, Travis Crawford traviscrawf...@gmail.comwrote: -- From: Gunther Hagleitner ghagleit...@hortonworks.com Date: Fri, Apr 5, 2013 at 3:11 PM To: dev@hive.apache.org Cc: u...@hive.apache.org +1 I would actually go a step further and propose to remove both PDK and builtins. I've went through the code for both and here is what I found: Builtins: - BuiltInUtils.java: Empty file - UDAFUnionMap: Merges maps. Doesn't seem to be useful by itself, but was intended as a building block for PDK PDK: - some helper build.xml/test setup + teardown scripts - Classes/annotations to help run unit tests - rot13 as an example From what I can tell it's a fair assessment that it hasn't taken off, last commits to it seem to have happened more than 1.5 years ago. Thanks, Gunther. On Thu, Apr 4, 2013 at 2:01 PM, Travis Crawford traviscrawf...@gmail.comwrote: -- From: Owen O'Malley omal...@apache.org Date: Fri, Apr 5, 2013 at 4:45 PM To: u...@hive.apache.org +1 to removing them. We have a Rot13 example in ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13{In,Out}putFormat.java anyways. *smile* -- Owen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4349) Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters
Xi Fang created HIVE-4349: - Summary: Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters Key: HIVE-4349 URL: https://issues.apache.org/jira/browse/HIVE-4349 Project: Hive Issue Type: Bug Reporter: Xi Fang Fix For: 0.11.0 If the Hive enlistment root path is longer than 12 chars then test classpath “hadoop.testcp” is exceeding the 8K chars so we are unable to run most of the Hive unit tests on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4349) Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters
[ https://issues.apache.org/jira/browse/HIVE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xi Fang updated HIVE-4349: -- Attachment: HIVE-4349.patch Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters Key: HIVE-4349 URL: https://issues.apache.org/jira/browse/HIVE-4349 Project: Hive Issue Type: Bug Reporter: Xi Fang Fix For: 0.11.0 Attachments: HIVE-4349.patch If the Hive enlistment root path is longer than 12 chars then test classpath “hadoop.testcp” is exceeding the 8K chars so we are unable to run most of the Hive unit tests on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4349) Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters
[ https://issues.apache.org/jira/browse/HIVE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xi Fang updated HIVE-4349: -- Attachment: (was: HIVE-4349.patch) Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters Key: HIVE-4349 URL: https://issues.apache.org/jira/browse/HIVE-4349 Project: Hive Issue Type: Bug Reporter: Xi Fang Fix For: 0.11.0 Attachments: HIVE-4349.1.patch If the Hive enlistment root path is longer than 12 chars then test classpath “hadoop.testcp” is exceeding the 8K chars so we are unable to run most of the Hive unit tests on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4349) Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters
[ https://issues.apache.org/jira/browse/HIVE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xi Fang updated HIVE-4349: -- Attachment: HIVE-4349.1.patch Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters Key: HIVE-4349 URL: https://issues.apache.org/jira/browse/HIVE-4349 Project: Hive Issue Type: Bug Reporter: Xi Fang Fix For: 0.11.0 Attachments: HIVE-4349.1.patch If the Hive enlistment root path is longer than 12 chars then test classpath “hadoop.testcp” is exceeding the 8K chars so we are unable to run most of the Hive unit tests on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4339) build fails after branch (hcatalog version not updated)
[ https://issues.apache.org/jira/browse/HIVE-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4339: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk. build fails after branch (hcatalog version not updated) --- Key: HIVE-4339 URL: https://issues.apache.org/jira/browse/HIVE-4339 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.12.0 Attachments: HIVE-4339.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4349) Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters
[ https://issues.apache.org/jira/browse/HIVE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630405#comment-13630405 ] Xi Fang commented on HIVE-4349: --- This is the current solution. 1) Before setting up the class path environment variable, find the list of JARs in the “test-classpath” and copy all of them to a test jar folder from various folders. This is done in a task shortenclasspath. That means, all the required JARs will be in a single folder. 2) Include the “test jar*” in the class path to reduce the class path size. 3) Set the environment variable Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters Key: HIVE-4349 URL: https://issues.apache.org/jira/browse/HIVE-4349 Project: Hive Issue Type: Bug Reporter: Xi Fang Fix For: 0.11.0 Attachments: HIVE-4349.1.patch If the Hive enlistment root path is longer than 12 chars then test classpath “hadoop.testcp” is exceeding the 8K chars so we are unable to run most of the Hive unit tests on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4349) Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters
[ https://issues.apache.org/jira/browse/HIVE-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xi Fang updated HIVE-4349: -- Affects Version/s: 0.11.0 Status: Patch Available (was: Open) Fix the Hive unit test failures when the Hive enlistment root path is longer than ~12 characters Key: HIVE-4349 URL: https://issues.apache.org/jira/browse/HIVE-4349 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Xi Fang Fix For: 0.11.0 Attachments: HIVE-4349.1.patch If the Hive enlistment root path is longer than 12 chars then test classpath “hadoop.testcp” is exceeding the 8K chars so we are unable to run most of the Hive unit tests on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4347) Hcatalog build fail on Windows because javadoc command exceed length limit
[ https://issues.apache.org/jira/browse/HIVE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-4347: - Attachment: HIVE-4347.patch Hcatalog build fail on Windows because javadoc command exceed length limit -- Key: HIVE-4347 URL: https://issues.apache.org/jira/browse/HIVE-4347 Project: Hive Issue Type: Bug Components: Build Infrastructure, HCatalog, Windows Affects Versions: 0.11.0 Environment: Windows 8 Reporter: Shuaishuai Nie Labels: build, patch Attachments: HIVE-4347.patch Original Estimate: 24h Remaining Estimate: 24h When building Hcatalog on Window 8, build fail because HIVE_DIR\hcatalog\build.xml:213: Javadoc failed: java.io.IOException: Cannot run program JAVA_HOME\bin\javadoc.exe: CreateProces s error=206, The filename or extension is too long -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4350) support AS keyword for table alias
Thejas M Nair created HIVE-4350: --- Summary: support AS keyword for table alias Key: HIVE-4350 URL: https://issues.apache.org/jira/browse/HIVE-4350 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Thejas M Nair SQL standard supports AS optional keyword, while creating an table alias. http://savage.net.au/SQL/sql-92.bnf.html#table reference Hive gives a error when the optional keyword is used - select * from tiny as t1; org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: ParseException line 1:19 mismatched input 'as' expecting EOF near 'tiny' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4342) NPE for query involving UNION ALL with nested JOIN and UNION ALL
[ https://issues.apache.org/jira/browse/HIVE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihir Kulkarni updated HIVE-4342: - Priority: Critical (was: Major) NPE for query involving UNION ALL with nested JOIN and UNION ALL Key: HIVE-4342 URL: https://issues.apache.org/jira/browse/HIVE-4342 Project: Hive Issue Type: Bug Components: Logging, Metastore, Query Processor Affects Versions: 0.9.0 Environment: Red Hat Linux VM with Hive 0.9 and Hadoop 2.0 Reporter: Mihir Kulkarni Priority: Critical Attachments: example.txt UNION ALL query with JOIN in first part and another UNION ALL in second part gives NPE. bq. JOIN UNION ALL bq. UNION ALL Attached file (example.txt) contains the schema and exact query which fails on Hive 0.9. It is worthwhile to note that the same query executes successfully on Hive 0.7. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630511#comment-13630511 ] Pamela Vagata commented on HIVE-4318: - Thanks for running these separately :) I just looked in OperatorHookUtils.java which is where the opHooks list is being initialized - it looks like the opHooks list is always being initialized even if there are no OperatorHooks installed. My suspicion is that if we returned null instead of an empty list, the numbers would be different since a null check should be much cheaper. Would you mind modifying OperatorHookUtils.getOperatorHooks to return null instead of an empty list and then rerun the MBM with the code for the OperatorHooks left in and also commented out to see what the difference is? OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python
[ https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630546#comment-13630546 ] Gang Tim Liu commented on HIVE-4322: +1 after test passes SkewedInfo in Metastore Thrift API cannot be deserialized in Python --- Key: HIVE-4322 URL: https://issues.apache.org/jira/browse/HIVE-4322 Project: Hive Issue Type: Bug Components: Metastore, Thrift API Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Priority: Minor Attachments: HIVE-4322.HIVE-4322.HIVE-4322.HIVE-4322.D10203.1.patch The Thrift-generated Python code that deserializes Thrift objects fails whenever a complex type is used as a map key, because by default mutable Python objects such as lists do not have a hash function. See https://issues.apache.org/jira/browse/THRIFT-162 for related discussion. The SkewedInfo struct contains a map which uses a list as a key, breaking the Python Thrift interface. It is not possible to specify the mapping from Thrift types to Python types, or otherwise we could map Thrift lists to Python tuples. Instead, the proposed workaround wraps the list inside a new struct. This alone does not accomplish anything, but allows Python clients to define a hash function for the struct class, e.g.: def f(object): return hash(tuple(object.skewedValueList)) SkewedValueList.__hash__ = f In practice a more efficient hash might be defined that does not involve copying the list. The advantage of wrapping the list inside a struct is that the client does not have to define the hash on the list itself, which would change the behaviour of lists everywhere else in the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python
[ https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630549#comment-13630549 ] Phabricator commented on HIVE-4322: --- gangtimliu has commented on the revision HIVE-4322 [jira] SkewedInfo in Metastore Thrift API cannot be deserialized in Python. code lgtm. looking at hadoop 23 tests. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1127 what's difference in this line? thanks REVISION DETAIL https://reviews.facebook.net/D10203 To: gangtimliu, sxyuan Cc: kevinwilfong, JIRA SkewedInfo in Metastore Thrift API cannot be deserialized in Python --- Key: HIVE-4322 URL: https://issues.apache.org/jira/browse/HIVE-4322 Project: Hive Issue Type: Bug Components: Metastore, Thrift API Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Priority: Minor Attachments: HIVE-4322.HIVE-4322.HIVE-4322.HIVE-4322.D10203.1.patch The Thrift-generated Python code that deserializes Thrift objects fails whenever a complex type is used as a map key, because by default mutable Python objects such as lists do not have a hash function. See https://issues.apache.org/jira/browse/THRIFT-162 for related discussion. The SkewedInfo struct contains a map which uses a list as a key, breaking the Python Thrift interface. It is not possible to specify the mapping from Thrift types to Python types, or otherwise we could map Thrift lists to Python tuples. Instead, the proposed workaround wraps the list inside a new struct. This alone does not accomplish anything, but allows Python clients to define a hash function for the struct class, e.g.: def f(object): return hash(tuple(object.skewedValueList)) SkewedValueList.__hash__ = f In practice a more efficient hash might be defined that does not involve copying the list. The advantage of wrapping the list inside a struct is that the client does not have to define the hash on the list itself, which would change the behaviour of lists everywhere else in the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python
[ https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630553#comment-13630553 ] Phabricator commented on HIVE-4322: --- sxyuan has commented on the revision HIVE-4322 [jira] SkewedInfo in Metastore Thrift API cannot be deserialized in Python. INLINE COMMENTS metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1127 Just fixed the name of the function. REVISION DETAIL https://reviews.facebook.net/D10203 To: gangtimliu, sxyuan Cc: kevinwilfong, JIRA SkewedInfo in Metastore Thrift API cannot be deserialized in Python --- Key: HIVE-4322 URL: https://issues.apache.org/jira/browse/HIVE-4322 Project: Hive Issue Type: Bug Components: Metastore, Thrift API Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Priority: Minor Attachments: HIVE-4322.HIVE-4322.HIVE-4322.HIVE-4322.D10203.1.patch The Thrift-generated Python code that deserializes Thrift objects fails whenever a complex type is used as a map key, because by default mutable Python objects such as lists do not have a hash function. See https://issues.apache.org/jira/browse/THRIFT-162 for related discussion. The SkewedInfo struct contains a map which uses a list as a key, breaking the Python Thrift interface. It is not possible to specify the mapping from Thrift types to Python types, or otherwise we could map Thrift lists to Python tuples. Instead, the proposed workaround wraps the list inside a new struct. This alone does not accomplish anything, but allows Python clients to define a hash function for the struct class, e.g.: def f(object): return hash(tuple(object.skewedValueList)) SkewedValueList.__hash__ = f In practice a more efficient hash might be defined that does not involve copying the list. The advantage of wrapping the list inside a struct is that the client does not have to define the hash on the list itself, which would change the behaviour of lists everywhere else in the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4351) Thrift code generation fails due to hcatalog
Gang Tim Liu created HIVE-4351: -- Summary: Thrift code generation fails due to hcatalog Key: HIVE-4351 URL: https://issues.apache.org/jira/browse/HIVE-4351 Project: Hive Issue Type: Bug Components: Thrift API Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Ashutosh Chauhan It fails to generate thrift code since hcatalog doesn't have Target thriftif ant thriftif -Dthrift.home=/usr/local . BUILD FAILED Target thriftif does not exist in the project hcatalog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4275) Hive does not differentiate scheme and authority in file uris
[ https://issues.apache.org/jira/browse/HIVE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4275: - Attachment: HIVE-4275.2.patch The test was actually for TestMinimrCliDriver. I had missed the changes in build-common.xml. Hive does not differentiate scheme and authority in file uris - Key: HIVE-4275 URL: https://issues.apache.org/jira/browse/HIVE-4275 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4275.2.patch, HIVE-4275.patch Consider the following set of queries: ALTER TABLE abc ADD PARTITION (x='0') LOCATION 'file:///foo'; ALTER TABLE abc ADD PARTITION (x='1') LOCATION '/foo'; select count(*) from abc; Even though there are different files under these directories, depending on number of mappers, the count produces a value = num of mappers * num of files in the 2 directories. This is incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4275) Hive does not differentiate scheme and authority in file uris
[ https://issues.apache.org/jira/browse/HIVE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4275: - Status: Patch Available (was: Open) Hive does not differentiate scheme and authority in file uris - Key: HIVE-4275 URL: https://issues.apache.org/jira/browse/HIVE-4275 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4275.2.patch, HIVE-4275.patch Consider the following set of queries: ALTER TABLE abc ADD PARTITION (x='0') LOCATION 'file:///foo'; ALTER TABLE abc ADD PARTITION (x='1') LOCATION '/foo'; select count(*) from abc; Even though there are different files under these directories, depending on number of mappers, the count produces a value = num of mappers * num of files in the 2 directories. This is incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4275) Hive does not differentiate scheme and authority in file uris
[ https://issues.apache.org/jira/browse/HIVE-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630576#comment-13630576 ] Vikram Dixit K commented on HIVE-4275: -- Review board request. https://reviews.apache.org/r/10429/ Hive does not differentiate scheme and authority in file uris - Key: HIVE-4275 URL: https://issues.apache.org/jira/browse/HIVE-4275 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4275.2.patch, HIVE-4275.patch Consider the following set of queries: ALTER TABLE abc ADD PARTITION (x='0') LOCATION 'file:///foo'; ALTER TABLE abc ADD PARTITION (x='1') LOCATION '/foo'; select count(*) from abc; Even though there are different files under these directories, depending on number of mappers, the count produces a value = num of mappers * num of files in the 2 directories. This is incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4352) Guava not getting included in build package
Mark Wagner created HIVE-4352: - Summary: Guava not getting included in build package Key: HIVE-4352 URL: https://issues.apache.org/jira/browse/HIVE-4352 Project: Hive Issue Type: Bug Reporter: Mark Wagner Since HIVE-4148, Guava is not getting included in the appropriate packages. This manifests as a ClassNotFoundException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4352) Guava not getting included in build package
[ https://issues.apache.org/jira/browse/HIVE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner reassigned HIVE-4352: - Assignee: Mark Wagner Guava not getting included in build package --- Key: HIVE-4352 URL: https://issues.apache.org/jira/browse/HIVE-4352 Project: Hive Issue Type: Bug Reporter: Mark Wagner Assignee: Mark Wagner Since HIVE-4148, Guava is not getting included in the appropriate packages. This manifests as a ClassNotFoundException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4353) Add AvroObjectInspectorGenerator API
Edward C. Skoviak created HIVE-4353: --- Summary: Add AvroObjectInspectorGenerator API Key: HIVE-4353 URL: https://issues.apache.org/jira/browse/HIVE-4353 Project: Hive Issue Type: Improvement Reporter: Edward C. Skoviak Priority: Minor Whilst working on a Hive project where I am auto-generating a create table hive command for clients, I became very aware how helpful an API would be for the AvroObjectInspectorGenerator. This functionality would make it very easy for consumer's to pull out column names and types. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4353) Add AvroObjectInspectorGenerator API
[ https://issues.apache.org/jira/browse/HIVE-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward C. Skoviak updated HIVE-4353: Description: Whilst working on a Hive project where I am auto-generating a create table hive command for clients, I became very aware how helpful an API would be for the AvroObjectInspectorGenerator. This functionality would make it very easy for consumer's to pull out column names and types, especially in the scenario where you can not use the auto-generate flag. (was: Whilst working on a Hive project where I am auto-generating a create table hive command for clients, I became very aware how helpful an API would be for the AvroObjectInspectorGenerator. This functionality would make it very easy for consumer's to pull out column names and types.) Add AvroObjectInspectorGenerator API Key: HIVE-4353 URL: https://issues.apache.org/jira/browse/HIVE-4353 Project: Hive Issue Type: Improvement Reporter: Edward C. Skoviak Priority: Minor Whilst working on a Hive project where I am auto-generating a create table hive command for clients, I became very aware how helpful an API would be for the AvroObjectInspectorGenerator. This functionality would make it very easy for consumer's to pull out column names and types, especially in the scenario where you can not use the auto-generate flag. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4353) Add AvroObjectInspectorGenerator API
[ https://issues.apache.org/jira/browse/HIVE-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward C. Skoviak updated HIVE-4353: Description: Whilst working on a Hive project where I am auto-generating a create table hive command for clients, I became very aware how helpful an API would be for the AvroObjectInspectorGenerator. This functionality would make it very easy for consumer's to pull out column names and types, and would be especially helpful in the scenario where you can not use the auto-generate flag. (was: Whilst working on a Hive project where I am auto-generating a create table hive command for clients, I became very aware how helpful an API would be for the AvroObjectInspectorGenerator. This functionality would make it very easy for consumer's to pull out column names and types, especially in the scenario where you can not use the auto-generate flag.) Add AvroObjectInspectorGenerator API Key: HIVE-4353 URL: https://issues.apache.org/jira/browse/HIVE-4353 Project: Hive Issue Type: Improvement Reporter: Edward C. Skoviak Priority: Minor Whilst working on a Hive project where I am auto-generating a create table hive command for clients, I became very aware how helpful an API would be for the AvroObjectInspectorGenerator. This functionality would make it very easy for consumer's to pull out column names and types, and would be especially helpful in the scenario where you can not use the auto-generate flag. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python
[ https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630605#comment-13630605 ] Phabricator commented on HIVE-4322: --- gangtimliu has accepted the revision HIVE-4322 [jira] SkewedInfo in Metastore Thrift API cannot be deserialized in Python. thanks REVISION DETAIL https://reviews.facebook.net/D10203 BRANCH svn ARCANIST PROJECT hive To: gangtimliu, sxyuan Cc: kevinwilfong, JIRA SkewedInfo in Metastore Thrift API cannot be deserialized in Python --- Key: HIVE-4322 URL: https://issues.apache.org/jira/browse/HIVE-4322 Project: Hive Issue Type: Bug Components: Metastore, Thrift API Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Priority: Minor Attachments: HIVE-4322.HIVE-4322.HIVE-4322.HIVE-4322.D10203.1.patch The Thrift-generated Python code that deserializes Thrift objects fails whenever a complex type is used as a map key, because by default mutable Python objects such as lists do not have a hash function. See https://issues.apache.org/jira/browse/THRIFT-162 for related discussion. The SkewedInfo struct contains a map which uses a list as a key, breaking the Python Thrift interface. It is not possible to specify the mapping from Thrift types to Python types, or otherwise we could map Thrift lists to Python tuples. Instead, the proposed workaround wraps the list inside a new struct. This alone does not accomplish anything, but allows Python clients to define a hash function for the struct class, e.g.: def f(object): return hash(tuple(object.skewedValueList)) SkewedValueList.__hash__ = f In practice a more efficient hash might be defined that does not involve copying the list. The advantage of wrapping the list inside a struct is that the client does not have to define the hash on the list itself, which would change the behaviour of lists everywhere else in the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630607#comment-13630607 ] Gunther Hagleitner commented on HIVE-4318: -- [~pamelavagata]: I saw that too and I am sure it would make the numbers slightly better. There's also the issue with allocating a new object for each invocation which is probably even worse than the empty list. My point though is this: Even if we get it down to where I fixed counters too, you would still pay a price for the feature. No counters v fixed counters is still faster (see above). From this thread it seems that the profiler is a valuable feature for keeping taps on performance in the dev cycle, operator hooks on the other hand are not that useful. Anything you add there has a tremendously bad effect on performance. From that I concluded that we should change the profiler not to rely on operator hooks and also not to contribute to run time in production. The best way to me is to remove it temporarily and handle it in a new jira (where we can discuss the how in more detail). Does that make sense? OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4318: - Attachment: HIVE-4318.2.patch Here's the patch that goes with the proposal OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1708) make hive history file configurable
[ https://issues.apache.org/jira/browse/HIVE-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630609#comment-13630609 ] Nitin Pawar commented on HIVE-1708: --- I did add a new setting to hive-site.xml and made some change in the cli code and tested it for making hive history optional. I wanted to add one more property for the hive history file path but currently it is set to .hivehistory inside each individual users home directory. If I have to retain this property how will I keep the default value in hive-site.xml. As all the users will have different home directories on different linux distributions, how do we default the path then? can we change the file path to something like log location which resides inside /tmp ? Is that an acceptable change? make hive history file configurable --- Key: HIVE-1708 URL: https://issues.apache.org/jira/browse/HIVE-1708 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Currentlly, it is derived from System.getProperty(user.home)/.hivehistory; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3129) Create windows native scripts (CMD files) to run hive on windows without Cygwin
[ https://issues.apache.org/jira/browse/HIVE-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xi Fang updated HIVE-3129: -- Attachment: HIVE-3129.1.patch The attached patch has the windows native command scripts, which can run hive on windows without Cygwin. We attach this patch because we already have the windows specific command scripts. We will deal with the unification of scripts in a separate JIRA. Additionally, unit test scripts will follow. Create windows native scripts (CMD files) to run hive on windows without Cygwin Key: HIVE-3129 URL: https://issues.apache.org/jira/browse/HIVE-3129 Project: Hive Issue Type: Bug Components: CLI, Windows Reporter: Kanna Karanam Labels: Windows Attachments: HIVE-3129.1.patch Create the cmd files equivalent to a)Bin\hive b)Bin\hive-config.sh c)Bin\Init-hive-dfs.sh d)Bin\ext\cli.sh e)Bin\ext\debug.sh f)Bin\ext\help.sh g)Bin\ext\hiveserver.sh h)Bin\ext\jar.sh i)Bin\ext\hwi.sh j)Bin\ext\lineage.sh k)Bin\ext\metastore.sh l)Bin\ext\rcfilecat.sh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3129) Create windows native scripts (CMD files) to run hive on windows without Cygwin
[ https://issues.apache.org/jira/browse/HIVE-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xi Fang updated HIVE-3129: -- Affects Version/s: 0.11.0 Status: Patch Available (was: Open) Create windows native scripts (CMD files) to run hive on windows without Cygwin Key: HIVE-3129 URL: https://issues.apache.org/jira/browse/HIVE-3129 Project: Hive Issue Type: Bug Components: CLI, Windows Affects Versions: 0.11.0 Reporter: Kanna Karanam Labels: Windows Attachments: HIVE-3129.1.patch Create the cmd files equivalent to a)Bin\hive b)Bin\hive-config.sh c)Bin\Init-hive-dfs.sh d)Bin\ext\cli.sh e)Bin\ext\debug.sh f)Bin\ext\help.sh g)Bin\ext\hiveserver.sh h)Bin\ext\jar.sh i)Bin\ext\hwi.sh j)Bin\ext\lineage.sh k)Bin\ext\metastore.sh l)Bin\ext\rcfilecat.sh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #345
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/345/ -- [...truncated 36424 lines...] [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2013-04-12_13-54-44_549_3988602030308142448/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201304121354_725414661.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] Copying file: file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/jenkins/hive_2013-04-12_13-54-48_381_7641344880379777334/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2013-04-12_13-54-48_381_7641344880379777334/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201304121354_1471248745.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201304121354_1742716093.txt [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201304121354_1105327587.txt [junit] Copying file: file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK:
[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630637#comment-13630637 ] Kevin Wilfong commented on HIVE-4318: - It's not clear to me that we can't cut down the cost added by operator hooks when there are no operator hooks present to the point where it does not significantly affect performance. Pam, could you provide Gunther a patch which sets the list of operator hooks to null rather than the empty list, and initializes the OperatorHookContext in the calls to enterOperatorHooks and exitOperatorHooks after the check if the list is null. This should limit the impact of operator hooks, to two method calls and two null checks. We could even put the check if this.operatorHooks==null around the method calls themselves, in case the Java compiler isn't inlining it for some reason. If after that, they still introduce a substantial amount of overhead, there's not much more we can do, and I'd be ok with removing operator hooks. OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4284) Implement class for vectorized row batch
[ https://issues.apache.org/jira/browse/HIVE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4284: -- Summary: Implement class for vectorized row batch (was: Implement class for vectorized row group.) Implement class for vectorized row batch Key: HIVE-4284 URL: https://issues.apache.org/jira/browse/HIVE-4284 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Eric Hanson Vectorized row group object will represent the row group that vectorized operators will work on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4296) ant thriftif fails on hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630687#comment-13630687 ] Ashutosh Chauhan commented on HIVE-4296: Thanks, Travis for review. Committed to trunk. Thanks, Roshan! ant thriftif fails on hcatalog Key: HIVE-4296 URL: https://issues.apache.org/jira/browse/HIVE-4296 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.10.0 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4296.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4351) Thrift code generation fails due to hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4351. Resolution: Duplicate Fix Version/s: 0.12.0 Fixed via HIVE-4296 Thrift code generation fails due to hcatalog Key: HIVE-4351 URL: https://issues.apache.org/jira/browse/HIVE-4351 Project: Hive Issue Type: Bug Components: Thrift API Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Ashutosh Chauhan Fix For: 0.12.0 It fails to generate thrift code since hcatalog doesn't have Target thriftif ant thriftif -Dthrift.home=/usr/local . BUILD FAILED Target thriftif does not exist in the project hcatalog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python
[ https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630689#comment-13630689 ] Gang Tim Liu commented on HIVE-4322: Committed. thank Samuel Yuan. SkewedInfo in Metastore Thrift API cannot be deserialized in Python --- Key: HIVE-4322 URL: https://issues.apache.org/jira/browse/HIVE-4322 Project: Hive Issue Type: Bug Components: Metastore, Thrift API Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Priority: Minor Attachments: HIVE-4322.HIVE-4322.HIVE-4322.HIVE-4322.D10203.1.patch The Thrift-generated Python code that deserializes Thrift objects fails whenever a complex type is used as a map key, because by default mutable Python objects such as lists do not have a hash function. See https://issues.apache.org/jira/browse/THRIFT-162 for related discussion. The SkewedInfo struct contains a map which uses a list as a key, breaking the Python Thrift interface. It is not possible to specify the mapping from Thrift types to Python types, or otherwise we could map Thrift lists to Python tuples. Instead, the proposed workaround wraps the list inside a new struct. This alone does not accomplish anything, but allows Python clients to define a hash function for the struct class, e.g.: def f(object): return hash(tuple(object.skewedValueList)) SkewedValueList.__hash__ = f In practice a more efficient hash might be defined that does not involve copying the list. The advantage of wrapping the list inside a struct is that the client does not have to define the hash on the list itself, which would change the behaviour of lists everywhere else in the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python
[ https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-4322: --- Resolution: Fixed Fix Version/s: 0.11.0 Status: Resolved (was: Patch Available) SkewedInfo in Metastore Thrift API cannot be deserialized in Python --- Key: HIVE-4322 URL: https://issues.apache.org/jira/browse/HIVE-4322 Project: Hive Issue Type: Bug Components: Metastore, Thrift API Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Priority: Minor Fix For: 0.11.0 Attachments: HIVE-4322.HIVE-4322.HIVE-4322.HIVE-4322.D10203.1.patch The Thrift-generated Python code that deserializes Thrift objects fails whenever a complex type is used as a map key, because by default mutable Python objects such as lists do not have a hash function. See https://issues.apache.org/jira/browse/THRIFT-162 for related discussion. The SkewedInfo struct contains a map which uses a list as a key, breaking the Python Thrift interface. It is not possible to specify the mapping from Thrift types to Python types, or otherwise we could map Thrift lists to Python tuples. Instead, the proposed workaround wraps the list inside a new struct. This alone does not accomplish anything, but allows Python clients to define a hash function for the struct class, e.g.: def f(object): return hash(tuple(object.skewedValueList)) SkewedValueList.__hash__ = f In practice a more efficient hash might be defined that does not involve copying the list. The advantage of wrapping the list inside a struct is that the client does not have to define the hash on the list itself, which would change the behaviour of lists everywhere else in the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4296) ant thriftif fails on hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4296: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) ant thriftif fails on hcatalog Key: HIVE-4296 URL: https://issues.apache.org/jira/browse/HIVE-4296 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.10.0 Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 0.12.0 Attachments: HIVE-4296.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4351) Thrift code generation fails due to hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630691#comment-13630691 ] Gang Tim Liu commented on HIVE-4351: thank [~ashutoshc] very much Thrift code generation fails due to hcatalog Key: HIVE-4351 URL: https://issues.apache.org/jira/browse/HIVE-4351 Project: Hive Issue Type: Bug Components: Thrift API Affects Versions: 0.11.0 Reporter: Gang Tim Liu Assignee: Ashutosh Chauhan Fix For: 0.12.0 It fails to generate thrift code since hcatalog doesn't have Target thriftif ant thriftif -Dthrift.home=/usr/local . BUILD FAILED Target thriftif does not exist in the project hcatalog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630703#comment-13630703 ] Gunther Hagleitner commented on HIVE-4318: -- What I was trying to say is that you'll end up in the exact same place that you are with removing counters v fixing counters. Two method calls, two null checks. And from my testing there *is* still overhead (29.3 v 27.9). If you think that's not a valid conclusion, I'll rerun the stuff, but otherwise we should just skip that step. Am I missing something? OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4344) CREATE VIEW fails when redundant casts are rewritten
[ https://issues.apache.org/jira/browse/HIVE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel Yuan updated HIVE-4344: -- Description: e.g. create view v as select cast(key as string) from src; The rewriter tries to replace both cast(key as string) and key as `src`.`key`, because cast(key as string) is a no-op. There may be other cases like this one. See HIVE-2439 for context. was: e.g. create view v as select cast(key as string) from src; The rewriter tries to replace both cast(key as string) and key as `src`.`key`, because cast(key as string) is a no-op. There may be other cases like this one. CREATE VIEW fails when redundant casts are rewritten Key: HIVE-4344 URL: https://issues.apache.org/jira/browse/HIVE-4344 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan e.g. create view v as select cast(key as string) from src; The rewriter tries to replace both cast(key as string) and key as `src`.`key`, because cast(key as string) is a no-op. There may be other cases like this one. See HIVE-2439 for context. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4344) CREATE VIEW fails when redundant casts are rewritten
[ https://issues.apache.org/jira/browse/HIVE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4344: -- Attachment: HIVE-4344.HIVE-4344.HIVE-4344.HIVE-4344.D10221.1.patch sxyuan requested code review of HIVE-4344 [jira] CREATE VIEW fails when redundant casts are rewritten. Reviewers: kevinwilfong See JIRA for a description of the problem. This change relaxes the constraints on translations. Previously, if a new translation overlaps with an existing one, one must be a prefix or suffix of the other. This allows the case when one is completely contained inside the other. TEST PLAN Run tests. REVISION DETAIL https://reviews.facebook.net/D10221 AFFECTED FILES ql/src/test/results/clientpositive/create_view_translate.q.out ql/src/test/queries/clientpositive/create_view_translate.q ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/24441/ To: kevinwilfong, sxyuan Cc: sambavim, JIRA CREATE VIEW fails when redundant casts are rewritten Key: HIVE-4344 URL: https://issues.apache.org/jira/browse/HIVE-4344 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Attachments: HIVE-4344.HIVE-4344.HIVE-4344.HIVE-4344.D10221.1.patch e.g. create view v as select cast(key as string) from src; The rewriter tries to replace both cast(key as string) and key as `src`.`key`, because cast(key as string) is a no-op. There may be other cases like this one. See HIVE-2439 for context. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4344) CREATE VIEW fails when redundant casts are rewritten
[ https://issues.apache.org/jira/browse/HIVE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel Yuan updated HIVE-4344: -- Status: Patch Available (was: Open) CREATE VIEW fails when redundant casts are rewritten Key: HIVE-4344 URL: https://issues.apache.org/jira/browse/HIVE-4344 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Samuel Yuan Assignee: Samuel Yuan Attachments: HIVE-4344.HIVE-4344.HIVE-4344.HIVE-4344.D10221.1.patch e.g. create view v as select cast(key as string) from src; The rewriter tries to replace both cast(key as string) and key as `src`.`key`, because cast(key as string) is a no-op. There may be other cases like this one. See HIVE-2439 for context. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4320) Consider extending max limit for precision to 38
[ https://issues.apache.org/jira/browse/HIVE-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4320: - Attachment: HIVE-4320.1.patch Consider extending max limit for precision to 38 Key: HIVE-4320 URL: https://issues.apache.org/jira/browse/HIVE-4320 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4320.1.patch Max precision of 38 still fits in 128. It changes the way you do math on these numbers though. Need to see if there will be perf implications, but there's a strong case to support 38 (instead of 36) to comply with other DBs. (Oracle, SQL Server, Teradata). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4320) Consider extending max limit for precision to 38
[ https://issues.apache.org/jira/browse/HIVE-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630754#comment-13630754 ] Gunther Hagleitner commented on HIVE-4320: -- Review: https://reviews.facebook.net/D10227 Consider extending max limit for precision to 38 Key: HIVE-4320 URL: https://issues.apache.org/jira/browse/HIVE-4320 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4320.1.patch Max precision of 38 still fits in 128. It changes the way you do math on these numbers though. Need to see if there will be perf implications, but there's a strong case to support 38 (instead of 36) to comply with other DBs. (Oracle, SQL Server, Teradata). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4315) enable doAs in unsecure mode for hive server2, when MR job runs locally
[ https://issues.apache.org/jira/browse/HIVE-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4315: Attachment: HIVE-4315.1.patch enable doAs in unsecure mode for hive server2, when MR job runs locally --- Key: HIVE-4315 URL: https://issues.apache.org/jira/browse/HIVE-4315 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.11.0 Attachments: HIVE-4315.1.patch When MR job is run locally by hive (instead of hadoop cluster), the MR job ends up running as hiveserver user instead of the user submitting the query, even if doAs configuration is enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4130) Bring the Lead/Lag UDFs interface in line with Lead/Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4130: -- Attachment: HIVE-4130.D10233.1.patch hbutani requested code review of HIVE-4130 [jira] Bring the Lead/Lag UDFs interface in line with Lead/Lag UDAFs. Reviewers: JIRA, ashutoshc support default vals for Lead/Lag UDFs support a default value arg both amt and defaultValue args can be optional TEST PLAN existing tests REVISION DETAIL https://reviews.facebook.net/D10233 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java ql/src/test/queries/clientpositive/windowing_expressions.q ql/src/test/results/clientpositive/windowing_expressions.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/24459/ To: JIRA, ashutoshc, hbutani Bring the Lead/Lag UDFs interface in line with Lead/Lag UDAFs - Key: HIVE-4130 URL: https://issues.apache.org/jira/browse/HIVE-4130 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Attachments: HIVE-4130.D10233.1.patch - support a default value arg - both amt and defaultValue args can be optional -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pamela Vagata updated HIVE-4318: Attachment: HIVE-4318.patch.pam.txt OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.patch.pam.txt Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630814#comment-13630814 ] Kevin Wilfong commented on HIVE-4318: - I'm just really surprised that a couple of null checks increase the amount of time by ~5% especially given that we do maybe 4 null checks in the FileSinkOperator's process method alone. Of course, I can't argue with facts, so if you could try such a patch once it's available and post your results I'd really appreciate it. OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.patch.pam.txt Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4318) OperatorHooks hit performance even when not used
[ https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630820#comment-13630820 ] Pamela Vagata commented on HIVE-4318: - I agree - I've just posted the patch, it would be great if you could post some results with this one too :) OperatorHooks hit performance even when not used Key: HIVE-4318 URL: https://issues.apache.org/jira/browse/HIVE-4318 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC (64 bit) Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: HIVE-4318.1.patch, HIVE-4318.2.patch, HIVE-4318.patch.pam.txt Operator Hooks inserted into Operator.java cause a performance hit even when it is not being used. For a count(1) query tested with without the operator hook calls. {code:title=with} 2013-04-09 07:33:58,920 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 84.07 sec Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec OK 28800991 Time taken: 40.407 seconds, Fetched: 1 row(s) {code} {code:title=without} 2013-04-09 07:33:02,355 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 68.48 sec ... Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec OK 28800991 Time taken: 35.907 seconds, Fetched: 1 row(s) {code} The effect is multiplied by the number of operators in the pipeline that has to forward the row - the more operators there are the, the slower the query. The modification made to test this was {code:title=Operator.java} --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws HiveException { return; } OperatorHookContext opHookContext = new OperatorHookContext(this, row, tag); -preProcessCounter(); -enterOperatorHooks(opHookContext); +//preProcessCounter(); +//enterOperatorHooks(opHookContext); processOp(row, tag); -exitOperatorHooks(opHookContext); -postProcessCounter(); +//exitOperatorHooks(opHookContext); +//postProcessCounter(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4261) union_remove_10 is failing on hadoop2 with assertion (root task with non-empty set of parents)
[ https://issues.apache.org/jira/browse/HIVE-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4261: - Attachment: HIVE-4261.2.patch union_remove_10 is failing on hadoop2 with assertion (root task with non-empty set of parents) -- Key: HIVE-4261 URL: https://issues.apache.org/jira/browse/HIVE-4261 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Critical Fix For: 0.11.0 Attachments: HIVE-4261.1.patch, HIVE-4261.2.patch Output seems to indicate that the stage plan is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4261) union_remove_10 is failing on hadoop2 with assertion (root task with non-empty set of parents)
[ https://issues.apache.org/jira/browse/HIVE-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630821#comment-13630821 ] Gunther Hagleitner commented on HIVE-4261: -- Thank you [~navis]! New patch is on phabricator. Still running tests, will report back with results. union_remove_10 is failing on hadoop2 with assertion (root task with non-empty set of parents) -- Key: HIVE-4261 URL: https://issues.apache.org/jira/browse/HIVE-4261 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Critical Fix For: 0.11.0 Attachments: HIVE-4261.1.patch, HIVE-4261.2.patch Output seems to indicate that the stage plan is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4347) Hcatalog build fail on Windows because javadoc command exceed length limit
[ https://issues.apache.org/jira/browse/HIVE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-4347: - Fix Version/s: 0.11.0 Status: Patch Available (was: Open) Hcatalog build fail on Windows because javadoc command exceed length limit -- Key: HIVE-4347 URL: https://issues.apache.org/jira/browse/HIVE-4347 Project: Hive Issue Type: Bug Components: Build Infrastructure, HCatalog, Windows Affects Versions: 0.11.0 Environment: Windows 8 Reporter: Shuaishuai Nie Labels: build, patch Fix For: 0.11.0 Attachments: HIVE-4347.patch Original Estimate: 24h Remaining Estimate: 24h When building Hcatalog on Window 8, build fail because HIVE_DIR\hcatalog\build.xml:213: Javadoc failed: java.io.IOException: Cannot run program JAVA_HOME\bin\javadoc.exe: CreateProces s error=206, The filename or extension is too long -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4348) Unit test compile fail at hbase-handler project on Windows becuase of illegal escape character
[ https://issues.apache.org/jira/browse/HIVE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-4348: - Status: Patch Available (was: Open) Unit test compile fail at hbase-handler project on Windows becuase of illegal escape character -- Key: HIVE-4348 URL: https://issues.apache.org/jira/browse/HIVE-4348 Project: Hive Issue Type: Bug Components: HBase Handler, Testing Infrastructure, Windows Affects Versions: 0.11.0 Environment: Windows 8 Reporter: Shuaishuai Nie Attachments: HIVE-4348.patch Original Estimate: 24h Remaining Estimate: 24h The problem is because the automatically generated test case hardcoded file path string of query file using \ instead of \\ as escape character. The change should be in the TestHBaseCliDriver.vm and TestHBaseNegativeCliDriver.vm -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira