[jira] [Resolved] (HIVE-4963) Support in memory PTF partitions
[ https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4963. Resolution: Fixed Fix Version/s: 0.12.0 Committed to trunk. Thanks, Harish! Support in memory PTF partitions Key: HIVE-4963 URL: https://issues.apache.org/jira/browse/HIVE-4963 Project: Hive Issue Type: New Feature Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.12.0 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch PTF partitions apply the defensive mode of assuming that partitions will not fit in memory. Because of this there is a significant deserialization overhead when accessing elements. Allow the user to specify that there is enough memory to hold partitions through a 'hive.ptf.partition.fits.in.mem' option. Savings depends on partition size and in case of windowing the number of UDAFs and the window ranges. For eg for the following (admittedly extreme) case the PTFOperator exec times went from 39 secs to 8 secs. {noformat} select t, s, i, b, f, d, min(t) over(partition by 1 rows between unbounded preceding and current row), min(s) over(partition by 1 rows between unbounded preceding and current row), min(i) over(partition by 1 rows between unbounded preceding and current row), min(b) over(partition by 1 rows between unbounded preceding and current row) from over10k {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749565#comment-13749565 ] Ashutosh Chauhan commented on HIVE-4964: [~rhbutani] Patch is not applying cleanly. Can you rebase it on the trunk? Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Priority: Minor Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk
Ashutosh Chauhan created HIVE-5147: -- Summary: Newly added test TestSessionHooks is failing on trunk Key: HIVE-5147 URL: https://issues.apache.org/jira/browse/HIVE-5147 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan This was recently added via HIVE-4588 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4963) Support in memory PTF partitions
[ https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749567#comment-13749567 ] Hudson commented on HIVE-4963: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #69 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/69/]) HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517236) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java * /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q * /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out Support in memory PTF partitions Key: HIVE-4963 URL: https://issues.apache.org/jira/browse/HIVE-4963 Project: Hive Issue Type: New Feature Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.12.0 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch PTF partitions apply the defensive mode of assuming that partitions will not fit in memory. Because of this there is a significant deserialization overhead when accessing elements. Allow the user to specify that there is enough memory to hold partitions through a 'hive.ptf.partition.fits.in.mem' option. Savings depends on partition size and in case of windowing the number of UDAFs and the window ranges. For eg for the following (admittedly extreme) case the PTFOperator exec times went from 39 secs to 8 secs. {noformat} select t, s, i, b, f, d, min(t) over(partition by 1 rows between unbounded preceding and current row), min(s) over(partition by 1 rows between unbounded preceding and current row), min(i) over(partition by 1 rows between unbounded preceding and current row), min(b) over(partition by 1 rows between unbounded preceding and current row) from over10k {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4963) Support in memory PTF partitions
[ https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749572#comment-13749572 ] Hudson commented on HIVE-4963: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #137 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/137/]) HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517236) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java * /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q * /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out Support in memory PTF partitions Key: HIVE-4963 URL: https://issues.apache.org/jira/browse/HIVE-4963 Project: Hive Issue Type: New Feature Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.12.0 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch PTF partitions apply the defensive mode of assuming that partitions will not fit in memory. Because of this there is a significant deserialization overhead when accessing elements. Allow the user to specify that there is enough memory to hold partitions through a 'hive.ptf.partition.fits.in.mem' option. Savings depends on partition size and in case of windowing the number of UDAFs and the window ranges. For eg for the following (admittedly extreme) case the PTFOperator exec times went from 39 secs to 8 secs. {noformat} select t, s, i, b, f, d, min(t) over(partition by 1 rows between unbounded preceding and current row), min(s) over(partition by 1 rows between unbounded preceding and current row), min(i) over(partition by 1 rows between unbounded preceding and current row), min(b) over(partition by 1 rows between unbounded preceding and current row) from over10k {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749694#comment-13749694 ] Edward Capriolo edited comment on HIVE-4964 at 8/25/13 5:10 PM: One more cleanup: Please remove 'while (true)' + 'break' constructs unless they are needed. The do not read well and introducing break logic is generally not suggested. {quote} while (true) { if (iDef instanceof PartitionedTableFunctionDef) { {quote} Instead try: {quote} Item found == null; while (found==null){ } {quote} or even better {quote} for (item: list){ if (matchesCriteria(item) ){ return item; } } {quote} was (Author: appodictic): One more cleanup: Please remove 'while (true)' + 'break' constructs unless they are needed. The do not read well and introducing break logic is generally not suggested. {quote} while (true) { if (iDef instanceof PartitionedTableFunctionDef) { {quote} Instead try: {quote} Item found == null; while (found!=null){ } {quote} or even better {quote} for (item: list){ if (matchesCriteria(item) ){ return item; } } {quote} Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Priority: Minor Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749694#comment-13749694 ] Edward Capriolo commented on HIVE-4964: --- One more cleanup: Please remove 'while (true)' + 'break' constructs unless they are needed. The do not read well and introducing break logic is generally not suggested. {quote} while (true) { if (iDef instanceof PartitionedTableFunctionDef) { {quote} Instead try: {quote} Item found == null; while (found!=null){ } {quote} or even better {quote} for (item: list){ if (matchesCriteria(item) ){ return item; } } {quote} Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Priority: Minor Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced
[ https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749696#comment-13749696 ] Edward Capriolo commented on HIVE-4964: --- Also when possible avoid stack {quote} StackPartitionedTableFunctionDef fnDefs = new StackPartitionedTableFunctionDef(); {quote} instead use {quote} Deque d = new ArrayDeque() {quote} Stack is synchronized and has overhead. (i know some things in hive use stack already so this is sometimes unavoidable. Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced --- Key: HIVE-4964 URL: https://issues.apache.org/jira/browse/HIVE-4964 Project: Hive Issue Type: Bug Reporter: Harish Butani Priority: Minor Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch There are still pieces of code that deal with: - supporting select expressions with Windowing - supporting a filter with windowing Need to do this before introducing Perf. improvements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4963) Support in memory PTF partitions
[ https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749703#comment-13749703 ] Hudson commented on HIVE-4963: -- FAILURE: Integrated in Hive-trunk-h0.21 #2288 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2288/]) HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517236) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java * /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q * /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out Support in memory PTF partitions Key: HIVE-4963 URL: https://issues.apache.org/jira/browse/HIVE-4963 Project: Hive Issue Type: New Feature Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.12.0 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch PTF partitions apply the defensive mode of assuming that partitions will not fit in memory. Because of this there is a significant deserialization overhead when accessing elements. Allow the user to specify that there is enough memory to hold partitions through a 'hive.ptf.partition.fits.in.mem' option. Savings depends on partition size and in case of windowing the number of UDAFs and the window ranges. For eg for the following (admittedly extreme) case the PTFOperator exec times went from 39 secs to 8 secs. {noformat} select t, s, i, b, f, d, min(t) over(partition by 1 rows between unbounded preceding and current row), min(s) over(partition by 1 rows between unbounded preceding and current row), min(i) over(partition by 1 rows between unbounded preceding and current row), min(b) over(partition by 1 rows between unbounded preceding and current row) from over10k {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE
[ https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749706#comment-13749706 ] Phabricator commented on HIVE-4375: --- ashutoshc has accepted the revision HIVE-4375 [jira] Single sourced multi insert consists of native and non-native table mixed throws NPE. +1 REVISION DETAIL https://reviews.facebook.net/D10329 BRANCH HIVE-4375 ARCANIST PROJECT hive To: JIRA, ashutoshc, navis Cc: njain Single sourced multi insert consists of native and non-native table mixed throws NPE Key: HIVE-4375 URL: https://issues.apache.org/jira/browse/HIVE-4375 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch CREATE TABLE src_x1(key string, value string); CREATE TABLE src_x2(key string, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string); explain from src a insert overwrite table src_x1 select key,value where a.key 0 AND a.key 50 insert overwrite table src_x2 select key,value where a.key 50 AND a.key 100; throws, {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236) at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3969) Session state for hive server should be cleanup
[ https://issues.apache.org/jira/browse/HIVE-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749712#comment-13749712 ] Ashutosh Chauhan commented on HIVE-3969: Now that HS2 is committed which I believe does clean up its state between different sessions, this should no longer be a problem. Or do you still see this leak even with HS2? Session state for hive server should be cleanup --- Key: HIVE-3969 URL: https://issues.apache.org/jira/browse/HIVE-3969 Project: Hive Issue Type: Bug Components: Server Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3969.D8325.1.patch Currently add jar command by clients are adding child ClassLoader to worker thread cumulatively, causing various problems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query
[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749714#comment-13749714 ] Edward Capriolo commented on HIVE-4002: --- {quote} [edward@jackintosh hive-trunk]$ patch -p0 D8739\?download\=true patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java Hunk #3 succeeded at 119 (offset 9 lines). Hunk #4 succeeded at 679 (offset 26 lines). patching file ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java patching file ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java Hunk #1 succeeded at 3503 (offset -19 lines). Hunk #2 succeeded at 3609 (offset -19 lines). Hunk #3 succeeded at 3622 (offset -19 lines). Hunk #4 succeeded at 3634 (offset -19 lines). Hunk #5 succeeded at 3684 (offset -19 lines). Hunk #6 succeeded at 3713 (offset -19 lines). Hunk #7 succeeded at 3820 (offset -19 lines). Hunk #8 succeeded at 6964 (offset -18 lines). Hunk #9 succeeded at 6990 (offset -18 lines). patching file ql/src/test/queries/clientpositive/fetch_aggregation.q patching file ql/src/test/results/clientpositive/fetch_aggregation.q.out patching file ql/src/test/results/compiler/plan/groupby1.q.xml Hunk #5 succeeded at 1312 (offset -10 lines). Hunk #6 succeeded at 1326 (offset -10 lines). Hunk #7 succeeded at 1345 (offset -10 lines). Hunk #8 succeeded at 1426 (offset -10 lines). Hunk #9 succeeded at 1478 (offset -10 lines). patching file ql/src/test/results/compiler/plan/groupby2.q.xml Hunk #10 succeeded at 1087 (offset -10 lines). Hunk #11 succeeded at 1428 (offset -10 lines). Hunk #12 succeeded at 1482 (offset -10 lines). Hunk #13 succeeded at 1508 (offset -10 lines). Hunk #14 succeeded at 1541 (offset -10 lines). Hunk #15 succeeded at 1618 (offset -10 lines). Hunk #16 succeeded at 1647 (offset -10 lines). Hunk #17 succeeded at 1715 (offset -10 lines). Hunk #18 succeeded at 1734 (offset -10 lines). Hunk #19 succeeded at 1819 (offset -10 lines). Hunk #20 succeeded at 1832 (offset -10 lines). patching file ql/src/test/results/compiler/plan/groupby3.q.xml Hunk #8 succeeded at 1299 (offset -7 lines). Hunk #9 succeeded at 1627 (offset -7 lines). Hunk #10 succeeded at 1640 (offset -7 lines). Hunk #11 succeeded at 1653 (offset -7 lines). Hunk #12 succeeded at 1695 (offset -7 lines). Hunk #13 succeeded at 1709 (offset -7 lines). Hunk #14 succeeded at 1723 (offset -7 lines). Hunk #15 succeeded at 1770 (offset -7 lines). Hunk #16 succeeded at 1846 (offset -7 lines). Hunk #17 succeeded at 1859 (offset -7 lines). Hunk #18 succeeded at 1872 (offset -7 lines). Hunk #19 succeeded at 1938 (offset -7 lines). Hunk #20 succeeded at 2144 (offset -7 lines). Hunk #21 succeeded at 2157 (offset -7 lines). Hunk #22 succeeded at 2170 (offset -7 lines). patching file ql/src/test/results/compiler/plan/groupby5.q.xml Hunk #5 succeeded at 1175 (offset -10 lines). Hunk #6 succeeded at 1189 (offset -10 lines). Hunk #7 succeeded at 1208 (offset -10 lines). Hunk #8 succeeded at 1295 (offset -10 lines). Hunk #9 succeeded at 1347 (offset -10 lines). patching file serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java {quote} THis did not patch perfectly clean. Running test now manually. Fetch task aggregation for simple group by query Key: HIVE-4002 URL: https://issues.apache.org/jira/browse/HIVE-4002 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, HIVE-4002.D8739.3.patch Aggregation queries with no group-by clause (for example, select count(*) from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF
Trying to drive through https://issues.apache.org/jira/browse/HIVE-4002
Hey all, Hive-4002 is something I would really like to get into trunk. This group by optimization can help very many use cases. This has been a couple times now that every time I go to review and commit it something else ends up touching the same things it will touch. This has been patch available since Feb, if possible could you sideline any commits that you suspect may effect this until I can run the tests and get it committed. TX
[jira] [Created] (HIVE-5148) Jam sessions w/ Tez
Gunther Hagleitner created HIVE-5148: Summary: Jam sessions w/ Tez Key: HIVE-5148 URL: https://issues.apache.org/jira/browse/HIVE-5148 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Tez introduced a session api that let's you reuse certain resources during a session (AM, localized files, etc). Hive needs to tie these into hive sessions (for both CLI and HS2) NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5148) Jam sessions w/ Tez
[ https://issues.apache.org/jira/browse/HIVE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5148: - Attachment: HIVE-5148.1.patch Jam sessions w/ Tez --- Key: HIVE-5148 URL: https://issues.apache.org/jira/browse/HIVE-5148 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-5148.1.patch Tez introduced a session api that let's you reuse certain resources during a session (AM, localized files, etc). Hive needs to tie these into hive sessions (for both CLI and HS2) NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5148) Jam sessions w/ Tez
[ https://issues.apache.org/jira/browse/HIVE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5148: - Status: Patch Available (was: Open) Jam sessions w/ Tez --- Key: HIVE-5148 URL: https://issues.apache.org/jira/browse/HIVE-5148 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-5148.1.patch Tez introduced a session api that let's you reuse certain resources during a session (AM, localized files, etc). Hive needs to tie these into hive sessions (for both CLI and HS2) NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4963) Support in memory PTF partitions
[ https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749764#comment-13749764 ] Hudson commented on HIVE-4963: -- ABORTED: Integrated in Hive-trunk-hadoop2 #380 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/380/]) HIVE-4963 : Support in memory PTF partitions (Harish Butani via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517236) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java * /hive/trunk/ql/src/test/queries/clientpositive/ptf_reuse_memstore.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q * /hive/trunk/ql/src/test/results/clientpositive/ptf_reuse_memstore.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out Support in memory PTF partitions Key: HIVE-4963 URL: https://issues.apache.org/jira/browse/HIVE-4963 Project: Hive Issue Type: New Feature Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.12.0 Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch PTF partitions apply the defensive mode of assuming that partitions will not fit in memory. Because of this there is a significant deserialization overhead when accessing elements. Allow the user to specify that there is enough memory to hold partitions through a 'hive.ptf.partition.fits.in.mem' option. Savings depends on partition size and in case of windowing the number of UDAFs and the window ranges. For eg for the following (admittedly extreme) case the PTFOperator exec times went from 39 secs to 8 secs. {noformat} select t, s, i, b, f, d, min(t) over(partition by 1 rows between unbounded preceding and current row), min(s) over(partition by 1 rows between unbounded preceding and current row), min(i) over(partition by 1 rows between unbounded preceding and current row), min(b) over(partition by 1 rows between unbounded preceding and current row) from over10k {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: custom Hive artifacts for Shark project
Guys, considering the absence of the input, I take it that it really doesn't matter which way the custom artifact will be published. Is it a correct impression? My first choice would be org.apache.hive.hive-common;0.9-shark0.7 org.apache.hive.hive-cli;0.9-shark0.7 artifacts. If this meets the objections from the community here, then I'd like to proceed with org.shark-project.hive-common;0.9.0 org.shark-project.hive-cli;0.9.0 Any of the artifacts are better be published at Maven central to make it readily available for development community. Thoughts? Regards, Cos On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote: Guys, I am trying to help Spark/Shark community (spark-project.org and now http://incubator.apache.org/projects/spark) with a predicament. Shark - that's also known as Hive on Spark - is using some parts of Hive, ie HQL parser, query optimizer, serdes, and codecs. In order to improve some known issues with performance and/or concurrency Shark developers need to apply a couple of patches on top of the stock Hive: https://issues.apache.org/jira/browse/HIVE-2891 https://issues.apache.org/jira/browse/HIVE-3772 (just committed to trunk) (as per https://github.com/amplab/shark/wiki/Hive-Patches) The issue here is that latest Shark is working on top if Hive 0.9 (Hive 0.11 work is underway) and having developers to apply the patches and build their own version of the Hive is an extra step that can be avoided. One way to address it is to publish Shark specific versions of Hive artifacts that would have all needed patches applied to stock release. This way downstream projects can simply reference the version org.apache.hive with version 0.9.0-shark-0.7 instead of building Hive locally every time. Perhaps this approach is a little overkill, so perhaps if Hive community is willing to consider a maintenance release of Hive 0.9.1 and perhaps 0.11.1 to include fixes needed by Shark project? I am willing to step up and produce Hive release bits if any of the committers here can help with publishing. -- Thanks in advance, Cos signature.asc Description: Digital signature
[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query
[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749766#comment-13749766 ] Yin Huai commented on HIVE-4002: [~appodictic] Sorry for jumping in late. Seems changes in DemuxOperator and MuxOperator will break plans optimized by Correlation Optimizer. Let me take a look and leave my comments on phabricator. Fetch task aggregation for simple group by query Key: HIVE-4002 URL: https://issues.apache.org/jira/browse/HIVE-4002 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, HIVE-4002.D8739.3.patch Aggregation queries with no group-by clause (for example, select count(*) from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed. This optimization transforms operator tree something like, TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK into TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases
Seems ReduceSinkDeDuplication picked the wrong partitioning columns. On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP s...@rocketfuel.com wrote: I think the problem lies with in the group by operation. For this optimization to work the group bys partitioning should be on the column 1 only. It wont effect the correctness of group by, can make it slow but int this case will fasten the overall query performance. On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: I have attached the hive 10 and 11 query plans, for the sample query below, for illustration. On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: Hi, We are using DISTRIBUTE BY with custom reducer scripts in our query workload. After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY and custom reducer scripts produced incorrect results. Particularly, rows with same value on DISTRIBUTE BY column ends up in multiple reducers and thus produce multiple rows in final result, when we expect only one. I investigated a little bit and discovered the following behavior for Hive 0.11: - Hive 0.11 produces a different plan for these queries with incorrect results. The extra stage for the DISTRIBUTE BY + Transform is missing and the Transform operator for the custom reducer script is pushed into the reduce operator tree containing GROUP BY itself. - However, *if the SORT BY in the query has a DESC order in it*, the right plan is produced, and the results look correct too. Hive 0.10 produces the expected plan with right results in all cases. To illustrate, here is a simplified repro setup: Table: *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3 STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;* Query: *ADD FILE reducer.py;* *FROM(* * SELECT grp, val2 * * FROM test_cluster * * GROUP BY grp, val2 * * DISTRIBUTE BY grp * * SORT BY grp, val2 -- add DESC here to get correct results* *) **a* * * *REDUCE a.** *USING 'reducer.py'* *AS grp, reducedValue* If i understand correctly, this is a bug. Is this a known issue? Any other insights? We have reverted to Hive 0.10 to avoid the incorrect results while we investigate this. I have the repro sample, with test data and scripts, if anybody is interested. Thanks, pala
Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases
Created a jira https://issues.apache.org/jira/browse/HIVE-5149 On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai huaiyin@gmail.com wrote: Seems ReduceSinkDeDuplication picked the wrong partitioning columns. On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP s...@rocketfuel.com wrote: I think the problem lies with in the group by operation. For this optimization to work the group bys partitioning should be on the column 1 only. It wont effect the correctness of group by, can make it slow but int this case will fasten the overall query performance. On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: I have attached the hive 10 and 11 query plans, for the sample query below, for illustration. On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: Hi, We are using DISTRIBUTE BY with custom reducer scripts in our query workload. After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY and custom reducer scripts produced incorrect results. Particularly, rows with same value on DISTRIBUTE BY column ends up in multiple reducers and thus produce multiple rows in final result, when we expect only one. I investigated a little bit and discovered the following behavior for Hive 0.11: - Hive 0.11 produces a different plan for these queries with incorrect results. The extra stage for the DISTRIBUTE BY + Transform is missing and the Transform operator for the custom reducer script is pushed into the reduce operator tree containing GROUP BY itself. - However, *if the SORT BY in the query has a DESC order in it*, the right plan is produced, and the results look correct too. Hive 0.10 produces the expected plan with right results in all cases. To illustrate, here is a simplified repro setup: Table: *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3 STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;* Query: *ADD FILE reducer.py;* *FROM(* * SELECT grp, val2 * * FROM test_cluster * * GROUP BY grp, val2 * * DISTRIBUTE BY grp * * SORT BY grp, val2 -- add DESC here to get correct results* *) **a* * * *REDUCE a.** *USING 'reducer.py'* *AS grp, reducedValue* If i understand correctly, this is a bug. Is this a known issue? Any other insights? We have reverted to Hive 0.10 to avoid the incorrect results while we investigate this. I have the repro sample, with test data and scripts, if anybody is interested. Thanks, pala
[jira] [Created] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
Yin Huai created HIVE-5149: -- Summary: ReduceSinkDeDuplication can pick the wrong partitioning columns Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5087) Rename npath UDF
[ https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749796#comment-13749796 ] Alex Breshears commented on HIVE-5087: -- Couple quick questions: what's driving the rename, and what will the new function be named? Rename npath UDF Key: HIVE-5087 URL: https://issues.apache.org/jira/browse/HIVE-5087 Project: Hive Issue Type: Bug Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5087.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5087) Rename npath UDF
[ https://issues.apache.org/jira/browse/HIVE-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749807#comment-13749807 ] Alan Gates commented on HIVE-5087: -- From the last [Hive report|http://www.apache.org/foundation/records/minutes/2013/board_minutes_2013_06_19.txt] to the Apache board * In late May Teradata requested that the project remove a UDF ('npath') which was included in the 0.11.0 release. Teradata alleges that this UDF violates a US patent they hold as well as their common law trademark. The Hive PMC has referred this issue to the ASF Legal Board. Rename npath UDF Key: HIVE-5087 URL: https://issues.apache.org/jira/browse/HIVE-5087 Project: Hive Issue Type: Bug Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-5087.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5146) FilterExprOrExpr changes the order of the rows
[ https://issues.apache.org/jira/browse/HIVE-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5146: --- Attachment: HIVE-5146.2.patch Updated patch with fixes in the tests. Some tests need to be fixed because of change in the order of rows. Also, due to the change in order, double computations return slightly different results. With this patch, the expected results match exactly with non-vector mode computation. FilterExprOrExpr changes the order of the rows -- Key: HIVE-5146 URL: https://issues.apache.org/jira/browse/HIVE-5146 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5146.1.patch, HIVE-5146.2.patch FilterExprOrExpr changes the order of the rows which might break some UDFs that assume an order in data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: custom Hive artifacts for Shark project
I think we plan on doing an 11.1 or just a 12.0. How does shark use hive? Do you just include hive components from maven or does the project somehow encorportate our build infrastructure. On Sun, Aug 25, 2013 at 7:42 PM, Konstantin Boudnik c...@apache.org wrote: Guys, considering the absence of the input, I take it that it really doesn't matter which way the custom artifact will be published. Is it a correct impression? My first choice would be org.apache.hive.hive-common;0.9-shark0.7 org.apache.hive.hive-cli;0.9-shark0.7 artifacts. If this meets the objections from the community here, then I'd like to proceed with org.shark-project.hive-common;0.9.0 org.shark-project.hive-cli;0.9.0 Any of the artifacts are better be published at Maven central to make it readily available for development community. Thoughts? Regards, Cos On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote: Guys, I am trying to help Spark/Shark community (spark-project.org and now http://incubator.apache.org/projects/spark) with a predicament. Shark - that's also known as Hive on Spark - is using some parts of Hive, ie HQL parser, query optimizer, serdes, and codecs. In order to improve some known issues with performance and/or concurrency Shark developers need to apply a couple of patches on top of the stock Hive: https://issues.apache.org/jira/browse/HIVE-2891 https://issues.apache.org/jira/browse/HIVE-3772 (just committed to trunk) (as per https://github.com/amplab/shark/wiki/Hive-Patches) The issue here is that latest Shark is working on top if Hive 0.9 (Hive 0.11 work is underway) and having developers to apply the patches and build their own version of the Hive is an extra step that can be avoided. One way to address it is to publish Shark specific versions of Hive artifacts that would have all needed patches applied to stock release. This way downstream projects can simply reference the version org.apache.hive with version 0.9.0-shark-0.7 instead of building Hive locally every time. Perhaps this approach is a little overkill, so perhaps if Hive community is willing to consider a maintenance release of Hive 0.9.1 and perhaps 0.11.1 to include fixes needed by Shark project? I am willing to step up and produce Hive release bits if any of the committers here can help with publishing. -- Thanks in advance, Cos
Re: custom Hive artifacts for Shark project
Hi Edward, Shark is using two jar files from Hive - hive-common and hive-cli. But Shark community puts a few patches on top of the stock Hive to fix blocking issues in the latter. The changes aren't proprietary and are either backports from the newer releases or fixes that weren't committed yet (HIVE-3772 is good example of this). Taking into example Hive 0.9 which Shark 0.7 uses. Shark backports a few bugfixes that were committed into Hive 0.10 or Hive 0.11, but never made it into Hive 0.9. I believe this is a side effect of Hive always moving forward and (almost) never making maintenance releases. Changes and especially massive rewrites bring instability into the software. It needs to be gradually ironed out with consequent releases. A good example of such a project would be HBase, that does quite a number of minor releases to provide their users with stable and robust server-side software. In the absence of maintenance releases downstream projects tend to find ways to work around such an obstacle. Hence my earlier email. As of 0.11.1: Shark currently doesn't support Hive 0.11 because of significant changes in the APIs of the latter. The support is coming in the next a couple of months. So, publishing artifacts improving on top of Hive 0.9 might be more a pressing issue. Hope it clarifies the situation, Cos On Sun, Aug 25, 2013 at 11:54PM, Edward Capriolo wrote: I think we plan on doing an 11.1 or just a 12.0. How does shark use hive? Do you just include hive components from maven or does the project somehow encorportate our build infrastructure. On Sun, Aug 25, 2013 at 7:42 PM, Konstantin Boudnik c...@apache.org wrote: Guys, considering the absence of the input, I take it that it really doesn't matter which way the custom artifact will be published. Is it a correct impression? My first choice would be org.apache.hive.hive-common;0.9-shark0.7 org.apache.hive.hive-cli;0.9-shark0.7 artifacts. If this meets the objections from the community here, then I'd like to proceed with org.shark-project.hive-common;0.9.0 org.shark-project.hive-cli;0.9.0 Any of the artifacts are better be published at Maven central to make it readily available for development community. Thoughts? Regards, Cos On Sat, Aug 10, 2013 at 10:08PM, Konstantin Boudnik wrote: Guys, I am trying to help Spark/Shark community (spark-project.org and now http://incubator.apache.org/projects/spark) with a predicament. Shark - that's also known as Hive on Spark - is using some parts of Hive, ie HQL parser, query optimizer, serdes, and codecs. In order to improve some known issues with performance and/or concurrency Shark developers need to apply a couple of patches on top of the stock Hive: https://issues.apache.org/jira/browse/HIVE-2891 https://issues.apache.org/jira/browse/HIVE-3772 (just committed to trunk) (as per https://github.com/amplab/shark/wiki/Hive-Patches) The issue here is that latest Shark is working on top if Hive 0.9 (Hive 0.11 work is underway) and having developers to apply the patches and build their own version of the Hive is an extra step that can be avoided. One way to address it is to publish Shark specific versions of Hive artifacts that would have all needed patches applied to stock release. This way downstream projects can simply reference the version org.apache.hive with version 0.9.0-shark-0.7 instead of building Hive locally every time. Perhaps this approach is a little overkill, so perhaps if Hive community is willing to consider a maintenance release of Hive 0.9.1 and perhaps 0.11.1 to include fixes needed by Shark project? I am willing to step up and produce Hive release bits if any of the committers here can help with publishing. -- Thanks in advance, Cos signature.asc Description: Digital signature
[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749836#comment-13749836 ] Jakob Homan commented on HIVE-4734: --- Reviewed last patch on RB. Everything looks good except for a change in the handling of [T1,Tn,NULL] types. Use custom ObjectInspectors for AvroSerde - Key: HIVE-4734 URL: https://issues.apache.org/jira/browse/HIVE-4734 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mark Wagner Fix For: 0.12.0 Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch Currently, the AvroSerde recursively copies all fields of a record from the GenericRecord to a List row object and provides the standard ObjectInspectors. Performance can be improved by providing ObjectInspectors to the Avro record itself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE
[ https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4375: --- Affects Version/s: 0.11.0 Status: Open (was: Patch Available) Following tests failed with the patch: * TestHBaseCliDriver_single_sorced_multi_insert.q * TestCliDriver_union28.q * TestCliDriver_union30.q Single sourced multi insert consists of native and non-native table mixed throws NPE Key: HIVE-4375 URL: https://issues.apache.org/jira/browse/HIVE-4375 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch CREATE TABLE src_x1(key string, value string); CREATE TABLE src_x2(key string, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string); explain from src a insert overwrite table src_x1 select key,value where a.key 0 AND a.key 50 insert overwrite table src_x2 select key,value where a.key 50 AND a.key 100; throws, {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236) at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables
On July 29, 2013, 10:41 a.m., Jakob Homan wrote: There is still no text covering a map-reduce job on an already existing, non-Avro table into an avro table. ie, create a text table, populate it, run a CTAS to manipulate the data into an Avro table. Mohammad Islam wrote: In general, Hive creates internal column names such as col0, col1 etc. Due to this, I didn't able to copy non-avro data to avro data and run select SQL. Only option is to change the current behavior to reuse the provided column names. Separate JIRA regarding this could be a choice. Wouldn't select * or using the new column names (they're named deterministically) work? This is a major test since otherwise we're missing the most important code path... ie have a text file c1, c2, c3 create table t1 load data into t1 from text file create table a1 as select c3, c2 where c2 = foo order by c3; select * from a1; describe extended a1; And verify in the q file's result that the table is avro and that the correct rows and columns got converted. - Jakob --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/#review24149 --- On Aug. 7, 2013, 5:24 p.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/ --- (Updated Aug. 7, 2013, 5:24 p.m.) Review request for hive, Ashutosh Chauhan and Jakob Homan. Bugs: HIVE-3159 https://issues.apache.org/jira/browse/HIVE-3159 Repository: hive-git Description --- Problem: Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name. For multiple cases, it is very inconvenient for user. Some of the un-supported use cases: 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE 3. Create table without specifying Avro schema. Diffs - ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/11925/diff/ Testing --- Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases. Thanks, Mohammad Islam
Re: Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12480/#review25537 --- One issue in the testing and a few formatting issues. Otherwise looks good. serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java https://reviews.apache.org/r/12480/#comment49986 Weird spacing... 2x below as well. serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java https://reviews.apache.org/r/12480/#comment49984 These should never be null, not even in testing. It's better to change the tests to correctly populate the data structure. serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java https://reviews.apache.org/r/12480/#comment49985 And this would indicate a bug. - Jakob Homan On Aug. 6, 2013, 7:13 p.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12480/ --- (Updated Aug. 6, 2013, 7:13 p.m.) Review request for hive, Ashutosh Chauhan and Jakob Homan. Bugs: HIVE-4732 https://issues.apache.org/jira/browse/HIVE-4732 Repository: hive-git Description --- From our performance analysis, we found AvroSerde's schema.equals() call consumed a substantial amount ( nearly 40%) of time. This patch intends to minimize the number schema.equals() calls by pushing the check as late/fewer as possible. At first, we added a unique id for each record reader which is then included in every AvroGenericRecordWritable. Then, we introduce two new data structures (one hashset and one hashmap) to store intermediate data to avoid duplicates checkings. Hashset contains all the record readers' IDs that don't need any re-encoding. On the other hand, HashMap contains the already used re-encoders. It works as cache and allows re-encoders reuse. With this change, our test shows nearly 40% reduction in Avro record reading time. Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java ed2a9af serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java e994411 serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java 66f0348 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 3828940 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 9af751b serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb Diff: https://reviews.apache.org/r/12480/diff/ Testing --- Thanks, Mohammad Islam