[jira] [Updated] (HIVE-13336) Transform unix_timestamp(args) into to_unix_timestamp(args)
[ https://issues.apache.org/jira/browse/HIVE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13336: --- Attachment: HIVE-13336.2.patch > Transform unix_timestamp(args) into to_unix_timestamp(args) > --- > > Key: HIVE-13336 > URL: https://issues.apache.org/jira/browse/HIVE-13336 > Project: Hive > Issue Type: Improvement > Components: UDF >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-13336.1.patch, HIVE-13336.2.patch > > > Transformation is necessary because the isDeterministic is a class annotation > & is not dependent on the argument count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13336) Transform unix_timestamp(args) into to_unix_timestamp(args)
[ https://issues.apache.org/jira/browse/HIVE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13336: --- Status: Patch Available (was: Open) > Transform unix_timestamp(args) into to_unix_timestamp(args) > --- > > Key: HIVE-13336 > URL: https://issues.apache.org/jira/browse/HIVE-13336 > Project: Hive > Issue Type: Improvement > Components: UDF >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-13336.1.patch, HIVE-13336.2.patch > > > Transformation is necessary because the isDeterministic is a class annotation > & is not dependent on the argument count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-13331) Failures when concatenating ORC files using tez
[ https://issues.apache.org/jira/browse/HIVE-13331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-13331. -- Resolution: Won't Fix Closing this issue as it has been fixed already. > Failures when concatenating ORC files using tez > --- > > Key: HIVE-13331 > URL: https://issues.apache.org/jira/browse/HIVE-13331 > Project: Hive > Issue Type: Bug > Environment: HDP 2.2 > Hive 0.14 with Tez as execution engine >Reporter: Ashish Shenoy >Assignee: Prasanth Jayachandran > > I hit this issue consistently when I try to concatenate the ORC files in a > hive partition using 'ALTER TABLE ... PARTITION(...) CONCATENATE'. In an > email thread on the hive users mailing list > [http://mail-archives.apache.org/mod_mbox/hive-user/201504.mbox/%3c553a2a9e.70...@uib.no%3E], > I read that tez should be used as the execution engine for hive, so I > updated my hive configs to use tez as the exec engine. > Here's the stack trace when I use the Tez execution engine: > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > File Merge FAILED -1 0 0 -1 0 0 > > VERTICES: 00/01 [>>--] 0% ELAPSED TIME: 1458666880.00 > s > > Status: Failed > Vertex failed, vertexName=File Merge, > vertexId=vertex_1455906569416_0009_1_00, diagnostics=[Vertex > vertex_1455906569416_0009_1_00 [File Merge] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: [] initializer > failed, vertex=vertex_1455906569416_0009_1_00 [File Merge], > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:452) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:441) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295) > at > org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:124) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > ] > DAG failed due to vertex failure. failedVertices:1 killedVertices:0 > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.DDLTask > Please let me know if this has been fixed ? This seems like a very basic > thing for Hive to get wrong, so I am wondering if I am using the right > configs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores
[ https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207886#comment-15207886 ] Wei Zheng commented on HIVE-11388: -- Thanks Eugene. 1. I see. It would be helpful to have a comment by the first stmt.executeQuery since it's not explicit. I didn't realize that in the first round :) 2. I mean we do need such logic to filter out compactions cleaned by other Cleaners. I'm saying we can have simpler code by directly using toClean. But I just realized that we need to extract id from CompactionInfo to have a convenient set, so never mind. 3. Agree. Btw what's the purpose of having column MT_KEY2? > Allow ACID Compactor components to run in multiple metastores > - > > Key: HIVE-11388 > URL: https://issues.apache.org/jira/browse/HIVE-11388 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, > HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.7.patch, HIVE-11388.patch > > > (this description is no loner accurate; see further comments) > org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs > inside the metastore service to manage compactions of ACID tables. There > should be exactly 1 instance of this thread (even with multiple Thrift > services). > This is documented in > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > but not enforced. > Should add enforcement, since more than 1 Initiator could cause concurrent > attempts to compact the same table/partition - which will not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13262) LLAP: Remove log levels from DebugUtils
[ https://issues.apache.org/jira/browse/HIVE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207871#comment-15207871 ] Hive QA commented on HIVE-13262: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12794566/HIVE-13262.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 9836 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7341/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7341/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7341/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12794566 - PreCommit-HIVE-TRUNK-Build > LLAP: Remove log levels from DebugUtils > --- > > Key: HIVE-13262 > URL: https://issues.apache.org/jira/browse/HIVE-13262 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13262.1.patch, HIVE-13262.2.patch, > HIVE-13262.2.patch > > > DebugUtils has many hardcoded log levels. To enable logging we need to > recompile code with desired value. Instead configure add loggers for these > classes with log levels via log4j properties. Also use parametrized logging > in IO elevator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13336) Transform unix_timestamp(args) into to_unix_timestamp(args)
[ https://issues.apache.org/jira/browse/HIVE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13336: --- Attachment: HIVE-13336.1.patch > Transform unix_timestamp(args) into to_unix_timestamp(args) > --- > > Key: HIVE-13336 > URL: https://issues.apache.org/jira/browse/HIVE-13336 > Project: Hive > Issue Type: Improvement > Components: UDF >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-13336.1.patch > > > Transformation is necessary because the isDeterministic is a class annotation > & is not dependent on the argument count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13336) Transform unix_timestamp(args) into to_unix_timestamp(args)
[ https://issues.apache.org/jira/browse/HIVE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-13336: -- Assignee: Gopal V (was: Jason Dere) > Transform unix_timestamp(args) into to_unix_timestamp(args) > --- > > Key: HIVE-13336 > URL: https://issues.apache.org/jira/browse/HIVE-13336 > Project: Hive > Issue Type: Improvement > Components: UDF >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > > Transformation is necessary because -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13336) Transform unix_timestamp(args) into to_unix_timestamp(args)
[ https://issues.apache.org/jira/browse/HIVE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13336: --- Description: Transformation is necessary because the isDeterministic is a class annotation & is not dependent on the argument count. (was: Transformation is necessary because ) > Transform unix_timestamp(args) into to_unix_timestamp(args) > --- > > Key: HIVE-13336 > URL: https://issues.apache.org/jira/browse/HIVE-13336 > Project: Hive > Issue Type: Improvement > Components: UDF >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > > Transformation is necessary because the isDeterministic is a class annotation > & is not dependent on the argument count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13336) Transform unix_timestamp(args) into to_unix_timestamp(args)
[ https://issues.apache.org/jira/browse/HIVE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13336: --- Description: Transformation is necessary because > Transform unix_timestamp(args) into to_unix_timestamp(args) > --- > > Key: HIVE-13336 > URL: https://issues.apache.org/jira/browse/HIVE-13336 > Project: Hive > Issue Type: Improvement > Components: UDF >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Jason Dere > > Transformation is necessary because -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC
[ https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207820#comment-15207820 ] Sergey Shelukhin commented on HIVE-9660: [~prasanth_j] fyi > store end offset of compressed data for RG in RowIndex in ORC > - > > Key: HIVE-9660 > URL: https://issues.apache.org/jira/browse/HIVE-9660 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-9660.WIP2.patch, HIVE-9660.patch, HIVE-9660.patch > > > Right now the end offset is estimated, which in some cases results in tons of > extra data being read. > We can add a separate array to RowIndex (positions_v2?) that stores number of > compressed buffers for each RG, or end offset, or something, to remove this > estimation magic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC
[ https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9660: --- Attachment: HIVE-9660.patch The attempt #1 > store end offset of compressed data for RG in RowIndex in ORC > - > > Key: HIVE-9660 > URL: https://issues.apache.org/jira/browse/HIVE-9660 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-9660.WIP2.patch, HIVE-9660.patch, HIVE-9660.patch > > > Right now the end offset is estimated, which in some cases results in tons of > extra data being read. > We can add a separate array to RowIndex (positions_v2?) that stores number of > compressed buffers for each RG, or end offset, or something, to remove this > estimation magic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13149) Remove some unnecessary HMS connections from HS2
[ https://issues.apache.org/jira/browse/HIVE-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207731#comment-15207731 ] Hive QA commented on HIVE-13149: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12794542/HIVE-13149.6.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9807 tests executed *Failed tests:* {noformat} TestJdbcWithMiniHS2 - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_sortmerge_join_13.q-alter_merge_2_orc.q-vector_outer_join2.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_partition_diff_num_cols.q-tez_joins_explain.q-vector_decimal_aggregate.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7340/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7340/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7340/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12794542 - PreCommit-HIVE-TRUNK-Build > Remove some unnecessary HMS connections from HS2 > - > > Key: HIVE-13149 > URL: https://issues.apache.org/jira/browse/HIVE-13149 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13149.1.patch, HIVE-13149.2.patch, > HIVE-13149.3.patch, HIVE-13149.4.patch, HIVE-13149.5.patch, HIVE-13149.6.patch > > > In SessionState class, currently we will always try to get a HMS connection > in {{start(SessionState startSs, boolean isAsync, LogHelper console)}} > regardless of if the connection will be used later or not. > When SessionState is accessed by the tasks in TaskRunner.java, although most > of the tasks other than some like StatsTask, don't need to access HMS. > Currently a new HMS connection will be established for each Task thread. If > HiveServer2 is configured to run in parallel and the query involves many > tasks, then the connections are created but unused. > {noformat} > @Override > public void run() { > runner = Thread.currentThread(); > try { > OperationLog.setCurrentOperationLog(operationLog); > SessionState.start(ss); > runSequential(); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10249) ACID: show locks should show who the lock is waiting for
[ https://issues.apache.org/jira/browse/HIVE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10249: -- Status: Patch Available (was: Open) > ACID: show locks should show who the lock is waiting for > > > Key: HIVE-10249 > URL: https://issues.apache.org/jira/browse/HIVE-10249 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-10249.patch > > > instead of just showing state WAITING, we should include what the lock is > waiting for. It will make diagnostics easier. > It would also be useful to add QueryPlan.getQueryId() so it's easy to see > which query the lock belongs to. > # need to store this in HIVE_LOCKS (additional field); this has a perf hit to > do another update on failed attempt and to clear filed on successful attempt. > (Actually on success, we update anyway). How exactly would this be > displayed? Each lock can block but we acquire all parts of external lock at > once. Since we stop at first one that blocked, we’d only update that one… > # This needs a matching Thrift change to pass to client: ShowLocksResponse > # Perhaps we can start updating this info after lock was in W state for some > time to reduce perf hit. > # This is mostly useful for “Why is my query stuck” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores
[ https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207696#comment-15207696 ] Eugene Koifman commented on HIVE-11388: --- 1. The purpose is to run Select For Update. So if the key is already there, the 1st "rs = stmt.executeQuery(sqlStmt);" will do it. 2. we do need this. Since you may have several Cleaner processes running, they will each accumulate state in these data structures. But you don't know which instance will end up actually cleaning files so if you remove data from these structures you'll have a memory leak. 3. What would that confirm? If the counts are off at this point, it means the 2nd thread somehow ran ahead and thus it will see its counts being different. > Allow ACID Compactor components to run in multiple metastores > - > > Key: HIVE-11388 > URL: https://issues.apache.org/jira/browse/HIVE-11388 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, > HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.7.patch, HIVE-11388.patch > > > (this description is no loner accurate; see further comments) > org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs > inside the metastore service to manage compactions of ACID tables. There > should be exactly 1 instance of this thread (even with multiple Thrift > services). > This is documented in > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > but not enforced. > Should add enforcement, since more than 1 Initiator could cause concurrent > attempts to compact the same table/partition - which will not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10249) ACID: show locks should show who the lock is waiting for
[ https://issues.apache.org/jira/browse/HIVE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10249: -- Attachment: HIVE-10249.patch > ACID: show locks should show who the lock is waiting for > > > Key: HIVE-10249 > URL: https://issues.apache.org/jira/browse/HIVE-10249 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-10249.patch > > > instead of just showing state WAITING, we should include what the lock is > waiting for. It will make diagnostics easier. > It would also be useful to add QueryPlan.getQueryId() so it's easy to see > which query the lock belongs to. > # need to store this in HIVE_LOCKS (additional field); this has a perf hit to > do another update on failed attempt and to clear filed on successful attempt. > (Actually on success, we update anyway). How exactly would this be > displayed? Each lock can block but we acquire all parts of external lock at > once. Since we stop at first one that blocked, we’d only update that one… > # This needs a matching Thrift change to pass to client: ShowLocksResponse > # Perhaps we can start updating this info after lock was in W state for some > time to reduce perf hit. > # This is mostly useful for “Why is my query stuck” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10600) optimize group by for GC
[ https://issues.apache.org/jira/browse/HIVE-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline resolved HIVE-10600. - Resolution: Duplicate HIVE-12369 > optimize group by for GC > > > Key: HIVE-10600 > URL: https://issues.apache.org/jira/browse/HIVE-10600 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Matt McCline > > Quoting [~gopalv]: > {noformat} > So, something like a sum() GROUP BY will create a few hundred thousand > AbstractAggregationBuffer objects all of which will suddenly go out of > scope when the map.aggr flushes it down to the sort buffer. > That particular GC collection takes forever because the tiny buffers take > a lot of time to walk over and then they leave the memory space > fragmented, which requires a compaction pass (which btw, writes to a > page-interleaved NUMA zone). > And to make things worse, the pre-allocated sort buffers with absolutely > zero data in them take up most of the tenured regions causing these chunks > of memory to be visited more and more often as they are part of the Eden > space. > {noformat} > We need flat data structures to be GC friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10249) ACID: show locks should show who the lock is waiting for
[ https://issues.apache.org/jira/browse/HIVE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10249: -- Priority: Critical (was: Major) > ACID: show locks should show who the lock is waiting for > > > Key: HIVE-10249 > URL: https://issues.apache.org/jira/browse/HIVE-10249 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-10249.patch > > > instead of just showing state WAITING, we should include what the lock is > waiting for. It will make diagnostics easier. > It would also be useful to add QueryPlan.getQueryId() so it's easy to see > which query the lock belongs to. > # need to store this in HIVE_LOCKS (additional field); this has a perf hit to > do another update on failed attempt and to clear filed on successful attempt. > (Actually on success, we update anyway). How exactly would this be > displayed? Each lock can block but we acquire all parts of external lock at > once. Since we stop at first one that blocked, we’d only update that one… > # This needs a matching Thrift change to pass to client: ShowLocksResponse > # Perhaps we can start updating this info after lock was in W state for some > time to reduce perf hit. > # This is mostly useful for “Why is my query stuck” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11386) Improve Vectorized GROUP BY Performance (Phase 1)
[ https://issues.apache.org/jira/browse/HIVE-11386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-11386: Resolution: Duplicate Status: Resolved (was: Patch Available) HIVE-12369 > Improve Vectorized GROUP BY Performance (Phase 1) > - > > Key: HIVE-11386 > URL: https://issues.apache.org/jira/browse/HIVE-11386 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-11386.01.patch, HIVE-11386.02.patch > > > Improve vectorized GROUP BY performance, with an eye towards the new LLAP > memory management (dramatically reduce the number of Java object, allocate > very large objects, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13334) stats state is not captured correctly
[ https://issues.apache.org/jira/browse/HIVE-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reassigned HIVE-13334: -- Assignee: Pengcheng Xiong > stats state is not captured correctly > - > > Key: HIVE-13334 > URL: https://issues.apache.org/jira/browse/HIVE-13334 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Statistics >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan >Assignee: Pengcheng Xiong > > As a results StatsOptimizer gives incorrect result. Can be reproduced with > for following queries: > {code} > mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true > -Dqfile=insert_orig_table.q,insert_values_orig_table.q,orc_merge9.q,sample_islocalmode_hook.-Dhive.compute.query.using.stats=true > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13334) stats state is not captured correctly
[ https://issues.apache.org/jira/browse/HIVE-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207671#comment-15207671 ] Pengcheng Xiong commented on HIVE-13334: [~ashutoshc], sure. I would like to turn this on by default for quite a long time. :) > stats state is not captured correctly > - > > Key: HIVE-13334 > URL: https://issues.apache.org/jira/browse/HIVE-13334 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Statistics >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan > > As a results StatsOptimizer gives incorrect result. Can be reproduced with > for following queries: > {code} > mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true > -Dqfile=insert_orig_table.q,insert_values_orig_table.q,orc_merge9.q,sample_islocalmode_hook.-Dhive.compute.query.using.stats=true > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13041) Backport to branch-1 HIVE-9862 Vectorized execution corrupts timestamp values
[ https://issues.apache.org/jira/browse/HIVE-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207660#comment-15207660 ] Matt McCline commented on HIVE-13041: - Very large change. Holding off for now. > Backport to branch-1 HIVE-9862 Vectorized execution corrupts timestamp values > - > > Key: HIVE-13041 > URL: https://issues.apache.org/jira/browse/HIVE-13041 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13041.1-branch1.patch, HIVE-13041.2-branch1.patch > > > Backport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13041) Backport to branch-1 HIVE-9862 Vectorized execution corrupts timestamp values
[ https://issues.apache.org/jira/browse/HIVE-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13041: Resolution: Won't Fix Status: Resolved (was: Patch Available) > Backport to branch-1 HIVE-9862 Vectorized execution corrupts timestamp values > - > > Key: HIVE-13041 > URL: https://issues.apache.org/jira/browse/HIVE-13041 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13041.1-branch1.patch, HIVE-13041.2-branch1.patch > > > Backport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13334) stats state is not captured correctly
[ https://issues.apache.org/jira/browse/HIVE-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207636#comment-15207636 ] Ashutosh Chauhan commented on HIVE-13334: - [~pxiong] Can you take a look at this one ? There might be different root causes for these failures. If so, lets create separate jira for each. > stats state is not captured correctly > - > > Key: HIVE-13334 > URL: https://issues.apache.org/jira/browse/HIVE-13334 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Statistics >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan > > As a results StatsOptimizer gives incorrect result. Can be reproduced with > for following queries: > {code} > mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true > -Dqfile=insert_orig_table.q,insert_values_orig_table.q,orc_merge9.q,sample_islocalmode_hook.-Dhive.compute.query.using.stats=true > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13334) stats state is not captured correctly
[ https://issues.apache.org/jira/browse/HIVE-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-13334: Description: As a results StatsOptimizer gives incorrect result. Can be reproduced with for following queries: {code} mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true -Dqfile=insert_orig_table.q,insert_values_orig_table.q,orc_merge9.q,sample_islocalmode_hook.-Dhive.compute.query.using.stats=true {code} was: As a results StatsOptimizer gives incorrect result. Can be reproduced with for following queries: {code} mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true -Dqfile=insert_orig_table.q,insert_values_orig_table.q,orc_merge9.q,sample_islocalmode_hook.-Dhive.compute.query.using.stats=true {code} [~pxiong] Can you take a look at this one ? > stats state is not captured correctly > - > > Key: HIVE-13334 > URL: https://issues.apache.org/jira/browse/HIVE-13334 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer, Statistics >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan > > As a results StatsOptimizer gives incorrect result. Can be reproduced with > for following queries: > {code} > mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true > -Dqfile=insert_orig_table.q,insert_values_orig_table.q,orc_merge9.q,sample_islocalmode_hook.-Dhive.compute.query.using.stats=true > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13333) StatsOptimizer throws ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207618#comment-15207618 ] Ashutosh Chauhan commented on HIVE-1: - {code} Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector.get(JavaIntObjectInspector.java:40) at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:239) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:72) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71) at org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40) at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:99) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer at org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:102) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145) {code} from query: {code} select f,a,e,b from (select count(*) as a, count(c_int) as b, sum(c_int) as c, avg(c_int) as d, max(c_int) as e, min(c_int) as f from cbo_t1) cbo_t1 {code} > StatsOptimizer throws ClassCastException > > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan > > mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true > -Dqfile=cbo_rp_udf_udaf.q -Dhive.compute.query.using.stats=true repros the > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13333) StatsOptimizer throws ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207620#comment-15207620 ] Ashutosh Chauhan commented on HIVE-1: - [~pxiong] Can you take a look at this one? > StatsOptimizer throws ClassCastException > > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan > > mvn test -Dtest=TestCliDriver -Dtest.output.overwrite=true > -Dqfile=cbo_rp_udf_udaf.q -Dhive.compute.query.using.stats=true repros the > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores
[ https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207615#comment-15207615 ] Wei Zheng commented on HIVE-11388: -- [~ekoifman] I have several questions regarding patch 7. 1. In TxnHandler.acquireLock implementation, there's a {code}if (!rs.next()){code}block, after that, shouldn't there be an else block that deals with the case when there's existing key in AUX_TABLE (thus roll back the select for update and retry)? 2. In Cleaner.run(), I'm not sure if we need currentToCleanSet, since we're essentially checking the existence of compactId2CompactInfoMap members in toClean set. 3. In TestTxnHandler.testMutexAPI, we can add two more asserts after //now 2 and //now 3 to confirm. > Allow ACID Compactor components to run in multiple metastores > - > > Key: HIVE-11388 > URL: https://issues.apache.org/jira/browse/HIVE-11388 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, > HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.7.patch, HIVE-11388.patch > > > (this description is no loner accurate; see further comments) > org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs > inside the metastore service to manage compactions of ACID tables. There > should be exactly 1 instance of this thread (even with multiple Thrift > services). > This is documented in > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > but not enforced. > Should add enforcement, since more than 1 Initiator could cause concurrent > attempts to compact the same table/partition - which will not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13332) support dumping all row indexes in ORC FileDump
[ https://issues.apache.org/jira/browse/HIVE-13332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207613#comment-15207613 ] Prasanth Jayachandran commented on HIVE-13332: -- LGTM, +1 > support dumping all row indexes in ORC FileDump > --- > > Key: HIVE-13332 > URL: https://issues.apache.org/jira/browse/HIVE-13332 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13332.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13261) Can not compute column stats for partition when schema evolves
[ https://issues.apache.org/jira/browse/HIVE-13261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207588#comment-15207588 ] Ashutosh Chauhan commented on HIVE-13261: - +1 > Can not compute column stats for partition when schema evolves > -- > > Key: HIVE-13261 > URL: https://issues.apache.org/jira/browse/HIVE-13261 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13261.01.patch > > > To repro > {code} > CREATE TABLE partitioned1(a INT, b STRING) PARTITIONED BY(part INT) STORED AS > TEXTFILE; > insert into table partitioned1 partition(part=1) values(1, 'original'),(2, > 'original'), (3, 'original'),(4, 'original'); > -- Table-Non-Cascade ADD COLUMNS ... > alter table partitioned1 add columns(c int, d string); > insert into table partitioned1 partition(part=2) values(1, 'new', 10, > 'ten'),(2, 'new', 20, 'twenty'), (3, 'new', 30, 'thirty'),(4, 'new', 40, > 'forty'); > insert into table partitioned1 partition(part=1) values(5, 'new', 100, > 'hundred'),(6, 'new', 200, 'two hundred'); > analyze table partitioned1 compute statistics for columns; > {code} > Error msg: > {code} > 2016-03-10T14:55:43,205 ERROR [abc3eb8d-7432-47ae-b76f-54c8d7020312 main[]]: > metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(177)) - > NoSuchObjectException(message:Column c for which stats gathering is requested > doesn't exist.) > at > org.apache.hadoop.hive.metastore.ObjectStore.writeMPartitionColumnStatistics(ObjectStore.java:6492) > at > org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:6574) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13332) support dumping all row indexes in ORC FileDump
[ https://issues.apache.org/jira/browse/HIVE-13332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13332: Attachment: HIVE-13332.patch [~prasanth_j] can you take a look? most of the changes are in out files. > support dumping all row indexes in ORC FileDump > --- > > Key: HIVE-13332 > URL: https://issues.apache.org/jira/browse/HIVE-13332 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13332.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12960) Migrate Column Stats Extrapolation to HBaseStore
[ https://issues.apache.org/jira/browse/HIVE-12960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207561#comment-15207561 ] Ashutosh Chauhan commented on HIVE-12960: - There is incorrect import {{import antlr.SemanticException;}} It would be great if aggregate computation actually happens on hbase server, but I guess thats not possible without a co-processor. Looks good otherwise, +1 > Migrate Column Stats Extrapolation to HBaseStore > > > Key: HIVE-12960 > URL: https://issues.apache.org/jira/browse/HIVE-12960 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-12960.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13310) Vectorized Projection Comparison Number Column to Scalar broken for !noNulls and selectedInUse
[ https://issues.apache.org/jira/browse/HIVE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13310: Resolution: Fixed Status: Resolved (was: Patch Available) > Vectorized Projection Comparison Number Column to Scalar broken for !noNulls > and selectedInUse > -- > > Key: HIVE-13310 > URL: https://issues.apache.org/jira/browse/HIVE-13310 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 2.1.0 > > Attachments: HIVE-13310.01.patch, HIVE-13310.02.patch > > > LongColEqualLongScalar.java > LongColGreaterEqualLongScalar.java > LongColGreaterLongScalar.java > LongColLessEqualLongScalar.java > LongColLessLongScalar.java > LongColNotEqualLongScalar.java > LongScalarEqualLongColumn.java > LongScalarGreaterEqualLongColumn.java > LongScalarGreaterLongColumn.java > LongScalarLessEqualLongColumn.java > LongScalarLessLongColumn.java > LongScalarNotEqualLongColumn.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13310) Vectorized Projection Comparison Number Column to Scalar broken for !noNulls and selectedInUse
[ https://issues.apache.org/jira/browse/HIVE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207550#comment-15207550 ] Matt McCline commented on HIVE-13310: - Not a bug in branch-1. > Vectorized Projection Comparison Number Column to Scalar broken for !noNulls > and selectedInUse > -- > > Key: HIVE-13310 > URL: https://issues.apache.org/jira/browse/HIVE-13310 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 2.1.0 > > Attachments: HIVE-13310.01.patch, HIVE-13310.02.patch > > > LongColEqualLongScalar.java > LongColGreaterEqualLongScalar.java > LongColGreaterLongScalar.java > LongColLessEqualLongScalar.java > LongColLessLongScalar.java > LongColNotEqualLongScalar.java > LongScalarEqualLongColumn.java > LongScalarGreaterEqualLongColumn.java > LongScalarGreaterLongColumn.java > LongScalarLessEqualLongColumn.java > LongScalarLessLongColumn.java > LongScalarNotEqualLongColumn.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13318) Cache the result of getTable from metaStore
[ https://issues.apache.org/jira/browse/HIVE-13318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13318: --- Attachment: HIVE-13318.01.patch > Cache the result of getTable from metaStore > --- > > Key: HIVE-13318 > URL: https://issues.apache.org/jira/browse/HIVE-13318 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13318.01.patch > > > getTable by name from metaStore is called many times. We plan to cache it to > save calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13318) Cache the result of getTable from metaStore
[ https://issues.apache.org/jira/browse/HIVE-13318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13318: --- Status: Patch Available (was: Open) > Cache the result of getTable from metaStore > --- > > Key: HIVE-13318 > URL: https://issues.apache.org/jira/browse/HIVE-13318 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13318.01.patch > > > getTable by name from metaStore is called many times. We plan to cache it to > save calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values
[ https://issues.apache.org/jira/browse/HIVE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207456#comment-15207456 ] Hive QA commented on HIVE-10176: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12794491/HIVE-10176.6.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7339/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7339/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7339/ Messages: {noformat} This message was trimmed, see log for full details [INFO] [INFO] --- maven-jar-plugin:2.2:test-jar (default) @ hive-service-rpc --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.1.0-SNAPSHOT-tests.jar [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-service-rpc --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.1.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-service-rpc/2.1.0-SNAPSHOT/hive-service-rpc-2.1.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/service-rpc/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-service-rpc/2.1.0-SNAPSHOT/hive-service-rpc-2.1.0-SNAPSHOT.pom [INFO] Installing /data/hive-ptest/working/apache-github-source-source/service-rpc/target/hive-service-rpc-2.1.0-SNAPSHOT-tests.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-service-rpc/2.1.0-SNAPSHOT/hive-service-rpc-2.1.0-SNAPSHOT-tests.jar [INFO] [INFO] [INFO] Building Spark Remote Client 2.1.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ spark-client --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/spark-client/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/spark-client (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ spark-client --- [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ spark-client --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ spark-client --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ spark-client --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ spark-client --- [INFO] Compiling 28 source files to /data/hive-ptest/working/apache-github-source-source/spark-client/target/classes [WARNING] /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java: /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java uses or overrides a deprecated API. [WARNING] /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java: Recompile with -Xlint:deprecation for details. [WARNING] /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java: Some input files use unchecked or unsafe operations. [WARNING] /data/hive-ptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcDispatcher.java: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ spark-client --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ spark-client --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/spark-client/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf [copy] Copying 16 files to /data/hive-ptest/working/apache-github-source-source/spark-client/target/tmp/conf
[jira] [Commented] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions
[ https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207443#comment-15207443 ] Hive QA commented on HIVE-13178: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12794725/HIVE-13178.06.patch {color:green}SUCCESS:{color} +1 due to 33 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9836 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-schema_evol_orc_nonvec_mapwork_table.q-insert_update_delete.q-selectDistinctStar.q-and-6-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acid_mapwork_part org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_part org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_acidvec_mapwork_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_schema_evol_orc_nonvec_mapwork_part_other_incompatible {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7338/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7338/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7338/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12794725 - PreCommit-HIVE-TRUNK-Build > Enhance ORC Schema Evolution to handle more standard data type conversions > -- > > Key: HIVE-13178 > URL: https://issues.apache.org/jira/browse/HIVE-13178 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, > HIVE-13178.03.patch, HIVE-13178.04.patch, HIVE-13178.05.patch, > HIVE-13178.06.patch > > > Currently, SHORT -> INT -> BIGINT is supported. > Handle ORC data type conversions permitted by Implicit conversion allowed by > TypeIntoUtils.implicitConvertible method. >* STRING_GROUP -> DOUBLE >* STRING_GROUP -> DECIMAL >* DATE_GROUP -> STRING >* NUMERIC_GROUP -> STRING >* STRING_GROUP -> STRING_GROUP >* >* // Upward from "lower" type to "higher" numeric type: >* BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13302) direct SQL: cast to date doesn't work on Oracle
[ https://issues.apache.org/jira/browse/HIVE-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207426#comment-15207426 ] Sergey Shelukhin edited comment on HIVE-13302 at 3/22/16 10:24 PM: --- Committed to master and branch-1. Verified the patch works on Oracle. was (Author: sershe): Committed to master and branch-1 > direct SQL: cast to date doesn't work on Oracle > --- > > Key: HIVE-13302 > URL: https://issues.apache.org/jira/browse/HIVE-13302 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13302.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13302) direct SQL: cast to date doesn't work on Oracle
[ https://issues.apache.org/jira/browse/HIVE-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13302: Resolution: Fixed Fix Version/s: 2.1.0 1.3.0 Status: Resolved (was: Patch Available) Committed to master and branch-1 > direct SQL: cast to date doesn't work on Oracle > --- > > Key: HIVE-13302 > URL: https://issues.apache.org/jira/browse/HIVE-13302 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13302.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11221) In Tez mode, alter table concatenate orc files can intermittently fail with NPE
[ https://issues.apache.org/jira/browse/HIVE-11221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207384#comment-15207384 ] Aaron Dossett commented on HIVE-11221: -- [~ashishen...@gmail.com] HDP 2.3.4 does include this fix backported to 1.2.1 (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/content/patch_hive.html) We recently upgraded to 2.3.4 and concatenation is working fine so far. > In Tez mode, alter table concatenate orc files can intermittently fail with > NPE > --- > > Key: HIVE-11221 > URL: https://issues.apache.org/jira/browse/HIVE-11221 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 2.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11221.1.patch > > > We are not waiting for input ready events which can trigger occasional NPE if > input is not actually ready. > Stacktrace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:478) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:471) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:648) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:146) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.(MRReaderMapred.java:73) > at > org.apache.tez.mapreduce.input.MRInput.initializeInternal(MRInput.java:483) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:108) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.getMRInput(MergeFileRecordProcessor.java:220) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.init(MergeFileRecordProcessor.java:72) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) > ... 13 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207370#comment-15207370 ] Ashutosh Chauhan commented on HIVE-12049: - Compiler related changes look good to me. > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13331) Failures when concatenating ORC files using tez
[ https://issues.apache.org/jira/browse/HIVE-13331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Shenoy updated HIVE-13331: - Assignee: Prasanth Jayachandran > Failures when concatenating ORC files using tez > --- > > Key: HIVE-13331 > URL: https://issues.apache.org/jira/browse/HIVE-13331 > Project: Hive > Issue Type: Bug > Environment: HDP 2.2 > Hive 0.14 with Tez as execution engine >Reporter: Ashish Shenoy >Assignee: Prasanth Jayachandran > > I hit this issue consistently when I try to concatenate the ORC files in a > hive partition using 'ALTER TABLE ... PARTITION(...) CONCATENATE'. In an > email thread on the hive users mailing list > [http://mail-archives.apache.org/mod_mbox/hive-user/201504.mbox/%3c553a2a9e.70...@uib.no%3E], > I read that tez should be used as the execution engine for hive, so I > updated my hive configs to use tez as the exec engine. > Here's the stack trace when I use the Tez execution engine: > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > File Merge FAILED -1 0 0 -1 0 0 > > VERTICES: 00/01 [>>--] 0% ELAPSED TIME: 1458666880.00 > s > > Status: Failed > Vertex failed, vertexName=File Merge, > vertexId=vertex_1455906569416_0009_1_00, diagnostics=[Vertex > vertex_1455906569416_0009_1_00 [File Merge] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: [] initializer > failed, vertex=vertex_1455906569416_0009_1_00 [File Merge], > java.lang.NullPointerException > at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:452) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:441) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295) > at > org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:124) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > ] > DAG failed due to vertex failure. failedVertices:1 killedVertices:0 > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.DDLTask > Please let me know if this has been fixed ? This seems like a very basic > thing for Hive to get wrong, so I am wondering if I am using the right > configs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207355#comment-15207355 ] Ashutosh Chauhan commented on HIVE-11424: - I see. Than shall we always execute Hive rule, irrespective whether Calcite rule ran or not? > Rule to transform OR clauses into IN clauses in CBO > --- > > Key: HIVE-11424 > URL: https://issues.apache.org/jira/browse/HIVE-11424 > Project: Hive > Issue Type: Bug >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, > HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, > HIVE-11424.05.patch, HIVE-11424.2.patch, HIVE-11424.patch > > > We create a rule that will transform OR clauses into IN clauses (when > possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13316) Upgrade to Calcite 1.7
[ https://issues.apache.org/jira/browse/HIVE-13316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207353#comment-15207353 ] Jesus Camacho Rodriguez commented on HIVE-13316: There seem to be problems with the metadata providers reimplementation in Calcite/current way of using them in Hive, as right method is not being triggered. I will need to look further into it. > Upgrade to Calcite 1.7 > -- > > Key: HIVE-13316 > URL: https://issues.apache.org/jira/browse/HIVE-13316 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13316.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-13250) Compute predicate conversions on the client, instead of per row group
[ https://issues.apache.org/jira/browse/HIVE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-13250. - Resolution: Invalid I take that back. Its not safe even for equality predicate, since that can lead to HIVE-12749 scenarios. Resolving this as invalid. > Compute predicate conversions on the client, instead of per row group > - > > Key: HIVE-13250 > URL: https://issues.apache.org/jira/browse/HIVE-13250 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Ashutosh Chauhan > Attachments: HIVE-13250.2.patch, HIVE-13250.2.patch, HIVE-13250.patch > > > When running a query for the form > select count from table where ts_field = "2016-01-23 00:00:00"; > or > select count from table where ts_field = 1453507200 > ts_field is of type TIMESTAMP > The predicate is converted to whatever format is appropriate for TIMESTAMP > processing on each and every row group. > It would be far more efficient to process this once on the client - or even > once per task. > The same applies to ORC splt elimination as well - this is applied for each > stripe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13250) Compute predicate conversions on the client, instead of per row group
[ https://issues.apache.org/jira/browse/HIVE-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-13250: Status: Open (was: Patch Available) > Compute predicate conversions on the client, instead of per row group > - > > Key: HIVE-13250 > URL: https://issues.apache.org/jira/browse/HIVE-13250 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Ashutosh Chauhan > Attachments: HIVE-13250.2.patch, HIVE-13250.2.patch, HIVE-13250.patch > > > When running a query for the form > select count from table where ts_field = "2016-01-23 00:00:00"; > or > select count from table where ts_field = 1453507200 > ts_field is of type TIMESTAMP > The predicate is converted to whatever format is appropriate for TIMESTAMP > processing on each and every row group. > It would be far more efficient to process this once on the client - or even > once per task. > The same applies to ORC splt elimination as well - this is applied for each > stripe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207341#comment-15207341 ] Jesus Camacho Rodriguez commented on HIVE-11424: Exactly, once we migrate partition condition remover, we could remove the new flag... But till then, it seems better to leave it as optional, so we do not regress in some cases. > Rule to transform OR clauses into IN clauses in CBO > --- > > Key: HIVE-11424 > URL: https://issues.apache.org/jira/browse/HIVE-11424 > Project: Hive > Issue Type: Bug >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, > HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, > HIVE-11424.05.patch, HIVE-11424.2.patch, HIVE-11424.patch > > > We create a rule that will transform OR clauses into IN clauses (when > possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207340#comment-15207340 ] Ashutosh Chauhan commented on HIVE-11424: - We are trying to migrate rules on calcite. So, if we implement partition condition remover in Calcite then we dont need to rely on Hive's rule. > Rule to transform OR clauses into IN clauses in CBO > --- > > Key: HIVE-11424 > URL: https://issues.apache.org/jira/browse/HIVE-11424 > Project: Hive > Issue Type: Bug >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, > HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, > HIVE-11424.05.patch, HIVE-11424.2.patch, HIVE-11424.patch > > > We create a rule that will transform OR clauses into IN clauses (when > possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13294) AvroSerde leaks the connection in a case when reading schema from a url
[ https://issues.apache.org/jira/browse/HIVE-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207337#comment-15207337 ] Chaoyu Tang commented on HIVE-13294: Thanks [~leftylev] for reminding me of this! > AvroSerde leaks the connection in a case when reading schema from a url > --- > > Key: HIVE-13294 > URL: https://issues.apache.org/jira/browse/HIVE-13294 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 2.1.0, 2.0.1 > > Attachments: HIVE-13294.1.patch, HIVE-13294.patch > > > AvroSerde leaks the connection in a case when reading schema from url: > In > public static Schema determineSchemaOrThrowException { > ... > return AvroSerdeUtils.getSchemaFor(new URL(schemaString).openStream()); > ... > } > The opened inputStream is never closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13327) SessionID added to HS2 threadname does not trim spaces
[ https://issues.apache.org/jira/browse/HIVE-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13327: -- Fix Version/s: 2.0.1 > SessionID added to HS2 threadname does not trim spaces > -- > > Key: HIVE-13327 > URL: https://issues.apache.org/jira/browse/HIVE-13327 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Carter Shanklin >Assignee: Prasanth Jayachandran > Fix For: 2.1.0, 2.0.1 > > Attachments: HIVE-13327.1.patch > > > HIVE-13153 introduced off-by-one in appending spaces to thread names. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-13230) ORC Vectorized String reader doesn't handle NULLs correctly
[ https://issues.apache.org/jira/browse/HIVE-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-13230. -- Resolution: Duplicate Didn't realize this existed when I created HIVE-13330. Looks like same issue. Closing this as duplicate. > ORC Vectorized String reader doesn't handle NULLs correctly > --- > > Key: HIVE-13230 > URL: https://issues.apache.org/jira/browse/HIVE-13230 > Project: Hive > Issue Type: Bug > Components: ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > > Wrong results produced. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207312#comment-15207312 ] Vikram Dixit K commented on HIVE-13286: --- Committed to both master and branch-2.0. Thanks [~aihuaxu]! > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > Attachments: HIVE-13286.1.patch, HIVE-13286.2.patch, > HIVE-13286.3.patch, HIVE-13286.4.patch > > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13286: -- Resolution: Fixed Target Version/s: 2.1.0, 2.0.1 Status: Resolved (was: Patch Available) > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > Attachments: HIVE-13286.1.patch, HIVE-13286.2.patch, > HIVE-13286.3.patch, HIVE-13286.4.patch > > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13153) SessionID is appended to thread name twice
[ https://issues.apache.org/jira/browse/HIVE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13153: -- Fix Version/s: 2.0.1 > SessionID is appended to thread name twice > -- > > Key: HIVE-13153 > URL: https://issues.apache.org/jira/browse/HIVE-13153 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 2.1.0, 2.0.1 > > Attachments: HIVE-13153.1.patch, HIVE-13153.2.patch > > > HIVE-12249 added sessionId to thread name. In some cases the sessionId could > be appended twice. Example log line > {code} > DEBUG [6432ec22-9f66-4fa5-8770-488a9d3f0b61 > 6432ec22-9f66-4fa5-8770-488a9d3f0b61 main] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13330) ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary
[ https://issues.apache.org/jira/browse/HIVE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207299#comment-15207299 ] Prasanth Jayachandran commented on HIVE-13330: -- Addressed [~gopalv]' comment to replace "".getBytes with static final empty byte array. > ORC vectorized string dictionary reader does not differentiate null vs empty > string dictionary > -- > > Key: HIVE-13330 > URL: https://issues.apache.org/jira/browse/HIVE-13330 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 2.0.0, 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-13330.1.patch, HIVE-13330.2.patch > > > Vectorized string dictionary reader cannot differentiate between the case > where all dictionary entries are null vs single entry with empty string. This > causes wrong results when reading data out of such files. > {code:title=Vectorization On} > SET hive.vectorized.execution.enabled=true; > SET hive.fetch.task.conversion=none; > select vcol from testnullorc3 limit 1; > OK > NULL > {code} > {code:title=Vectorization Off} > SET hive.vectorized.execution.enabled=false; > SET hive.fetch.task.conversion=none; > select vcol from testnullorc3 limit 1; > OK > {code} > The input table testnullorc3 contains a varchar column vcol with few empty > strings and few nulls. For this table, non vectorized reader returns empty as > first row but vectorized reader returns NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13330) ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary
[ https://issues.apache.org/jira/browse/HIVE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13330: - Attachment: HIVE-13330.2.patch > ORC vectorized string dictionary reader does not differentiate null vs empty > string dictionary > -- > > Key: HIVE-13330 > URL: https://issues.apache.org/jira/browse/HIVE-13330 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 2.0.0, 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-13330.1.patch, HIVE-13330.2.patch > > > Vectorized string dictionary reader cannot differentiate between the case > where all dictionary entries are null vs single entry with empty string. This > causes wrong results when reading data out of such files. > {code:title=Vectorization On} > SET hive.vectorized.execution.enabled=true; > SET hive.fetch.task.conversion=none; > select vcol from testnullorc3 limit 1; > OK > NULL > {code} > {code:title=Vectorization Off} > SET hive.vectorized.execution.enabled=false; > SET hive.fetch.task.conversion=none; > select vcol from testnullorc3 limit 1; > OK > {code} > The input table testnullorc3 contains a varchar column vcol with few empty > strings and few nulls. For this table, non vectorized reader returns empty as > first row but vectorized reader returns NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13330) ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary
[ https://issues.apache.org/jira/browse/HIVE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13330: - Attachment: HIVE-13330.1.patch > ORC vectorized string dictionary reader does not differentiate null vs empty > string dictionary > -- > > Key: HIVE-13330 > URL: https://issues.apache.org/jira/browse/HIVE-13330 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 2.0.0, 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-13330.1.patch > > > Vectorized string dictionary reader cannot differentiate between the case > where all dictionary entries are null vs single entry with empty string. This > causes wrong results when reading data out of such files. > {code:title=Vectorization On} > SET hive.vectorized.execution.enabled=true; > SET hive.fetch.task.conversion=none; > select vcol from testnullorc3 limit 1; > OK > NULL > {code} > {code:title=Vectorization Off} > SET hive.vectorized.execution.enabled=false; > SET hive.fetch.task.conversion=none; > select vcol from testnullorc3 limit 1; > OK > {code} > The input table testnullorc3 contains a varchar column vcol with few empty > strings and few nulls. For this table, non vectorized reader returns empty as > first row but vectorized reader returns NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13330) ORC vectorized string dictionary reader does not differentiate null vs empty string dictionary
[ https://issues.apache.org/jira/browse/HIVE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13330: - Status: Patch Available (was: Open) > ORC vectorized string dictionary reader does not differentiate null vs empty > string dictionary > -- > > Key: HIVE-13330 > URL: https://issues.apache.org/jira/browse/HIVE-13330 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-13330.1.patch > > > Vectorized string dictionary reader cannot differentiate between the case > where all dictionary entries are null vs single entry with empty string. This > causes wrong results when reading data out of such files. > {code:title=Vectorization On} > SET hive.vectorized.execution.enabled=true; > SET hive.fetch.task.conversion=none; > select vcol from testnullorc3 limit 1; > OK > NULL > {code} > {code:title=Vectorization Off} > SET hive.vectorized.execution.enabled=false; > SET hive.fetch.task.conversion=none; > select vcol from testnullorc3 limit 1; > OK > {code} > The input table testnullorc3 contains a varchar column vcol with few empty > strings and few nulls. For this table, non vectorized reader returns empty as > first row but vectorized reader returns NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13294) AvroSerde leaks the connection in a case when reading schema from a url
[ https://issues.apache.org/jira/browse/HIVE-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207268#comment-15207268 ] Lefty Leverenz commented on HIVE-13294: --- Okay, now it's committed to master. Thanks. > AvroSerde leaks the connection in a case when reading schema from a url > --- > > Key: HIVE-13294 > URL: https://issues.apache.org/jira/browse/HIVE-13294 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 2.1.0, 2.0.1 > > Attachments: HIVE-13294.1.patch, HIVE-13294.patch > > > AvroSerde leaks the connection in a case when reading schema from url: > In > public static Schema determineSchemaOrThrowException { > ... > return AvroSerdeUtils.getSchemaFor(new URL(schemaString).openStream()); > ... > } > The opened inputStream is never closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13322) LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a log4j logger
[ https://issues.apache.org/jira/browse/HIVE-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13322: --- Resolution: Fixed Fix Version/s: 2.1.0 Release Note: LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a log4j logger (Gopal V, reviewed by Prasanth Jayachandran) Status: Resolved (was: Patch Available) > LLAP: ZK registry throws at shutdown due to slf4j trying to initialize a > log4j logger > - > > Key: HIVE-13322 > URL: https://issues.apache.org/jira/browse/HIVE-13322 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Gopal V >Priority: Minor > Fix For: 2.1.0 > > Attachments: HIVE-13322.1.patch > > > {noformat} > 2016-03-08 23:56:34,883 Thread-5 FATAL Unable to register shutdown hook > because JVM is shutting down. java.lang.IllegalStateException: Cannot add new > shutdown hook as this is not started. Current state: STOPPED > at > org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113) > at > org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:271) > at > org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256) > at > org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216) > at > org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146) > at > org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41) > at org.apache.logging.log4j.LogManager.getContext(LogManager.java:185) > at > org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103) > at > org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43) > at > org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42) > at > org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29) > at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285) > at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:305) > at > org.apache.curator.utils.CloseableUtils.(CloseableUtils.java:33) > at > org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.stop(LlapZookeeperRegistryImpl.java:584) > at > org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.serviceStop(LlapRegistryService.java:105) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceStop(LlapDaemon.java:294) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65) > at > org.apache.hadoop.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:183) > at > org.apache.hive.common.util.ShutdownHookManager$1.run(ShutdownHookManager.java:63) > {noformat} > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13107) LLAP: Rotate GC logs periodically to prevent full disks
[ https://issues.apache.org/jira/browse/HIVE-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207237#comment-15207237 ] Lefty Leverenz commented on HIVE-13107: --- Does this need to be documented in the wiki? (If so, please add a TODOC2.1 label.) It could go in a new subsection of Hive Logging: * [Getting Started -- Hive Logging | https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-HiveLogging] > LLAP: Rotate GC logs periodically to prevent full disks > --- > > Key: HIVE-13107 > URL: https://issues.apache.org/jira/browse/HIVE-13107 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Trivial > Fix For: 2.1.0 > > Attachments: HIVE-13107.1.patch, HIVE-13107.2.patch > > > STDOUT cannot be rotated easily, so log GC logs to a different file and > rotate periodically with -XX:+UseGCLogFileRotation > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13300) Hive on spark throws exception for multi-insert with join
[ https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207228#comment-15207228 ] Szehon Ho commented on HIVE-13300: -- Thanks, I'll take a look and file if there are not. > Hive on spark throws exception for multi-insert with join > - > > Key: HIVE-13300 > URL: https://issues.apache.org/jira/browse/HIVE-13300 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-13300.2.patch, HIVE-13300.3.patch, HIVE-13300.patch > > > For certain multi-insert queries, Hive on Spark throws a deserialization > error. > {noformat} > create table status_updates(userid int,status string,ds string); > create table profiles(userid int,school string,gender int); > drop table school_summary; create table school_summary(school string,cnt int) > partitioned by (ds string); > drop table gender_summary; create table gender_summary(gender int,cnt int) > partitioned by (ds string); > insert into status_updates values (1, "status_1", "2016-03-16"); > insert into profiles values (1, "school_1", 0); > set hive.auto.convert.join=false; > set hive.execution.engine=spark; > FROM (SELECT a.status, b.school, b.gender > FROM status_updates a JOIN profiles b > ON (a.userid = b.userid and > a.ds='2009-03-20' ) > ) subq1 > INSERT OVERWRITE TABLE gender_summary > PARTITION(ds='2009-03-20') > SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender > INSERT OVERWRITE TABLE school_summary > PARTITION(ds='2009-03-20') > SELECT subq1.school, COUNT(1) GROUP BY subq1.school > {noformat} > Error: > {noformat} > 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost > task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable > to deserialize reduce input key from x1x128x0x0 with properties > {serialization.sort.order.null=a, columns=reducesinkkey0, > serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, > serialization.sort.order=+, columns.types=int} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error: Unable to deserialize reduce input key from x1x128x0x0 with properties > {serialization.sort.order.null=a, columns=reducesinkkey0, > serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, > serialization.sort.order=+, columns.types=int} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251) > ... 12 more > Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249) > ... 12 more > Caused by: java.io.EOFException > at > org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:237) > ... 13 more >
[jira] [Comment Edited] (HIVE-13300) Hive on spark throws exception for multi-insert with join
[ https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207176#comment-15207176 ] Xuefu Zhang edited comment on HIVE-13300 at 3/22/16 8:24 PM: - +1. Just wondering whether these test failures are related or tracked in other jiras. was (Author: xuefuz): +1 > Hive on spark throws exception for multi-insert with join > - > > Key: HIVE-13300 > URL: https://issues.apache.org/jira/browse/HIVE-13300 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-13300.2.patch, HIVE-13300.3.patch, HIVE-13300.patch > > > For certain multi-insert queries, Hive on Spark throws a deserialization > error. > {noformat} > create table status_updates(userid int,status string,ds string); > create table profiles(userid int,school string,gender int); > drop table school_summary; create table school_summary(school string,cnt int) > partitioned by (ds string); > drop table gender_summary; create table gender_summary(gender int,cnt int) > partitioned by (ds string); > insert into status_updates values (1, "status_1", "2016-03-16"); > insert into profiles values (1, "school_1", 0); > set hive.auto.convert.join=false; > set hive.execution.engine=spark; > FROM (SELECT a.status, b.school, b.gender > FROM status_updates a JOIN profiles b > ON (a.userid = b.userid and > a.ds='2009-03-20' ) > ) subq1 > INSERT OVERWRITE TABLE gender_summary > PARTITION(ds='2009-03-20') > SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender > INSERT OVERWRITE TABLE school_summary > PARTITION(ds='2009-03-20') > SELECT subq1.school, COUNT(1) GROUP BY subq1.school > {noformat} > Error: > {noformat} > 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost > task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable > to deserialize reduce input key from x1x128x0x0 with properties > {serialization.sort.order.null=a, columns=reducesinkkey0, > serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, > serialization.sort.order=+, columns.types=int} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error: Unable to deserialize reduce input key from x1x128x0x0 with properties > {serialization.sort.order.null=a, columns=reducesinkkey0, > serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, > serialization.sort.order=+, columns.types=int} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251) > ... 12 more > Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249) > ... 12 more > Caused by: java.io.EOFException > at > org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288) > at >
[jira] [Commented] (HIVE-13307) LLAP: Slider package should contain permanent functions
[ https://issues.apache.org/jira/browse/HIVE-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207192#comment-15207192 ] Sergey Shelukhin commented on HIVE-13307: - +1, can you remove the getMSC code on commit? > LLAP: Slider package should contain permanent functions > --- > > Key: HIVE-13307 > URL: https://issues.apache.org/jira/browse/HIVE-13307 > Project: Hive > Issue Type: New Feature > Components: llap >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > Labels: TODOC2.1 > Attachments: HIVE-13307.1.patch > > > This renames a previous configuration option > hive.llap.daemon.allow.permanent.fns -> > hive.llap.daemon.download.permanent.fns > and adds a new parameter for LlapDecider > hive.llap.allow.permanent.fns > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC
[ https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-9660 started by Sergey Shelukhin. -- > store end offset of compressed data for RG in RowIndex in ORC > - > > Key: HIVE-9660 > URL: https://issues.apache.org/jira/browse/HIVE-9660 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-9660.WIP2.patch, HIVE-9660.patch > > > Right now the end offset is estimated, which in some cases results in tons of > extra data being read. > We can add a separate array to RowIndex (positions_v2?) that stores number of > compressed buffers for each RG, or end offset, or something, to remove this > estimation magic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work stopped] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC
[ https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-9660 stopped by Sergey Shelukhin. -- > store end offset of compressed data for RG in RowIndex in ORC > - > > Key: HIVE-9660 > URL: https://issues.apache.org/jira/browse/HIVE-9660 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-9660.WIP2.patch, HIVE-9660.patch > > > Right now the end offset is estimated, which in some cases results in tons of > extra data being read. > We can add a separate array to RowIndex (positions_v2?) that stores number of > compressed buffers for each RG, or end offset, or something, to remove this > estimation magic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9660) store end offset of compressed data for RG in RowIndex in ORC
[ https://issues.apache.org/jira/browse/HIVE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9660: --- Status: Patch Available (was: Open) > store end offset of compressed data for RG in RowIndex in ORC > - > > Key: HIVE-9660 > URL: https://issues.apache.org/jira/browse/HIVE-9660 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-9660.WIP2.patch, HIVE-9660.patch > > > Right now the end offset is estimated, which in some cases results in tons of > extra data being read. > We can add a separate array to RowIndex (positions_v2?) that stores number of > compressed buffers for each RG, or end offset, or something, to remove this > estimation magic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13300) Hive on spark throws exception for multi-insert with join
[ https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207176#comment-15207176 ] Xuefu Zhang commented on HIVE-13300: +1 > Hive on spark throws exception for multi-insert with join > - > > Key: HIVE-13300 > URL: https://issues.apache.org/jira/browse/HIVE-13300 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-13300.2.patch, HIVE-13300.3.patch, HIVE-13300.patch > > > For certain multi-insert queries, Hive on Spark throws a deserialization > error. > {noformat} > create table status_updates(userid int,status string,ds string); > create table profiles(userid int,school string,gender int); > drop table school_summary; create table school_summary(school string,cnt int) > partitioned by (ds string); > drop table gender_summary; create table gender_summary(gender int,cnt int) > partitioned by (ds string); > insert into status_updates values (1, "status_1", "2016-03-16"); > insert into profiles values (1, "school_1", 0); > set hive.auto.convert.join=false; > set hive.execution.engine=spark; > FROM (SELECT a.status, b.school, b.gender > FROM status_updates a JOIN profiles b > ON (a.userid = b.userid and > a.ds='2009-03-20' ) > ) subq1 > INSERT OVERWRITE TABLE gender_summary > PARTITION(ds='2009-03-20') > SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender > INSERT OVERWRITE TABLE school_summary > PARTITION(ds='2009-03-20') > SELECT subq1.school, COUNT(1) GROUP BY subq1.school > {noformat} > Error: > {noformat} > 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost > task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable > to deserialize reduce input key from x1x128x0x0 with properties > {serialization.sort.order.null=a, columns=reducesinkkey0, > serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, > serialization.sort.order=+, columns.types=int} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error: Unable to deserialize reduce input key from x1x128x0x0 with properties > {serialization.sort.order.null=a, columns=reducesinkkey0, > serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, > serialization.sort.order=+, columns.types=int} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251) > ... 12 more > Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249) > ... 12 more > Caused by: java.io.EOFException > at > org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:237) > ... 13 more > {noformat} -- This message was sent by
[jira] [Updated] (HIVE-13107) LLAP: Rotate GC logs periodically to prevent full disks
[ https://issues.apache.org/jira/browse/HIVE-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13107: --- Resolution: Fixed Fix Version/s: 2.1.0 Release Note: LLAP: Rotate GC logs periodically to prevent full disks (Gopal V, reviewed by Prasanth Jayachandran) Status: Resolved (was: Patch Available) > LLAP: Rotate GC logs periodically to prevent full disks > --- > > Key: HIVE-13107 > URL: https://issues.apache.org/jira/browse/HIVE-13107 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Trivial > Fix For: 2.1.0 > > Attachments: HIVE-13107.1.patch, HIVE-13107.2.patch > > > STDOUT cannot be rotated easily, so log GC logs to a different file and > rotate periodically with -XX:+UseGCLogFileRotation > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13107) LLAP: Rotate GC logs periodically to prevent full disks
[ https://issues.apache.org/jira/browse/HIVE-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13107: --- Attachment: HIVE-13107.2.patch Removing the extra verbose logging with the PrintGCApplicationConcurrentTime. > LLAP: Rotate GC logs periodically to prevent full disks > --- > > Key: HIVE-13107 > URL: https://issues.apache.org/jira/browse/HIVE-13107 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Trivial > Attachments: HIVE-13107.1.patch, HIVE-13107.2.patch > > > STDOUT cannot be rotated easily, so log GC logs to a different file and > rotate periodically with -XX:+UseGCLogFileRotation > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12606) HCatalog ORC Null values in fields results in NullPointer exception
[ https://issues.apache.org/jira/browse/HIVE-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-12606: Component/s: Hive > HCatalog ORC Null values in fields results in NullPointer exception > --- > > Key: HIVE-12606 > URL: https://issues.apache.org/jira/browse/HIVE-12606 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive >Affects Versions: 0.13.1 > Environment: Linux >Reporter: Z. S. > > When reading via HCatalog an ORC table that has null values in fields it > fails with the following exception: > 15/12/07 19:47:42 INFO mapred.Task: Using ResourceCalculatorProcessTree : > null > 15/12/07 19:47:42 INFO mapred.MapTask: Processing split: > org.apache.hive.hcatalog.mapreduce.HCatSplit@4c8c30bc > 15/12/07 19:47:42 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) > 15/12/07 19:47:42 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 > 15/12/07 19:47:42 INFO mapred.MapTask: soft limit at 83886080 > 15/12/07 19:47:42 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 > 15/12/07 19:47:42 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 > 15/12/07 19:47:42 INFO mapred.MapTask: Map output collector class = > org.apache.hadoop.mapred.MapTask$MapOutputBuffer > 15/12/07 19:47:42 INFO orc.ReaderImpl: Reading ORC rows from > hdfs://[REDACTED]/00_0 with {include: null, offset: 0, length: 1628} > 15/12/07 19:47:42 INFO mapred.MapTask: Ignoring exception during close for > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader@5096bee4 > java.lang.NullPointerException > at > org.apache.hive.hcatalog.mapreduce.HCatRecordReader.close(HCatRecordReader.java:223) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:520) > at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1999) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 15/12/07 19:47:42 INFO mapred.MapTask: Starting flush of map output > 15/12/07 19:47:42 INFO mapred.LocalJobRunner: map task executor complete. > 15/12/07 19:47:42 INFO mapreduce.FileOutputCommitterContainer: Job failed. > Try cleaning up temporary directory > [hdfs://bd/user/hive/warehouse/test.db/billing_aolon_revenue_output_stream/_DYN0.44164173619220104]. > 15/12/07 19:47:42 INFO mapreduce.FileOutputCommitterContainer: Cancelling > delegation token for the job. > 15/12/07 19:47:42 WARN conf.Configuration: > file:/tmp/hadoop-/mapred/local/localRunner/job_local413328602_0001/job_local413328602_0001.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.retry.interval; Ignoring. > 15/12/07 19:47:42 WARN conf.Configuration: > file:/tmp/hadoop-/mapred/local/localRunner/job_local413328602_0001/job_local413328602_0001.xml:an > attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > 15/12/07 19:47:42 INFO hive.metastore: Trying to connect to metastore with > URI thrift://bd:9083 > 15/12/07 19:47:42 INFO hive.metastore: Connected to metastore. > 15/12/07 19:47:42 WARN mapred.LocalJobRunner: job_local413328602_0001 > java.lang.Exception: java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.startStripe(RecordReaderImpl.java:1545) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.startStripe(RecordReaderImpl.java:1337) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.startStripe(RecordReaderImpl.java:1825) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:2537) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:2950) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:2992) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:284) > at >
[jira] [Updated] (HIVE-13107) LLAP: Rotate GC logs periodically to prevent full disks
[ https://issues.apache.org/jira/browse/HIVE-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13107: --- Attachment: (was: HIVE-13107.2.patch) > LLAP: Rotate GC logs periodically to prevent full disks > --- > > Key: HIVE-13107 > URL: https://issues.apache.org/jira/browse/HIVE-13107 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Trivial > Attachments: HIVE-13107.1.patch > > > STDOUT cannot be rotated easily, so log GC logs to a different file and > rotate periodically with -XX:+UseGCLogFileRotation > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13107) LLAP: Rotate GC logs periodically to prevent full disks
[ https://issues.apache.org/jira/browse/HIVE-13107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-13107: --- Attachment: HIVE-13107.2.patch > LLAP: Rotate GC logs periodically to prevent full disks > --- > > Key: HIVE-13107 > URL: https://issues.apache.org/jira/browse/HIVE-13107 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Trivial > Attachments: HIVE-13107.1.patch, HIVE-13107.2.patch > > > STDOUT cannot be rotated easily, so log GC logs to a different file and > rotate periodically with -XX:+UseGCLogFileRotation > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13300) Hive on spark throws exception for multi-insert with join
[ https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207100#comment-15207100 ] Szehon Ho commented on HIVE-13300: -- Test failures do not look related (SparkCliDriver tests have been timing out a lot lately, need to investigate). [~xuefuz] [~csun] can you take another look at latest patch? > Hive on spark throws exception for multi-insert with join > - > > Key: HIVE-13300 > URL: https://issues.apache.org/jira/browse/HIVE-13300 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-13300.2.patch, HIVE-13300.3.patch, HIVE-13300.patch > > > For certain multi-insert queries, Hive on Spark throws a deserialization > error. > {noformat} > create table status_updates(userid int,status string,ds string); > create table profiles(userid int,school string,gender int); > drop table school_summary; create table school_summary(school string,cnt int) > partitioned by (ds string); > drop table gender_summary; create table gender_summary(gender int,cnt int) > partitioned by (ds string); > insert into status_updates values (1, "status_1", "2016-03-16"); > insert into profiles values (1, "school_1", 0); > set hive.auto.convert.join=false; > set hive.execution.engine=spark; > FROM (SELECT a.status, b.school, b.gender > FROM status_updates a JOIN profiles b > ON (a.userid = b.userid and > a.ds='2009-03-20' ) > ) subq1 > INSERT OVERWRITE TABLE gender_summary > PARTITION(ds='2009-03-20') > SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender > INSERT OVERWRITE TABLE school_summary > PARTITION(ds='2009-03-20') > SELECT subq1.school, COUNT(1) GROUP BY subq1.school > {noformat} > Error: > {noformat} > 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost > task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable > to deserialize reduce input key from x1x128x0x0 with properties > {serialization.sort.order.null=a, columns=reducesinkkey0, > serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, > serialization.sort.order=+, columns.types=int} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error: Unable to deserialize reduce input key from x1x128x0x0 with properties > {serialization.sort.order.null=a, columns=reducesinkkey0, > serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, > serialization.sort.order=+, columns.types=int} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:251) > ... 12 more > Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:241) > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:249) > ... 12 more > Caused by: java.io.EOFException > at > org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:597) > at > org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:288) > at >
[jira] [Commented] (HIVE-13300) Hive on spark throws exception for multi-insert with join
[ https://issues.apache.org/jira/browse/HIVE-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207093#comment-15207093 ] Hive QA commented on HIVE-13300: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12794698/HIVE-13300.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9852 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7337/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7337/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7337/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12794698 - PreCommit-HIVE-TRUNK-Build > Hive on spark throws exception for multi-insert with join > - > > Key: HIVE-13300 > URL: https://issues.apache.org/jira/browse/HIVE-13300 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.0.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-13300.2.patch, HIVE-13300.3.patch, HIVE-13300.patch > > > For certain multi-insert queries, Hive on Spark throws a deserialization > error. > {noformat} > create table status_updates(userid int,status string,ds string); > create table profiles(userid int,school string,gender int); > drop table school_summary; create table school_summary(school string,cnt int) > partitioned by (ds string); > drop table gender_summary; create table gender_summary(gender int,cnt int) > partitioned by (ds string); > insert into status_updates values (1, "status_1", "2016-03-16"); > insert into profiles values (1, "school_1", 0); > set hive.auto.convert.join=false; > set hive.execution.engine=spark; > FROM (SELECT a.status, b.school, b.gender > FROM status_updates a JOIN profiles b > ON (a.userid = b.userid and > a.ds='2009-03-20' ) > ) subq1 > INSERT OVERWRITE TABLE gender_summary > PARTITION(ds='2009-03-20') > SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender > INSERT OVERWRITE TABLE school_summary > PARTITION(ds='2009-03-20') > SELECT subq1.school, COUNT(1) GROUP BY subq1.school > {noformat} > Error: > {noformat} > 16/03/17 13:29:00 [task-result-getter-3]: WARN scheduler.TaskSetManager: Lost > task 0.0 in stage 2.0 (TID 3, localhost): java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable > to deserialize reduce input key from x1x128x0x0 with properties > {serialization.sort.order.null=a, columns=reducesinkkey0, > serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, > serialization.sort.order=+, columns.types=int} > at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:279) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49) > at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at >
[jira] [Commented] (HIVE-13115) MetaStore Direct SQL getPartitions call fail when the columns schemas for a partition are null
[ https://issues.apache.org/jira/browse/HIVE-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207080#comment-15207080 ] Carl Steinbach commented on HIVE-13115: --- +1 > MetaStore Direct SQL getPartitions call fail when the columns schemas for a > partition are null > -- > > Key: HIVE-13115 > URL: https://issues.apache.org/jira/browse/HIVE-13115 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: DirectSql, MetaStore, ORM > Attachments: HIVE-13115.patch, HIVE-13115.reproduce.issue.patch > > > We are seeing the following exception in our MetaStore logs > {noformat} > 2016-02-11 00:00:19,002 DEBUG metastore.MetaStoreDirectSql > (MetaStoreDirectSql.java:timingTrace(602)) - Direct SQL query in 5.842372ms + > 1.066728ms, the query is [select "PARTITIONS"."PART_ID" from "PARTITIONS" > inner join "TBLS" on "PART > ITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join > "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? order by > "PART_NAME" asc] > 2016-02-11 00:00:19,021 ERROR metastore.ObjectStore > (ObjectStore.java:handleDirectSqlError(2243)) - Direct SQL failed, falling > back to ORM > MetaException(message:Unexpected null for one of the IDs, SD 6437, column > null, serde 6437 for a non- view) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:360) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:224) > at > org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1563) > at > org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1559) > at > org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1570) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1553) > at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) > at com.sun.proxy.$Proxy5.getPartitions(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2526) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8747) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions.getResult(ThriftHiveMetastore.java:8731) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1591) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This direct SQL call fails for every {{getPartitions}} call and then falls > back to ORM. > The query which fails is > {code} > select > PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, > SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, > PARTITIONS.LAST_ACCESS_TIME, SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, > SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, > SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB > from PARTITIONS > left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID > left outer join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID > where PART_ID in ( ? ) order by PART_NAME asc; > {code} > By looking at the source
[jira] [Updated] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-12616: - Attachment: HIVE-12616.3.patch Looks like the patch is reviewed, but no longer applies. I rebased and will checkin if test still pass. > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.1.patch, HIVE-12616.2.patch, > HIVE-12616.3.patch, HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13283) LLAP: make sure IO elevator is enabled by default in the daemons
[ https://issues.apache.org/jira/browse/HIVE-13283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13283: Attachment: HIVE-13283.03.patch Addressing the comment about getBoolVar > LLAP: make sure IO elevator is enabled by default in the daemons > > > Key: HIVE-13283 > URL: https://issues.apache.org/jira/browse/HIVE-13283 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13283.01.patch, HIVE-13283.02.patch, > HIVE-13283.03.patch, HIVE-13283.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13283) LLAP: make sure IO elevator is enabled by default in the daemons
[ https://issues.apache.org/jira/browse/HIVE-13283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207040#comment-15207040 ] Sergey Shelukhin commented on HIVE-13283: - Ah. nm, I see jobconf works correctly. Yeah the intent is to change the default, but only in the daemon > LLAP: make sure IO elevator is enabled by default in the daemons > > > Key: HIVE-13283 > URL: https://issues.apache.org/jira/browse/HIVE-13283 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13283.01.patch, HIVE-13283.02.patch, > HIVE-13283.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207015#comment-15207015 ] Aihua Xu commented on HIVE-13286: - [~vikram.dixit] Those tests are not related. Sorry. Forgot to mention that. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > Attachments: HIVE-13286.1.patch, HIVE-13286.2.patch, > HIVE-13286.3.patch, HIVE-13286.4.patch > > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13226) Improve tez print summary to print query execution breakdown
[ https://issues.apache.org/jira/browse/HIVE-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207016#comment-15207016 ] Sergey Shelukhin commented on HIVE-13226: - Is it possible to rename "Start" and "FInish" to something less confusing? DAG startup, DAG runtime? > Improve tez print summary to print query execution breakdown > > > Key: HIVE-13226 > URL: https://issues.apache.org/jira/browse/HIVE-13226 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 2.1.0 > > Attachments: HIVE-13226.1.patch, HIVE-13226.2.patch, > HIVE-13226.3.patch, sampleoutput.png > > > When tez print summary is enabled, methods summary is printed which are > difficult to correlate with the actual execution time. We can improve that to > print the execution times in the sequence of operations that happens behind > the scenes. > Instead of printing the methods name it will be useful to print something > like below > 1) Query Compilation time > 2) Query Submit to DAG Submit time > 3) DAG Submit to DAG Accept time > 4) DAG Accept to DAG Start time > 5) DAG Start to DAG End time > With this it will be easier to find out where the actual time is spent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13297) Set default field separator instead of ^A
[ https://issues.apache.org/jira/browse/HIVE-13297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-13297: Assignee: (was: Yongzhi Chen) > Set default field separator instead of ^A > - > > Key: HIVE-13297 > URL: https://issues.apache.org/jira/browse/HIVE-13297 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Cristian > > By default Hive tables are created with field delimiter as ^A. it can be > changed by users defining the correct value in tblproperties or > serdeproperties. > The default field separator should be modified by configuration and maybe > other default values as line separator in order to avoid specify it for each > table -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13283) LLAP: make sure IO elevator is enabled by default in the daemons
[ https://issues.apache.org/jira/browse/HIVE-13283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207012#comment-15207012 ] Sergey Shelukhin commented on HIVE-13283: - Hmm.. actually yeah, this doesn't work either, the client-side setting is now ignored. It would need to be propagated with the plan. > LLAP: make sure IO elevator is enabled by default in the daemons > > > Key: HIVE-13283 > URL: https://issues.apache.org/jira/browse/HIVE-13283 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13283.01.patch, HIVE-13283.02.patch, > HIVE-13283.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13151) Clean up UGI objects in FileSystem cache for transactions
[ https://issues.apache.org/jira/browse/HIVE-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13151: - Attachment: HIVE-13151.4.patch Upload patch 4 for test > Clean up UGI objects in FileSystem cache for transactions > - > > Key: HIVE-13151 > URL: https://issues.apache.org/jira/browse/HIVE-13151 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13151.1.patch, HIVE-13151.2.patch, > HIVE-13151.3.patch, HIVE-13151.4.patch > > > One issue with FileSystem.CACHE is that it does not clean itself. The key in > that cache includes UGI object. When new UGI objects are created and used > with the FileSystem api, new entries get added to the cache. > We need to manually clean up those UGI objects once they are no longer in use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores
[ https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206934#comment-15206934 ] Eugene Koifman commented on HIVE-11388: --- This patch also includes a 1 character fix to an issue introduced in HIVE-13013 (SQL stmt in TxnHandler.lockTransactionRecord()) > Allow ACID Compactor components to run in multiple metastores > - > > Key: HIVE-11388 > URL: https://issues.apache.org/jira/browse/HIVE-11388 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, > HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.7.patch, HIVE-11388.patch > > > (this description is no loner accurate; see further comments) > org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs > inside the metastore service to manage compactions of ACID tables. There > should be exactly 1 instance of this thread (even with multiple Thrift > services). > This is documented in > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > but not enforced. > Should add enforcement, since more than 1 Initiator could cause concurrent > attempts to compact the same table/partition - which will not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores
[ https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206928#comment-15206928 ] Eugene Koifman commented on HIVE-11388: --- fix for HIVE-12725 is included here > Allow ACID Compactor components to run in multiple metastores > - > > Key: HIVE-11388 > URL: https://issues.apache.org/jira/browse/HIVE-11388 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, > HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.7.patch, HIVE-11388.patch > > > (this description is no loner accurate; see further comments) > org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs > inside the metastore service to manage compactions of ACID tables. There > should be exactly 1 instance of this thread (even with multiple Thrift > services). > This is documented in > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > but not enforced. > Should add enforcement, since more than 1 Initiator could cause concurrent > attempts to compact the same table/partition - which will not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13151) Clean up UGI objects in FileSystem cache for transactions
[ https://issues.apache.org/jira/browse/HIVE-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206923#comment-15206923 ] Eugene Koifman commented on HIVE-13151: --- That is +1 modulo my comments above > Clean up UGI objects in FileSystem cache for transactions > - > > Key: HIVE-13151 > URL: https://issues.apache.org/jira/browse/HIVE-13151 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13151.1.patch, HIVE-13151.2.patch, > HIVE-13151.3.patch > > > One issue with FileSystem.CACHE is that it does not clean itself. The key in > that cache includes UGI object. When new UGI objects are created and used > with the FileSystem api, new entries get added to the cache. > We need to manually clean up those UGI objects once they are no longer in use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13151) Clean up UGI objects in FileSystem cache for transactions
[ https://issues.apache.org/jira/browse/HIVE-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206920#comment-15206920 ] Eugene Koifman commented on HIVE-13151: --- I talked to Thejas and now I understand this better. [~wzheng] +1 on the patch > Clean up UGI objects in FileSystem cache for transactions > - > > Key: HIVE-13151 > URL: https://issues.apache.org/jira/browse/HIVE-13151 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13151.1.patch, HIVE-13151.2.patch, > HIVE-13151.3.patch > > > One issue with FileSystem.CACHE is that it does not clean itself. The key in > that cache includes UGI object. When new UGI objects are created and used > with the FileSystem api, new entries get added to the cache. > We need to manually clean up those UGI objects once they are no longer in use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12612) beeline always exits with 0 status when reading query from standard input
[ https://issues.apache.org/jira/browse/HIVE-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-12612: -- Attachment: HIVE-12612.01.patch > beeline always exits with 0 status when reading query from standard input > - > > Key: HIVE-12612 > URL: https://issues.apache.org/jira/browse/HIVE-12612 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 1.1.0 > Environment: CDH5.5.0 >Reporter: Paulo Sequeira >Assignee: Reuben Kuhnert >Priority: Minor > Attachments: HIVE-12612.01.patch > > > Similar to what was reported on HIVE-6978, but now it only happens when the > query is read from the standard input. For example, the following fails as > expected: > {code} > bash$ if beeline -u "jdbc:hive2://..." -e "boo;" ; then echo "Ok?!" ; else > echo "Failed!" ; fi > Connecting to jdbc:hive2://... > Connected to: Apache Hive (version 1.1.0-cdh5.5.0) > Driver: Hive JDBC (version 1.1.0-cdh5.5.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:0 > cannot recognize input near 'boo' '' '' (state=42000,code=4) > Closing: 0: jdbc:hive2://... > Failed! > {code} > But the following does not: > {code} > bash$ if echo "boo;"|beeline -u "jdbc:hive2://..." ; then echo "Ok?!" ; else > echo "Failed!" ; fi > Connecting to jdbc:hive2://... > Connected to: Apache Hive (version 1.1.0-cdh5.5.0) > Driver: Hive JDBC (version 1.1.0-cdh5.5.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 1.1.0-cdh5.5.0 by Apache Hive > 0: jdbc:hive2://...:8> Error: Error while compiling statement: FAILED: > ParseException line 1:0 cannot recognize input near 'boo' '' '' > (state=42000,code=4) > 0: jdbc:hive2://...:8> Closing: 0: jdbc:hive2://... > Ok?! > {code} > This was misleading our batch scripts to always believe that the execution of > the queries succeded, when sometimes that was not the case. > h2. Workaround > We found we can work around the issue by always using the -e or the -f > parameters, and even reading the standard input through the /dev/stdin device > (this was useful because a lot of the scripts fed the queries from here > documents), like this: > {code:title=some-script.sh} > #!/bin/sh > set -o nounset -o errexit -o pipefail > # As beeline is failing to report an error status if reading the query > # to be executed from STDIN, check whether no -f or -e option is used > # and, in that case, pretend it has to read the query from a regular > # file using -f to read from /dev/stdin > function beeline_workaround_exit_status () { > for arg in "$@" > do if [ "$arg" = "-f" -o "$arg" = "-e" ] >then beeline -u "..." "$@" > return >fi > done > beeline -u "..." "$@" -f /dev/stdin > } > beeline_workaround_exit_status < boo; > EOF > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12612) beeline always exits with 0 status when reading query from standard input
[ https://issues.apache.org/jira/browse/HIVE-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-12612: -- Status: Patch Available (was: Open) > beeline always exits with 0 status when reading query from standard input > - > > Key: HIVE-12612 > URL: https://issues.apache.org/jira/browse/HIVE-12612 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 1.1.0 > Environment: CDH5.5.0 >Reporter: Paulo Sequeira >Assignee: Reuben Kuhnert >Priority: Minor > Attachments: HIVE-12612.01.patch > > > Similar to what was reported on HIVE-6978, but now it only happens when the > query is read from the standard input. For example, the following fails as > expected: > {code} > bash$ if beeline -u "jdbc:hive2://..." -e "boo;" ; then echo "Ok?!" ; else > echo "Failed!" ; fi > Connecting to jdbc:hive2://... > Connected to: Apache Hive (version 1.1.0-cdh5.5.0) > Driver: Hive JDBC (version 1.1.0-cdh5.5.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:0 > cannot recognize input near 'boo' '' '' (state=42000,code=4) > Closing: 0: jdbc:hive2://... > Failed! > {code} > But the following does not: > {code} > bash$ if echo "boo;"|beeline -u "jdbc:hive2://..." ; then echo "Ok?!" ; else > echo "Failed!" ; fi > Connecting to jdbc:hive2://... > Connected to: Apache Hive (version 1.1.0-cdh5.5.0) > Driver: Hive JDBC (version 1.1.0-cdh5.5.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 1.1.0-cdh5.5.0 by Apache Hive > 0: jdbc:hive2://...:8> Error: Error while compiling statement: FAILED: > ParseException line 1:0 cannot recognize input near 'boo' '' '' > (state=42000,code=4) > 0: jdbc:hive2://...:8> Closing: 0: jdbc:hive2://... > Ok?! > {code} > This was misleading our batch scripts to always believe that the execution of > the queries succeded, when sometimes that was not the case. > h2. Workaround > We found we can work around the issue by always using the -e or the -f > parameters, and even reading the standard input through the /dev/stdin device > (this was useful because a lot of the scripts fed the queries from here > documents), like this: > {code:title=some-script.sh} > #!/bin/sh > set -o nounset -o errexit -o pipefail > # As beeline is failing to report an error status if reading the query > # to be executed from STDIN, check whether no -f or -e option is used > # and, in that case, pretend it has to read the query from a regular > # file using -f to read from /dev/stdin > function beeline_workaround_exit_status () { > for arg in "$@" > do if [ "$arg" = "-f" -o "$arg" = "-e" ] >then beeline -u "..." "$@" > return >fi > done > beeline -u "..." "$@" -f /dev/stdin > } > beeline_workaround_exit_status < boo; > EOF > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13303) spill to YARN directories, not tmp, when available
[ https://issues.apache.org/jira/browse/HIVE-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206881#comment-15206881 ] Sergey Shelukhin commented on HIVE-13303: - [~gopalv] [~sseth] can you please review? > spill to YARN directories, not tmp, when available > -- > > Key: HIVE-13303 > URL: https://issues.apache.org/jira/browse/HIVE-13303 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13303.patch > > > RowContainer::setupWriter, HybridHashTableContainer::spillPartition, > (KeyValueContainer|ObjectContainer)::setupOutput, > VectorMapJoinRowBytesContainer::setupOutputFileStreams create files in tmp. > Maybe some other code does it too, those are the ones I see on the execution > path. When there are multiple YARN output directories and multiple tasks > running on a machine, it's better to use the YARN directories. The only > question is cleanup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13329) Hive query id should not be allowed to be modified by users.
[ https://issues.apache.org/jira/browse/HIVE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-13329: -- External issue ID: HIVE-13286 (was: HIVE-13296) > Hive query id should not be allowed to be modified by users. > > > Key: HIVE-13329 > URL: https://issues.apache.org/jira/browse/HIVE-13329 > Project: Hive > Issue Type: Bug >Reporter: Vikram Dixit K >Assignee: Vikram Dixit K > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206870#comment-15206870 ] Vikram Dixit K commented on HIVE-13286: --- [~aihuaxu] Are the test failures related? Otherwise let me know and I can commit the patch to master and branch-2. I will raise a follow-on jira for disallowing the user to set this configuration. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > Attachments: HIVE-13286.1.patch, HIVE-13286.2.patch, > HIVE-13286.3.patch, HIVE-13286.4.patch > > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
[ https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206852#comment-15206852 ] Eugene Koifman commented on HIVE-12439: --- [~leftylev] The new props only apply to direct SQL from Metastore to Metastore DB. > CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements > -- > > Key: HIVE-12439 > URL: https://issues.apache.org/jira/browse/HIVE-12439 > Project: Hive > Issue Type: Improvement > Components: Metastore, Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Labels: TODOC1.3, TODOC2.1 > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-12439.1.patch, HIVE-12439.2.patch, > HIVE-12439.3.patch > > > # add a safeguard to make sure IN clause is not too large; break up by txn id > to delete from TXN_COMPONENTS where tc_txnid in ... > # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, > rather than 1 DB roundtrip per row -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores
[ https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206848#comment-15206848 ] Eugene Koifman commented on HIVE-11388: --- This change makes use of JDBC Connections, and thus the connection pool may need to be larger. Pool size is currently hardcoded. Should fix HIVE-12592. > Allow ACID Compactor components to run in multiple metastores > - > > Key: HIVE-11388 > URL: https://issues.apache.org/jira/browse/HIVE-11388 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-11388.2.patch, HIVE-11388.4.patch, > HIVE-11388.5.patch, HIVE-11388.6.patch, HIVE-11388.7.patch, HIVE-11388.patch > > > (this description is no loner accurate; see further comments) > org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs > inside the metastore service to manage compactions of ACID tables. There > should be exactly 1 instance of this thread (even with multiple Thrift > services). > This is documented in > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > but not enforced. > Should add enforcement, since more than 1 Initiator could cause concurrent > attempts to compact the same table/partition - which will not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12592) Expose connection pool tuning props in TxnHandler
[ https://issues.apache.org/jira/browse/HIVE-12592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206844#comment-15206844 ] Eugene Koifman commented on HIVE-12592: --- I don't think it's sufficient. If you look at TxnHandler.setupJdbcConnectionPool() - it explicitly sets some parameters for BoneCP which I imagine will override whatever is in bonecp-config.xml. So to make this work properly we likely need to add a "base" bonecp-config.xml to hive JAR that contains TxnHandler or make it available in some other way > Expose connection pool tuning props in TxnHandler > - > > Key: HIVE-12592 > URL: https://issues.apache.org/jira/browse/HIVE-12592 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Chetna Chaudhari > > BoneCP allows various pool tuning options like connection timeout, num > connections, etc > There should be a config based way to set these -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11221) In Tez mode, alter table concatenate orc files can intermittently fail with NPE
[ https://issues.apache.org/jira/browse/HIVE-11221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206833#comment-15206833 ] ashish shenoy commented on HIVE-11221: -- I hit this issue consistently as well; here's the stack trace when I use the Tez execution engine: VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED File MergeFAILED -1 00 -1 0 0 VERTICES: 00/01 [>>--] 0%ELAPSED TIME: 1458666880.00 s Status: Failed Vertex failed, vertexName=File Merge, vertexId=vertex_1455906569416_0009_1_00, diagnostics=[Vertex vertex_1455906569416_0009_1_00 [File Merge] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: [] initializer failed, vertex=vertex_1455906569416_0009_1_00 [File Merge], java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:452) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:441) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295) at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:124) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ] DAG failed due to vertex failure. failedVertices:1 killedVertices:0 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.DDLTask We are still on Hive 0.14, and are planning to move to HDP 2.4 since we have observed hive to be very unstable, unpredictable and hence unreliable for merging ORC files as well as many other basic sql queries that presto successfully completes. Since 1.3.0 is not in HDP 2.4, is installing a custom hive jar the only option at this point to mitigate this issue ? How will ambari behave with a custom installation of hive ? > In Tez mode, alter table concatenate orc files can intermittently fail with > NPE > --- > > Key: HIVE-11221 > URL: https://issues.apache.org/jira/browse/HIVE-11221 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 2.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11221.1.patch > > > We are not waiting for input ready events which can trigger occasional NPE if > input is not actually ready. > Stacktrace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) >
[jira] [Commented] (HIVE-13310) Vectorized Projection Comparison Number Column to Scalar broken for !noNulls and selectedInUse
[ https://issues.apache.org/jira/browse/HIVE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206809#comment-15206809 ] Matt McCline commented on HIVE-13310: - Committed to master. Classes are generated by GenVectorCode on branch-1 -- investigating. > Vectorized Projection Comparison Number Column to Scalar broken for !noNulls > and selectedInUse > -- > > Key: HIVE-13310 > URL: https://issues.apache.org/jira/browse/HIVE-13310 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 2.1.0 > > Attachments: HIVE-13310.01.patch, HIVE-13310.02.patch > > > LongColEqualLongScalar.java > LongColGreaterEqualLongScalar.java > LongColGreaterLongScalar.java > LongColLessEqualLongScalar.java > LongColLessLongScalar.java > LongColNotEqualLongScalar.java > LongScalarEqualLongColumn.java > LongScalarGreaterEqualLongColumn.java > LongScalarGreaterLongColumn.java > LongScalarLessEqualLongColumn.java > LongScalarLessLongColumn.java > LongScalarNotEqualLongColumn.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13310) Vectorized Projection Comparison Number Column to Scalar broken for !noNulls and selectedInUse
[ https://issues.apache.org/jira/browse/HIVE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13310: Fix Version/s: (was: 1.3.0) > Vectorized Projection Comparison Number Column to Scalar broken for !noNulls > and selectedInUse > -- > > Key: HIVE-13310 > URL: https://issues.apache.org/jira/browse/HIVE-13310 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 2.1.0 > > Attachments: HIVE-13310.01.patch, HIVE-13310.02.patch > > > LongColEqualLongScalar.java > LongColGreaterEqualLongScalar.java > LongColGreaterLongScalar.java > LongColLessEqualLongScalar.java > LongColLessLongScalar.java > LongColNotEqualLongScalar.java > LongScalarEqualLongColumn.java > LongScalarGreaterEqualLongColumn.java > LongScalarGreaterLongColumn.java > LongScalarLessEqualLongColumn.java > LongScalarLessLongColumn.java > LongScalarNotEqualLongColumn.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13310) Vectorized Projection Comparison Number Column to Scalar broken for !noNulls and selectedInUse
[ https://issues.apache.org/jira/browse/HIVE-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206780#comment-15206780 ] Matt McCline commented on HIVE-13310: - Failures are unrelated. > Vectorized Projection Comparison Number Column to Scalar broken for !noNulls > and selectedInUse > -- > > Key: HIVE-13310 > URL: https://issues.apache.org/jira/browse/HIVE-13310 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13310.01.patch, HIVE-13310.02.patch > > > LongColEqualLongScalar.java > LongColGreaterEqualLongScalar.java > LongColGreaterLongScalar.java > LongColLessEqualLongScalar.java > LongColLessLongScalar.java > LongColNotEqualLongScalar.java > LongScalarEqualLongColumn.java > LongScalarGreaterEqualLongColumn.java > LongScalarGreaterLongColumn.java > LongScalarLessEqualLongColumn.java > LongScalarLessLongColumn.java > LongScalarNotEqualLongColumn.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)