[jira] [Commented] (HIVE-16487) Serious Zookeeper exception is logged when a race condition happens
[ https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989775#comment-15989775 ] Hive QA commented on HIVE-16487: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865547/HIVE-16487.02.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_2] (batchId=234) org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4933/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4933/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4933/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865547 - PreCommit-HIVE-Build > Serious Zookeeper exception is logged when a race condition happens > --- > > Key: HIVE-16487 > URL: https://issues.apache.org/jira/browse/HIVE-16487 > Project: Hive > Issue Type: Bug > Components: Locking >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-16487.02.patch, HIVE-16487.patch > > > A customer started to see this in the logs, but happily everything was > working as intended: > {code} > 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager: > [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /hive_zookeeper_namespace//LOCK-SHARED- > {code} > This was happening, because a race condition between the lock releasing, and > lock acquiring. The thread releasing the lock removes the parent ZK node just > after the thread acquiring the lock made sure, that the parent node exists. > Since this can happen without any real problem, I plan to add NODEEXISTS, and > NONODE as a transient ZooKeeper exception, so the users are not confused. > Also, the original author of ZooKeeperHiveLockManager maybe planned to handle > different ZooKeeperExceptions differently, and the code is hard to > understand. See the {{continue}} and the {{break}}. The {{break}} only breaks > the switch, and not the loop which IMHO is not intuitive: > {code} > do { > try { > [..] > ret = lockPrimitive(key, mode, keepAlive, parentCreated, > } catch (Exception e1) { > if (e1 instanceof KeeperException) { > KeeperException e = (KeeperException) e1; > switch (e.code()) { > case CONNECTIONLOSS: > case OPERATIONTIMEOUT: > LOG.debug("Possibly transient ZooKeeper exception: ", e); > continue; > default: > LOG.error("Serious Zookeeper exception: ", e); > break; > } > } > [..] > } > } while (tryNum < numRetriesForLock); > {code} > If we do not want to try again in case of a "Serious Zookeeper exception:", > then we should add a label to the do loop, and break it in the switch. > If we do want to try regardless of the type of the ZK exception, then we > should just change the {{continue;}} to {{break;}} and move the lines part of > the code which did not run in case of {{continue}} to the {{default}} switch, > so it is easier to understand the code. > Any suggestions or ideas [~ctang.ma] or [~szehon]? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector
[ https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989760#comment-15989760 ] Hive QA commented on HIVE-15795: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864833/HIVE-15795.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_16] (batchId=234) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4932/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4932/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4932/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864833 - PreCommit-HIVE-Build > Support Accumulo Index Tables in Hive Accumulo Connector > > > Key: HIVE-15795 > URL: https://issues.apache.org/jira/browse/HIVE-15795 > Project: Hive > Issue Type: Improvement > Components: Accumulo Storage Handler >Reporter: Mike Fagan >Assignee: Mike Fagan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, > HIVE-15795.3.patch > > > Ability to specify an accumulo index table for an accumulo-hive table. > This would greatly improve performance for non-rowid query predicates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector
[ https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989743#comment-15989743 ] Hive QA commented on HIVE-15795: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864833/HIVE-15795.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4931/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4931/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4931/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864833 - PreCommit-HIVE-Build > Support Accumulo Index Tables in Hive Accumulo Connector > > > Key: HIVE-15795 > URL: https://issues.apache.org/jira/browse/HIVE-15795 > Project: Hive > Issue Type: Improvement > Components: Accumulo Storage Handler >Reporter: Mike Fagan >Assignee: Mike Fagan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, > HIVE-15795.3.patch > > > Ability to specify an accumulo index table for an accumulo-hive table. > This would greatly improve performance for non-rowid query predicates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15160) Can't order by an unselected column
[ https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15160: --- Status: Patch Available (was: Open) > Can't order by an unselected column > --- > > Key: HIVE-15160 > URL: https://issues.apache.org/jira/browse/HIVE-15160 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, > HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, > HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, > HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, > HIVE-15160.12.patch > > > If a grouping key hasn't been selected, Hive complains. For comparison, > Postgres does not. > Example. Notice i_item_id is not selected: > {code} > select i_item_desc >,i_category >,i_class >,i_current_price >,sum(cs_ext_sales_price) as itemrevenue >,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over >(partition by i_class) as revenueratio > from catalog_sales > ,item > ,date_dim > where cs_item_sk = i_item_sk >and i_category in ('Jewelry', 'Sports', 'Books') >and cs_sold_date_sk = d_date_sk > and d_date between cast('2001-01-12' as date) > and (cast('2001-01-12' as date) + 30 days) > group by i_item_id > ,i_item_desc > ,i_category > ,i_class > ,i_current_price > order by i_category > ,i_class > ,i_item_id > ,i_item_desc > ,revenueratio > limit 100; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15160) Can't order by an unselected column
[ https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15160: --- Status: Open (was: Patch Available) > Can't order by an unselected column > --- > > Key: HIVE-15160 > URL: https://issues.apache.org/jira/browse/HIVE-15160 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, > HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, > HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, > HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, > HIVE-15160.12.patch > > > If a grouping key hasn't been selected, Hive complains. For comparison, > Postgres does not. > Example. Notice i_item_id is not selected: > {code} > select i_item_desc >,i_category >,i_class >,i_current_price >,sum(cs_ext_sales_price) as itemrevenue >,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over >(partition by i_class) as revenueratio > from catalog_sales > ,item > ,date_dim > where cs_item_sk = i_item_sk >and i_category in ('Jewelry', 'Sports', 'Books') >and cs_sold_date_sk = d_date_sk > and d_date between cast('2001-01-12' as date) > and (cast('2001-01-12' as date) + 30 days) > group by i_item_id > ,i_item_desc > ,i_category > ,i_class > ,i_current_price > order by i_category > ,i_class > ,i_item_id > ,i_item_desc > ,revenueratio > limit 100; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15160) Can't order by an unselected column
[ https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15160: --- Status: Open (was: Patch Available) > Can't order by an unselected column > --- > > Key: HIVE-15160 > URL: https://issues.apache.org/jira/browse/HIVE-15160 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, > HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, > HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, > HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, > HIVE-15160.12.patch > > > If a grouping key hasn't been selected, Hive complains. For comparison, > Postgres does not. > Example. Notice i_item_id is not selected: > {code} > select i_item_desc >,i_category >,i_class >,i_current_price >,sum(cs_ext_sales_price) as itemrevenue >,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over >(partition by i_class) as revenueratio > from catalog_sales > ,item > ,date_dim > where cs_item_sk = i_item_sk >and i_category in ('Jewelry', 'Sports', 'Books') >and cs_sold_date_sk = d_date_sk > and d_date between cast('2001-01-12' as date) > and (cast('2001-01-12' as date) + 30 days) > group by i_item_id > ,i_item_desc > ,i_category > ,i_class > ,i_current_price > order by i_category > ,i_class > ,i_item_id > ,i_item_desc > ,revenueratio > limit 100; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15160) Can't order by an unselected column
[ https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15160: --- Status: Patch Available (was: Open) > Can't order by an unselected column > --- > > Key: HIVE-15160 > URL: https://issues.apache.org/jira/browse/HIVE-15160 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, > HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, > HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, > HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, > HIVE-15160.12.patch > > > If a grouping key hasn't been selected, Hive complains. For comparison, > Postgres does not. > Example. Notice i_item_id is not selected: > {code} > select i_item_desc >,i_category >,i_class >,i_current_price >,sum(cs_ext_sales_price) as itemrevenue >,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over >(partition by i_class) as revenueratio > from catalog_sales > ,item > ,date_dim > where cs_item_sk = i_item_sk >and i_category in ('Jewelry', 'Sports', 'Books') >and cs_sold_date_sk = d_date_sk > and d_date between cast('2001-01-12' as date) > and (cast('2001-01-12' as date) + 30 days) > group by i_item_id > ,i_item_desc > ,i_category > ,i_class > ,i_current_price > order by i_category > ,i_class > ,i_item_id > ,i_item_desc > ,revenueratio > limit 100; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15160) Can't order by an unselected column
[ https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15160: --- Attachment: HIVE-15160.12.patch > Can't order by an unselected column > --- > > Key: HIVE-15160 > URL: https://issues.apache.org/jira/browse/HIVE-15160 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, > HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, > HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, > HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, > HIVE-15160.12.patch > > > If a grouping key hasn't been selected, Hive complains. For comparison, > Postgres does not. > Example. Notice i_item_id is not selected: > {code} > select i_item_desc >,i_category >,i_class >,i_current_price >,sum(cs_ext_sales_price) as itemrevenue >,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over >(partition by i_class) as revenueratio > from catalog_sales > ,item > ,date_dim > where cs_item_sk = i_item_sk >and i_category in ('Jewelry', 'Sports', 'Books') >and cs_sold_date_sk = d_date_sk > and d_date between cast('2001-01-12' as date) > and (cast('2001-01-12' as date) + 30 days) > group by i_item_id > ,i_item_desc > ,i_category > ,i_class > ,i_current_price > order by i_category > ,i_class > ,i_item_id > ,i_item_desc > ,revenueratio > limit 100; > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989681#comment-15989681 ] Thejas M Nair commented on HIVE-16520: -- +1 to updated patch in pull request. > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, > HIVE-16520-proto.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16399) create an index for tc_txnid in TXN_COMPONENTS
[ https://issues.apache.org/jira/browse/HIVE-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989552#comment-15989552 ] Wei Zheng commented on HIVE-16399: -- At this moment I'm confused by Hive's release plan. Seems we're going to have a 2.3 release before 2.2. I'm not sure how the upgrade scripts should be. > create an index for tc_txnid in TXN_COMPONENTS > -- > > Key: HIVE-16399 > URL: https://issues.apache.org/jira/browse/HIVE-16399 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-16399.branch-2.3.patch, HIVE-16399.branch-2.patch, > HIVE-16399.master.patch > > > w/o this TxnStore.cleanEmptyAbortedTxns() can be very slow -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction
[ https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989539#comment-15989539 ] Wei Zheng commented on HIVE-12636: -- It looks good in general. A few minor comments: Missed "break;" in Driver.startImplicitTxn() after case COMMIT and ROLLBACK? Comment for "return 10;" in Driver.compile() ? Same block, "long txnid = txnManager.openTxn(ctx, userFromUGI);", txnid is not used. Not too sure about this in Driver.recordValidTxns() {code} if(oldList != null) { throw new IllegalStateException("calling recordValidTxn() more than once in the same " + JavaUtils.txnIdToString(txnMgr.getCurrentTxnId())); } {code} "userFromUGI" in Driver.getUserFromUGI() is no longer used. In SemanticAnalyzerFactory, should they stay? //commandType.put(HiveParser.TOK_UPDATE_TABLE, HiveOperation.SQLUPDATE);//HIVE-16443 //commandType.put(HiveParser.TOK_DELETE_FROM, HiveOperation.SQLDELETE); //commandType.put(HiveParser.TOK_MERGE, HiveOperation.SQLMERGE); // INSERT, INSERT OVERWRITE, Why would some stats estimate in q.out files change? > Ensure that all queries (with DbTxnManager) run in a transaction > > > Key: HIVE-12636 > URL: https://issues.apache.org/jira/browse/HIVE-12636 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, > HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, > HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, > HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, > HIVE-12636.17.patch > > > Assuming Hive is using DbTxnManager > Currently (as of this writing only auto commit mode is supported), only > queries that write to an Acid table start a transaction. > Read-only queries don't open a txn but still acquire locks. > This makes internal structures confusing/odd. > The are constantly 2 code paths to deal with which is inconvenient and error > prone. > Also, a txn id is convenient "handle" for all locks/resources within a txn. > Doing thing would mean the client no longer needs to track locks that it > acquired. This enables further improvements to metastore side of Acid. > # add metastore call to openTxn() and acquireLocks() in a single call. this > it to make sure perf doesn't degrade for read-only query. (Would also be > useful for auto commit write queries) > # Should RO queries generate txn ids from the same sequence? (they could for > example use negative values of a different sequence). Txnid is part of the > delta/base file name. Currently it's 7 digits. If we use the same sequence, > we'll exceed 7 digits faster. (possible upgrade issue). On the other hand > there is value in being able to pick txn id and commit timestamp out of the > same logical sequence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989530#comment-15989530 ] Hive QA commented on HIVE-16488: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865591/HIVE-16488.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10636 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4930/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4930/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4930/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865591 - PreCommit-HIVE-Build > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled
[ https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989469#comment-15989469 ] Hive QA commented on HIVE-16558: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865510/HIVE-16558.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_handler_snapshot] (batchId=93) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4929/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4929/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4929/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865510 - PreCommit-HIVE-Build > In the hiveserver2.jsp Closed Queries table under the data click Drilldown > Link view details, the Chinese show garbled > -- > > Key: HIVE-16558 > URL: https://issues.apache.org/jira/browse/HIVE-16558 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Fix For: 3.0.0 > > Attachments: HIVE-16558.1.patch > > > In QueryProfileImpl.jamon,We see the following settings: > > > > > HiveServer2 > > > > > > > So we should set the response code to utf-8, which can avoid Chinese garbled > or other languages,Please check it! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16456) Kill spark job when InterruptedException happens or driverContext.isShutdown is true.
[ https://issues.apache.org/jira/browse/HIVE-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989448#comment-15989448 ] zhihai xu commented on HIVE-16456: -- Thanks [~xuefuz]! I created a Review Request for my patch at the following RB link: https://reviews.apache.org/r/58856/ > Kill spark job when InterruptedException happens or driverContext.isShutdown > is true. > - > > Key: HIVE-16456 > URL: https://issues.apache.org/jira/browse/HIVE-16456 > Project: Hive > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Attachments: HIVE-16456.000.patch > > > Kill spark job when InterruptedException happens or driverContext.isShutdown > is true. If the InterruptedException happened in RemoteSparkJobMonitor and > LocalSparkJobMonitor, it will be better to kill the job. Also there is a race > condition between submit the spark job and query/operation cancellation, it > will be better to check driverContext.isShutdown right after submit the spark > job. This will guarantee the job being killed no matter when shutdown is > called. It is similar as HIVE-15997. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain
[ https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989416#comment-15989416 ] Chao Sun commented on HIVE-16552: - [~xuefuz] Could you open a RB for this? Thanks. > Limit the number of tasks a Spark job may contain > - > > Key: HIVE-16552 > URL: https://issues.apache.org/jira/browse/HIVE-16552 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 1.0.0, 2.0.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-16552.patch > > > It's commonly desirable to block bad and big queries that takes a lot of YARN > resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to > stop a query that invokes a Spark job that contains too many tasks. The > proposal here is to introduce hive.spark.job.max.tasks with a default value > of -1 (no limit), which an admin can set to block queries that trigger too > many spark tasks. > Please note that this control knob applies to a spark job, though it's > possible that one query can trigger multiple Spark jobs (such as in case of > map-join). Nevertheless, the proposed approach is still helpful. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989404#comment-15989404 ] Hive QA commented on HIVE-15642: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865589/HIVE-15642.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=217) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4928/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4928/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4928/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865589 - PreCommit-HIVE-Build > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector
[ https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989358#comment-15989358 ] Josh Elser commented on HIVE-15795: --- +1 on the addendum from me. [~sershe], would you be able to commit this addendum after the 24hr period, please? > Support Accumulo Index Tables in Hive Accumulo Connector > > > Key: HIVE-15795 > URL: https://issues.apache.org/jira/browse/HIVE-15795 > Project: Hive > Issue Type: Improvement > Components: Accumulo Storage Handler >Reporter: Mike Fagan >Assignee: Mike Fagan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, > HIVE-15795.3.patch > > > Ability to specify an accumulo index table for an accumulo-hive table. > This would greatly improve performance for non-rowid query predicates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
[ https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989319#comment-15989319 ] Hive QA commented on HIVE-16524: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864874/HIVE-16524.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct] (batchId=109) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4927/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4927/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4927/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864874 - PreCommit-HIVE-Build > Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon > > > Key: HIVE-16524 > URL: https://issues.apache.org/jira/browse/HIVE-16524 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16524.1.patch > > > The Id attribute is defined in w3c as follows: > 1.The id attribute specifies the unique id of the HTML element. > 2.Id must be unique in the HTML document. > 3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or > by CSS to change or add a style to an element with the specified id. > But,the "id='attributes_table'" in hiveserver2.jsp and > QueryProfileTmpl.jamon: > 1.Not quoted by any css and js > 2.It has the same id attribute name on the same page > So I suggest removing this id attribute definition,Please Check It. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList
[ https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989305#comment-15989305 ] Wei Zheng commented on HIVE-16534: -- We can drop this: {code} while (txnId <= maxTxnId) { firstAbortedTxnIndex = Arrays.binarySearch(exceptions, txnId); if (firstAbortedTxnIndex >= 0) { break; } txnId++; } {code} The main usage of above code is to locate the index for first aborted txn in the range so that we can save some unnecessary iterations when scanning the BitSet. But in your example which is very likely to be a common situation, this is not acceptable. Considering the BitSet is not big (comparing to the gap between 5 and 100), we can just start from index 0 and scan thru the BitSet. I think this should be ok. > Add capability to tell aborted transactions apart from open transactions in > ValidTxnList > > > Key: HIVE-16534 > URL: https://issues.apache.org/jira/browse/HIVE-16534 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch > > > Currently in ValidReadTxnList, open transactions and aborted transactions are > stored together in one array. That makes it impossible to extract just > aborted transactions or open transactions. > For ValidCompactorTxnList this is fine, since we only store aborted > transactions but no open transactions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList
[ https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989294#comment-15989294 ] Eugene Koifman commented on HIVE-16534: --- another thought: I think implementation of isTxnRangeAborted() is problematic suppose we do an insert in Table1/part1 with txnid=5. Then there is no activity on this table for a month. Then there is another insert into Table1/part1 with txnid=100. After compaction we get a delta_5_100. so now this method is going to do 1M binary searches If (isAborted(minTxnId) && isAborted(maxTxnId) && (the number of on bits in BitSet between index of minTxnId and maxTxnId is max - min + 1) - then all txns in range in question are aborted - this gives ALL I'm not sure how to do NONE/SOME efficiently > Add capability to tell aborted transactions apart from open transactions in > ValidTxnList > > > Key: HIVE-16534 > URL: https://issues.apache.org/jira/browse/HIVE-16534 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch > > > Currently in ValidReadTxnList, open transactions and aborted transactions are > stored together in one array. That makes it impossible to extract just > aborted transactions or open transactions. > For ValidCompactorTxnList this is fine, since we only store aborted > transactions but no open transactions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16560) Avoid hive UDF jars to be dependent on HIVEServer2 auxilliary path deployment
[ https://issues.apache.org/jira/browse/HIVE-16560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krish Dey updated HIVE-16560: - Affects Version/s: 1.1.1 1.2.2 2.1.0 > Avoid hive UDF jars to be dependent on HIVEServer2 auxilliary path deployment > - > > Key: HIVE-16560 > URL: https://issues.apache.org/jira/browse/HIVE-16560 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 1.1.1, 1.2.2, 2.1.0 >Reporter: Krish Dey >Priority: Minor > > Hive UDFs need deployment in HIVE Server 2 auxilliary path, even with the > reloadable jars feature if the same Class already been loaded it wont load > the class again. > One improvement could be to remove the dependency of deploying this in > HiveServer 2 and let it load from the HDFS path itself. > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction
[ https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-16553: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Commited to master > Change default value for hive.tez.bigtable.minsize.semijoin.reduction > - > > Key: HIVE-16553 > URL: https://issues.apache.org/jira/browse/HIVE-16553 > Project: Hive > Issue Type: Bug > Components: Configuration >Reporter: Jason Dere >Assignee: Jason Dere > Fix For: 3.0.0 > > Attachments: HIVE-16553.1.patch > > > Current value is 1M rows, would like to bump this up to make sure we are not > creating semjoin optimizations on dimension tables, since having too many > semijoin optimizations can cause serialized execution of tasks if lots of > tasks are waiting for semijoin optimizations to be computed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList
[ https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989271#comment-15989271 ] Eugene Koifman commented on HIVE-16534: --- bq. I do serialize the BitSet into a byte array before sending it over Thrift interface. After receiving it I convert it back to BitSet since the bit manipulation is convenient. I meant in writeToString() - seems like that would make reading from string much simper/efficient You are right about the other points > Add capability to tell aborted transactions apart from open transactions in > ValidTxnList > > > Key: HIVE-16534 > URL: https://issues.apache.org/jira/browse/HIVE-16534 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch > > > Currently in ValidReadTxnList, open transactions and aborted transactions are > stored together in one array. That makes it impossible to extract just > aborted transactions or open transactions. > For ValidCompactorTxnList this is fine, since we only store aborted > transactions but no open transactions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989269#comment-15989269 ] Chaoyu Tang commented on HIVE-16147: [~pxiong] Thanks for looking into this. Yeah, I made some changes to fix the test failures and also optimized the code a little. I have uploaded the 2nd patch to RB requesting for the review. > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989265#comment-15989265 ] Hive QA commented on HIVE-16488: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865530/HIVE-16488.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10636 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4926/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4926/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4926/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865530 - PreCommit-HIVE-Build > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16143) Improve msck repair batching
[ https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-16143: --- Attachment: HIVE-16143.03.patch > Improve msck repair batching > > > Key: HIVE-16143 > URL: https://issues.apache.org/jira/browse/HIVE-16143 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, > HIVE-16143.03.patch > > > Currently, the {{msck repair table}} command batches the number of partitions > created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. > Following snippet shows the batching logic. There can be couple of > improvements to this batching logic: > {noformat} > int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE); > if (batch_size > 0 && partsNotInMs.size() > batch_size) { > int counter = 0; > for (CheckResult.PartitionResult part : partsNotInMs) { > counter++; > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > if (counter % batch_size == 0 || counter == > partsNotInMs.size()) { > db.createPartitions(apd); > apd = new AddPartitionDesc(table.getDbName(), > table.getTableName(), false); > } > } > } else { > for (CheckResult.PartitionResult part : partsNotInMs) { > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > } > db.createPartitions(apd); > } > } catch (Exception e) { > LOG.info("Could not bulk-add partitions to metastore; trying one by > one", e); > repairOutput.clear(); > msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput); > } > {noformat} > 1. If the batch size is too aggressive the code falls back to adding > partitions one by one which is almost always very slow. It is easily possible > that users increase the batch size to higher value to make the command run > faster but end up with a worse performance because code falls back to adding > one by one. Users are then expected to determine the tuned value of batch > size which works well for their environment. I think the code could handle > this situation better by exponentially decaying the batch size instead of > falling back to one by one. > 2. The other issue with this implementation is if lets say first batch > succeeds and the second one fails, the code tries to add all the partitions > one by one irrespective of whether some of the were successfully added or > not. If we need to fall back to one by one we should atleast remove the ones > which we know for sure are already added successfully. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit
[ https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-16484: Attachment: HIVE-16484.7.patch > Investigate SparkLauncher for HoS as alternative to bin/spark-submit > > > Key: HIVE-16484 > URL: https://issues.apache.org/jira/browse/HIVE-16484 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, > HIVE-16484.3.patch, HIVE-16484.4.patch, HIVE-16484.5.patch, > HIVE-16484.6.patch, HIVE-16484.7.patch > > > The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} > directory and invokes the {{bin/spark-submit}} script, which spawns a > separate process to run the Spark application. > {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch > Spark applications. > I see a few advantages: > * No need to spawn a separate process to launch a HoS --> lower startup time > * Simplifies the code in {{SparkClientImpl}} --> easier to debug > * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which > contains some useful utilities for querying the state of the Spark job > ** It also allows the launcher to specify a list of job listeners -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions
[ https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated HIVE-16527: Attachment: HIVE-16527.03.patch patch .03 added values file and non-explain selects to .q > Support outer and mixed reference aggregates in windowed functions > -- > > Key: HIVE-16527 > URL: https://issues.apache.org/jira/browse/HIVE-16527 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch, > HIVE-16527.03.patch > > > {noformat} > select sum(sum(c1)) over() from e011_01; > select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by > e011_01.c1, e011_01.c2; > select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) > from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, > e011_01.c2; > select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) > from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, > e011_03.c2; > select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order > by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by > e011_03.c2, e011_01.c2; > {noformat} > We fail to generate a plan for any of the above. The issue is that in > {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we > ignore all children except the last (the window spec child). Additionally the > typecheck processor is not prepared to encounter UDAF expressions > ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, > {{getXpathOrFuncExprNodeDesc}}). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989246#comment-15989246 ] ASF GitHub Bot commented on HIVE-16520: --- GitHub user daijyc opened a pull request: https://github.com/apache/hive/pull/173 HIVE-16520: Cache hive metadata in metastore You can merge this pull request into a Git repository by running: $ git pull https://github.com/daijyc/hive master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/173.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #173 commit 24fed179e00b1f323e218b2eba2c07ab5124a9e3 Author: Daniel DaiDate: 2017-04-28T00:08:11Z HIVE-16520: Cache hive metadata in metastore > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, > HIVE-16520-proto.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Status: Patch Available (was: Open) > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Attachment: HIVE-16488.02.patch > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-16520: -- Attachment: HIVE-16520-1.patch > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, > HIVE-16520-proto.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList
[ https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989239#comment-15989239 ] Wei Zheng commented on HIVE-16534: -- The sorting of exceptions in ValidReadTxnList is troublesome for the accompanying BitSet, as we have to sort the BitSet in the same manner. So I removed the sorting logic in the ctor and added "oder by txn_id" to TxnHandler.getOpenTxns so we don't need to worry about sorting later on. It's true that we always have 3 ':'. But if some fields are missing, e.g. "1:2::", then String.split() will only return an array of size 2. I do serialize the BitSet into a byte array before sending it over Thrift interface. After receiving it I convert it back to BitSet since the bit manipulation is convenient. I need to binary search in isTxnAborted() to get the index for the txnid, then look up in the bitset using that index. bitSet.set(0, bitSet.length()) does turn all the bits on, right? > Add capability to tell aborted transactions apart from open transactions in > ValidTxnList > > > Key: HIVE-16534 > URL: https://issues.apache.org/jira/browse/HIVE-16534 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch > > > Currently in ValidReadTxnList, open transactions and aborted transactions are > stored together in one array. That makes it impossible to extract just > aborted transactions or open transactions. > For ValidCompactorTxnList this is fine, since we only store aborted > transactions but no open transactions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Status: Open (was: Patch Available) > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-16520: -- Attachment: (was: HIVE-16520-1.patch) > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16520-proto-2.patch, HIVE-16520-proto.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Attachment: (was: HIVE-16488.02.patch) > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-15642: Status: Open (was: Patch Available) > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-15642: Status: Patch Available (was: Open) > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-15642: Attachment: (was: HIVE-15642.02.patch) > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-15642: Attachment: HIVE-15642.02.patch > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good
[ https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16523: Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) Committed to some branches. Thanks for the update/review! > VectorHashKeyWrapper hash code for strings is not so good > - > > Key: HIVE-16523 > URL: https://issues.apache.org/jira/browse/HIVE-16523 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16523.01.patch, HIVE-16523.02.patch, > HIVE-16523.patch > > > Perf issues in vectorized gby on some string keys -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16143) Improve msck repair batching
[ https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-16143: --- Attachment: HIVE-16143.02.patch Fixed the msck q.out files > Improve msck repair batching > > > Key: HIVE-16143 > URL: https://issues.apache.org/jira/browse/HIVE-16143 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch > > > Currently, the {{msck repair table}} command batches the number of partitions > created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. > Following snippet shows the batching logic. There can be couple of > improvements to this batching logic: > {noformat} > int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE); > if (batch_size > 0 && partsNotInMs.size() > batch_size) { > int counter = 0; > for (CheckResult.PartitionResult part : partsNotInMs) { > counter++; > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > if (counter % batch_size == 0 || counter == > partsNotInMs.size()) { > db.createPartitions(apd); > apd = new AddPartitionDesc(table.getDbName(), > table.getTableName(), false); > } > } > } else { > for (CheckResult.PartitionResult part : partsNotInMs) { > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > } > db.createPartitions(apd); > } > } catch (Exception e) { > LOG.info("Could not bulk-add partitions to metastore; trying one by > one", e); > repairOutput.clear(); > msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput); > } > {noformat} > 1. If the batch size is too aggressive the code falls back to adding > partitions one by one which is almost always very slow. It is easily possible > that users increase the batch size to higher value to make the command run > faster but end up with a worse performance because code falls back to adding > one by one. Users are then expected to determine the tuned value of batch > size which works well for their environment. I think the code could handle > this situation better by exponentially decaying the batch size instead of > falling back to one by one. > 2. The other issue with this implementation is if lets say first batch > succeeds and the second one fails, the code tries to add all the partitions > one by one irrespective of whether some of the were successfully added or > not. If we need to fall back to one by one we should atleast remove the ones > which we know for sure are already added successfully. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-16520: -- Attachment: HIVE-16520-1.patch > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, > HIVE-16520-proto.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16346: Resolution: Fixed Status: Resolved (was: Patch Available) > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.4.0 > > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16346: Status: Patch Available (was: Reopened) There are some files not renamed properly during applying the patch. Resubmitted the patch. > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.4.0 > > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989219#comment-15989219 ] Aihua Xu edited comment on HIVE-16346 at 4/28/17 5:46 PM: -- There are some files not renamed properly during applying the patch. Patch recommitted. was (Author: aihuaxu): There are some files not renamed properly during applying the patch. Resubmitted the patch. > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.4.0 > > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989211#comment-15989211 ] Pengcheng Xiong commented on HIVE-16147: [~ctang.ma], may i ask what did u change from the 1st patch? thanks. > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain
[ https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989201#comment-15989201 ] Xuefu Zhang commented on HIVE-16552: hi [~csun] and [~lirui], could you please review the changes? Thanks. > Limit the number of tasks a Spark job may contain > - > > Key: HIVE-16552 > URL: https://issues.apache.org/jira/browse/HIVE-16552 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 1.0.0, 2.0.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-16552.patch > > > It's commonly desirable to block bad and big queries that takes a lot of YARN > resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to > stop a query that invokes a Spark job that contains too many tasks. The > proposal here is to introduce hive.spark.job.max.tasks with a default value > of -1 (no limit), which an admin can set to block queries that trigger too > many spark tasks. > Please note that this control knob applies to a spark job, though it's > possible that one query can trigger multiple Spark jobs (such as in case of > map-join). Nevertheless, the proposed approach is still helpful. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception
[ https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-16213: --- Attachment: HIVE-16213.08.patch > ObjectStore can leak Queries when rollbackTransaction throws an exception > - > > Key: HIVE-16213 > URL: https://issues.apache.org/jira/browse/HIVE-16213 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Alexander Kolbasov >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, > HIVE-16213.03.patch, HIVE-16213.04.patch, HIVE-16213.05.patch, > HIVE-16213.06.patch, HIVE-16213.07.patch, HIVE-16213.08.patch > > > In ObjectStore.java there are a few places with the code similar to: > {code} > Query query = null; > try { > openTransaction(); > query = pm.newQuery(Something.class); > ... > commited = commitTransaction(); > } finally { > if (!commited) { > rollbackTransaction(); > } > if (query != null) { > query.closeAll(); > } > } > {code} > The problem is that rollbackTransaction() may throw an exception in which > case query.closeAll() wouldn't be executed. > The fix would be to wrap rollbackTransaction in its own try-catch block. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception
[ https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989199#comment-15989199 ] Vihang Karajgaonkar commented on HIVE-16213: org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) are known flaky tests I think are related to some PigServer setup issues during the run. I ran them locally and they were successful. Submitted them again to confirm. org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteDate (batchId=178) org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteVarchar (batchId=178) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testStoreFuncAllSimpleTypes (batchId=178) > ObjectStore can leak Queries when rollbackTransaction throws an exception > - > > Key: HIVE-16213 > URL: https://issues.apache.org/jira/browse/HIVE-16213 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Alexander Kolbasov >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, > HIVE-16213.03.patch, HIVE-16213.04.patch, HIVE-16213.05.patch, > HIVE-16213.06.patch, HIVE-16213.07.patch > > > In ObjectStore.java there are a few places with the code similar to: > {code} > Query query = null; > try { > openTransaction(); > query = pm.newQuery(Something.class); > ... > commited = commitTransaction(); > } finally { > if (!commited) { > rollbackTransaction(); > } > if (query != null) { > query.closeAll(); > } > } > {code} > The problem is that rollbackTransaction() may throw an exception in which > case query.closeAll() wouldn't be executed. > The fix would be to wrap rollbackTransaction in its own try-catch block. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
[ https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989195#comment-15989195 ] Xuefu Zhang commented on HIVE-16524: +1 > Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon > > > Key: HIVE-16524 > URL: https://issues.apache.org/jira/browse/HIVE-16524 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16524.1.patch > > > The Id attribute is defined in w3c as follows: > 1.The id attribute specifies the unique id of the HTML element. > 2.Id must be unique in the HTML document. > 3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or > by CSS to change or add a style to an element with the specified id. > But,the "id='attributes_table'" in hiveserver2.jsp and > QueryProfileTmpl.jamon: > 1.Not quoted by any css and js > 2.It has the same id attribute name on the same page > So I suggest removing this id attribute definition,Please Check It. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit
[ https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989180#comment-15989180 ] Hive QA commented on HIVE-16484: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865465/HIVE-16484.6.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles (batchId=280) org.apache.hive.spark.client.TestSparkClient.testCounters (batchId=280) org.apache.hive.spark.client.TestSparkClient.testErrorJob (batchId=280) org.apache.hive.spark.client.TestSparkClient.testJobSubmission (batchId=280) org.apache.hive.spark.client.TestSparkClient.testMetricsCollection (batchId=280) org.apache.hive.spark.client.TestSparkClient.testRemoteClient (batchId=280) org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob (batchId=280) org.apache.hive.spark.client.TestSparkClient.testSyncRpc (batchId=280) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4925/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4925/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4925/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865465 - PreCommit-HIVE-Build > Investigate SparkLauncher for HoS as alternative to bin/spark-submit > > > Key: HIVE-16484 > URL: https://issues.apache.org/jira/browse/HIVE-16484 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, > HIVE-16484.3.patch, HIVE-16484.4.patch, HIVE-16484.5.patch, HIVE-16484.6.patch > > > The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} > directory and invokes the {{bin/spark-submit}} script, which spawns a > separate process to run the Spark application. > {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch > Spark applications. > I see a few advantages: > * No need to spawn a separate process to launch a HoS --> lower startup time > * Simplifies the code in {{SparkClientImpl}} --> easier to debug > * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which > contains some useful utilities for querying the state of the Spark job > ** It also allows the launcher to specify a list of job listeners -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList
[ https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989172#comment-15989172 ] Eugene Koifman edited comment on HIVE-16534 at 4/28/17 5:16 PM: you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - why? writeToString() always creates 3 ':' - why does the deserializer need cases like _if (values.length < 3) {_ wouldn't be simpler to just serialize the BitSet as "0010110" - it's very compact and the deserializer wouldn't have to sort and do multiple binary searches why does _isTxnAborted()_ need a binary search? why not just look up the in the bitset? _bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in exceptio_ - shouldn't this turn all the bits ON? Nit: seems like ValidCompactorTxnList() c'tor could do this since it's always the case for compactor was (Author: ekoifman): you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - why? writeToString() always creates 3 ':' - why does the deserializer need cases like _if (values.length < 3) {_ wouldn't be simpler to just serialize the BitSet as "0010110" - it's very compact and the deserializer wouldn't have to sort and do multiple binary searches why does _ isTxnAborted()_ need a binary search? why not just look up the in the bitset? _bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in exceptio_ - shouldn't this turn all the bits ON? Nit: seems like ValidCompactorTxnList() c'tor could do this since it's always the case for compactor > Add capability to tell aborted transactions apart from open transactions in > ValidTxnList > > > Key: HIVE-16534 > URL: https://issues.apache.org/jira/browse/HIVE-16534 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch > > > Currently in ValidReadTxnList, open transactions and aborted transactions are > stored together in one array. That makes it impossible to extract just > aborted transactions or open transactions. > For ValidCompactorTxnList this is fine, since we only store aborted > transactions but no open transactions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList
[ https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989172#comment-15989172 ] Eugene Koifman edited comment on HIVE-16534 at 4/28/17 5:15 PM: you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - why? writeToString() always creates 3 ':' - why does the deserializer need cases like _if (values.length < 3) {_ wouldn't be simpler to just serialize the BitSet as "0010110" - it's very compact and the deserializer wouldn't have to sort and do multiple binary searches why does _ isTxnAborted()_ need a binary search? why not just look up the in the bitset? _bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in exceptio_ - shouldn't this turn all the bits ON? Nit: seems like ValidCompactorTxnList() c'tor could do this since it's always the case for compactor was (Author: ekoifman): you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - why? writeToString() always creates 3 ':' - why does the deserializer need cases like_if (values.length < 3) {_ wouldn't be simpler to just serialize the BitSet as "0010110" - it's very compact and the deserializer wouldn't have to sort and do multiple binary searches why does _ isTxnAborted()_ need a binary search? why not just look up the in the bitset? _bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in exceptio_ - shouldn't this turn all the bits ON? Nit: seems like ValidCompactorTxnList() c'tor could do this since it's always the case for compactor > Add capability to tell aborted transactions apart from open transactions in > ValidTxnList > > > Key: HIVE-16534 > URL: https://issues.apache.org/jira/browse/HIVE-16534 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch > > > Currently in ValidReadTxnList, open transactions and aborted transactions are > stored together in one array. That makes it impossible to extract just > aborted transactions or open transactions. > For ValidCompactorTxnList this is fine, since we only store aborted > transactions but no open transactions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList
[ https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989172#comment-15989172 ] Eugene Koifman commented on HIVE-16534: --- you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - why? writeToString() always creates 3 ':' - why does the deserializer need cases like_if (values.length < 3) {_ wouldn't be simpler to just serialize the BitSet as "0010110" - it's very compact and the deserializer wouldn't have to sort and do multiple binary searches why does _ isTxnAborted()_ need a binary search? why not just look up the in the bitset? _bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in exceptio_ - shouldn't this turn all the bits ON? Nit: seems like ValidCompactorTxnList() c'tor could do this since it's always the case for compactor > Add capability to tell aborted transactions apart from open transactions in > ValidTxnList > > > Key: HIVE-16534 > URL: https://issues.apache.org/jira/browse/HIVE-16534 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch > > > Currently in ValidReadTxnList, open transactions and aborted transactions are > stored together in one array. That makes it impossible to extract just > aborted transactions or open transactions. > For ValidCompactorTxnList this is fine, since we only store aborted > transactions but no open transactions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989168#comment-15989168 ] Chaoyu Tang commented on HIVE-16147: The only one test failure is not related to this patch. [~pxiong] could you review the patch? Thanks > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15571) Support Insert into for druid storage handler
[ https://issues.apache.org/jira/browse/HIVE-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989154#comment-15989154 ] Nishant Bangarwa commented on HIVE-15571: - [~jcamachorodriguez] your understanding is correct, this is not safe to be execute multiple insert into in parallel. When we create a druid segment we need to allocate a version and a shardSpec to it based on existing segments, In case of druid it is easier to manage as the overlord manages segment allocation and uses interval based locks, which can handle multiple tasks as the allocation and locking is done at a central place. In the case of hive insert into, i was planning to lock the complete datasource for the first version and later on extend that to only lock the intervals for which data is being ingested in a subsequent PR . For interval based locking we would need to figure out the interval for the data being ingested in StorageHandler preInsert, which we don't know currently. > Support Insert into for druid storage handler > - > > Key: HIVE-15571 > URL: https://issues.apache.org/jira/browse/HIVE-15571 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: Nishant Bangarwa > Attachments: HIVE-15571.01.patch > > > Add support of inset into operator for druid storage handler. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16513) width_bucket issues
[ https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989103#comment-15989103 ] Hive QA commented on HIVE-16513: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865462/HIVE-16513.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4924/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4924/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4924/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865462 - PreCommit-HIVE-Build > width_bucket issues > --- > > Key: HIVE-16513 > URL: https://issues.apache.org/jira/browse/HIVE-16513 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin >Assignee: Sahil Takiar > Attachments: HIVE-16513.1.patch, HIVE-16513.2.patch > > > width_bucket was recently added with HIVE-15982. This ticket notes a few > issues. > Usability issue: > Currently only accepts integral numeric types. Decimals, floats and doubles > are not supported. > Runtime failures: This query will cause a runtime divide-by-zero in the > reduce stage. > select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1; > The divide-by-zero seems to trigger any time I use a group-by. Here's another > example (that actually requires the group-by): > select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1; > Advanced Usage Issues: > Suppose you have a table e011_01 as follows: > create table e011_01 (c1 integer, c2 smallint); > insert into e011_01 values (1, 1), (2, 2); > Compile-time problems: > You cannot use simple case expressions, searched case expressions or grouping > sets. These queries fail: > select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) > from e011_01; > select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) > from e011_01; > select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as > integer)) from e011_02 group by cube(c1, c2); > I'll admit the grouping one is pretty contrived but the case ones seem > straightforward, valid, and it's strange that they don't work. Similar > queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe > [~ashutoshc] can lend some perspective on that? > Interestingly, you can use window functions in width_bucket, example: > select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01; > works just fine. Hopefully we can get to a place where people implementing > functions like this don't need to think about value expression support but we > don't seem to be there yet. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16542) make merge that targets acid 2.0 table fail-fast
[ https://issues.apache.org/jira/browse/HIVE-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16542: -- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 2.3.0 Status: Resolved (was: Patch Available) committed to branch-2.3, branch-2, master thanks Wei for the review > make merge that targets acid 2.0 table fail-fast > - > > Key: HIVE-16542 > URL: https://issues.apache.org/jira/browse/HIVE-16542 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 2.3.0, 3.0.0, 2.4.0 > > Attachments: HIVE-16542.01-branch-2.3.patch, > HIVE-16542.01-branch-2.patch, HIVE-16542.01.patch, HIVE-16542.02.patch > > > Until HIVE-14947 is fixed, need to add a check so that acid 2.0 tables are > not written to by Merge stmt that has both Insert and Update clauses -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16546) LLAP: Fail map join tasks if hash table memory exceeds threshold
[ https://issues.apache.org/jira/browse/HIVE-16546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989009#comment-15989009 ] Hive QA commented on HIVE-16546: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865459/HIVE-16546.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4923/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4923/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4923/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865459 - PreCommit-HIVE-Build > LLAP: Fail map join tasks if hash table memory exceeds threshold > > > Key: HIVE-16546 > URL: https://issues.apache.org/jira/browse/HIVE-16546 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16546.1.patch, HIVE-16546.2.patch, > HIVE-16546.WIP.patch > > > When map join task is running in llap, it can potentially use lot more memory > than its limit which could be memory per executor or no conditional task > size. If it uses more memory, it can adversely affect other query performance > or it can even bring down the daemon. In such cases, it is better to fail the > query than to bring down the daemon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989006#comment-15989006 ] Aihua Xu commented on HIVE-16346: - Thanks [~ekoifman] . I just reverted the change. [~stakiar_impala_496e] Can you take a look? > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.4.0 > > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988990#comment-15988990 ] Aihua Xu commented on HIVE-16346: - [~ekoifman] Sorry about that. Seems I may forget to include a new file when committing the change. > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.4.0 > > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Issue Comment Deleted] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16346: Comment: was deleted (was: [~ekoifman] Sorry about that. Seems I may forget to include a new file when committing the change. ) > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.4.0 > > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16542) make merge that targets acid 2.0 table fail-fast
[ https://issues.apache.org/jira/browse/HIVE-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16542: -- Attachment: HIVE-16542.01-branch-2.3.patch > make merge that targets acid 2.0 table fail-fast > - > > Key: HIVE-16542 > URL: https://issues.apache.org/jira/browse/HIVE-16542 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16542.01-branch-2.3.patch, > HIVE-16542.01-branch-2.patch, HIVE-16542.01.patch, HIVE-16542.02.patch > > > Until HIVE-14947 is fixed, need to add a check so that acid 2.0 tables are > not written to by Merge stmt that has both Insert and Update clauses -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reopened HIVE-16346: --- > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.4.0 > > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988981#comment-15988981 ] Eugene Koifman commented on HIVE-16346: --- [~aihuaxu], [~stakiar] - it looks like this broke branch-2 compilation. For example, https://builds.apache.org/job/PreCommit-HIVE-Build/4916/ (HIVE-16542). I'm (and others [~wei.zheng]) getting the same error compiling the branch w/o any changes. Seems because HdfsUtils.java in shims-common is referring to some class in common before common module is compiled > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.4.0 > > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16487) Serious Zookeeper exception is logged when a race condition happens
[ https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988954#comment-15988954 ] Chaoyu Tang commented on HIVE-16487: LGTM, +1 pending tests. > Serious Zookeeper exception is logged when a race condition happens > --- > > Key: HIVE-16487 > URL: https://issues.apache.org/jira/browse/HIVE-16487 > Project: Hive > Issue Type: Bug > Components: Locking >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-16487.02.patch, HIVE-16487.patch > > > A customer started to see this in the logs, but happily everything was > working as intended: > {code} > 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager: > [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /hive_zookeeper_namespace//LOCK-SHARED- > {code} > This was happening, because a race condition between the lock releasing, and > lock acquiring. The thread releasing the lock removes the parent ZK node just > after the thread acquiring the lock made sure, that the parent node exists. > Since this can happen without any real problem, I plan to add NODEEXISTS, and > NONODE as a transient ZooKeeper exception, so the users are not confused. > Also, the original author of ZooKeeperHiveLockManager maybe planned to handle > different ZooKeeperExceptions differently, and the code is hard to > understand. See the {{continue}} and the {{break}}. The {{break}} only breaks > the switch, and not the loop which IMHO is not intuitive: > {code} > do { > try { > [..] > ret = lockPrimitive(key, mode, keepAlive, parentCreated, > } catch (Exception e1) { > if (e1 instanceof KeeperException) { > KeeperException e = (KeeperException) e1; > switch (e.code()) { > case CONNECTIONLOSS: > case OPERATIONTIMEOUT: > LOG.debug("Possibly transient ZooKeeper exception: ", e); > continue; > default: > LOG.error("Serious Zookeeper exception: ", e); > break; > } > } > [..] > } > } while (tryNum < numRetriesForLock); > {code} > If we do not want to try again in case of a "Serious Zookeeper exception:", > then we should add a label to the do loop, and break it in the switch. > If we do want to try regardless of the type of the ZK exception, then we > should just change the {{continue;}} to {{break;}} and move the lines part of > the code which did not run in case of {{continue}} to the {{default}} switch, > so it is easier to understand the code. > Any suggestions or ideas [~ctang.ma] or [~szehon]? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction
[ https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988912#comment-15988912 ] Hive QA commented on HIVE-16553: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865455/HIVE-16553.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] (batchId=9) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4922/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4922/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4922/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865455 - PreCommit-HIVE-Build > Change default value for hive.tez.bigtable.minsize.semijoin.reduction > - > > Key: HIVE-16553 > URL: https://issues.apache.org/jira/browse/HIVE-16553 > Project: Hive > Issue Type: Bug > Components: Configuration >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16553.1.patch > > > Current value is 1M rows, would like to bump this up to make sure we are not > creating semjoin optimizations on dimension tables, since having too many > semijoin optimizations can cause serialized execution of tasks if lots of > tasks are waiting for semijoin optimizations to be computed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Attachment: HIVE-16559.01.patch First draft containing a check to prevent the dropping of columns if the table is: - partitioned - stored in parquet - cascade option is missing > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Fix Version/s: 3.0.0 Status: Patch Available (was: Open) > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16546) LLAP: Fail map join tasks if hash table memory exceeds threshold
[ https://issues.apache.org/jira/browse/HIVE-16546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988810#comment-15988810 ] Hive QA commented on HIVE-16546: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865459/HIVE-16546.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10631 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbasestats] (batchId=91) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=236) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4921/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4921/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4921/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865459 - PreCommit-HIVE-Build > LLAP: Fail map join tasks if hash table memory exceeds threshold > > > Key: HIVE-16546 > URL: https://issues.apache.org/jira/browse/HIVE-16546 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16546.1.patch, HIVE-16546.2.patch, > HIVE-16546.WIP.patch > > > When map join task is running in llap, it can potentially use lot more memory > than its limit which could be memory per executor or no conditional task > size. If it uses more memory, it can adversely affect other query performance > or it can even bring down the daemon. In such cases, it is better to fail the > query than to bring down the daemon. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16487) Serious Zookeeper exception is logged when a race condition happens
[ https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-16487: -- Attachment: HIVE-16487.02.patch Cleaned up the code a little. I think it is more readable this way. > Serious Zookeeper exception is logged when a race condition happens > --- > > Key: HIVE-16487 > URL: https://issues.apache.org/jira/browse/HIVE-16487 > Project: Hive > Issue Type: Bug > Components: Locking >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-16487.02.patch, HIVE-16487.patch > > > A customer started to see this in the logs, but happily everything was > working as intended: > {code} > 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager: > [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /hive_zookeeper_namespace//LOCK-SHARED- > {code} > This was happening, because a race condition between the lock releasing, and > lock acquiring. The thread releasing the lock removes the parent ZK node just > after the thread acquiring the lock made sure, that the parent node exists. > Since this can happen without any real problem, I plan to add NODEEXISTS, and > NONODE as a transient ZooKeeper exception, so the users are not confused. > Also, the original author of ZooKeeperHiveLockManager maybe planned to handle > different ZooKeeperExceptions differently, and the code is hard to > understand. See the {{continue}} and the {{break}}. The {{break}} only breaks > the switch, and not the loop which IMHO is not intuitive: > {code} > do { > try { > [..] > ret = lockPrimitive(key, mode, keepAlive, parentCreated, > } catch (Exception e1) { > if (e1 instanceof KeeperException) { > KeeperException e = (KeeperException) e1; > switch (e.code()) { > case CONNECTIONLOSS: > case OPERATIONTIMEOUT: > LOG.debug("Possibly transient ZooKeeper exception: ", e); > continue; > default: > LOG.error("Serious Zookeeper exception: ", e); > break; > } > } > [..] > } > } while (tryNum < numRetriesForLock); > {code} > If we do not want to try again in case of a "Serious Zookeeper exception:", > then we should add a label to the do loop, and break it in the switch. > If we do want to try regardless of the type of the ZK exception, then we > should just change the {{continue;}} to {{break;}} and move the lines part of > the code which did not run in case of {{continue}} to the {{default}} switch, > so it is easier to understand the code. > Any suggestions or ideas [~ctang.ma] or [~szehon]? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16485) Enable outputName for RS operator in explain formatted
[ https://issues.apache.org/jira/browse/HIVE-16485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988752#comment-15988752 ] Hive QA commented on HIVE-16485: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865443/HIVE-16485.03.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4920/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4920/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4920/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-04-28 12:43:31.851 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-4920/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-04-28 12:43:31.853 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at bbf5ecc HIVE-16171 : Support replication of truncate table (Sankar Hariappan, reviewed by Sushanth Sowmyan) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at bbf5ecc HIVE-16171 : Support replication of truncate table (Sankar Hariappan, reviewed by Sushanth Sowmyan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-04-28 12:43:32.575 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: patch -p1 patching file ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java patching file ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/AnnotateReduceSinkOutputOperator.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java patching file ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java patching file ql/src/test/queries/clientpositive/explain_formatted_oid.q patching file ql/src/test/results/clientpositive/explain_formatted_oid.q.out patching file ql/src/test/results/clientpositive/input4.q.out patching file ql/src/test/results/clientpositive/join0.q.out patching file ql/src/test/results/clientpositive/parallel_join0.q.out patching file ql/src/test/results/clientpositive/plan_json.q.out + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven ANTLR Parser Generator Version 3.5.2 Output file /data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java does not exist: must build /data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g org/apache/hadoop/hive/metastore/parser/Filter.g DataNucleus Enhancer (version 4.1.17) for API "JDO" DataNucleus Enhancer : Classpath >> /usr/share/maven/boot/plexus-classworlds-2.x.jar ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo ENHANCED
[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector
[ https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988750#comment-15988750 ] Hive QA commented on HIVE-15795: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864833/HIVE-15795.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testFloatCast2DoubleThriftSerializeInTasks (batchId=223) org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4919/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4919/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4919/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864833 - PreCommit-HIVE-Build > Support Accumulo Index Tables in Hive Accumulo Connector > > > Key: HIVE-15795 > URL: https://issues.apache.org/jira/browse/HIVE-15795 > Project: Hive > Issue Type: Improvement > Components: Accumulo Storage Handler >Reporter: Mike Fagan >Assignee: Mike Fagan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, > HIVE-15795.3.patch > > > Ability to specify an accumulo index table for an accumulo-hive table. > This would greatly improve performance for non-rowid query predicates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16171) Support replication of truncate table
[ https://issues.apache.org/jira/browse/HIVE-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988726#comment-15988726 ] ASF GitHub Bot commented on HIVE-16171: --- Github user sankarh closed the pull request at: https://github.com/apache/hive/pull/166 > Support replication of truncate table > - > > Key: HIVE-16171 > URL: https://issues.apache.org/jira/browse/HIVE-16171 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.2.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR > Fix For: 3.0.0 > > Attachments: HIVE-16171.01.patch, HIVE-16171.02.patch, > HIVE-16171.03.patch, HIVE-16171.04.patch, HIVE-16171.05.patch, > HIVE-16171.06.patch, HIVE-16171.07.patch > > > Need to support truncate table for replication. Key points to note. > 1. For non-partitioned table, truncate table will remove all the rows from > the table. > 2. For partitioned tables, need to consider how truncate behaves if truncate > a partition or the whole table. > 3. Bootstrap load with truncate table must work as it is just > loadTable/loadPartition with empty dataset. > 4. It is suggested to re-use the alter table/alter partition events to handle > truncate. > 5. Need to consider the case where insert event happens before truncate table > which needs to see their data files through change management. The data files > should be recycled to the cmroot path before trashing it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Status: Patch Available (was: Open) > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Attachment: HIVE-16488.02.patch > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Attachment: (was: HIVE-16488.02.patch) > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Status: Open (was: Patch Available) > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception
[ https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988689#comment-15988689 ] Hive QA commented on HIVE-16213: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865442/HIVE-16213.07.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10636 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteDate (batchId=178) org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteVarchar (batchId=178) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testStoreFuncAllSimpleTypes (batchId=178) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4918/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4918/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4918/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865442 - PreCommit-HIVE-Build > ObjectStore can leak Queries when rollbackTransaction throws an exception > - > > Key: HIVE-16213 > URL: https://issues.apache.org/jira/browse/HIVE-16213 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Alexander Kolbasov >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, > HIVE-16213.03.patch, HIVE-16213.04.patch, HIVE-16213.05.patch, > HIVE-16213.06.patch, HIVE-16213.07.patch > > > In ObjectStore.java there are a few places with the code similar to: > {code} > Query query = null; > try { > openTransaction(); > query = pm.newQuery(Something.class); > ... > commited = commitTransaction(); > } finally { > if (!commited) { > rollbackTransaction(); > } > if (query != null) { > query.closeAll(); > } > } > {code} > The problem is that rollbackTransaction() may throw an exception in which > case query.closeAll() wouldn't be executed. > The fix would be to wrap rollbackTransaction in its own try-catch block. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Description: Parquet schema evolution should make it possible to have partitions/tables backed by files with different schemas. Hive should match the table columns with file columns based on the column name if possible. However if the serde for a table is missing columns from the serde of a partition Hive fails to match the columns together. Steps to reproduce: {code} CREATE TABLE myparquettable_parted ( name string, favnumber int, favcolor string, age int, favpet string ) PARTITIONED BY (day string) STORED AS PARQUET; INSERT OVERWRITE TABLE myparquettable_parted PARTITION(day='2017-04-04') SELECT 'mary' as name, 5 AS favnumber, 'blue' AS favcolor, 35 AS age, 'dog' AS favpet; alter table myparquettable_parted REPLACE COLUMNS ( favnumber int, age int ); > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16143) Improve msck repair batching
[ https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988629#comment-15988629 ] Hive QA commented on HIVE-16143: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865438/HIVE-16143.01.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10647 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[create_like] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_0] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_1] (batchId=76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_2] (batchId=56) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_3] (batchId=39) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_batchsize] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repair] (batchId=32) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4917/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4917/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4917/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865438 - PreCommit-HIVE-Build > Improve msck repair batching > > > Key: HIVE-16143 > URL: https://issues.apache.org/jira/browse/HIVE-16143 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16143.01.patch > > > Currently, the {{msck repair table}} command batches the number of partitions > created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. > Following snippet shows the batching logic. There can be couple of > improvements to this batching logic: > {noformat} > int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE); > if (batch_size > 0 && partsNotInMs.size() > batch_size) { > int counter = 0; > for (CheckResult.PartitionResult part : partsNotInMs) { > counter++; > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > if (counter % batch_size == 0 || counter == > partsNotInMs.size()) { > db.createPartitions(apd); > apd = new AddPartitionDesc(table.getDbName(), > table.getTableName(), false); > } > } > } else { > for (CheckResult.PartitionResult part : partsNotInMs) { > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > } > db.createPartitions(apd); > } > } catch (Exception e) { > LOG.info("Could not bulk-add partitions to metastore; trying one by > one", e); > repairOutput.clear(); > msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput); > } > {noformat} > 1. If the batch size is too aggressive the code falls back to adding > partitions one by one which is almost always very slow. It is easily possible > that users increase the batch size to higher value to make the command run > faster but end up with a worse performance because code falls back to adding > one by one. Users are then expected to determine the tuned value of batch > size which works well for their environment. I think the code could handle > this situation better by exponentially decaying the batch size instead of > falling back to one by one. > 2. The other issue with this implementation is if lets say first batch > succeeds and the second one fails, the code tries to add all the partitions > one by one irrespective of whether some of the were successfully added or > not. If we need to fall back to one by one we should atleast remove the ones > which we know for sure are
[jira] [Commented] (HIVE-15726) Reenable indentation checks to checkstyle
[ https://issues.apache.org/jira/browse/HIVE-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988602#comment-15988602 ] Peter Vary commented on HIVE-15726: --- Test failures are not related > Reenable indentation checks to checkstyle > - > > Key: HIVE-15726 > URL: https://issues.apache.org/jira/browse/HIVE-15726 > Project: Hive > Issue Type: Improvement >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-15726.patch > > > The Indentation check is commented out because at that time there were no > possibility to check the throws indentation. > There is a possibility now, so we can reenable it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara reassigned HIVE-16559: -- > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-16559: --- Component/s: Serializers/Deserializers > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > REPLACE COLUMNS > ( > favnumber int, > age int > );
[jira] [Commented] (HIVE-16542) make merge that targets acid 2.0 table fail-fast
[ https://issues.apache.org/jira/browse/HIVE-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988557#comment-15988557 ] Hive QA commented on HIVE-16542: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865435/HIVE-16542.01-branch-2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4916/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4916/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4916/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-04-28 09:56:20.467 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-4916/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z branch-2 ]] + [[ -d apache-github-branch-2-source ]] + [[ ! -d apache-github-branch-2-source/.git ]] + [[ ! -d apache-github-branch-2-source ]] + date '+%Y-%m-%d %T.%3N' 2017-04-28 09:56:20.470 + cd apache-github-branch-2-source + git fetch origin + git reset --hard HEAD HEAD is now at ab3a24b update RELEASE_NOTES.txt for 2.3 (HIVE-16545,HIVE-16547) + git clean -f -d + git checkout branch-2 Already on 'branch-2' Your branch is up-to-date with 'origin/branch-2'. + git reset --hard origin/branch-2 HEAD is now at ab3a24b update RELEASE_NOTES.txt for 2.3 (HIVE-16545,HIVE-16547) + git merge --ff-only origin/branch-2 Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-04-28 09:56:23.161 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: patch -p0 patching file ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2WithSplitUpdate.java + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven [ERROR] COMPILATION ERROR : [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[43,37] package org.apache.hadoop.hive.common does not exist [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[79,9] cannot find symbol symbol: variable StorageUtils location: class org.apache.hadoop.hive.io.HdfsUtils [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[98,9] cannot find symbol symbol: variable StorageUtils location: class org.apache.hadoop.hive.io.HdfsUtils [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-shims-common: Compilation failure: Compilation failure: [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[43,37] package org.apache.hadoop.hive.common does not exist [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[79,9] cannot find symbol [ERROR] symbol: variable StorageUtils [ERROR] location: class org.apache.hadoop.hive.io.HdfsUtils [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[98,9] cannot find symbol [ERROR] symbol: variable StorageUtils [ERROR] location: class org.apache.hadoop.hive.io.HdfsUtils [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR]
[jira] [Commented] (HIVE-16366) Hive 2.3 release planning
[ https://issues.apache.org/jira/browse/HIVE-16366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988554#comment-15988554 ] Hive QA commented on HIVE-16366: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865426/HIVE-16366-branch-2.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10571 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=142) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=174) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4915/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4915/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4915/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865426 - PreCommit-HIVE-Build > Hive 2.3 release planning > - > > Key: HIVE-16366 > URL: https://issues.apache.org/jira/browse/HIVE-16366 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong >Priority: Blocker > Labels: 2.3.0 > Fix For: 2.3.0 > > Attachments: HIVE-16366-branch-2.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-15642: Status: Patch Available (was: In Progress) > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled
[ https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16558: - Status: Patch Available (was: Open) > In the hiveserver2.jsp Closed Queries table under the data click Drilldown > Link view details, the Chinese show garbled > -- > > Key: HIVE-16558 > URL: https://issues.apache.org/jira/browse/HIVE-16558 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Fix For: 3.0.0 > > Attachments: HIVE-16558.1.patch > > > In QueryProfileImpl.jamon,We see the following settings: > > > > > HiveServer2 > > > > > > > So we should set the response code to utf-8, which can avoid Chinese garbled > or other languages,Please check it! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-15642: Attachment: HIVE-15642.02.patch Added 02.patch with following updates, - The new files listed for insert overwrite for non-partitioned table (loadTable method) as well. - The new files listing should consider the sub-directories in destination path which should recursively traverse. - The new files listing is done on the physical destination path after moveFile is successful, instead of logically building the new files path using source file names. - Added new test cases to verify insert overwrites, dynamic partition and loads. > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled
[ https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16558: - Description: In QueryProfileImpl.jamon,We see the following settings: HiveServer2 So we should set the response code to utf-8, which can avoid Chinese garbled or other languages,Please check it! > In the hiveserver2.jsp Closed Queries table under the data click Drilldown > Link view details, the Chinese show garbled > -- > > Key: HIVE-16558 > URL: https://issues.apache.org/jira/browse/HIVE-16558 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Fix For: 3.0.0 > > Attachments: HIVE-16558.1.patch > > > In QueryProfileImpl.jamon,We see the following settings: > > > > > HiveServer2 > > > > > > > So we should set the response code to utf-8, which can avoid Chinese garbled > or other languages,Please check it! -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled
[ https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16558: - Attachment: HIVE-16558.1.patch > In the hiveserver2.jsp Closed Queries table under the data click Drilldown > Link view details, the Chinese show garbled > -- > > Key: HIVE-16558 > URL: https://issues.apache.org/jira/browse/HIVE-16558 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Fix For: 3.0.0 > > Attachments: HIVE-16558.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled
[ https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin reassigned HIVE-16558: > In the hiveserver2.jsp Closed Queries table under the data click Drilldown > Link view details, the Chinese show garbled > -- > > Key: HIVE-16558 > URL: https://issues.apache.org/jira/browse/HIVE-16558 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988517#comment-15988517 ] ASF GitHub Bot commented on HIVE-15642: --- GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/172 HIVE-15642: Replicate Insert Overwrites, Dynamic Partition Inserts and Loads Replicate Insert Overwrites, Dynamic Partition Inserts and Loads You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-15642 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/172.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #172 commit ddf04ae11c800be8b762a44fedcc16393396745d Author: Sankar HariappanDate: 2017-04-28T07:49:04Z HIVE-15642: Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain
[ https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988450#comment-15988450 ] Hive QA commented on HIVE-16552: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865423/HIVE-16552.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10631 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestHBaseCliDriver.org.apache.hadoop.hive.cli.TestHBaseCliDriver (batchId=94) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4914/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4914/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4914/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865423 - PreCommit-HIVE-Build > Limit the number of tasks a Spark job may contain > - > > Key: HIVE-16552 > URL: https://issues.apache.org/jira/browse/HIVE-16552 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 1.0.0, 2.0.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-16552.patch > > > It's commonly desirable to block bad and big queries that takes a lot of YARN > resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to > stop a query that invokes a Spark job that contains too many tasks. The > proposal here is to introduce hive.spark.job.max.tasks with a default value > of -1 (no limit), which an admin can set to block queries that trigger too > many spark tasks. > Please note that this control knob applies to a spark job, though it's > possible that one query can trigger multiple Spark jobs (such as in case of > map-join). Nevertheless, the proposed approach is still helpful. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
[ https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988444#comment-15988444 ] ZhangBing Lin commented on HIVE-16524: -- [~xuefuz] Can you help me commit it > Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon > > > Key: HIVE-16524 > URL: https://issues.apache.org/jira/browse/HIVE-16524 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16524.1.patch > > > The Id attribute is defined in w3c as follows: > 1.The id attribute specifies the unique id of the HTML element. > 2.Id must be unique in the HTML document. > 3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or > by CSS to change or add a style to an element with the specified id. > But,the "id='attributes_table'" in hiveserver2.jsp and > QueryProfileTmpl.jamon: > 1.Not quoted by any css and js > 2.It has the same id attribute name on the same page > So I suggest removing this id attribute definition,Please Check It. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16557) Vectorization: Specialize ReduceSink empty key case
[ https://issues.apache.org/jira/browse/HIVE-16557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline reassigned HIVE-16557: --- > Vectorization: Specialize ReduceSink empty key case > --- > > Key: HIVE-16557 > URL: https://issues.apache.org/jira/browse/HIVE-16557 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > > Gopal pointed out that native Vectorization of ReduceSink is missing the > empty key case. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16542) make merge that targets acid 2.0 table fail-fast
[ https://issues.apache.org/jira/browse/HIVE-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988393#comment-15988393 ] Hive QA commented on HIVE-16542: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865435/HIVE-16542.01-branch-2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4913/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4913/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4913/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-04-28 08:05:18.209 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-4913/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z branch-2 ]] + [[ -d apache-github-branch-2-source ]] + [[ ! -d apache-github-branch-2-source/.git ]] + [[ ! -d apache-github-branch-2-source ]] + date '+%Y-%m-%d %T.%3N' 2017-04-28 08:05:18.238 + cd apache-github-branch-2-source + git fetch origin >From https://github.com/apache/hive 0ecdfcd..ab3a24b branch-2 -> origin/branch-2 03941e3..3403535 branch-2.2 -> origin/branch-2.2 ee57fa1..9194cae branch-2.3 -> origin/branch-2.3 6566065..bbf5ecc master -> origin/master * [new branch] storage-branch-2.3 -> origin/storage-branch-2.3 * [new tag] release-2.3.0-rc0 -> release-2.3.0-rc0 * [new tag] storage-release-2.3.0rc0 -> storage-release-2.3.0rc0 + git reset --hard HEAD HEAD is now at 0ecdfcd HIVE-15761: ObjectStore.getNextNotification could return an empty NotificationEventResponse causing TProtocolException (Sergio Pena, reviewed by Aihua Xu) + git clean -f -d + git checkout branch-2 Already on 'branch-2' Your branch is behind 'origin/branch-2' by 10 commits, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/branch-2 HEAD is now at ab3a24b update RELEASE_NOTES.txt for 2.3 (HIVE-16545,HIVE-16547) + git merge --ff-only origin/branch-2 Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-04-28 08:05:23.784 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: patch -p0 patching file ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2WithSplitUpdate.java + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven [ERROR] COMPILATION ERROR : [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[43,37] package org.apache.hadoop.hive.common does not exist [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[79,9] cannot find symbol symbol: variable StorageUtils location: class org.apache.hadoop.hive.io.HdfsUtils [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[98,9] cannot find symbol symbol: variable StorageUtils location: class org.apache.hadoop.hive.io.HdfsUtils [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-shims-common: Compilation failure: Compilation failure: [ERROR] /data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[43,37] package org.apache.hadoop.hive.common does not exist [ERROR]
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988389#comment-15988389 ] Hive QA commented on HIVE-16147: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865390/HIVE-16147.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4912/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4912/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4912/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865390 - PreCommit-HIVE-Build > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 >
[jira] [Work started] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-15642 started by Sankar Hariappan. --- > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)