[jira] [Commented] (HIVE-16487) Serious Zookeeper exception is logged when a race condition happens

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989775#comment-15989775
 ] 

Hive QA commented on HIVE-16487:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865547/HIVE-16487.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_2] 
(batchId=234)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4933/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4933/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4933/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865547 - PreCommit-HIVE-Build

> Serious Zookeeper exception is logged when a race condition happens
> ---
>
> Key: HIVE-16487
> URL: https://issues.apache.org/jira/browse/HIVE-16487
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16487.02.patch, HIVE-16487.patch
>
>
> A customer started to see this in the logs, but happily everything was 
> working as intended:
> {code}
> 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager: 
> [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /hive_zookeeper_namespace//LOCK-SHARED-
> {code}
> This was happening, because a race condition between the lock releasing, and 
> lock acquiring. The thread releasing the lock removes the parent ZK node just 
> after the thread acquiring the lock made sure, that the parent node exists.
> Since this can happen without any real problem, I plan to add NODEEXISTS, and 
> NONODE as a transient ZooKeeper exception, so the users are not confused.
> Also, the original author of ZooKeeperHiveLockManager maybe planned to handle 
> different ZooKeeperExceptions differently, and the code is hard to 
> understand. See the {{continue}} and the {{break}}. The {{break}} only breaks 
> the switch, and not the loop which IMHO is not intuitive:
> {code}
> do {
>   try {
> [..]
> ret = lockPrimitive(key, mode, keepAlive, parentCreated, 
>   } catch (Exception e1) {
> if (e1 instanceof KeeperException) {
>   KeeperException e = (KeeperException) e1;
>   switch (e.code()) {
>   case CONNECTIONLOSS:
>   case OPERATIONTIMEOUT:
> LOG.debug("Possibly transient ZooKeeper exception: ", e);
> continue;
>   default:
> LOG.error("Serious Zookeeper exception: ", e);
> break;
>   }
> }
> [..]
>   }
> } while (tryNum < numRetriesForLock);
> {code}
> If we do not want to try again in case of a "Serious Zookeeper exception:", 
> then we should add a label to the do loop, and break it in the switch.
> If we do want to try regardless of the type of the ZK exception, then we 
> should just change the {{continue;}} to {{break;}} and move the lines part of 
> the code which did not run in case of {{continue}} to the {{default}} switch, 
> so it is easier to understand the code.
> Any suggestions or ideas [~ctang.ma] or [~szehon]?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989760#comment-15989760
 ] 

Hive QA commented on HIVE-15795:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864833/HIVE-15795.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_16] 
(batchId=234)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4932/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4932/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4932/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864833 - PreCommit-HIVE-Build

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, 
> HIVE-15795.3.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989743#comment-15989743
 ] 

Hive QA commented on HIVE-15795:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864833/HIVE-15795.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4931/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4931/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4931/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864833 - PreCommit-HIVE-Build

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, 
> HIVE-15795.3.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-04-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Patch Available  (was: Open)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, 
> HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, 
> HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, 
> HIVE-15160.12.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-04-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Open  (was: Patch Available)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, 
> HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, 
> HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, 
> HIVE-15160.12.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-04-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Open  (was: Patch Available)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, 
> HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, 
> HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, 
> HIVE-15160.12.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-04-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Patch Available  (was: Open)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, 
> HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, 
> HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, 
> HIVE-15160.12.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-04-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Attachment: HIVE-15160.12.patch

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, 
> HIVE-15160.07.patch, HIVE-15160.08.patch, HIVE-15160.09.patch, 
> HIVE-15160.09.patch, HIVE-15160.10.patch, HIVE-15160.11.patch, 
> HIVE-15160.12.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16520) Cache hive metadata in metastore

2017-04-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989681#comment-15989681
 ] 

Thejas M Nair commented on HIVE-16520:
--

+1 to updated patch in pull request.

> Cache hive metadata in metastore
> 
>
> Key: HIVE-16520
> URL: https://issues.apache.org/jira/browse/HIVE-16520
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, 
> HIVE-16520-proto.patch
>
>
> During Hive 2 benchmark, we find Hive metastore operation take a lot of time 
> and thus slow down Hive compilation. In some extreme case, it takes much 
> longer than the actual query run time. Especially, we find the latency of 
> cloud db is very high and 90% of total query runtime is waiting for metastore 
> SQL database operations. Based on this observation, the metastore operation 
> performance will be greatly enhanced if we have a memory structure which 
> cache the database query result.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16399) create an index for tc_txnid in TXN_COMPONENTS

2017-04-28 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989552#comment-15989552
 ] 

Wei Zheng commented on HIVE-16399:
--

At this moment I'm confused by Hive's release plan. Seems we're going to have a 
2.3 release before 2.2. I'm not sure how the upgrade scripts should be.

> create an index for tc_txnid in TXN_COMPONENTS
> --
>
> Key: HIVE-16399
> URL: https://issues.apache.org/jira/browse/HIVE-16399
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-16399.branch-2.3.patch, HIVE-16399.branch-2.patch, 
> HIVE-16399.master.patch
>
>
> w/o this TxnStore.cleanEmptyAbortedTxns() can be very slow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2017-04-28 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989539#comment-15989539
 ] 

Wei Zheng commented on HIVE-12636:
--

It looks good in general. A few minor comments:

Missed "break;" in Driver.startImplicitTxn() after case COMMIT and ROLLBACK?

Comment for "return 10;" in Driver.compile() ? Same block, "long txnid = 
txnManager.openTxn(ctx, userFromUGI);", txnid is not used.

Not too sure about this in Driver.recordValidTxns()
{code}
if(oldList != null) {
  throw new IllegalStateException("calling recordValidTxn() more than once 
in the same " +
JavaUtils.txnIdToString(txnMgr.getCurrentTxnId()));
}
{code}

"userFromUGI" in Driver.getUserFromUGI() is no longer used.

In SemanticAnalyzerFactory, should they stay?
//commandType.put(HiveParser.TOK_UPDATE_TABLE, 
HiveOperation.SQLUPDATE);//HIVE-16443
//commandType.put(HiveParser.TOK_DELETE_FROM, HiveOperation.SQLDELETE);
//commandType.put(HiveParser.TOK_MERGE, HiveOperation.SQLMERGE);
//   INSERT, INSERT OVERWRITE, 

Why would some stats estimate in q.out files change?

> Ensure that all queries (with DbTxnManager) run in a transaction
> 
>
> Key: HIVE-12636
> URL: https://issues.apache.org/jira/browse/HIVE-12636
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, 
> HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, 
> HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, 
> HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, 
> HIVE-12636.17.patch
>
>
> Assuming Hive is using DbTxnManager
> Currently (as of this writing only auto commit mode is supported), only 
> queries that write to an Acid table start a transaction.
> Read-only queries don't open a txn but still acquire locks.
> This makes internal structures confusing/odd.
> The are constantly 2 code paths to deal with which is inconvenient and error 
> prone.
> Also, a txn id is convenient "handle" for all locks/resources within a txn.
> Doing thing would mean the client no longer needs to track locks that it 
> acquired.  This enables further improvements to metastore side of Acid.
> # add metastore call to openTxn() and acquireLocks() in a single call.  this 
> it to make sure perf doesn't degrade for read-only query.  (Would also be 
> useful for auto commit write queries)
> # Should RO queries generate txn ids from the same sequence?  (they could for 
> example use negative values of a different sequence).  Txnid is part of the 
> delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
> we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
> there is value in being able to pick txn id and commit timestamp out of the 
> same logical sequence.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989530#comment-15989530
 ] 

Hive QA commented on HIVE-16488:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865591/HIVE-16488.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10636 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4930/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4930/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4930/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865591 - PreCommit-HIVE-Build

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989469#comment-15989469
 ] 

Hive QA commented on HIVE-16558:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865510/HIVE-16558.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_handler_snapshot]
 (batchId=93)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4929/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4929/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4929/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865510 - PreCommit-HIVE-Build

> In the hiveserver2.jsp Closed Queries table under the data click Drilldown 
> Link view details, the Chinese show garbled
> --
>
> Key: HIVE-16558
> URL: https://issues.apache.org/jira/browse/HIVE-16558
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Fix For: 3.0.0
>
> Attachments: HIVE-16558.1.patch
>
>
> In QueryProfileImpl.jamon,We see the following settings:
> 
> 
>   
> 
> HiveServer2
> 
> 
> 
> 
> 
>   
> So we should set the response code to utf-8, which can avoid Chinese garbled 
> or other languages,Please check it!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16456) Kill spark job when InterruptedException happens or driverContext.isShutdown is true.

2017-04-28 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989448#comment-15989448
 ] 

zhihai xu commented on HIVE-16456:
--

Thanks [~xuefuz]! I created a Review Request for my patch at the following RB 
link:
https://reviews.apache.org/r/58856/

> Kill spark job when InterruptedException happens or driverContext.isShutdown 
> is true.
> -
>
> Key: HIVE-16456
> URL: https://issues.apache.org/jira/browse/HIVE-16456
> Project: Hive
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-16456.000.patch
>
>
> Kill spark job when InterruptedException happens or driverContext.isShutdown 
> is true. If the InterruptedException happened in RemoteSparkJobMonitor and 
> LocalSparkJobMonitor, it will be better to kill the job. Also there is a race 
> condition between submit the spark job and query/operation cancellation, it 
> will be better to check driverContext.isShutdown right after submit the spark 
> job. This will guarantee the job being killed no matter when shutdown is 
> called. It is similar as HIVE-15997.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain

2017-04-28 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989416#comment-15989416
 ] 

Chao Sun commented on HIVE-16552:
-

[~xuefuz] Could you open a RB for this? Thanks.

> Limit the number of tasks a Spark job may contain
> -
>
> Key: HIVE-16552
> URL: https://issues.apache.org/jira/browse/HIVE-16552
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16552.patch
>
>
> It's commonly desirable to block bad and big queries that takes a lot of YARN 
> resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to 
> stop a query that invokes a Spark job that contains too many tasks. The 
> proposal here is to introduce hive.spark.job.max.tasks with a default value 
> of -1 (no limit), which an admin can set to block queries that trigger too 
> many spark tasks.
> Please note that this control knob applies to a spark job, though it's 
> possible that one query can trigger multiple Spark jobs (such as in case of 
> map-join). Nevertheless, the proposed approach is still helpful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989404#comment-15989404
 ] 

Hive QA commented on HIVE-15642:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865589/HIVE-15642.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=217)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4928/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4928/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4928/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865589 - PreCommit-HIVE-Build

> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-28 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989358#comment-15989358
 ] 

Josh Elser commented on HIVE-15795:
---

+1 on the addendum from me.

[~sershe], would you be able to commit this addendum after the 24hr period, 
please?

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, 
> HIVE-15795.3.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989319#comment-15989319
 ] 

Hive QA commented on HIVE-16524:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864874/HIVE-16524.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct]
 (batchId=109)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4927/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4927/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4927/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864874 - PreCommit-HIVE-Build

> Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
> 
>
> Key: HIVE-16524
> URL: https://issues.apache.org/jira/browse/HIVE-16524
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16524.1.patch
>
>
> The Id attribute is defined in w3c as follows:
> 1.The id attribute specifies the unique id of the HTML element.
> 2.Id must be unique in the HTML document.
> 3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or 
> by CSS to change or add a style to an element with the specified id.
> But,the "id='attributes_table'"  in hiveserver2.jsp and 
> QueryProfileTmpl.jamon:
> 1.Not quoted by any css and js
> 2.It has the same id attribute name on the same page
> So I suggest removing this id attribute definition,Please Check It.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList

2017-04-28 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989305#comment-15989305
 ] 

Wei Zheng commented on HIVE-16534:
--

We can drop this:
{code}
while (txnId <= maxTxnId) {
  firstAbortedTxnIndex = Arrays.binarySearch(exceptions, txnId);
  if (firstAbortedTxnIndex >= 0) {
break;
  }
  txnId++;
}
{code}
The main usage of above code is to locate the index for first aborted txn in 
the range so that we can save some unnecessary iterations when scanning the 
BitSet. But in your example which is very likely to be a common situation, this 
is not acceptable.

Considering the BitSet is not big (comparing to the gap between 5 and 100), 
we can just start from index 0 and scan thru the BitSet. I think this should be 
ok.

> Add capability to tell aborted transactions apart from open transactions in 
> ValidTxnList
> 
>
> Key: HIVE-16534
> URL: https://issues.apache.org/jira/browse/HIVE-16534
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch
>
>
> Currently in ValidReadTxnList, open transactions and aborted transactions are 
> stored together in one array. That makes it impossible to extract just 
> aborted transactions or open transactions.
> For ValidCompactorTxnList this is fine, since we only store aborted 
> transactions but no open transactions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList

2017-04-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989294#comment-15989294
 ] 

Eugene Koifman commented on HIVE-16534:
---

another thought: I think implementation of isTxnRangeAborted() is problematic
suppose we do an insert in Table1/part1 with txnid=5.  Then there is no 
activity on this table for a month.
Then there is another insert into Table1/part1 with txnid=100.
After compaction we get a delta_5_100.

so now this method is going to do 1M binary searches

If (isAborted(minTxnId) && isAborted(maxTxnId) && (the number of on bits in 
BitSet between index of minTxnId and maxTxnId is max - min + 1) - then all txns 
in range in question are aborted - this gives ALL

I'm not sure how to do NONE/SOME efficiently

> Add capability to tell aborted transactions apart from open transactions in 
> ValidTxnList
> 
>
> Key: HIVE-16534
> URL: https://issues.apache.org/jira/browse/HIVE-16534
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch
>
>
> Currently in ValidReadTxnList, open transactions and aborted transactions are 
> stored together in one array. That makes it impossible to extract just 
> aborted transactions or open transactions.
> For ValidCompactorTxnList this is fine, since we only store aborted 
> transactions but no open transactions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16560) Avoid hive UDF jars to be dependent on HIVEServer2 auxilliary path deployment

2017-04-28 Thread Krish Dey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krish Dey updated HIVE-16560:
-
Affects Version/s: 1.1.1
   1.2.2
   2.1.0

> Avoid hive UDF jars to be dependent on HIVEServer2 auxilliary path deployment
> -
>
> Key: HIVE-16560
> URL: https://issues.apache.org/jira/browse/HIVE-16560
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 1.1.1, 1.2.2, 2.1.0
>Reporter: Krish Dey
>Priority: Minor
>
> Hive UDFs need deployment in HIVE Server 2 auxilliary path, even with the 
> reloadable jars feature if the same Class already been loaded it wont load 
> the class again.
> One improvement could be to remove the dependency of deploying this in 
> HiveServer 2 and let it load from the HDFS path itself.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction

2017-04-28 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16553:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Commited to master

> Change default value for hive.tez.bigtable.minsize.semijoin.reduction
> -
>
> Key: HIVE-16553
> URL: https://issues.apache.org/jira/browse/HIVE-16553
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16553.1.patch
>
>
> Current value is 1M rows, would like to bump this up to make sure we are not 
> creating semjoin optimizations on dimension tables, since having too many 
> semijoin optimizations can cause serialized execution of tasks if lots of 
> tasks are waiting for semijoin optimizations to be computed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList

2017-04-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989271#comment-15989271
 ] 

Eugene Koifman commented on HIVE-16534:
---


bq. I do serialize the BitSet into a byte array before sending it over Thrift 
interface. After receiving it I convert it back to BitSet since the bit 
manipulation is convenient.

I meant in writeToString() - seems like that would make reading from string 
much simper/efficient

You are right about the other points

> Add capability to tell aborted transactions apart from open transactions in 
> ValidTxnList
> 
>
> Key: HIVE-16534
> URL: https://issues.apache.org/jira/browse/HIVE-16534
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch
>
>
> Currently in ValidReadTxnList, open transactions and aborted transactions are 
> stored together in one array. That makes it impossible to extract just 
> aborted transactions or open transactions.
> For ValidCompactorTxnList this is fine, since we only store aborted 
> transactions but no open transactions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats

2017-04-28 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989269#comment-15989269
 ] 

Chaoyu Tang commented on HIVE-16147:


[~pxiong] Thanks for looking into this. Yeah, I made some changes to fix the 
test failures and also optimized the code a little. I have uploaded the 2nd 
patch to RB requesting for the review.

> Rename a partitioned table should not drop its partition columns stats
> --
>
> Key: HIVE-16147
> URL: https://issues.apache.org/jira/browse/HIVE-16147
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to 
> sample_pt_rename), describing its partition shows that the partition column 
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
> for all columns are true
> {code}
> ...
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
> stats exists
> {code}
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> salaryint 1   151370  
> 0   94
>   
> from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): 
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for 
> columns are still true.
> {code}
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt_rename 
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: 
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358  
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the 
> column stats have been dropped.
> {code}
> # col_namedata_type   comment 
>  
>   
>  
> salaryint from deserializer   
>  
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989265#comment-15989265
 ] 

Hive QA commented on HIVE-16488:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865530/HIVE-16488.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10636 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4926/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4926/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4926/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865530 - PreCommit-HIVE-Build

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16143) Improve msck repair batching

2017-04-28 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16143:
---
Attachment: HIVE-16143.03.patch

> Improve msck repair batching
> 
>
> Key: HIVE-16143
> URL: https://issues.apache.org/jira/browse/HIVE-16143
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, 
> HIVE-16143.03.patch
>
>
> Currently, the {{msck repair table}} command batches the number of partitions 
> created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. 
> Following snippet shows the batching logic. There can be couple of 
> improvements to this batching logic:
> {noformat} 
> int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
>   if (batch_size > 0 && partsNotInMs.size() > batch_size) {
> int counter = 0;
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   counter++;
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
>   if (counter % batch_size == 0 || counter == 
> partsNotInMs.size()) {
> db.createPartitions(apd);
> apd = new AddPartitionDesc(table.getDbName(), 
> table.getTableName(), false);
>   }
> }
>   } else {
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
> }
> db.createPartitions(apd);
>   }
> } catch (Exception e) {
>   LOG.info("Could not bulk-add partitions to metastore; trying one by 
> one", e);
>   repairOutput.clear();
>   msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
> }
> {noformat}
> 1. If the batch size is too aggressive the code falls back to adding 
> partitions one by one which is almost always very slow. It is easily possible 
> that users increase the batch size to higher value to make the command run 
> faster but end up with a worse performance because code falls back to adding 
> one by one. Users are then expected to determine the tuned value of batch 
> size which works well for their environment. I think the code could handle 
> this situation better by exponentially decaying the batch size instead of 
> falling back to one by one.
> 2. The other issue with this implementation is if lets say first batch 
> succeeds and the second one fails, the code tries to add all the partitions 
> one by one irrespective of whether some of the were successfully added or 
> not. If we need to fall back to one by one we should atleast remove the ones 
> which we know for sure are already added successfully.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2017-04-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16484:

Attachment: HIVE-16484.7.patch

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, 
> HIVE-16484.3.patch, HIVE-16484.4.patch, HIVE-16484.5.patch, 
> HIVE-16484.6.patch, HIVE-16484.7.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16527:

Attachment: HIVE-16527.03.patch

patch .03 added values file and non-explain selects to .q

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch, 
> HIVE-16527.03.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16520) Cache hive metadata in metastore

2017-04-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989246#comment-15989246
 ] 

ASF GitHub Bot commented on HIVE-16520:
---

GitHub user daijyc opened a pull request:

https://github.com/apache/hive/pull/173

HIVE-16520: Cache hive metadata in metastore



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/daijyc/hive master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #173


commit 24fed179e00b1f323e218b2eba2c07ab5124a9e3
Author: Daniel Dai 
Date:   2017-04-28T00:08:11Z

HIVE-16520: Cache hive metadata in metastore




> Cache hive metadata in metastore
> 
>
> Key: HIVE-16520
> URL: https://issues.apache.org/jira/browse/HIVE-16520
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, 
> HIVE-16520-proto.patch
>
>
> During Hive 2 benchmark, we find Hive metastore operation take a lot of time 
> and thus slow down Hive compilation. In some extreme case, it takes much 
> longer than the actual query run time. Especially, we find the latency of 
> cloud db is very high and 90% of total query runtime is waiting for metastore 
> SQL database operations. Based on this observation, the metastore operation 
> performance will be greatly enhanced if we have a memory structure which 
> cache the database query result.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Status: Patch Available  (was: Open)

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Attachment: HIVE-16488.02.patch

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore

2017-04-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-16520:
--
Attachment: HIVE-16520-1.patch

> Cache hive metadata in metastore
> 
>
> Key: HIVE-16520
> URL: https://issues.apache.org/jira/browse/HIVE-16520
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, 
> HIVE-16520-proto.patch
>
>
> During Hive 2 benchmark, we find Hive metastore operation take a lot of time 
> and thus slow down Hive compilation. In some extreme case, it takes much 
> longer than the actual query run time. Especially, we find the latency of 
> cloud db is very high and 90% of total query runtime is waiting for metastore 
> SQL database operations. Based on this observation, the metastore operation 
> performance will be greatly enhanced if we have a memory structure which 
> cache the database query result.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList

2017-04-28 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989239#comment-15989239
 ] 

Wei Zheng commented on HIVE-16534:
--

The sorting of exceptions in ValidReadTxnList is troublesome for the 
accompanying BitSet, as we have to sort the BitSet in the same manner. So I 
removed the sorting logic in the ctor and added "oder by txn_id" to 
TxnHandler.getOpenTxns so we don't need to worry about sorting later on.

It's true that we always have 3 ':'. But if some fields are missing, e.g. 
"1:2::", then String.split() will only return an array of size 2.

I do serialize the BitSet into a byte array before sending it over Thrift 
interface. After receiving it I convert it back to BitSet since the bit 
manipulation is convenient.

I need to binary search in isTxnAborted() to get the index for the txnid, then 
look up in the bitset using that index.

bitSet.set(0, bitSet.length()) does turn all the bits on, right?

> Add capability to tell aborted transactions apart from open transactions in 
> ValidTxnList
> 
>
> Key: HIVE-16534
> URL: https://issues.apache.org/jira/browse/HIVE-16534
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch
>
>
> Currently in ValidReadTxnList, open transactions and aborted transactions are 
> stored together in one array. That makes it impossible to extract just 
> aborted transactions or open transactions.
> For ValidCompactorTxnList this is fine, since we only store aborted 
> transactions but no open transactions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Status: Open  (was: Patch Available)

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore

2017-04-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-16520:
--
Attachment: (was: HIVE-16520-1.patch)

> Cache hive metadata in metastore
> 
>
> Key: HIVE-16520
> URL: https://issues.apache.org/jira/browse/HIVE-16520
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16520-proto-2.patch, HIVE-16520-proto.patch
>
>
> During Hive 2 benchmark, we find Hive metastore operation take a lot of time 
> and thus slow down Hive compilation. In some extreme case, it takes much 
> longer than the actual query run time. Especially, we find the latency of 
> cloud db is very high and 90% of total query runtime is waiting for metastore 
> SQL database operations. Based on this observation, the metastore operation 
> performance will be greatly enhanced if we have a memory structure which 
> cache the database query result.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Attachment: (was: HIVE-16488.02.patch)

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15642:

Status: Open  (was: Patch Available)

> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15642:

Status: Patch Available  (was: Open)

> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15642:

Attachment: (was: HIVE-15642.02.patch)

> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15642:

Attachment: HIVE-15642.02.patch

> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good

2017-04-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16523:

   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to some branches. Thanks for the update/review!

> VectorHashKeyWrapper hash code for strings is not so good
> -
>
> Key: HIVE-16523
> URL: https://issues.apache.org/jira/browse/HIVE-16523
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16523.01.patch, HIVE-16523.02.patch, 
> HIVE-16523.patch
>
>
> Perf issues in vectorized gby on some string keys



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16143) Improve msck repair batching

2017-04-28 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16143:
---
Attachment: HIVE-16143.02.patch

Fixed the msck q.out files

> Improve msck repair batching
> 
>
> Key: HIVE-16143
> URL: https://issues.apache.org/jira/browse/HIVE-16143
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch
>
>
> Currently, the {{msck repair table}} command batches the number of partitions 
> created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. 
> Following snippet shows the batching logic. There can be couple of 
> improvements to this batching logic:
> {noformat} 
> int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
>   if (batch_size > 0 && partsNotInMs.size() > batch_size) {
> int counter = 0;
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   counter++;
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
>   if (counter % batch_size == 0 || counter == 
> partsNotInMs.size()) {
> db.createPartitions(apd);
> apd = new AddPartitionDesc(table.getDbName(), 
> table.getTableName(), false);
>   }
> }
>   } else {
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
> }
> db.createPartitions(apd);
>   }
> } catch (Exception e) {
>   LOG.info("Could not bulk-add partitions to metastore; trying one by 
> one", e);
>   repairOutput.clear();
>   msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
> }
> {noformat}
> 1. If the batch size is too aggressive the code falls back to adding 
> partitions one by one which is almost always very slow. It is easily possible 
> that users increase the batch size to higher value to make the command run 
> faster but end up with a worse performance because code falls back to adding 
> one by one. Users are then expected to determine the tuned value of batch 
> size which works well for their environment. I think the code could handle 
> this situation better by exponentially decaying the batch size instead of 
> falling back to one by one.
> 2. The other issue with this implementation is if lets say first batch 
> succeeds and the second one fails, the code tries to add all the partitions 
> one by one irrespective of whether some of the were successfully added or 
> not. If we need to fall back to one by one we should atleast remove the ones 
> which we know for sure are already added successfully.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore

2017-04-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-16520:
--
Attachment: HIVE-16520-1.patch

> Cache hive metadata in metastore
> 
>
> Key: HIVE-16520
> URL: https://issues.apache.org/jira/browse/HIVE-16520
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16520-1.patch, HIVE-16520-proto-2.patch, 
> HIVE-16520-proto.patch
>
>
> During Hive 2 benchmark, we find Hive metastore operation take a lot of time 
> and thus slow down Hive compilation. In some extreme case, it takes much 
> longer than the actual query run time. Especially, we find the latency of 
> cloud db is very high and 90% of total query runtime is waiting for metastore 
> SQL database operations. Based on this observation, the metastore operation 
> performance will be greatly enhanced if we have a memory structure which 
> cache the database query result.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-28 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16346:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-28 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16346:

Status: Patch Available  (was: Reopened)

There are some files not renamed properly during applying the patch. 
Resubmitted the patch.

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-28 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989219#comment-15989219
 ] 

Aihua Xu edited comment on HIVE-16346 at 4/28/17 5:46 PM:
--

There are some files not renamed properly during applying the patch. Patch 
recommitted.


was (Author: aihuaxu):
There are some files not renamed properly during applying the patch. 
Resubmitted the patch.

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats

2017-04-28 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989211#comment-15989211
 ] 

Pengcheng Xiong commented on HIVE-16147:


[~ctang.ma], may i ask what did u change from the 1st patch? thanks.

> Rename a partitioned table should not drop its partition columns stats
> --
>
> Key: HIVE-16147
> URL: https://issues.apache.org/jira/browse/HIVE-16147
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to 
> sample_pt_rename), describing its partition shows that the partition column 
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
> for all columns are true
> {code}
> ...
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
> stats exists
> {code}
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> salaryint 1   151370  
> 0   94
>   
> from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): 
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for 
> columns are still true.
> {code}
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt_rename 
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: 
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358  
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the 
> column stats have been dropped.
> {code}
> # col_namedata_type   comment 
>  
>   
>  
> salaryint from deserializer   
>  
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain

2017-04-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989201#comment-15989201
 ] 

Xuefu Zhang commented on HIVE-16552:


hi [~csun] and [~lirui], could you please review the changes? Thanks.

> Limit the number of tasks a Spark job may contain
> -
>
> Key: HIVE-16552
> URL: https://issues.apache.org/jira/browse/HIVE-16552
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16552.patch
>
>
> It's commonly desirable to block bad and big queries that takes a lot of YARN 
> resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to 
> stop a query that invokes a Spark job that contains too many tasks. The 
> proposal here is to introduce hive.spark.job.max.tasks with a default value 
> of -1 (no limit), which an admin can set to block queries that trigger too 
> many spark tasks.
> Please note that this control knob applies to a spark job, though it's 
> possible that one query can trigger multiple Spark jobs (such as in case of 
> map-join). Nevertheless, the proposed approach is still helpful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception

2017-04-28 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16213:
---
Attachment: HIVE-16213.08.patch

> ObjectStore can leak Queries when rollbackTransaction throws an exception
> -
>
> Key: HIVE-16213
> URL: https://issues.apache.org/jira/browse/HIVE-16213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Alexander Kolbasov
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, 
> HIVE-16213.03.patch, HIVE-16213.04.patch, HIVE-16213.05.patch, 
> HIVE-16213.06.patch, HIVE-16213.07.patch, HIVE-16213.08.patch
>
>
> In ObjectStore.java there are a few places with the code similar to:
> {code}
> Query query = null;
> try {
>   openTransaction();
>   query = pm.newQuery(Something.class);
>   ...
>   commited = commitTransaction();
> } finally {
>   if (!commited) {
> rollbackTransaction();
>   }
>   if (query != null) {
> query.closeAll();
>   }
> }
> {code}
> The problem is that rollbackTransaction() may throw an exception in which 
> case query.closeAll() wouldn't be executed. 
> The fix would be to wrap rollbackTransaction in its own try-catch block.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception

2017-04-28 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989199#comment-15989199
 ] 

Vihang Karajgaonkar commented on HIVE-16213:


org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)

are known flaky tests

I think are related to some PigServer setup issues during the run. I ran them 
locally and they were successful. Submitted them again to confirm.

org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteDate (batchId=178)
org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteVarchar (batchId=178)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testStoreFuncAllSimpleTypes 
(batchId=178)


> ObjectStore can leak Queries when rollbackTransaction throws an exception
> -
>
> Key: HIVE-16213
> URL: https://issues.apache.org/jira/browse/HIVE-16213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Alexander Kolbasov
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, 
> HIVE-16213.03.patch, HIVE-16213.04.patch, HIVE-16213.05.patch, 
> HIVE-16213.06.patch, HIVE-16213.07.patch
>
>
> In ObjectStore.java there are a few places with the code similar to:
> {code}
> Query query = null;
> try {
>   openTransaction();
>   query = pm.newQuery(Something.class);
>   ...
>   commited = commitTransaction();
> } finally {
>   if (!commited) {
> rollbackTransaction();
>   }
>   if (query != null) {
> query.closeAll();
>   }
> }
> {code}
> The problem is that rollbackTransaction() may throw an exception in which 
> case query.closeAll() wouldn't be executed. 
> The fix would be to wrap rollbackTransaction in its own try-catch block.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon

2017-04-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989195#comment-15989195
 ] 

Xuefu Zhang commented on HIVE-16524:


+1

> Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
> 
>
> Key: HIVE-16524
> URL: https://issues.apache.org/jira/browse/HIVE-16524
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16524.1.patch
>
>
> The Id attribute is defined in w3c as follows:
> 1.The id attribute specifies the unique id of the HTML element.
> 2.Id must be unique in the HTML document.
> 3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or 
> by CSS to change or add a style to an element with the specified id.
> But,the "id='attributes_table'"  in hiveserver2.jsp and 
> QueryProfileTmpl.jamon:
> 1.Not quoted by any css and js
> 2.It has the same id attribute name on the same page
> So I suggest removing this id attribute definition,Please Check It.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989180#comment-15989180
 ] 

Hive QA commented on HIVE-16484:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865465/HIVE-16484.6.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles (batchId=280)
org.apache.hive.spark.client.TestSparkClient.testCounters (batchId=280)
org.apache.hive.spark.client.TestSparkClient.testErrorJob (batchId=280)
org.apache.hive.spark.client.TestSparkClient.testJobSubmission (batchId=280)
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection (batchId=280)
org.apache.hive.spark.client.TestSparkClient.testRemoteClient (batchId=280)
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob (batchId=280)
org.apache.hive.spark.client.TestSparkClient.testSyncRpc (batchId=280)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4925/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4925/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4925/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865465 - PreCommit-HIVE-Build

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, 
> HIVE-16484.3.patch, HIVE-16484.4.patch, HIVE-16484.5.patch, HIVE-16484.6.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList

2017-04-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989172#comment-15989172
 ] 

Eugene Koifman edited comment on HIVE-16534 at 4/28/17 5:16 PM:


you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - 
why?
writeToString() always creates 3 ':' - why does the deserializer need cases 
like _if (values.length < 3) {_

wouldn't be simpler to just serialize the BitSet as "0010110" - it's very 
compact and the deserializer wouldn't have to sort and do multiple binary 
searches 

why does _isTxnAborted()_ need a binary search?  why not just look up the in 
the bitset?

_bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in 
exceptio_ - shouldn't this turn all the bits ON?
Nit: seems like ValidCompactorTxnList() c'tor could do this  since it's always 
the case for compactor




was (Author: ekoifman):
you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - 
why?
writeToString() always creates 3 ':' - why does the deserializer need cases 
like _if (values.length < 3) {_

wouldn't be simpler to just serialize the BitSet as "0010110" - it's very 
compact and the deserializer wouldn't have to sort and do multiple binary 
searches 

why does _ isTxnAborted()_ need a binary search?  why not just look up the in 
the bitset?

_bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in 
exceptio_ - shouldn't this turn all the bits ON?
Nit: seems like ValidCompactorTxnList() c'tor could do this  since it's always 
the case for compactor



> Add capability to tell aborted transactions apart from open transactions in 
> ValidTxnList
> 
>
> Key: HIVE-16534
> URL: https://issues.apache.org/jira/browse/HIVE-16534
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch
>
>
> Currently in ValidReadTxnList, open transactions and aborted transactions are 
> stored together in one array. That makes it impossible to extract just 
> aborted transactions or open transactions.
> For ValidCompactorTxnList this is fine, since we only store aborted 
> transactions but no open transactions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList

2017-04-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989172#comment-15989172
 ] 

Eugene Koifman edited comment on HIVE-16534 at 4/28/17 5:15 PM:


you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - 
why?
writeToString() always creates 3 ':' - why does the deserializer need cases 
like _if (values.length < 3) {_

wouldn't be simpler to just serialize the BitSet as "0010110" - it's very 
compact and the deserializer wouldn't have to sort and do multiple binary 
searches 

why does _ isTxnAborted()_ need a binary search?  why not just look up the in 
the bitset?

_bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in 
exceptio_ - shouldn't this turn all the bits ON?
Nit: seems like ValidCompactorTxnList() c'tor could do this  since it's always 
the case for compactor




was (Author: ekoifman):
you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - 
why?
writeToString() always creates 3 ':' - why does the deserializer need cases 
like_if (values.length < 3) {_

wouldn't be simpler to just serialize the BitSet as "0010110" - it's very 
compact and the deserializer wouldn't have to sort and do multiple binary 
searches 

why does _ isTxnAborted()_ need a binary search?  why not just look up the in 
the bitset?

_bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in 
exceptio_ - shouldn't this turn all the bits ON?
Nit: seems like ValidCompactorTxnList() c'tor could do this  since it's always 
the case for compactor



> Add capability to tell aborted transactions apart from open transactions in 
> ValidTxnList
> 
>
> Key: HIVE-16534
> URL: https://issues.apache.org/jira/browse/HIVE-16534
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch
>
>
> Currently in ValidReadTxnList, open transactions and aborted transactions are 
> stored together in one array. That makes it impossible to extract just 
> aborted transactions or open transactions.
> For ValidCompactorTxnList this is fine, since we only store aborted 
> transactions but no open transactions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16534) Add capability to tell aborted transactions apart from open transactions in ValidTxnList

2017-04-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989172#comment-15989172
 ] 

Eugene Koifman commented on HIVE-16534:
---

you refactored ValidReadTxnList() c'tor and removed the sorting of exceptions - 
why?
writeToString() always creates 3 ':' - why does the deserializer need cases 
like_if (values.length < 3) {_

wouldn't be simpler to just serialize the BitSet as "0010110" - it's very 
compact and the deserializer wouldn't have to sort and do multiple binary 
searches 

why does _ isTxnAborted()_ need a binary search?  why not just look up the in 
the bitset?

_bitSet.set(0, bitSet.length()); // for ValidCompactorTxnList, everything in 
exceptio_ - shouldn't this turn all the bits ON?
Nit: seems like ValidCompactorTxnList() c'tor could do this  since it's always 
the case for compactor



> Add capability to tell aborted transactions apart from open transactions in 
> ValidTxnList
> 
>
> Key: HIVE-16534
> URL: https://issues.apache.org/jira/browse/HIVE-16534
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-16534.1.patch, HIVE-16534.2.patch
>
>
> Currently in ValidReadTxnList, open transactions and aborted transactions are 
> stored together in one array. That makes it impossible to extract just 
> aborted transactions or open transactions.
> For ValidCompactorTxnList this is fine, since we only store aborted 
> transactions but no open transactions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats

2017-04-28 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989168#comment-15989168
 ] 

Chaoyu Tang commented on HIVE-16147:


The only one test failure is not related to this patch. [~pxiong] could you 
review the patch? Thanks

> Rename a partitioned table should not drop its partition columns stats
> --
>
> Key: HIVE-16147
> URL: https://issues.apache.org/jira/browse/HIVE-16147
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to 
> sample_pt_rename), describing its partition shows that the partition column 
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
> for all columns are true
> {code}
> ...
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
> stats exists
> {code}
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> salaryint 1   151370  
> 0   94
>   
> from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): 
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for 
> columns are still true.
> {code}
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt_rename 
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: 
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358  
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the 
> column stats have been dropped.
> {code}
> # col_namedata_type   comment 
>  
>   
>  
> salaryint from deserializer   
>  
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15571) Support Insert into for druid storage handler

2017-04-28 Thread Nishant Bangarwa (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989154#comment-15989154
 ] 

Nishant Bangarwa commented on HIVE-15571:
-

[~jcamachorodriguez] your understanding is correct, this is not safe to be 
execute multiple insert into in parallel. 
When we create a druid segment we need to allocate a version and a shardSpec to 
it based on existing segments, In case of druid it is easier to manage as the 
overlord manages segment allocation and uses interval based locks, which can 
handle multiple tasks as the allocation and locking is done at a central place. 
In the case of hive insert into, i was planning to lock the complete datasource 
for the first version and later on extend that to only lock the intervals for 
which data is being ingested in a subsequent PR . For interval based locking we 
would need to figure out the interval for the data being ingested in 
StorageHandler preInsert, which we don't know currently.  

> Support Insert into for druid storage handler
> -
>
> Key: HIVE-15571
> URL: https://issues.apache.org/jira/browse/HIVE-15571
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
> Attachments: HIVE-15571.01.patch
>
>
> Add support of inset into operator for druid storage handler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16513) width_bucket issues

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989103#comment-15989103
 ] 

Hive QA commented on HIVE-16513:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865462/HIVE-16513.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4924/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4924/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4924/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865462 - PreCommit-HIVE-Build

> width_bucket issues
> ---
>
> Key: HIVE-16513
> URL: https://issues.apache.org/jira/browse/HIVE-16513
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Sahil Takiar
> Attachments: HIVE-16513.1.patch, HIVE-16513.2.patch
>
>
> width_bucket was recently added with HIVE-15982. This ticket notes a few 
> issues.
> Usability issue:
> Currently only accepts integral numeric types. Decimals, floats and doubles 
> are not supported.
> Runtime failures: This query will cause a runtime divide-by-zero in the 
> reduce stage.
> select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1;
> The divide-by-zero seems to trigger any time I use a group-by. Here's another 
> example (that actually requires the group-by):
> select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1;
> Advanced Usage Issues:
> Suppose you have a table e011_01 as follows:
> create table e011_01 (c1 integer, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> Compile-time problems:
> You cannot use simple case expressions, searched case expressions or grouping 
> sets. These queries fail:
> select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) 
> from e011_01;
> select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) 
> from e011_01;
> select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as 
> integer)) from e011_02 group by cube(c1, c2);
> I'll admit the grouping one is pretty contrived but the case ones seem 
> straightforward, valid, and it's strange that they don't work. Similar 
> queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe 
> [~ashutoshc] can lend some perspective on that?
> Interestingly, you can use window functions in width_bucket, example:
> select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01;
> works just fine. Hopefully we can get to a place where people implementing 
> functions like this don't need to think about value expression support but we 
> don't seem to be there yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16542) make merge that targets acid 2.0 table fail-fast

2017-04-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16542:
--
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   2.3.0
   Status: Resolved  (was: Patch Available)

committed to branch-2.3, branch-2, master
thanks Wei for the review

> make merge that targets acid 2.0 table fail-fast 
> -
>
> Key: HIVE-16542
> URL: https://issues.apache.org/jira/browse/HIVE-16542
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 2.3.0, 3.0.0, 2.4.0
>
> Attachments: HIVE-16542.01-branch-2.3.patch, 
> HIVE-16542.01-branch-2.patch, HIVE-16542.01.patch, HIVE-16542.02.patch
>
>
> Until HIVE-14947 is fixed, need to add a check so that acid 2.0 tables are 
> not written to by Merge stmt that has both Insert and Update clauses



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16546) LLAP: Fail map join tasks if hash table memory exceeds threshold

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989009#comment-15989009
 ] 

Hive QA commented on HIVE-16546:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865459/HIVE-16546.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4923/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4923/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4923/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865459 - PreCommit-HIVE-Build

> LLAP: Fail map join tasks if hash table memory exceeds threshold
> 
>
> Key: HIVE-16546
> URL: https://issues.apache.org/jira/browse/HIVE-16546
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16546.1.patch, HIVE-16546.2.patch, 
> HIVE-16546.WIP.patch
>
>
> When map join task is running in llap, it can potentially use lot more memory 
> than its limit which could be memory per executor or no conditional task 
> size. If it uses more memory, it can adversely affect other query performance 
> or it can even bring down the daemon. In such cases, it is better to fail the 
> query than to bring down the daemon. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-28 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989006#comment-15989006
 ] 

Aihua Xu commented on HIVE-16346:
-

Thanks [~ekoifman] . I just reverted the change. 

[~stakiar_impala_496e] Can you take a look?

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-28 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988990#comment-15988990
 ] 

Aihua Xu commented on HIVE-16346:
-

[~ekoifman] Sorry about that. Seems I may forget to include a new file when 
committing the change. 

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-28 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16346:

Comment: was deleted

(was: [~ekoifman] Sorry about that. Seems I may forget to include a new file 
when committing the change. )

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16542) make merge that targets acid 2.0 table fail-fast

2017-04-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16542:
--
Attachment: HIVE-16542.01-branch-2.3.patch

> make merge that targets acid 2.0 table fail-fast 
> -
>
> Key: HIVE-16542
> URL: https://issues.apache.org/jira/browse/HIVE-16542
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16542.01-branch-2.3.patch, 
> HIVE-16542.01-branch-2.patch, HIVE-16542.01.patch, HIVE-16542.02.patch
>
>
> Until HIVE-14947 is fixed, need to add a check so that acid 2.0 tables are 
> not written to by Merge stmt that has both Insert and Update clauses



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reopened HIVE-16346:
---

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988981#comment-15988981
 ] 

Eugene Koifman commented on HIVE-16346:
---

[~aihuaxu], [~stakiar] - it looks like this broke branch-2 compilation.

For example, https://builds.apache.org/job/PreCommit-HIVE-Build/4916/   
(HIVE-16542).
I'm (and others [~wei.zheng]) getting the same error compiling the branch w/o 
any changes.

Seems because HdfsUtils.java in shims-common is referring to some class in 
common before common module is compiled

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16487) Serious Zookeeper exception is logged when a race condition happens

2017-04-28 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988954#comment-15988954
 ] 

Chaoyu Tang commented on HIVE-16487:


LGTM, +1 pending tests.

> Serious Zookeeper exception is logged when a race condition happens
> ---
>
> Key: HIVE-16487
> URL: https://issues.apache.org/jira/browse/HIVE-16487
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16487.02.patch, HIVE-16487.patch
>
>
> A customer started to see this in the logs, but happily everything was 
> working as intended:
> {code}
> 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager: 
> [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /hive_zookeeper_namespace//LOCK-SHARED-
> {code}
> This was happening, because a race condition between the lock releasing, and 
> lock acquiring. The thread releasing the lock removes the parent ZK node just 
> after the thread acquiring the lock made sure, that the parent node exists.
> Since this can happen without any real problem, I plan to add NODEEXISTS, and 
> NONODE as a transient ZooKeeper exception, so the users are not confused.
> Also, the original author of ZooKeeperHiveLockManager maybe planned to handle 
> different ZooKeeperExceptions differently, and the code is hard to 
> understand. See the {{continue}} and the {{break}}. The {{break}} only breaks 
> the switch, and not the loop which IMHO is not intuitive:
> {code}
> do {
>   try {
> [..]
> ret = lockPrimitive(key, mode, keepAlive, parentCreated, 
>   } catch (Exception e1) {
> if (e1 instanceof KeeperException) {
>   KeeperException e = (KeeperException) e1;
>   switch (e.code()) {
>   case CONNECTIONLOSS:
>   case OPERATIONTIMEOUT:
> LOG.debug("Possibly transient ZooKeeper exception: ", e);
> continue;
>   default:
> LOG.error("Serious Zookeeper exception: ", e);
> break;
>   }
> }
> [..]
>   }
> } while (tryNum < numRetriesForLock);
> {code}
> If we do not want to try again in case of a "Serious Zookeeper exception:", 
> then we should add a label to the do loop, and break it in the switch.
> If we do want to try regardless of the type of the ZK exception, then we 
> should just change the {{continue;}} to {{break;}} and move the lines part of 
> the code which did not run in case of {{continue}} to the {{default}} switch, 
> so it is easier to understand the code.
> Any suggestions or ideas [~ctang.ma] or [~szehon]?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988912#comment-15988912
 ] 

Hive QA commented on HIVE-16553:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865455/HIVE-16553.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4922/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4922/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4922/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865455 - PreCommit-HIVE-Build

> Change default value for hive.tez.bigtable.minsize.semijoin.reduction
> -
>
> Key: HIVE-16553
> URL: https://issues.apache.org/jira/browse/HIVE-16553
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16553.1.patch
>
>
> Current value is 1M rows, would like to bump this up to make sure we are not 
> creating semjoin optimizations on dimension tables, since having too many 
> semijoin optimizations can cause serialized execution of tasks if lots of 
> tasks are waiting for semijoin optimizations to be computed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Attachment: HIVE-16559.01.patch

First draft containing a check to prevent the dropping of columns if the table 
is:
- partitioned
- stored in parquet
- cascade option is missing

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Fix Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Commented] (HIVE-16546) LLAP: Fail map join tasks if hash table memory exceeds threshold

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988810#comment-15988810
 ] 

Hive QA commented on HIVE-16546:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865459/HIVE-16546.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10631 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbasestats] 
(batchId=91)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=236)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4921/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4921/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4921/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865459 - PreCommit-HIVE-Build

> LLAP: Fail map join tasks if hash table memory exceeds threshold
> 
>
> Key: HIVE-16546
> URL: https://issues.apache.org/jira/browse/HIVE-16546
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16546.1.patch, HIVE-16546.2.patch, 
> HIVE-16546.WIP.patch
>
>
> When map join task is running in llap, it can potentially use lot more memory 
> than its limit which could be memory per executor or no conditional task 
> size. If it uses more memory, it can adversely affect other query performance 
> or it can even bring down the daemon. In such cases, it is better to fail the 
> query than to bring down the daemon. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16487) Serious Zookeeper exception is logged when a race condition happens

2017-04-28 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16487:
--
Attachment: HIVE-16487.02.patch

Cleaned up the code a little.
I think it is more readable this way.

> Serious Zookeeper exception is logged when a race condition happens
> ---
>
> Key: HIVE-16487
> URL: https://issues.apache.org/jira/browse/HIVE-16487
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16487.02.patch, HIVE-16487.patch
>
>
> A customer started to see this in the logs, but happily everything was 
> working as intended:
> {code}
> 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager: 
> [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /hive_zookeeper_namespace//LOCK-SHARED-
> {code}
> This was happening, because a race condition between the lock releasing, and 
> lock acquiring. The thread releasing the lock removes the parent ZK node just 
> after the thread acquiring the lock made sure, that the parent node exists.
> Since this can happen without any real problem, I plan to add NODEEXISTS, and 
> NONODE as a transient ZooKeeper exception, so the users are not confused.
> Also, the original author of ZooKeeperHiveLockManager maybe planned to handle 
> different ZooKeeperExceptions differently, and the code is hard to 
> understand. See the {{continue}} and the {{break}}. The {{break}} only breaks 
> the switch, and not the loop which IMHO is not intuitive:
> {code}
> do {
>   try {
> [..]
> ret = lockPrimitive(key, mode, keepAlive, parentCreated, 
>   } catch (Exception e1) {
> if (e1 instanceof KeeperException) {
>   KeeperException e = (KeeperException) e1;
>   switch (e.code()) {
>   case CONNECTIONLOSS:
>   case OPERATIONTIMEOUT:
> LOG.debug("Possibly transient ZooKeeper exception: ", e);
> continue;
>   default:
> LOG.error("Serious Zookeeper exception: ", e);
> break;
>   }
> }
> [..]
>   }
> } while (tryNum < numRetriesForLock);
> {code}
> If we do not want to try again in case of a "Serious Zookeeper exception:", 
> then we should add a label to the do loop, and break it in the switch.
> If we do want to try regardless of the type of the ZK exception, then we 
> should just change the {{continue;}} to {{break;}} and move the lines part of 
> the code which did not run in case of {{continue}} to the {{default}} switch, 
> so it is easier to understand the code.
> Any suggestions or ideas [~ctang.ma] or [~szehon]?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16485) Enable outputName for RS operator in explain formatted

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988752#comment-15988752
 ] 

Hive QA commented on HIVE-16485:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865443/HIVE-16485.03.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4920/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4920/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4920/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-04-28 12:43:31.851
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4920/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-04-28 12:43:31.853
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at bbf5ecc HIVE-16171 : Support replication of truncate table 
(Sankar Hariappan, reviewed by Sushanth Sowmyan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at bbf5ecc HIVE-16171 : Support replication of truncate table 
(Sankar Hariappan, reviewed by Sushanth Sowmyan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-04-28 12:43:32.575
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AnnotateReduceSinkOutputOperator.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
patching file ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
patching file ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
patching file ql/src/test/queries/clientpositive/explain_formatted_oid.q
patching file ql/src/test/results/clientpositive/explain_formatted_oid.q.out
patching file ql/src/test/results/clientpositive/input4.q.out
patching file ql/src/test/results/clientpositive/join0.q.out
patching file ql/src/test/results/clientpositive/parallel_join0.q.out
patching file ql/src/test/results/clientpositive/plan_json.q.out
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.17) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED 

[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988750#comment-15988750
 ] 

Hive QA commented on HIVE-15795:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864833/HIVE-15795.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testFloatCast2DoubleThriftSerializeInTasks
 (batchId=223)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4919/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4919/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4919/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864833 - PreCommit-HIVE-Build

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, 
> HIVE-15795.3.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16171) Support replication of truncate table

2017-04-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988726#comment-15988726
 ] 

ASF GitHub Bot commented on HIVE-16171:
---

Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/166


> Support replication of truncate table
> -
>
> Key: HIVE-16171
> URL: https://issues.apache.org/jira/browse/HIVE-16171
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.2.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR
> Fix For: 3.0.0
>
> Attachments: HIVE-16171.01.patch, HIVE-16171.02.patch, 
> HIVE-16171.03.patch, HIVE-16171.04.patch, HIVE-16171.05.patch, 
> HIVE-16171.06.patch, HIVE-16171.07.patch
>
>
> Need to support truncate table for replication. Key points to note.
> 1. For non-partitioned table, truncate table will remove all the rows from 
> the table.
> 2. For partitioned tables, need to consider how truncate behaves if truncate 
> a partition or the whole table.
> 3. Bootstrap load with truncate table must work as it is just 
> loadTable/loadPartition with empty dataset.
> 4. It is suggested to re-use the alter table/alter partition events to handle 
> truncate.
> 5. Need to consider the case where insert event happens before truncate table 
> which needs to see their data files through change management. The data files 
> should be recycled to the cmroot path before trashing it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Status: Patch Available  (was: Open)

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Attachment: HIVE-16488.02.patch

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Attachment: (was: HIVE-16488.02.patch)

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Status: Open  (was: Patch Available)

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988689#comment-15988689
 ] 

Hive QA commented on HIVE-16213:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865442/HIVE-16213.07.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10636 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteDate (batchId=178)
org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testWriteVarchar (batchId=178)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testStoreFuncAllSimpleTypes 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4918/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4918/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4918/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865442 - PreCommit-HIVE-Build

> ObjectStore can leak Queries when rollbackTransaction throws an exception
> -
>
> Key: HIVE-16213
> URL: https://issues.apache.org/jira/browse/HIVE-16213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Alexander Kolbasov
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, 
> HIVE-16213.03.patch, HIVE-16213.04.patch, HIVE-16213.05.patch, 
> HIVE-16213.06.patch, HIVE-16213.07.patch
>
>
> In ObjectStore.java there are a few places with the code similar to:
> {code}
> Query query = null;
> try {
>   openTransaction();
>   query = pm.newQuery(Something.class);
>   ...
>   commited = commitTransaction();
> } finally {
>   if (!commited) {
> rollbackTransaction();
>   }
>   if (query != null) {
> query.closeAll();
>   }
> }
> {code}
> The problem is that rollbackTransaction() may throw an exception in which 
> case query.closeAll() wouldn't be executed. 
> The fix would be to wrap rollbackTransaction in its own try-catch block.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Description: 
Parquet schema evolution should make it possible to have partitions/tables 
 backed by files with different schemas. Hive should match the table columns 
with file columns based on the column name if possible.
However if the serde for a table is missing columns from the serde of a 
partition Hive fails to match the columns together.
Steps to reproduce:
{code}
CREATE TABLE myparquettable_parted
(
  name string,
  favnumber int,
  favcolor string,
  age int,
  favpet string
)
PARTITIONED BY (day string)
STORED AS PARQUET;

INSERT OVERWRITE TABLE myparquettable_parted
PARTITION(day='2017-04-04')
SELECT
   'mary' as name,
   5 AS favnumber,
   'blue' AS favcolor,
   35 AS age,
   'dog' AS favpet;

alter table myparquettable_parted
REPLACE COLUMNS
(
favnumber int,
age int
);   
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Commented] (HIVE-16143) Improve msck repair batching

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988629#comment-15988629
 ] 

Hive QA commented on HIVE-16143:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865438/HIVE-16143.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10647 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[create_like] 
(batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_0] 
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_1] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_2] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_3] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_batchsize] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repair] (batchId=32)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4917/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4917/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4917/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865438 - PreCommit-HIVE-Build

> Improve msck repair batching
> 
>
> Key: HIVE-16143
> URL: https://issues.apache.org/jira/browse/HIVE-16143
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16143.01.patch
>
>
> Currently, the {{msck repair table}} command batches the number of partitions 
> created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. 
> Following snippet shows the batching logic. There can be couple of 
> improvements to this batching logic:
> {noformat} 
> int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
>   if (batch_size > 0 && partsNotInMs.size() > batch_size) {
> int counter = 0;
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   counter++;
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
>   if (counter % batch_size == 0 || counter == 
> partsNotInMs.size()) {
> db.createPartitions(apd);
> apd = new AddPartitionDesc(table.getDbName(), 
> table.getTableName(), false);
>   }
> }
>   } else {
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
> }
> db.createPartitions(apd);
>   }
> } catch (Exception e) {
>   LOG.info("Could not bulk-add partitions to metastore; trying one by 
> one", e);
>   repairOutput.clear();
>   msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
> }
> {noformat}
> 1. If the batch size is too aggressive the code falls back to adding 
> partitions one by one which is almost always very slow. It is easily possible 
> that users increase the batch size to higher value to make the command run 
> faster but end up with a worse performance because code falls back to adding 
> one by one. Users are then expected to determine the tuned value of batch 
> size which works well for their environment. I think the code could handle 
> this situation better by exponentially decaying the batch size instead of 
> falling back to one by one.
> 2. The other issue with this implementation is if lets say first batch 
> succeeds and the second one fails, the code tries to add all the partitions 
> one by one irrespective of whether some of the were successfully added or 
> not. If we need to fall back to one by one we should atleast remove the ones 
> which we know for sure are 

[jira] [Commented] (HIVE-15726) Reenable indentation checks to checkstyle

2017-04-28 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988602#comment-15988602
 ] 

Peter Vary commented on HIVE-15726:
---

Test failures are not related

> Reenable indentation checks to checkstyle
> -
>
> Key: HIVE-15726
> URL: https://issues.apache.org/jira/browse/HIVE-15726
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15726.patch
>
>
> The Indentation check is commented out because at that time there were no 
> possibility to check the throws indentation.
> There is a possibility now, so we can reenable it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara reassigned HIVE-16559:
--


> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Updated] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-16559:
---
Component/s: Serializers/Deserializers

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   

[jira] [Commented] (HIVE-16542) make merge that targets acid 2.0 table fail-fast

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988557#comment-15988557
 ] 

Hive QA commented on HIVE-16542:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865435/HIVE-16542.01-branch-2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4916/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4916/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4916/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-04-28 09:56:20.467
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4916/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z branch-2 ]]
+ [[ -d apache-github-branch-2-source ]]
+ [[ ! -d apache-github-branch-2-source/.git ]]
+ [[ ! -d apache-github-branch-2-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-04-28 09:56:20.470
+ cd apache-github-branch-2-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at ab3a24b update RELEASE_NOTES.txt for 2.3 (HIVE-16545,HIVE-16547)
+ git clean -f -d
+ git checkout branch-2
Already on 'branch-2'
Your branch is up-to-date with 'origin/branch-2'.
+ git reset --hard origin/branch-2
HEAD is now at ab3a24b update RELEASE_NOTES.txt for 2.3 (HIVE-16545,HIVE-16547)
+ git merge --ff-only origin/branch-2
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-04-28 09:56:23.161
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p0
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java
patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java
patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
patching file 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2WithSplitUpdate.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
[ERROR] COMPILATION ERROR : 
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[43,37]
 package org.apache.hadoop.hive.common does not exist
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[79,9]
 cannot find symbol
  symbol:   variable StorageUtils
  location: class org.apache.hadoop.hive.io.HdfsUtils
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[98,9]
 cannot find symbol
  symbol:   variable StorageUtils
  location: class org.apache.hadoop.hive.io.HdfsUtils
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-shims-common: Compilation failure: Compilation failure:
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[43,37]
 package org.apache.hadoop.hive.common does not exist
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[79,9]
 cannot find symbol
[ERROR] symbol:   variable StorageUtils
[ERROR] location: class org.apache.hadoop.hive.io.HdfsUtils
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[98,9]
 cannot find symbol
[ERROR] symbol:   variable StorageUtils
[ERROR] location: class org.apache.hadoop.hive.io.HdfsUtils
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] 

[jira] [Commented] (HIVE-16366) Hive 2.3 release planning

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988554#comment-15988554
 ] 

Hive QA commented on HIVE-16366:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865426/HIVE-16366-branch-2.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10571 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=142)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=174)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4915/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4915/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4915/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865426 - PreCommit-HIVE-Build

> Hive 2.3 release planning
> -
>
> Key: HIVE-16366
> URL: https://issues.apache.org/jira/browse/HIVE-16366
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Blocker
>  Labels: 2.3.0
> Fix For: 2.3.0
>
> Attachments: HIVE-16366-branch-2.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15642:

Status: Patch Available  (was: In Progress)

> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled

2017-04-28 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16558:
-
Status: Patch Available  (was: Open)

> In the hiveserver2.jsp Closed Queries table under the data click Drilldown 
> Link view details, the Chinese show garbled
> --
>
> Key: HIVE-16558
> URL: https://issues.apache.org/jira/browse/HIVE-16558
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Fix For: 3.0.0
>
> Attachments: HIVE-16558.1.patch
>
>
> In QueryProfileImpl.jamon,We see the following settings:
> 
> 
>   
> 
> HiveServer2
> 
> 
> 
> 
> 
>   
> So we should set the response code to utf-8, which can avoid Chinese garbled 
> or other languages,Please check it!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-15642:

Attachment: HIVE-15642.02.patch

Added 02.patch with following updates,
- The new files listed for insert overwrite for non-partitioned table 
(loadTable method) as well.
- The new files listing should consider the sub-directories in destination path 
which should recursively traverse.
- The new files listing is done on the physical destination path after moveFile 
is successful, instead of logically building the new files path using source 
file names.
- Added new test cases to verify insert overwrites, dynamic partition and loads.

> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled

2017-04-28 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16558:
-
Description: 
In QueryProfileImpl.jamon,We see the following settings:


  

HiveServer2






  
So we should set the response code to utf-8, which can avoid Chinese garbled or 
other languages,Please check it!


> In the hiveserver2.jsp Closed Queries table under the data click Drilldown 
> Link view details, the Chinese show garbled
> --
>
> Key: HIVE-16558
> URL: https://issues.apache.org/jira/browse/HIVE-16558
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Fix For: 3.0.0
>
> Attachments: HIVE-16558.1.patch
>
>
> In QueryProfileImpl.jamon,We see the following settings:
> 
> 
>   
> 
> HiveServer2
> 
> 
> 
> 
> 
>   
> So we should set the response code to utf-8, which can avoid Chinese garbled 
> or other languages,Please check it!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled

2017-04-28 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16558:
-
Attachment: HIVE-16558.1.patch

> In the hiveserver2.jsp Closed Queries table under the data click Drilldown 
> Link view details, the Chinese show garbled
> --
>
> Key: HIVE-16558
> URL: https://issues.apache.org/jira/browse/HIVE-16558
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Fix For: 3.0.0
>
> Attachments: HIVE-16558.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16558) In the hiveserver2.jsp Closed Queries table under the data click Drilldown Link view details, the Chinese show garbled

2017-04-28 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin reassigned HIVE-16558:



> In the hiveserver2.jsp Closed Queries table under the data click Drilldown 
> Link view details, the Chinese show garbled
> --
>
> Key: HIVE-16558
> URL: https://issues.apache.org/jira/browse/HIVE-16558
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988517#comment-15988517
 ] 

ASF GitHub Bot commented on HIVE-15642:
---

GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/172

HIVE-15642: Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-15642

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #172


commit ddf04ae11c800be8b762a44fedcc16393396745d
Author: Sankar Hariappan 
Date:   2017-04-28T07:49:04Z

HIVE-15642: Replicate Insert Overwrites, Dynamic Partition Inserts and Loads




> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988450#comment-15988450
 ] 

Hive QA commented on HIVE-16552:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865423/HIVE-16552.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10631 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.org.apache.hadoop.hive.cli.TestHBaseCliDriver
 (batchId=94)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4914/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4914/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4914/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865423 - PreCommit-HIVE-Build

> Limit the number of tasks a Spark job may contain
> -
>
> Key: HIVE-16552
> URL: https://issues.apache.org/jira/browse/HIVE-16552
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.0.0, 2.0.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16552.patch
>
>
> It's commonly desirable to block bad and big queries that takes a lot of YARN 
> resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to 
> stop a query that invokes a Spark job that contains too many tasks. The 
> proposal here is to introduce hive.spark.job.max.tasks with a default value 
> of -1 (no limit), which an admin can set to block queries that trigger too 
> many spark tasks.
> Please note that this control knob applies to a spark job, though it's 
> possible that one query can trigger multiple Spark jobs (such as in case of 
> map-join). Nevertheless, the proposed approach is still helpful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon

2017-04-28 Thread ZhangBing Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988444#comment-15988444
 ] 

ZhangBing Lin commented on HIVE-16524:
--

[~xuefuz] Can you help me commit it

> Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
> 
>
> Key: HIVE-16524
> URL: https://issues.apache.org/jira/browse/HIVE-16524
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16524.1.patch
>
>
> The Id attribute is defined in w3c as follows:
> 1.The id attribute specifies the unique id of the HTML element.
> 2.Id must be unique in the HTML document.
> 3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or 
> by CSS to change or add a style to an element with the specified id.
> But,the "id='attributes_table'"  in hiveserver2.jsp and 
> QueryProfileTmpl.jamon:
> 1.Not quoted by any css and js
> 2.It has the same id attribute name on the same page
> So I suggest removing this id attribute definition,Please Check It.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16557) Vectorization: Specialize ReduceSink empty key case

2017-04-28 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-16557:
---


> Vectorization: Specialize ReduceSink empty key case
> ---
>
> Key: HIVE-16557
> URL: https://issues.apache.org/jira/browse/HIVE-16557
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>
> Gopal pointed out that native Vectorization of ReduceSink is missing the 
> empty key case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16542) make merge that targets acid 2.0 table fail-fast

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988393#comment-15988393
 ] 

Hive QA commented on HIVE-16542:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865435/HIVE-16542.01-branch-2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4913/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4913/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4913/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-04-28 08:05:18.209
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4913/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z branch-2 ]]
+ [[ -d apache-github-branch-2-source ]]
+ [[ ! -d apache-github-branch-2-source/.git ]]
+ [[ ! -d apache-github-branch-2-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-04-28 08:05:18.238
+ cd apache-github-branch-2-source
+ git fetch origin
>From https://github.com/apache/hive
   0ecdfcd..ab3a24b  branch-2   -> origin/branch-2
   03941e3..3403535  branch-2.2 -> origin/branch-2.2
   ee57fa1..9194cae  branch-2.3 -> origin/branch-2.3
   6566065..bbf5ecc  master -> origin/master
 * [new branch]  storage-branch-2.3 -> origin/storage-branch-2.3
 * [new tag] release-2.3.0-rc0 -> release-2.3.0-rc0
 * [new tag] storage-release-2.3.0rc0 -> storage-release-2.3.0rc0
+ git reset --hard HEAD
HEAD is now at 0ecdfcd HIVE-15761: ObjectStore.getNextNotification could return 
an empty NotificationEventResponse causing TProtocolException (Sergio Pena, 
reviewed by Aihua Xu)
+ git clean -f -d
+ git checkout branch-2
Already on 'branch-2'
Your branch is behind 'origin/branch-2' by 10 commits, and can be 
fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/branch-2
HEAD is now at ab3a24b update RELEASE_NOTES.txt for 2.3 (HIVE-16545,HIVE-16547)
+ git merge --ff-only origin/branch-2
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-04-28 08:05:23.784
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p0
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java
patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java
patching file ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
patching file 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2WithSplitUpdate.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
[ERROR] COMPILATION ERROR : 
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[43,37]
 package org.apache.hadoop.hive.common does not exist
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[79,9]
 cannot find symbol
  symbol:   variable StorageUtils
  location: class org.apache.hadoop.hive.io.HdfsUtils
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[98,9]
 cannot find symbol
  symbol:   variable StorageUtils
  location: class org.apache.hadoop.hive.io.HdfsUtils
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-shims-common: Compilation failure: Compilation failure:
[ERROR] 
/data/hiveptest/working/apache-github-branch-2-source/shims/common/src/main/java/org/apache/hadoop/hive/io/HdfsUtils.java:[43,37]
 package org.apache.hadoop.hive.common does not exist
[ERROR] 

[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats

2017-04-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988389#comment-15988389
 ] 

Hive QA commented on HIVE-16147:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865390/HIVE-16147.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4912/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4912/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4912/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865390 - PreCommit-HIVE-Build

> Rename a partitioned table should not drop its partition columns stats
> --
>
> Key: HIVE-16147
> URL: https://issues.apache.org/jira/browse/HIVE-16147
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16147.1.patch, HIVE-16147.patch, HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to 
> sample_pt_rename), describing its partition shows that the partition column 
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
> for all columns are true
> {code}
> ...
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
> stats exists
> {code}
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> salaryint 1   151370  
> 0   94
>   
> from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): 
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for 
> columns are still true.
> {code}
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt_rename 
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: 
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>  

[jira] [Work started] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-28 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15642 started by Sankar Hariappan.
---
> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >