date:20170811

[jira] [Commented] (HIVE-17267) Make HMS Notification Listeners typesafe

2017-08-11 Thread Barna Zsombor Klara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124491#comment-16124491
 ] 

Barna Zsombor Klara commented on HIVE-17267:


Unit test failures should not be related.

> Make HMS Notification Listeners typesafe
> 
>
> Key: HIVE-17267
> URL: https://issues.apache.org/jira/browse/HIVE-17267
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17267.01.patch, HIVE-17267.02.patch, 
> HIVE-17267.03.patch
>
>
> Currently in the HMS we support two types of notification listeners, 
> transactional and non-transactional ones. Transactional listeners will only 
> be invoked if the jdbc transaction finished successfully while 
> non-transactional ones are supposed to be resilient and will be invoked in 
> any case, even for failures.
> Having the same type for these two is a source of confusion and opens the 
> door for misconfigurations. We should try to fix this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17305) New insert overwrite dynamic partitions qtest need to have the golden file regenerated

2017-08-11 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara reassigned HIVE-17305:
--


> New insert overwrite dynamic partitions qtest need to have the golden file 
> regenerated
> --
>
> Key: HIVE-17305
> URL: https://issues.apache.org/jira/browse/HIVE-17305
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17089) make acid 2.0 the default

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124480#comment-16124480
 ] 

Hive QA commented on HIVE-17089:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881561/HIVE-17089.11.patch

{color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10968 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_cp] 
(batchId=82)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=291)
org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderIncompleteDelta
 (batchId=264)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithNoneMode
 (batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6364/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6364/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6364/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881561 - PreCommit-HIVE-Build

> make acid 2.0 the default
> -
>
> Key: HIVE-17089
> URL: https://issues.apache.org/jira/browse/HIVE-17089
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, 
> HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, 
> HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch
>
>
> acid 2.0 is introduced in HIVE-14035.  It replaces Update events with a 
> combination of Delete + Insert events.  This now makes U=D+I the default (and 
> only) supported acid table type in Hive 3.0.  
> The expectation for upgrade is that Major compaction has to be run on all 
> acid tables in the existing Hive cluster and that no new writes to these 
> table take place since the start of compaction (Need to add a mechanism to 
> put a table in read-only mode - this way it can still be read while it's 
> being compacted).  Then upgrade to Hive 3.0 can take place.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe

2017-08-11 Thread Tao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124479#comment-16124479
 ] 

Tao Li commented on HIVE-17301:
---

Test failures are tracked in HIVE-15058 and don't seem related to this change.

> Make JSONMessageFactory.getTObj method thread safe
> --
>
> Key: HIVE-17301
> URL: https://issues.apache.org/jira/browse/HIVE-17301
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17301.1.patch
>
>
> This static method is using a singleton instance of TDeserializer, which is 
> not thread safe. Instead we want to create a new instance per method call. 
> This class is lightweight, so it should be fine from perf perspective. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-11 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124477#comment-16124477
 ] 

Lefty Leverenz commented on HIVE-17283:
---

Doc note:  This adds *hive.tez.dynamic.semijoin.reduction.for.mapjoin* to 
HiveConf.java, so it needs to be documented in the wiki.

* [Configuration Properties -- Tez | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez]

Added a TODOC3.0 label.

> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17283.1.patch, HIVE-17283.2.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-11 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-17283:
--
Labels: TODOC3.0  (was: )

> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17283.1.patch, HIVE-17283.2.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17240) Function ACOS(2) and ASIN(2) should be null

2017-08-11 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124472#comment-16124472
 ] 

Yuming Wang commented on HIVE-17240:


[~sershe], is it ready to be committed? Thanks.

> Function ACOS(2) and ASIN(2) should be null
> ---
>
> Key: HIVE-17240
> URL: https://issues.apache.org/jira/browse/HIVE-17240
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 1.2.2, 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-17240.1.patch, HIVE-17240.2.patch, 
> HIVE-17240.3.patch, HIVE-17240.4.patch, HIVE-17240.5.patch, HIVE-17240.6.patch
>
>
> {{acos(2)}} should be null, same as MySQL:
> {code:sql}
> hive> desc function extended acos;
> OK
> acos(x) - returns the arc cosine of x if -1<=x<=1 or NULL otherwise
> Example:
>   > SELECT acos(1) FROM src LIMIT 1;
>   0
>   > SELECT acos(2) FROM src LIMIT 1;
>   NULL
> Time taken: 0.009 seconds, Fetched: 6 row(s)
> hive> select acos(2);
> OK
> NaN
> Time taken: 0.437 seconds, Fetched: 1 row(s)
> {code}
> {code:sql}
> mysql>  select acos(2);
> +-+
> | acos(2) |
> +-+
> |NULL |
> +-+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-08-11 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124473#comment-16124473
 ] 

Yuming Wang commented on HIVE-15794:


[~sershe], is it ready to be committed? Thanks.

> Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
> --
>
> Key: HIVE-15794
> URL: https://issues.apache.org/jira/browse/HIVE-15794
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 1.2.0, 1.1.0, 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-15794.1.patch, HIVE-15794.2.patch, 
> HIVE-15794.3.patch
>
>
> *SQL*:
> {code:sql}
> hive> create table table2 as select * from table1;
> hive> show create table table2;
> OK
> CREATE TABLE `table2`(
>   `id` string)
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
> STORED AS INPUTFORMAT 
>   'org.apache.hadoop.mapred.TextInputFormat' 
> OUTPUTFORMAT 
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://cluster4/user/hive/warehouse/table2'
> TBLPROPERTIES (
>   'transient_lastDdlTime'='1486050317')
> {code}
> *LOG*:
> {noformat}
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> 2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
> session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
> to hdfs filesystem.
> {noformat}
> Can’t get hdfsEncryptionShim if {{FileSystem}} is 
> [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
>  we should support it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124450#comment-16124450
 ] 

Hive QA commented on HIVE-17302:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881552/HIVE-17302.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6363/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6363/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6363/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881552 - PreCommit-HIVE-Build

> ReduceRecordSource should not add batch string to Exception message
> ---
>
> Key: HIVE-17302
> URL: https://issues.apache.org/jira/browse/HIVE-17302
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17302.patch, stack.txt
>
>
> ReduceRecordSource is adding the batch data as a string to the exception 
> stack, this can lead to an OOM of the Query AM when the query fails due to 
> other issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-1941) support explicit view partitioning

2017-08-11 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124447#comment-16124447
 ] 

Lefty Leverenz commented on HIVE-1941:
--

Quite right, this is only documented as a design doc and the DDL doc doesn't 
even have a link to it.  There's also a Views design doc that has information 
not covered in the DDL section.

* [DDL -- Create/Drop/Alter View | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView]
* [Design Docs -- Partitioned Views | 
https://cwiki.apache.org/confluence/display/Hive/PartitionedViews]
* [Design Docs -- Views | 
https://cwiki.apache.org/confluence/display/Hive/ViewDev]

By tracking, do you mean a TODOC label?  We'd have to create a new one -- 
either a generic "TODOC" label or a version-specific one for 0.8.0, which would 
be TODOC8 for consistency with the other pre-1.0.0 labels.  (I'm inclined to 
avoid label proliferation with a generic label.)

> support explicit view partitioning
> --
>
> Key: HIVE-1941
> URL: https://issues.apache.org/jira/browse/HIVE-1941
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Views
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.8.0
>
> Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, 
> HIVE-1941.4.patch, HIVE-1941.5.patch
>
>
> Allow creation of a view with an explicit partitioning definition, and 
> support ALTER VIEW ADD/DROP PARTITION for instantiating partitions.
> For more information, see
> (obsolete:  http://wiki.apache.org/hadoop/Hive/PartitionedViews)
> https://cwiki.apache.org/confluence/display/Hive/PartitionedViews



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-1941) support explicit view partitioning

2017-08-11 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-1941:
-
Description: 
Allow creation of a view with an explicit partitioning definition, and support 
ALTER VIEW ADD/DROP PARTITION for instantiating partitions.

For more information, see

(obsolete:  http://wiki.apache.org/hadoop/Hive/PartitionedViews)

https://cwiki.apache.org/confluence/display/Hive/PartitionedViews

  was:
Allow creation of a view with an explicit partitioning definition, and support 
ALTER VIEW ADD/DROP PARTITION for instantiating partitions.

For more information, see

http://wiki.apache.org/hadoop/Hive/PartitionedViews



> support explicit view partitioning
> --
>
> Key: HIVE-1941
> URL: https://issues.apache.org/jira/browse/HIVE-1941
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Views
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: John Sichi
> Fix For: 0.8.0
>
> Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, 
> HIVE-1941.4.patch, HIVE-1941.5.patch
>
>
> Allow creation of a view with an explicit partitioning definition, and 
> support ALTER VIEW ADD/DROP PARTITION for instantiating partitions.
> For more information, see
> (obsolete:  http://wiki.apache.org/hadoop/Hive/PartitionedViews)
> https://cwiki.apache.org/confluence/display/Hive/PartitionedViews



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124425#comment-16124425
 ] 

Hive QA commented on HIVE-14731:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881548/HIVE-14731.19.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 46 failed/errored test(s), 11008 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join0] 
(batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join29]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join30]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join_filters]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join_nulls]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_12]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_1]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_2]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_4]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_product_check_2]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[empty_join] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[jdbc_handler]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[leftsemijoin]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_exists]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_null_agg]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_between_columns]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_complex_all]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_mapjoin]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_include_no_sel]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join30]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_filters]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_nulls]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_leftsemi_mapjoin]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_multi_output_select]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[hybridgrace_hashjoin_1]
 (batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.t

[jira] [Comment Edited] (HIVE-17265) Cache merged column stats from retrieved partitions

2017-08-11 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124404#comment-16124404
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-17265 at 8/12/17 2:17 AM:
-

[~ashutoshc], sure, I have created it in : https://reviews.apache.org/r/61604/

Thanks


was (Author: jcamachorodriguez):
Sure, I have created it in : https://reviews.apache.org/r/61604/

Thanks

> Cache merged column stats from retrieved partitions
> ---
>
> Key: HIVE-17265
> URL: https://issues.apache.org/jira/browse/HIVE-17265
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17265.02.patch, HIVE-17265.patch
>
>
> Currently when we retrieve stats from the metastore for a column in a 
> partitioned table, we will execute the logic to merge the column stats coming 
> from each partition multiple times.
> Even though we avoid multiple calls to metastore if the cache for the stats 
> in enabled, merging the stats for a given column can take a large amount of 
> time if there is a large number of partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17265) Cache merged column stats from retrieved partitions

2017-08-11 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124404#comment-16124404
 ] 

Jesus Camacho Rodriguez commented on HIVE-17265:


Sure, I have created it in : https://reviews.apache.org/r/61604/

Thanks

> Cache merged column stats from retrieved partitions
> ---
>
> Key: HIVE-17265
> URL: https://issues.apache.org/jira/browse/HIVE-17265
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17265.02.patch, HIVE-17265.patch
>
>
> Currently when we retrieve stats from the metastore for a column in a 
> partitioned table, we will execute the logic to merge the column stats coming 
> from each partition multiple times.
> Even though we avoid multiple calls to metastore if the cache for the stats 
> in enabled, merging the stats for a given column can take a large amount of 
> time if there is a large number of partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124401#comment-16124401
 ] 

Hive QA commented on HIVE-17301:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881518/HIVE-17301.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6361/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6361/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6361/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881518 - PreCommit-HIVE-Build

> Make JSONMessageFactory.getTObj method thread safe
> --
>
> Key: HIVE-17301
> URL: https://issues.apache.org/jira/browse/HIVE-17301
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17301.1.patch
>
>
> This static method is using a singleton instance of TDeserializer, which is 
> not thread safe. Instead we want to create a new instance per method call. 
> This class is lightweight, so it should be fine from perf perspective. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17265) Cache merged column stats from retrieved partitions

2017-08-11 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124383#comment-16124383
 ] 

Ashutosh Chauhan commented on HIVE-17265:
-

Can you please create a RB for this? Got some minor comments.

> Cache merged column stats from retrieved partitions
> ---
>
> Key: HIVE-17265
> URL: https://issues.apache.org/jira/browse/HIVE-17265
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17265.02.patch, HIVE-17265.patch
>
>
> Currently when we retrieve stats from the metastore for a column in a 
> partitioned table, we will execute the logic to merge the column stats coming 
> from each partition multiple times.
> Even though we avoid multiple calls to metastore if the cache for the stats 
> in enabled, merging the stats for a given column can take a large amount of 
> time if there is a large number of partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-11 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124382#comment-16124382
 ] 

Prasanth Jayachandran commented on HIVE-17304:
--

This provides a better estimates and also the estimates are pretty close the 
actual object size (observed this from heapdumps) atleast for vectorized case. 
Also bringing down the inflation factor from 2.0 to 1.5 as a result. Still 
testing this patch on larger dataset. 

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-11 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17304:
-
Attachment: HIVE-17304.1.patch

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-11 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17304:
-
Status: Patch Available  (was: Open)

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17268) WebUI / QueryPlan: query plan is sometimes null when explain output conf is on

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124375#comment-16124375
 ] 

Hive QA commented on HIVE-17268:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881481/HIVE-17268.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6360/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6360/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6360/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881481 - PreCommit-HIVE-Build

> WebUI / QueryPlan: query plan is sometimes null when explain output conf is on
> --
>
> Key: HIVE-17268
> URL: https://issues.apache.org/jira/browse/HIVE-17268
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
> Attachments: HIVE-17268.2.patch, HIVE-17268.3.patch, HIVE-17268.patch
>
>
> The Hive WebUI's Query Plan tab displays "SET hive.log.explain.output TO true 
> TO VIEW PLAN" even when hive.log.explain.output is set to true, when the 
> query cannot be compiled, because the plan is null in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-11 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17286:
---
Attachment: HIVE-17286.03.patch

> Avoid expensive String serialization/deserialization for bitvectors
> ---
>
> Key: HIVE-17286
> URL: https://issues.apache.org/jira/browse/HIVE-17286
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17286.01.patch, HIVE-17286.02.patch, 
> HIVE-17286.03.patch, HIVE-17286.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-11 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17286:
---
Attachment: HIVE-17286.02.patch

> Avoid expensive String serialization/deserialization for bitvectors
> ---
>
> Key: HIVE-17286
> URL: https://issues.apache.org/jira/browse/HIVE-17286
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17286.01.patch, HIVE-17286.02.patch, 
> HIVE-17286.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16873) Remove Thread Cache From Logging

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124310#comment-16124310
 ] 

Hive QA commented on HIVE-16873:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881476/HIVE-16873.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11003 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6359/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6359/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6359/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881476 - PreCommit-HIVE-Build

> Remove Thread Cache From Logging
> 
>
> Key: HIVE-16873
> URL: https://issues.apache.org/jira/browse/HIVE-16873
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16873.1.patch, HIVE-16873.2.patch, 
> HIVE-16873.3.patch
>
>
> In {{org.apache.hadoop.hive.metastore.HiveMetaStore}} we have a {{Formatter}} 
> class (and its buffer) tied to every thread.
> This {{Formatter}} is for logging purposes. I would suggest that we simply 
> let let the logging framework itself handle these kind of details and ditch 
> the buffer per thread.
> {code}
> public static final String AUDIT_FORMAT =
> "ugi=%s\t" + // ugi
> "ip=%s\t" + // remote IP
> "cmd=%s\t"; // command
> public static final Logger auditLog = LoggerFactory.getLogger(
> HiveMetaStore.class.getName() + ".audit");
> private static final ThreadLocal auditFormatter =
> new ThreadLocal() {
>   @Override
>   protected Formatter initialValue() {
> return new Formatter(new StringBuilder(AUDIT_FORMAT.length() * 
> 4));
>   }
> };
> ...
> private static final void logAuditEvent(String cmd) {
>   final Formatter fmt = auditFormatter.get();
>   ((StringBuilder) fmt.out()).setLength(0);
>   String address = getIPAddress();
>   if (address == null) {
> address = "unknown-ip-addr";
>   }
>   auditLog.info(fmt.format(AUDIT_FORMAT, ugi.getUserName(),
>   address, cmd).toString());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-11 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17148:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Vlad!

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Fix For: 3.0.0
>
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, 
> HIVE-17148.3.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17241) Change metastore classes to not use the shims

2017-08-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124271#comment-16124271
 ] 

ASF GitHub Bot commented on HIVE-17241:
---

GitHub user alanfgates opened a pull request:

https://github.com/apache/hive/pull/228

HIVE-17241 Removed shims from metastore.  For HDFS and getPassword I just 
access…

… those operations directly.  I copied all of the HadoopThriftAuthBridge 
stuff over from Hive common.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanfgates/hive hive17241

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #228






> Change metastore classes to not use the shims
> -
>
> Key: HIVE-17241
> URL: https://issues.apache.org/jira/browse/HIVE-17241
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> As part of moving the metastore into a standalone package, it will no longer 
> have access to the shims.  This means we need to either copy them or access 
> the underlying Hadoop operations directly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-11 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-17304:



> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124263#comment-16124263
 ] 

Xuefu Zhang commented on HIVE-17287:


[~kellyzly], just curious, what error did you get for the failed tasks? Memory 
related?

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: query67-fail-at-groupby.png, 
> query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124224#comment-16124224
 ] 

Hive QA commented on HIVE-17289:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881462/HIVE-17289.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_join_partition_key] 
(batchId=13)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6358/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6358/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6358/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881462 - PreCommit-HIVE-Build

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> -
>
> Key: HIVE-17289
> URL: https://issues.apache.org/jira/browse/HIVE-17289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Export, Import, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17274) RowContainer spills for timestamp column throws exception

2017-08-11 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17274:
-
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Test failures are not related to this patch.
Committed patch to master and branch-2. Thanks for the review!

> RowContainer spills for timestamp column throws exception
> -
>
> Key: HIVE-17274
> URL: https://issues.apache.org/jira/browse/HIVE-17274
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17274.1.patch
>
>
> Path names cannot contain ":" (HADOOP-3257)
> Join key toString() is used as part of filename.
> https://github.com/apache/hive/blob/16bfb9c9405b68a24c7e6c1b13bec00e38bbe213/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java#L523
> If join key is timestamp column then this will throw following exception.
> {code}
> 2017-08-05 23:51:33,631 ERROR [main] 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: .RowContainer7551143976922371245.[1792453531, 
> 2016-09-02 01:17:43,%202016-09-02%5D.tmp.crc
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: .RowContainer7551143976922371245.[1792453531, 
> 2016-09-02 01:17:43,%202016-09-02%5D.tmp.crc
> at org.apache.hadoop.fs.Path.initialize(Path.java:205)
> at org.apache.hadoop.fs.Path.(Path.java:171)
> at org.apache.hadoop.fs.Path.(Path.java:93)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.getChecksumFile(ChecksumFileSystem.java:94)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:404)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:463)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:926)
> at 
> org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:1137)
> at 
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
> at 
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.createSequenceWriter(Utilities.java:1643)
> at 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat.getHiveRecordWriter(HiveSequenceFileOutputFormat.java:64)
> at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:243)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.setupWriter(RowContainer.java:538)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:299)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.copyToDFSDirecory(RowContainer.java:407)
> at 
> org.apache.hadoop.hive.ql.exec.SkewJoinHandler.endGroup(SkewJoinHandler.java:185)
> at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:249)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:195)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> .RowContainer7551143976922371245.[1792453531, 2016-09-02 
> 01:17:43,%202016-09-02%5D.tmp.crc
> at java.net.URI.checkPath(URI.java:1823)
> at java.net.URI.(URI.java:745)
> at org.apache.hadoop.fs.Path.initialize(Path.java:202)
> ... 26 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez

2017-08-11 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124206#comment-16124206
 ] 

Gopal V commented on HIVE-17303:


[~bslim]: do you know the versions mismatched between Druid & Tez? Tez will 
upgrade eventually, but it would be good to know the versions.

> Missmatch between roaring bitmap library used by druid and the one coming 
> from tez
> --
>
> Key: HIVE-17303
> URL: https://issues.apache.org/jira/browse/HIVE-17303
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17303.patch
>
>
> {code} 
>  
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165)
>   ... 25 more
> Caused by: java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65)
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88)
>   at 
> org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385)
>   at 
> org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44)
>   ... 4 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] 
> killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
> Options
> Attachments
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17089) make acid 2.0 the default

2017-08-11 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124192#comment-16124192
 ] 

Eugene Koifman commented on HIVE-17089:
---

patch 11 - fix remaining 3 failures in TestInputOutputFormat


> make acid 2.0 the default
> -
>
> Key: HIVE-17089
> URL: https://issues.apache.org/jira/browse/HIVE-17089
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, 
> HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, 
> HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch
>
>
> acid 2.0 is introduced in HIVE-14035.  It replaces Update events with a 
> combination of Delete + Insert events.  This now makes U=D+I the default (and 
> only) supported acid table type in Hive 3.0.  
> The expectation for upgrade is that Major compaction has to be run on all 
> acid tables in the existing Hive cluster and that no new writes to these 
> table take place since the start of compaction (Need to add a mechanism to 
> put a table in read-only mode - this way it can still be read while it's 
> being compacted).  Then upgrade to Hive 3.0 can take place.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17089) make acid 2.0 the default

2017-08-11 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17089:
--
Attachment: HIVE-17089.11.patch

> make acid 2.0 the default
> -
>
> Key: HIVE-17089
> URL: https://issues.apache.org/jira/browse/HIVE-17089
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, 
> HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, 
> HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch
>
>
> acid 2.0 is introduced in HIVE-14035.  It replaces Update events with a 
> combination of Delete + Insert events.  This now makes U=D+I the default (and 
> only) supported acid table type in Hive 3.0.  
> The expectation for upgrade is that Major compaction has to be run on all 
> acid tables in the existing Hive cluster and that no new writes to these 
> table take place since the start of compaction (Need to add a mechanism to 
> put a table in read-only mode - this way it can still be read while it's 
> being compacted).  Then upgrade to Hive 3.0 can take place.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez

2017-08-11 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17303:
--
Attachment: HIVE-17303.patch

> Missmatch between roaring bitmap library used by druid and the one coming 
> from tez
> --
>
> Key: HIVE-17303
> URL: https://issues.apache.org/jira/browse/HIVE-17303
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17303.patch
>
>
> {code} 
>  
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165)
>   ... 25 more
> Caused by: java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65)
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88)
>   at 
> org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385)
>   at 
> org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44)
>   ... 4 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] 
> killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
> Options
> Attachments
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez

2017-08-11 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17303:
--
Status: Patch Available  (was: Open)

> Missmatch between roaring bitmap library used by druid and the one coming 
> from tez
> --
>
> Key: HIVE-17303
> URL: https://issues.apache.org/jira/browse/HIVE-17303
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>
> {code} 
>  
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165)
>   ... 25 more
> Caused by: java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65)
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88)
>   at 
> org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385)
>   at 
> org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44)
>   ... 4 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] 
> killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
> Options
> Attachments
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez

2017-08-11 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-17303:
-


> Missmatch between roaring bitmap library used by druid and the one coming 
> from tez
> --
>
> Key: HIVE-17303
> URL: https://issues.apache.org/jira/browse/HIVE-17303
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>
> {code} 
>  
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>   at 
> org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165)
>   ... 25 more
> Caused by: java.lang.NoSuchMethodError: 
> org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65)
>   at 
> org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88)
>   at 
> org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186)
>   at 
> org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93)
>   at 
> org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385)
>   at 
> org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44)
>   ... 4 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] 
> killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)
> Options
> Attachments
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message

2017-08-11 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124183#comment-16124183
 ] 

Prasanth Jayachandran commented on HIVE-17302:
--

+1, pending tests

> ReduceRecordSource should not add batch string to Exception message
> ---
>
> Key: HIVE-17302
> URL: https://issues.apache.org/jira/browse/HIVE-17302
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17302.patch, stack.txt
>
>
> ReduceRecordSource is adding the batch data as a string to the exception 
> stack, this can lead to an OOM of the Query AM when the query fails due to 
> other issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17194) JDBC: Implement Gzip servlet filter

2017-08-11 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124172#comment-16124172
 ] 

Thejas M Nair commented on HIVE-17194:
--

Patch looks good. +1

More information from offline discussion with [~gopalv] - tests show  +4% CPU 
for ~3x reduction sizes with TPC-DS customer_demographics. Overall time 
remained same.




> JDBC: Implement Gzip servlet filter
> ---
>
> Key: HIVE-17194
> URL: https://issues.apache.org/jira/browse/HIVE-17194
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17194.1.patch, HIVE-17194.2.patch, 
> HIVE-17194.3.patch
>
>
> {code}
> POST /cliservice HTTP/1.1
> Content-Type: application/x-thrift
> Accept: application/x-thrift
> User-Agent: Java/THttpClient/HC
> Authorization: Basic YW5vbnltb3VzOmFub255bW91cw==
> Content-Length: 71
> Host: localhost:10007
> Connection: Keep-Alive
> Accept-Encoding: gzip,deflate
> X-XSRF-HEADER: true
> {code}
> The Beeline client clearly sends out HTTP compression headers which are 
> ignored by the HTTP service layer in HS2.
> After patch, result looks like
> {code}
> HTTP/1.1 200 OK
> Date: Tue, 01 Aug 2017 01:47:23 GMT
> Content-Type: application/x-thrift
> Vary: Accept-Encoding, User-Agent
> Content-Encoding: gzip
> Transfer-Encoding: chunked
> Server: Jetty(9.3.8.v20160314)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message

2017-08-11 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124164#comment-16124164
 ] 

slim bouguerra commented on HIVE-17302:
---

[~ashutoshc] can you check this out.

> ReduceRecordSource should not add batch string to Exception message
> ---
>
> Key: HIVE-17302
> URL: https://issues.apache.org/jira/browse/HIVE-17302
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17302.patch, stack.txt
>
>
> ReduceRecordSource is adding the batch data as a string to the exception 
> stack, this can lead to an OOM of the Query AM when the query fails due to 
> other issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message

2017-08-11 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17302:
--
Attachment: HIVE-17302.patch

> ReduceRecordSource should not add batch string to Exception message
> ---
>
> Key: HIVE-17302
> URL: https://issues.apache.org/jira/browse/HIVE-17302
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17302.patch, stack.txt
>
>
> ReduceRecordSource is adding the batch data as a string to the exception 
> stack, this can lead to an OOM of the Query AM when the query fails due to 
> other issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message

2017-08-11 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17302:
--
Assignee: slim bouguerra
  Status: Patch Available  (was: Open)

> ReduceRecordSource should not add batch string to Exception message
> ---
>
> Key: HIVE-17302
> URL: https://issues.apache.org/jira/browse/HIVE-17302
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: stack.txt
>
>
> ReduceRecordSource is adding the batch data as a string to the exception 
> stack, this can lead to an OOM of the Query AM when the query fails due to 
> other issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-08-11 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124139#comment-16124139
 ] 

Zhiyuan Yang commented on HIVE-14731:
-

[~hagleitn] Non deterministic behavior should come from multiple cross product 
reducers

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.18.patch, HIVE-14731.19.patch, HIVE-14731.1.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17218) Canonical-ize hostnames for Hive metastore, and HS2 servers.

2017-08-11 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124135#comment-16124135
 ] 

Thejas M Nair commented on HIVE-17218:
--

+1 pending test verificaiton

> Canonical-ize hostnames for Hive metastore, and HS2 servers.
> 
>
> Key: HIVE-17218
> URL: https://issues.apache.org/jira/browse/HIVE-17218
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore, Security
>Affects Versions: 1.2.2, 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17218.1.patch
>
>
> Currently, the {{HiveMetastoreClient}} and {{HiveConnection}} do not 
> canonical-ize the hostnames of the metastore/HS2 servers. In deployments 
> where there are multiple such servers behind a VIP, this causes a number of 
> inconveniences:
> # The client-side configuration (e.g. {{hive.metastore.uris}} in 
> {{hive-site.xml}}) needs to specify the VIP's hostname, and cannot use a 
> simplified CNAME, in the thrift URL. If the 
> {{hive.metastore.kerberos.principal}} is specified using {{_HOST}}, one sees 
> GSS failures as follows:
> {noformat}
> hive --hiveconf hive.metastore.kerberos.principal=hive/_h...@grid.myth.net 
> --hiveconf 
> hive.metastore.uris="thrift://simplified-hcat-cname.grid.myth.net:56789"
> ...
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:542)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> ...
> {noformat}
> This is because {{_HOST}} is filled in with the CNAME, and not the 
> canonicalized name.
> # Oozie workflows that use HCat {{}} have to always use the VIP 
> hostname, and can't use {{_HOST}}-based service principals, if the CNAME 
> differs from the VIP name.
> If the client-code simply canonical-ized the hostnames, it would enable the 
> use of both simplified CNAMEs, and _HOST in service principals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-08-11 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-14731:
--
Attachment: HIVE-14731.19.patch

[~aplusplus] i've rebased the patch. i'm getting non-deterministic results in 
cross_prod_1-4 btw. Have you ever seen this?

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Gunther Hagleitner
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.18.patch, HIVE-14731.19.patch, HIVE-14731.1.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-08-11 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reassigned HIVE-14731:
-

Assignee: Zhiyuan Yang  (was: Gunther Hagleitner)

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.18.patch, HIVE-14731.19.patch, HIVE-14731.1.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-08-11 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reassigned HIVE-14731:
-

Assignee: Gunther Hagleitner  (was: Zhiyuan Yang)

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Gunther Hagleitner
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.18.patch, HIVE-14731.1.patch, HIVE-14731.2.patch, 
> HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, 
> HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17224) Move JDO classes to standalone metastore

2017-08-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124060#comment-16124060
 ] 

ASF GitHub Bot commented on HIVE-17224:
---

Github user alanfgates closed the pull request at:

https://github.com/apache/hive/pull/220


> Move JDO classes to standalone metastore
> 
>
> Key: HIVE-17224
> URL: https://issues.apache.org/jira/browse/HIVE-17224
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 3.0.0
>
> Attachments: HIVE-17224.patch
>
>
> The JDO model classes (MDatabase, MTable, etc.) and the package.jdo file that 
> defines the DB mapping need to be moved to the standalone metastore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe

2017-08-11 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124052#comment-16124052
 ] 

Daniel Dai commented on HIVE-17301:
---

+1 pending test.

> Make JSONMessageFactory.getTObj method thread safe
> --
>
> Key: HIVE-17301
> URL: https://issues.apache.org/jira/browse/HIVE-17301
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17301.1.patch
>
>
> This static method is using a singleton instance of TDeserializer, which is 
> not thread safe. Instead we want to create a new instance per method call. 
> This class is lightweight, so it should be fine from perf perspective. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124042#comment-16124042
 ] 

Hive QA commented on HIVE-17148:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881456/HIVE-17148.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 11003 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6357/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6357/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6357/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881456 - PreCommit-HIVE-Build

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, 
> HIVE-17148.3.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>

[jira] [Updated] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-11 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17283:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-17283.1.patch, HIVE-17283.2.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-11 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17281:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-17281.1.patch
>
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17224) Move JDO classes to standalone metastore

2017-08-11 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-17224:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Patch committed.  Thank you Vihang for the review.

> Move JDO classes to standalone metastore
> 
>
> Key: HIVE-17224
> URL: https://issues.apache.org/jira/browse/HIVE-17224
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 3.0.0
>
> Attachments: HIVE-17224.patch
>
>
> The JDO model classes (MDatabase, MTable, etc.) and the package.jdo file that 
> defines the DB mapping need to be moved to the standalone metastore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message

2017-08-11 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-17302:
--
Attachment: stack.txt

> ReduceRecordSource should not add batch string to Exception message
> ---
>
> Key: HIVE-17302
> URL: https://issues.apache.org/jira/browse/HIVE-17302
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
> Attachments: stack.txt
>
>
> ReduceRecordSource is adding the batch data as a string to the exception 
> stack, this can lead to an OOM of the Query AM when the query fails due to 
> other issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-11 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123943#comment-16123943
 ] 

Alan Gates commented on HIVE-8472:
--

+1

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17256) add a notion of a guaranteed task to LLAP

2017-08-11 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17256:

Attachment: HIVE-17256.01.patch

> add a notion of a guaranteed task to LLAP
> -
>
> Key: HIVE-17256
> URL: https://issues.apache.org/jira/browse/HIVE-17256
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17256.01.patch, HIVE-17256.patch
>
>
> Tasks are basically on two levels, guaranteed and speculative, with 
> speculative being the default. As long as noone uses the new flag, the tasks 
> behave the same.
> All the tasks that do have the flag also behave the same with regard to each 
> other.
> The difference is that a guaranteed task is always higher priority, and 
> preempts, a speculative task. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17256) add a notion of a guaranteed task to LLAP

2017-08-11 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123907#comment-16123907
 ] 

Sergey Shelukhin commented on HIVE-17256:
-

Will remove the wtf comment, it was just surprising.
There's a test for queue ordering (the first one), not sure if scheduling tests 
apply here, that would be in the AM patch. If you mean LLAP scheduling tests, 
that seems to be covered by the added tests.
Deadlock is possible with incorrect usage, similar to task priority inversions 
if they were to happen in the AM... 

> add a notion of a guaranteed task to LLAP
> -
>
> Key: HIVE-17256
> URL: https://issues.apache.org/jira/browse/HIVE-17256
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17256.patch
>
>
> Tasks are basically on two levels, guaranteed and speculative, with 
> speculative being the default. As long as noone uses the new flag, the tasks 
> behave the same.
> All the tasks that do have the flag also behave the same with regard to each 
> other.
> The difference is that a guaranteed task is always higher priority, and 
> preempts, a speculative task. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-08-11 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123901#comment-16123901
 ] 

Sergey Shelukhin commented on HIVE-12631:
-

Sorry, a couple more comments on RB.

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, 
> HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, 
> HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.24.patch, 
> HIVE-12631.25.patch, HIVE-12631.26.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, 
> HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14746) Remove branch from profiles by sending them from ptest-client

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123863#comment-16123863
 ] 

Hive QA commented on HIVE-14746:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881451/HIVE-14746.03.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6356/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6356/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6356/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hiveptest/logs/PreCommit-HIVE-Build-6356/succeeded/33-TestCliDriver-ql_rewrite_gbtoidx.q-json_serde1.q-constantPropWhen.q-and-27-more,
 remoteFile=/home/hiveptest/35.192.6.159-hiveptest-1/logs/, getExitCode()=255, 
getException()=null, getUser()=hiveptest, getHost()=35.192.6.159, 
getInstance()=1]: 'Warning: Permanently added '35.192.6.159' (ECDSA) to the 
list of known hosts.
receiving incremental file list
./
TEST-33-TestCliDriver-ql_rewrite_gbtoidx.q-json_serde1.q-constantPropWhen.q-and-27-more-TEST-org.apache.hadoop.hive.cli.TestCliDriver.xml

  0   0%0.00kB/s0:00:00  
  8,630 100%8.23MB/s0:00:00 (xfr#1, to-chk=5/7)
maven-test.txt

  0   0%0.00kB/s0:00:00  
 47,461 100%1.33MB/s0:00:00 (xfr#2, to-chk=4/7)
logs/
logs/derby.log

  0   0%0.00kB/s0:00:00  
996 100%   28.61kB/s0:00:00 (xfr#3, to-chk=1/7)
logs/hive.log

  0   0%0.00kB/s0:00:00  
 30,212,096   1%   28.81MB/s0:00:55  
 85,327,872   5%   40.71MB/s0:00:38  
142,082,048   8%   45.20MB/s0:00:33  
198,770,688  11%   47.43MB/s0:00:30  
255,459,328  15%   53.74MB/s0:00:25  
312,868,864  18%   54.29MB/s0:00:24  
369,655,808  22%   54.30MB/s0:00:23  
427,261,952  25%   54.48MB/s0:00:22  Timeout, server 35.192.6.159 not 
responding.

rsync: connection unexpectedly closed (484318874 bytes received so far) 
[receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) 
[receiver=3.1.1]
rsync: connection unexpectedly closed (441 bytes received so far) [generator]
rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.1]
ssh: connect to host 35.192.6.159 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
ssh: connect to host 35.192.6.159 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
ssh: connect to host 35.192.6.159 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
ssh: connect to host 35.192.6.159 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881451 - PreCommit-HIVE-Build

> Remove branch from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14746
> URL: https://issues.apache.org/jira/browse/HIVE-14746
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14746.01.patch, HIVE-14746.02.patch, 
> HIVE-14746.03.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes the branch name used to fetch the branch code. We 
> should get rid of this by detecting the branch from the 
> jenkins-execute-build.sh script, and send the information directly to 
> ptest-server as command line parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123835#comment-16123835
 ] 

Xuefu Zhang commented on HIVE-17291:


You're right. Thanks for the explanation.
+1 to the patch.


> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe

2017-08-11 Thread Tao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17301:
--
Status: Patch Available  (was: Open)

> Make JSONMessageFactory.getTObj method thread safe
> --
>
> Key: HIVE-17301
> URL: https://issues.apache.org/jira/browse/HIVE-17301
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17301.1.patch
>
>
> This static method is using a singleton instance of TDeserializer, which is 
> not thread safe. Instead we want to create a new instance per method call. 
> This class is lightweight, so it should be fine from perf perspective. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-11 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123826#comment-16123826
 ] 

Peter Vary commented on HIVE-17291:
---

Hi [~xuefuz],

I might not see the whole picture here, but my intention was to modify only the 
case when the dynamic allocation is not enabled.

The patch modifies only the {{SparkSessionImpl.getMemoryAndCores()}} method 
which is only used by {{SetSparkReducerParallelism.getSparkMemoryAndCores()}} 
method which looks like this:
{code:title=SetSparkReducerParallelism}
  private void getSparkMemoryAndCores(OptimizeSparkProcContext context) throws 
SemanticException {
if (sparkMemoryAndCores != null) {
  return;
}
if (context.getConf().getBoolean(SPARK_DYNAMIC_ALLOCATION_ENABLED, false)) {
  // If dynamic allocation is enabled, numbers for memory and cores are 
meaningless. So, we don't
  // try to get it.
  sparkMemoryAndCores = null;
  return;
}

[..]
try {
[..]
  sparkMemoryAndCores = sparkSession.getMemoryAndCores();
} catch (HiveException e) {
[..]
}
  }
{code}

If the above statements are true, then in case of dynamic allocation we do not 
use this data, and the number of reducers based only on the size of the data:
{code:title=SetSparkReducerParallelism}
  @Override
  public Object process(Node nd, Stack stack,
  NodeProcessorCtx procContext, Object... nodeOutputs)
  throws SemanticException {
[..]
  // Divide it by 2 so that we can have more reducers
  long bytesPerReducer = 
context.getConf().getLongVar(HiveConf.ConfVars.BYTESPERREDUCER) / 2;
  int numReducers = Utilities.estimateReducers(numberOfBytes, 
bytesPerReducer,
  maxReducers, false);

  getSparkMemoryAndCores(context); <-- In case of dynamic 
allocation this sets sparkMemoryAndCores to null
  if (sparkMemoryAndCores != null &&
  sparkMemoryAndCores.getFirst() > 0 && 
sparkMemoryAndCores.getSecond() > 0) {
// warn the user if bytes per reducer is much larger than memory 
per task
if ((double) sparkMemoryAndCores.getFirst() / bytesPerReducer < 
0.5) {
  LOG.warn("Average load of a reducer is much larger than its 
available memory. " +
  "Consider decreasing hive.exec.reducers.bytes.per.reducer");
}

// If there are more cores, use the number of cores
numReducers = Math.max(numReducers, 
sparkMemoryAndCores.getSecond());
  }
  numReducers = Math.min(numReducers, maxReducers);
  LOG.info("Set parallelism for reduce sink " + sink + " to: " + 
numReducers +
  " (calculated)");
  desc.setNumReducers(numReducers);
[..]
  }
{code}

Might missed something, since I am quite newby in this part of the code.

Thanks for taking the time and looking at this!
Peter

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe

2017-08-11 Thread Tao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17301:
--
Attachment: HIVE-17301.1.patch

> Make JSONMessageFactory.getTObj method thread safe
> --
>
> Key: HIVE-17301
> URL: https://issues.apache.org/jira/browse/HIVE-17301
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-17301.1.patch
>
>
> This static method is using a singleton instance of TDeserializer, which is 
> not thread safe. Instead we want to create a new instance per method call. 
> This class is lightweight, so it should be fine from perf perspective. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe

2017-08-11 Thread Tao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li reassigned HIVE-17301:
-

Assignee: Tao Li

> Make JSONMessageFactory.getTObj method thread safe
> --
>
> Key: HIVE-17301
> URL: https://issues.apache.org/jira/browse/HIVE-17301
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Tao Li
>Assignee: Tao Li
>
> This static method is using a singleton instance of TDeserializer, which is 
> not thread safe. Instead we want to create a new instance per method call. 
> This class is lightweight, so it should be fine from perf perspective. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe

2017-08-11 Thread Tao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-17301:
--
Component/s: Metastore

> Make JSONMessageFactory.getTObj method thread safe
> --
>
> Key: HIVE-17301
> URL: https://issues.apache.org/jira/browse/HIVE-17301
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Tao Li
>Assignee: Tao Li
>
> This static method is using a singleton instance of TDeserializer, which is 
> not thread safe. Instead we want to create a new instance per method call. 
> This class is lightweight, so it should be fine from perf perspective. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17267) Make HMS Notification Listeners typesafe

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123798#comment-16123798
 ] 

Hive QA commented on HIVE-17267:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881449/HIVE-17267.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6355/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6355/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6355/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881449 - PreCommit-HIVE-Build

> Make HMS Notification Listeners typesafe
> 
>
> Key: HIVE-17267
> URL: https://issues.apache.org/jira/browse/HIVE-17267
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17267.01.patch, HIVE-17267.02.patch, 
> HIVE-17267.03.patch
>
>
> Currently in the HMS we support two types of notification listeners, 
> transactional and non-transactional ones. Transactional listeners will only 
> be invoked if the jdbc transaction finished successfully while 
> non-transactional ones are supposed to be resilient and will be invoked in 
> any case, even for failures.
> Having the same type for these two is a source of confusion and opens the 
> door for misconfigurations. We should try to fix this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17294) LLAP: switch task heartbeats to protobuf

2017-08-11 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123787#comment-16123787
 ] 

Siddharth Seth edited comment on HIVE-17294 at 8/11/17 6:21 PM:


The endpoint is in the LLAP AM plugin. However, that extends an upstream tez 
plugin - so some translation will likely be required. That's an unnecessary 
step. Other than time to work on this, another thing to watch out for is the 
cost of serializing protobuf, and memory overhead of copying buffers with 
protobuf rpc engine.
At the moment, some parts of the system use the ProtobufRpcEngine from Hadoop, 
other parts use the WritableRpcEngine (specifically the task to AM 
communication).

When I say part of he work is done - it's the representation of various pieces 
of information in protobuf.


was (Author: sseth):
The endpoint is in the LLAP AM plugin. However, that extends an upstream tez 
plugin - so some translation will likely be required. That's an unnecessary 
step. Other than time to work on this, another thing to watch out for is the 
cost of serializing protobuf, and memory overhead of copying buffers with 
protobuf rpc engine.
At the moment, some parts of the system use the ProtobufRpcEngine from Hadoop, 
other parts use the WritableRpcEngine (specifically the task to AM 
communication).

> LLAP: switch task heartbeats to protobuf
> 
>
> Key: HIVE-17294
> URL: https://issues.apache.org/jira/browse/HIVE-17294
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17294) LLAP: switch task heartbeats to protobuf

2017-08-11 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123787#comment-16123787
 ] 

Siddharth Seth commented on HIVE-17294:
---

The endpoint is in the LLAP AM plugin. However, that extends an upstream tez 
plugin - so some translation will likely be required. That's an unnecessary 
step. Other than time to work on this, another thing to watch out for is the 
cost of serializing protobuf, and memory overhead of copying buffers with 
protobuf rpc engine.
At the moment, some parts of the system use the ProtobufRpcEngine from Hadoop, 
other parts use the WritableRpcEngine (specifically the task to AM 
communication).

> LLAP: switch task heartbeats to protobuf
> 
>
> Key: HIVE-17294
> URL: https://issues.apache.org/jira/browse/HIVE-17294
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

2017-08-11 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123761#comment-16123761
 ] 

Sankar Hariappan commented on HIVE-17289:
-

The test failures are irrelevant to this patch!

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> -
>
> Key: HIVE-17289
> URL: https://issues.apache.org/jira/browse/HIVE-17289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Export, Import, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123749#comment-16123749
 ] 

Xuefu Zhang commented on HIVE-17291:


Thanks for working on this, [~pvary]. The patch looks good. However, I was a 
little confused. The description suggests that we are fixing the case when 
dynamic allocation is not enabled. However, the code seemingly will get 
executed in either case. I'm not sure if it's proper to use 
{{spark.executor.instances}} when dynamic allocation is enabled. Any thoughts?

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16873) Remove Thread Cache From Logging

2017-08-11 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123732#comment-16123732
 ] 

Aihua Xu commented on HIVE-16873:
-

Yeah. That seems unnecessary. The change looks good to me.

+1.

> Remove Thread Cache From Logging
> 
>
> Key: HIVE-16873
> URL: https://issues.apache.org/jira/browse/HIVE-16873
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16873.1.patch, HIVE-16873.2.patch, 
> HIVE-16873.3.patch
>
>
> In {{org.apache.hadoop.hive.metastore.HiveMetaStore}} we have a {{Formatter}} 
> class (and its buffer) tied to every thread.
> This {{Formatter}} is for logging purposes. I would suggest that we simply 
> let let the logging framework itself handle these kind of details and ditch 
> the buffer per thread.
> {code}
> public static final String AUDIT_FORMAT =
> "ugi=%s\t" + // ugi
> "ip=%s\t" + // remote IP
> "cmd=%s\t"; // command
> public static final Logger auditLog = LoggerFactory.getLogger(
> HiveMetaStore.class.getName() + ".audit");
> private static final ThreadLocal auditFormatter =
> new ThreadLocal() {
>   @Override
>   protected Formatter initialValue() {
> return new Formatter(new StringBuilder(AUDIT_FORMAT.length() * 
> 4));
>   }
> };
> ...
> private static final void logAuditEvent(String cmd) {
>   final Formatter fmt = auditFormatter.get();
>   ((StringBuilder) fmt.out()).setLength(0);
>   String address = getIPAddress();
>   if (address == null) {
> address = "unknown-ip-addr";
>   }
>   auditLog.info(fmt.format(AUDIT_FORMAT, ugi.getUserName(),
>   address, cmd).toString());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123668#comment-16123668
 ] 

Hive QA commented on HIVE-17289:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881462/HIVE-17289.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth 
(batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth 
(batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=241)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6354/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6354/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6354/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881462 - PreCommit-HIVE-Build

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> -
>
> Key: HIVE-17289
> URL: https://issues.apache.org/jira/browse/HIVE-17289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Export, Import, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17300) WebUI query plan graphs

2017-08-11 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-17300:
--

Assignee: Karen Coppage

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, last_stage_error.png, last_stage_running.png, 
> non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie

2017-08-11 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-15767:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks [~gezapeti] for your contribution!

> Hive On Spark is not working on secure clusters from Oozie
> --
>
> Key: HIVE-15767
> URL: https://issues.apache.org/jira/browse/HIVE-15767
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Peter Cseh
>Assignee: Peter Cseh
> Fix For: 3.0.0
>
> Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch, 
> HIVE-15767.1.patch
>
>
> When a HiveAction is launched form Oozie with Hive On Spark enabled, we're 
> getting errors:
> {noformat}
> Caused by: java.io.IOException: Exception reading 
> file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens
> at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155)
> {noformat}
> This is caused by passing the {{mapreduce.job.credentials.binary}} property 
> to the Spark configuration in RemoteHiveSparkClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-11 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123582#comment-16123582
 ] 

Peter Vary commented on HIVE-17292:
---

Sorry I made a mistake when looking up the stuff again, and copied the wrong 
config name.

We have to set {{yarn.scheduler.increment-allocation-mb}} - Notice *increment* 
:)
This is a FairScheduler only configuration:
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Properties_that_can_be_placed_in_yarn-site.xml

??The fairscheduler grants memory in increments of this value. If you submit a 
task with resource request that is not a multiple of increment-allocation-mb, 
the request will be rounded up to the nearest increment. Defaults to 1024 MB.??

Also see, why is not in the yarn documentation (YARN-5902)
Sorry for the confusion :(

Peter


> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123550#comment-16123550
 ] 

Hive QA commented on HIVE-17261:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881406/HIVE-17261.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=9)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6353/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6353/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6353/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881406 - PreCommit-HIVE-Build

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
> Attachments: HIVE-17261.2.patch, HIVE-17261.3.patch, HIVE-17261.diff, 
> HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-11 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123544#comment-16123544
 ] 

Rui Li commented on HIVE-17292:
---

I mean we did set {{RM_SCHEDULER_MINIMUM_ALLOCATION_MB}} to 512 in the code:
{code}
public MiniSparkShim(Configuration conf, int numberOfTaskTrackers,
  String nameNode, int numDir) throws IOException {
  mr = new MiniSparkOnYARNCluster("sparkOnYarn");
  conf.set("fs.defaultFS", nameNode);
  conf.set("yarn.resourcemanager.scheduler.class", 
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler");
  // disable resource monitoring, although it should be off by default
  
conf.setBoolean(YarnConfiguration.YARN_MINICLUSTER_CONTROL_RESOURCE_MONITORING, 
false);
  conf.setInt(YarnConfiguration.YARN_MINICLUSTER_NM_PMEM_MB, 2048);
  conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 512);
  conf.setInt(YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB, 2048);
  configureImpersonation(conf);
  mr.init(conf);
  mr.start();
  this.conf = mr.getConfig();
}
{code}
Do you mean {{RM_SCHEDULER_MINIMUM_ALLOCATION_MB}} is different from 
{{yarn.scheduler.minimum-allocation-mb}}, or we set it in the wrong place?

> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17293) ETL split strategy not accounting for empty base and non-empty delta buckets

2017-08-11 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17293:
--
Component/s: Transactions

> ETL split strategy not accounting for empty base and non-empty delta buckets
> 
>
> Key: HIVE-17293
> URL: https://issues.apache.org/jira/browse/HIVE-17293
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 3.0.0, 2.4.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
>
> Observed an issue with customer case where there are 2 buckets (bucket_0 
> and bucket_1).
> Based bucket 0 had some rows whereas bucket 1 was empty.
> Delta bucket 0 and 1 had some rows.
> ETL split strategy did not generate OrcSplit for bucket 1 even though it had 
> some rows in delta directories.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14746) Remove branch from profiles by sending them from ptest-client

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123512#comment-16123512
 ] 

Hive QA commented on HIVE-14746:





{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/Hive-Build/10/testReport
Console output: https://builds.apache.org/job/Hive-Build/10/console
Test logs: http://104.199.114.197/logs/Hive-Build-10/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hiveptest/logs/Hive-Build-10/succeeded/54-TestCliDriver-push_or.q-encryption_move_tbl.q-vectorization_5.q-and-27-more,
 remoteFile=/home/hiveptest/35.184.192.137-hiveptest-0/logs/, 
getExitCode()=255, getException()=null, getUser()=hiveptest, 
getHost()=35.184.192.137, getInstance()=0]: 'ssh: connect to host 
35.184.192.137 port 22: Connection refused
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
ssh: connect to host 35.184.192.137 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
ssh: connect to host 35.184.192.137 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
ssh: connect to host 35.184.192.137 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
ssh: connect to host 35.184.192.137 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
'
{noformat}

This message is automatically generated.

ATTACHMENT ID:  - Hive-Build

> Remove branch from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14746
> URL: https://issues.apache.org/jira/browse/HIVE-14746
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14746.01.patch, HIVE-14746.02.patch, 
> HIVE-14746.03.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes the branch name used to fetch the branch code. We 
> should get rid of this by detecting the branch from the 
> jenkins-execute-build.sh script, and send the information directly to 
> ptest-server as command line parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17268) WebUI / QueryPlan: query plan is sometimes null when explain output conf is on

2017-08-11 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-17268:
-
Attachment: HIVE-17268.3.patch

> WebUI / QueryPlan: query plan is sometimes null when explain output conf is on
> --
>
> Key: HIVE-17268
> URL: https://issues.apache.org/jira/browse/HIVE-17268
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
> Attachments: HIVE-17268.2.patch, HIVE-17268.3.patch, HIVE-17268.patch
>
>
> The Hive WebUI's Query Plan tab displays "SET hive.log.explain.output TO true 
> TO VIEW PLAN" even when hive.log.explain.output is set to true, when the 
> query cannot be compiled, because the plan is null in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16873) Remove Thread Cache From Logging

2017-08-11 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123420#comment-16123420
 ] 

BELUGA BEHR commented on HIVE-16873:


[~aihuaxu]

> Remove Thread Cache From Logging
> 
>
> Key: HIVE-16873
> URL: https://issues.apache.org/jira/browse/HIVE-16873
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16873.1.patch, HIVE-16873.2.patch, 
> HIVE-16873.3.patch
>
>
> In {{org.apache.hadoop.hive.metastore.HiveMetaStore}} we have a {{Formatter}} 
> class (and its buffer) tied to every thread.
> This {{Formatter}} is for logging purposes. I would suggest that we simply 
> let let the logging framework itself handle these kind of details and ditch 
> the buffer per thread.
> {code}
> public static final String AUDIT_FORMAT =
> "ugi=%s\t" + // ugi
> "ip=%s\t" + // remote IP
> "cmd=%s\t"; // command
> public static final Logger auditLog = LoggerFactory.getLogger(
> HiveMetaStore.class.getName() + ".audit");
> private static final ThreadLocal auditFormatter =
> new ThreadLocal() {
>   @Override
>   protected Formatter initialValue() {
> return new Formatter(new StringBuilder(AUDIT_FORMAT.length() * 
> 4));
>   }
> };
> ...
> private static final void logAuditEvent(String cmd) {
>   final Formatter fmt = auditFormatter.get();
>   ((StringBuilder) fmt.out()).setLength(0);
>   String address = getIPAddress();
>   if (address == null) {
> address = "unknown-ip-addr";
>   }
>   auditLog.info(fmt.format(AUDIT_FORMAT, ugi.getUserName(),
>   address, cmd).toString());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17288) LlapOutputFormatService: Increase netty event loop threads

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123421#comment-16123421
 ] 

Hive QA commented on HIVE-17288:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881150/HIVE-17288.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6352/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6352/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6352/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hiveptest/logs/PreCommit-HIVE-Build-6352/failed/225_TestSSL, 
remoteFile=/home/hiveptest/35.202.179.168-hiveptest-1/logs/, getExitCode()=23, 
getException()=null, getUser()=hiveptest, getHost()=35.202.179.168, 
getInstance()=1]: 'Warning: Permanently added '35.202.179.168' (ECDSA) to the 
list of known hosts.
receiving incremental file list
rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No 
such file or directory (2)

sent 8 bytes  received 123 bytes  262.00 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 
23) at main.c(1655) [Receiver=3.1.1]
rsync: [Receiver] write error: Broken pipe (32)
Warning: Permanently added '35.202.179.168' (ECDSA) to the list of known hosts.
receiving incremental file list
rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No 
such file or directory (2)

sent 8 bytes  received 123 bytes  87.33 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 
23) at main.c(1655) [Receiver=3.1.1]
rsync: [Receiver] write error: Broken pipe (32)
Warning: Permanently added '35.202.179.168' (ECDSA) to the list of known hosts.
receiving incremental file list
rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No 
such file or directory (2)

sent 8 bytes  received 123 bytes  87.33 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 
23) at main.c(1655) [Receiver=3.1.1]
rsync: [Receiver] write error: Broken pipe (32)
Warning: Permanently added '35.202.179.168' (ECDSA) to the list of known hosts.
receiving incremental file list
rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No 
such file or directory (2)

sent 8 bytes  received 123 bytes  262.00 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 
23) at main.c(1655) [Receiver=3.1.1]
rsync: [Receiver] write error: Broken pipe (32)
Warning: Permanently added '35.202.179.168' (ECDSA) to the list of known hosts.
receiving incremental file list
rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No 
such file or directory (2)

sent 8 bytes  received 123 bytes  87.33 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 
23) at main.c(1655) [Receiver=3.1.1]
rsync: [Receiver] write error: Broken pipe (32)
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881150 - PreCommit-HIVE-Build

> LlapOutputFormatService: Increase netty event loop threads
> --
>
> Key: HIVE-17288
> URL: https://issues.apache.org/jira/browse/HIVE-17288
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-17288.1.patch
>
>
> Currently it is set to 1 which would be used for parent both acceptor and 
> client groups. It would be good to leave it at default, which sets the number 
> of threads to "number of processors * 2". It can be modified later via 
> {{-Dio.netty.eventLoopThreads}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16873) Remove Thread Cache From Logging

2017-08-11 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-16873:
---
Attachment: HIVE-16873.3.patch

> Remove Thread Cache From Logging
> 
>
> Key: HIVE-16873
> URL: https://issues.apache.org/jira/browse/HIVE-16873
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16873.1.patch, HIVE-16873.2.patch, 
> HIVE-16873.3.patch
>
>
> In {{org.apache.hadoop.hive.metastore.HiveMetaStore}} we have a {{Formatter}} 
> class (and its buffer) tied to every thread.
> This {{Formatter}} is for logging purposes. I would suggest that we simply 
> let let the logging framework itself handle these kind of details and ditch 
> the buffer per thread.
> {code}
> public static final String AUDIT_FORMAT =
> "ugi=%s\t" + // ugi
> "ip=%s\t" + // remote IP
> "cmd=%s\t"; // command
> public static final Logger auditLog = LoggerFactory.getLogger(
> HiveMetaStore.class.getName() + ".audit");
> private static final ThreadLocal auditFormatter =
> new ThreadLocal() {
>   @Override
>   protected Formatter initialValue() {
> return new Formatter(new StringBuilder(AUDIT_FORMAT.length() * 
> 4));
>   }
> };
> ...
> private static final void logAuditEvent(String cmd) {
>   final Formatter fmt = auditFormatter.get();
>   ((StringBuilder) fmt.out()).setLength(0);
>   String address = getIPAddress();
>   if (address == null) {
> address = "unknown-ip-addr";
>   }
>   auditLog.info(fmt.format(AUDIT_FORMAT, ugi.getUserName(),
>   address, cmd).toString());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17300) WebUI query plan graphs

2017-08-11 Thread Karen Coppage (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123413#comment-16123413
 ] 

Karen Coppage commented on HIVE-17300:
--

[~xuefuz] Thanks very much for the suggestion to create a JIRA on this topic. 

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, last_stage_error.png, last_stage_running.png, 
> non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores

2017-08-11 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123383#comment-16123383
 ] 

Peter Vary commented on HIVE-17292:
---

Yeah, I was able to identify the root cause of the problem.
When a scheduler is used then there is an additional configuration value for 
the minimum allocation. See:
https://github.com/apache/hadoop/blob/branch-2.8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java#L161

This is defined by {{yarn.scheduler.minimum-allocation-mb}}. By default this is 
set to {{1024}}, so the minimum memory allocation is 1G. Also this is 
documented here:
https://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

??The minimum allocation for every container request at the RM, in MBs. Memory 
requests lower than this will throw a InvalidResourceRequestException.??

When I set {{yarn.scheduler.minimum-allocation-mb}} to {{512}} in 
hive-site.xml, then I will get 4 reducers as expected.

As for the test results - we have to wait for HIVE-17291 to get in to have 
consistent outputs.

Thanks,
Peter


> Change TestMiniSparkOnYarnCliDriver test configuration to use the configured 
> cores
> --
>
> Key: HIVE-17292
> URL: https://issues.apache.org/jira/browse/HIVE-17292
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17292.1.patch
>
>
> Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test 
> defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster 
> does not allows the creation of the 3rd container.
> The FairScheduler uses 1GB increments for memory, but the containers would 
> like to use only 512MB. We should change the fairscheduler configuration to 
> use only the requested 512MB



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

2017-08-11 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123051#comment-16123051
 ] 

Sankar Hariappan edited comment on HIVE-17289 at 8/11/17 1:59 PM:
--

Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the 
user config hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to 
hive.distcp.privileged.doAs if lazy copy is true and null if false. This is 
just to avoid passing this argument from multiple flows and also, the 
incremental REPL LOAD shares common code with IMPORT.
- Enabled distcp for copy within same file systems in case of large number of 
files or large size files.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the 
CopyUtils implementation which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files 
read and actual data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common 
for dump/load.
- No tests added as the existing tests itself will cover the changes except 
distcp flow (due to hive.in.test) which needs to be tested manually.

Request [~thejas]/[~daijy] to please review it!


was (Author: sankarh):
Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the 
user config hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to 
hive.distcp.privileged.doAs if lazy copy is true and null if false. This is 
just to avoid passing this argument from multiple flows and also, the 
incremental REPL LOAD shares common code with IMPORT.
- Enabled distcp for copy within same file systems in case of large number of 
files or large size files.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the 
CopyUtils implementation which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files 
read and actual data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common 
for dump/load.
- No tests added as the existing tests itself will cover the changes except 
distcp flow (due to hive.in.test) which needs to be tested manually.

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> -
>
> Key: HIVE-17289
> URL: https://issues.apache.org/jira/browse/HIVE-17289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Export, Import, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17300) WebUI query plan graphs

2017-08-11 Thread Karen Coppage (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-17300:
-
Attachment: complete_success.png
full_mapred_stats.png
graph_with_mapred_stats.png
last_stage_error.png
last_stage_running.png
non_mapred_task_selected.png

> WebUI query plan graphs
> ---
>
> Key: HIVE-17300
> URL: https://issues.apache.org/jira/browse/HIVE-17300
> Project: Hive
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Karen Coppage
> Attachments: complete_success.png, full_mapred_stats.png, 
> graph_with_mapred_stats.png, last_stage_error.png, last_stage_running.png, 
> non_mapred_task_selected.png
>
>
> Hi all,
> I’m working on a feature of the Hive WebUI Query Plan tab that would provide 
> the option to display the query plan as a nice graph (scroll down for 
> screenshots). If you click on one of the graph’s stages, the plan for that 
> stage appears as text below. 
> Stages are color-coded if they have a status (Success, Error, Running), and 
> the rest are grayed out. Coloring is based on status already available in the 
> WebUI, under the Stages tab.
> There is an additional option to display stats for MapReduce tasks. This 
> includes the job’s ID, tracking URL (where the logs are found), and mapper 
> and reducer numbers/progress, among other info. 
> The library I’m using for the graph is called vis.js (http://visjs.org/). It 
> has an Apache license, and the only necessary file to be included from this 
> library is about 700 KB.
> I tried to keep server-side changes minimal, and graph generation is taken 
> care of by the client. Plans with more than a given number of stages 
> (default: 25) won't be displayed in order to preserve resources.
> I’d love to hear any and all input from the community about this feature: do 
> you think it’s useful, and is there anything important I’m missing?
> Thanks,
> Karen Coppage



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information

2017-08-11 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123370#comment-16123370
 ] 

Peter Vary commented on HIVE-17291:
---

Failures are not related.

[~lirui], [~xuefuz], could you please review?

Thanks,
Peter

> Set the number of executors based on config if client does not provide 
> information
> --
>
> Key: HIVE-17291
> URL: https://issues.apache.org/jira/browse/HIVE-17291
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17291.1.patch
>
>
> When calculating the memory and cores and the client does not provide 
> information we should try to use the one provided by default. This can happen 
> on startup, when {{spark.dynamicAllocation.enabled}} is not enabled



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

2017-08-11 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17289:

Status: Patch Available  (was: Open)

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> -
>
> Key: HIVE-17289
> URL: https://issues.apache.org/jira/browse/HIVE-17289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Export, Import, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

2017-08-11 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17289:

Attachment: HIVE-17289.01.patch

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> -
>
> Key: HIVE-17289
> URL: https://issues.apache.org/jira/browse/HIVE-17289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Export, Import, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

2017-08-11 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17289:

Attachment: (was: HIVE-17289.01.patch)

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> -
>
> Key: HIVE-17289
> URL: https://issues.apache.org/jira/browse/HIVE-17289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Export, Import, replication
> Fix For: 3.0.0
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

2017-08-11 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17289:

Status: Open  (was: Patch Available)

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> -
>
> Key: HIVE-17289
> URL: https://issues.apache.org/jira/browse/HIVE-17289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Export, Import, replication
> Fix For: 3.0.0
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14746) Remove branch from profiles by sending them from ptest-client

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123349#comment-16123349
 ] 

Hive QA commented on HIVE-14746:





{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/Hive-Build/9/testReport
Console output: https://builds.apache.org/job/Hive-Build/9/console
Test logs: http://104.199.114.197/logs/Hive-Build-9/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/branch-2.3-working/scratch/source-prep.sh' failed 
with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-08-11 13:40:40.425
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ MAVEN_OPTS='-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hiveptest/branch-2.3-working/
+ tee /data/hiveptest/logs/Hive-Build-9/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z branch-2.3 ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source ]]
+ git clone https://github.com/apache/hive.git apache-github-source-source
Cloning into 'apache-github-source-source'...
+ date '+%Y-%m-%d %T.%3N'
2017-08-11 13:41:08.117
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at efa5b54 HIVE-17008: Fix boolean flag switchup in DropTableEvent 
(Dan Burkert, reviewed by Mohit Sabharwal and Peter Vary)
+ git clean -f -d
+ git checkout branch-2.3
Switched to a new branch 'branch-2.3'
Branch branch-2.3 set up to track remote branch branch-2.3 from origin.
+ git reset --hard origin/branch-2.3
HEAD is now at 6f4c35c Release Notes
+ git merge --ff-only origin/branch-2.3
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-08-11 13:41:10.272
+ 
patchCommandPath=/data/hiveptest/branch-2.3-working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/branch-2.3-working/scratch/build.patch
+ [[ -f /data/hiveptest/branch-2.3-working/scratch/build.patch ]]
+ [[ maven == \m\a\v\e\n ]]
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/branch-2.3-working/maven
[ERROR] Failed to execute goal on project hive-hcatalog: Could not resolve 
dependencies for project org.apache.hive.hcatalog:hive-hcatalog:pom:2.3.0: 
Failure to transfer javax.xml.bind:jaxb-api:jar:2.2.2 from 
http://www.datanucleus.org/downloads/maven2 was cached in the local repository, 
resolution will not be reattempted until the update interval of datanucleus has 
elapsed or updates are forced. Original error: Could not transfer artifact 
javax.xml.bind:jaxb-api:jar:2.2.2 from/to datanucleus 
(http://www.datanucleus.org/downloads/maven2): Connect to localhost:3128 
[localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused 
(Connection refused) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-hcatalog
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID:  - Hive-Build

> Remove branch from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14746
> URL: https://issues.apache.org/jira/browse/HIVE-14746
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14746.01.patch, HIVE-14746.02.patch, 
> HIVE-14746.03.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes the branch name used to fetch the branch code. We 
> should get rid of this by detecting the branch from the 
> jenkins-execute-build.sh script, and send the information directly to 
> ptest-server as command line parameters.



--
This message was sent by Atlass

[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-11 Thread Vlad Gudikov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.3.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, 
> HIVE-17148.3.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-11 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123320#comment-16123320
 ] 

Hive QA commented on HIVE-17261:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881376/HIVE-17261.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 11002 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] 
(batchId=9)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hadoop.hive.ql.io.parquet.TestParquetRowGroupFilter.testRowGroupFilterTakeEffect
 (batchId=263)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=183)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalXY 
(batchId=183)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTimestamp 
(batchId=183)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6351/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6351/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6351/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881376 - PreCommit-HIVE-Build

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
> Attachments: HIVE-17261.2.patch, HIVE-17261.3.patch, HIVE-17261.diff, 
> HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-14746) Remove branch from profiles by sending them from ptest-client

2017-08-11 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-14746:
---
Attachment: HIVE-14746.03.patch

> Remove branch from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14746
> URL: https://issues.apache.org/jira/browse/HIVE-14746
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14746.01.patch, HIVE-14746.02.patch, 
> HIVE-14746.03.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes the branch name used to fetch the branch code. We 
> should get rid of this by detecting the branch from the 
> jenkins-execute-build.sh script, and send the information directly to 
> ptest-server as command line parameters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.

2017-08-11 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123051#comment-16123051
 ] 

Sankar Hariappan edited comment on HIVE-17289 at 8/11/17 11:49 AM:
---

Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the 
user config hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to 
hive.distcp.privileged.doAs if lazy copy is true and null if false. This is 
just to avoid passing this argument from multiple flows and also, the 
incremental REPL LOAD shares common code with IMPORT.
- Enabled distcp for copy within same file systems in case of large number of 
files or large size files.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the 
CopyUtils implementation which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files 
read and actual data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common 
for dump/load.
- No tests added as the existing tests itself will cover the changes except 
distcp flow (due to hive.in.test) which needs to be tested manually.


was (Author: sankarh):
Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the 
user config hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to 
hive.distcp.privileged.doAs if lazy copy is true and null if false. This is 
just to avoid passing this argument from multiple flows and also, the 
incremental REPL LOAD shares common code with IMPORT.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the 
CopyUtils implementation which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files 
read and actual data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common 
for dump/load.
- No tests added as the existing tests itself will cover the changes except 
distcp flow (due to hive.in.test) which needs to be tested manually.

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> -
>
> Key: HIVE-17289
> URL: https://issues.apache.org/jira/browse/HIVE-17289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Export, Import, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-08-11 Thread Barna Zsombor Klara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123226#comment-16123226
 ] 

Barna Zsombor Klara commented on HIVE-14747:


[~spena] the last comment was generated by an instance of PTest which contained 
this patch. I ran it to validate my change. Do you have any other comments or 
suggestions?

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch, 
> HIVE-14747.03.patch, HIVE-14747.04.patch, HIVE-14747.05.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17267) Make HMS Notification Listeners typesafe

2017-08-11 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-17267:
---
Attachment: HIVE-17267.03.patch

> Make HMS Notification Listeners typesafe
> 
>
> Key: HIVE-17267
> URL: https://issues.apache.org/jira/browse/HIVE-17267
> Project: Hive
>  Issue Type: Bug
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17267.01.patch, HIVE-17267.02.patch, 
> HIVE-17267.03.patch
>
>
> Currently in the HMS we support two types of notification listeners, 
> transactional and non-transactional ones. Transactional listeners will only 
> be invoked if the jdbc transaction finished successfully while 
> non-transactional ones are supposed to be resilient and will be invoked in 
> any case, even for failures.
> Having the same type for these two is a source of confusion and opens the 
> door for misconfigurations. We should try to fix this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17251) Remove usage of org.apache.pig.ResourceStatistics#setmBytes method in HCatLoader

2017-08-11 Thread Nandor Kollar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123219#comment-16123219
 ] 

Nandor Kollar commented on HIVE-17251:
--

Thank you all for taking care of my ticket!

> Remove usage of org.apache.pig.ResourceStatistics#setmBytes method in 
> HCatLoader
> 
>
> Key: HIVE-17251
> URL: https://issues.apache.org/jira/browse/HIVE-17251
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Reporter: Nandor Kollar
>Assignee: Adam Szita
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-17251.0.patch
>
>
> org.apache.pig.ResourceStatistics#setmBytes is marked as deprecated, and is 
> going to be removed from Pig. Is it possible to use use the the proper 
> replacement method (ResourceStatistics#setSizeInBytes) instead?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-11 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123204#comment-16123204
 ] 

Rui Li commented on HIVE-17287:
---

[~kellyzly], I mean you can check each of the group keys to see how they are 
skewed.

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: query67-fail-at-groupby.png, 
> query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-11 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123123#comment-16123123
 ] 

liyunzhang_intel commented on HIVE-17287:
-

[~lirui] :
bq.You can run some statistics on the group key to confirm

not very understand, you mean "add select count(i_category), i_category, to 
see the number of every key"?
bq.  what will the metrics look like if you enable hive.groupby.skewindata?
before i enabled {{hive.groupby.skewindata}}, still hangs on the group by stage 
after  sending the data randomly.


> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: query67-fail-at-groupby.png, 
> query67-groupby_shuffle_metric.png
>
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 >

1 - 100 of 125 matches

Mail list logo