date:20170728

[jira] [Updated] (HIVE-17174) LLAP: ShuffleHandler: optimize fadvise calls for broadcast edge

2017-07-28 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-17174:

Attachment: HIVE-17174.2.patch

Thanks [~gopalv]. Changed to {{llap.shuffle.os.cache.always.evict}} which 
defaults to false. By default, it would evict partitions which are greater than 
0.

> LLAP: ShuffleHandler: optimize fadvise calls for broadcast edge
> ---
>
> Key: HIVE-17174
> URL: https://issues.apache.org/jira/browse/HIVE-17174
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-17174.1.patch, HIVE-17174.2.patch
>
>
> Currently, once the data is transferred `fadvise` call is invoked to throw 
> away the pages. This may not be very helpful in broadcast, as it would tend 
> to transfer the same data to multiple downstream tasks. 
> e.g Q50 at 1 TB scale
> {noformat}
>   Edges:
> Map 1 <- Map 5 (BROADCAST_EDGE)
> Map 6 <- Reducer 2 (BROADCAST_EDGE), Reducer 3 (BROADCAST_EDGE), 
> Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 4 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 7 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 10 (BROADCAST_EDGE), Map 
> 11 (BROADCAST_EDGE), Map 6 (CUSTOM_SIMPLE_EDGE)
> Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
> Reducer 9 <- Reducer 8 (SIMPLE_EDGE)
> Status: Running (Executing on YARN cluster with App id 
> application_1490656001509_6084)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 5 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED 11 1100  
>  0   0
> Reducer 4 ..  llap SUCCEEDED  1  100  
>  0   0
> Reducer 2 ..  llap SUCCEEDED  1  100  
>  0   0
> Reducer 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 6 ..  llap SUCCEEDED13913900  
>  0   0
> Map 10 .  llap SUCCEEDED  1  100  
>  0   0
> Map 11 .  llap SUCCEEDED  1  100  
>  0   0
> Reducer 7 ..  llap SUCCEEDED83483400  
>  0   0
> Reducer 8 ..  llap SUCCEEDED 24 2400  
>  0   0
> Reducer 9 ..  llap SUCCEEDED  1  100  
>  0   0
> --
> e.g count of evictions on files
> 139 
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_05_00_0_18387/file.out
> 834 
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_07_00_0_18420_1/file.out
> 834 
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_6084/1/output/attempt_1490656001509_6084_1_07_00_0_18420_2/file.out
>
> {noformat}
> It would be good to fadvise for cases when "partition != 0". This would help 
> retaining the pages for broadcast.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17057) Flaky test: TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema

2017-07-28 Thread PRASHANT GOLASH (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104609#comment-16104609
 ] 

PRASHANT GOLASH commented on HIVE-17057:


It looks that in [https://issues.apache.org/jira/browse/HIVE-16844] caused the 
issue and the committer is tracking the fix in 
[https://issues.apache.org/jira/browse/HIVE-16908]. Would you like to resolve 
this as duplicate or just add this JIRA to other one?

> Flaky test: 
> TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema
> -
>
> Key: HIVE-17057
> URL: https://issues.apache.org/jira/browse/HIVE-17057
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janaki Lahorani
>Assignee: PRASHANT GOLASH
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104630#comment-16104630
 ] 

Hive QA commented on HIVE-16965:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879285/HIVE-16965.8.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.spark.client.TestSparkClient.testJobSubmission (batchId=288)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6165/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6165/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6165/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879285 - PreCommit-HIVE-Build

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-28 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104637#comment-16104637
 ] 

liyunzhang_intel commented on HIVE-16948:
-

[~lirui]:  thanks for your catch, it needs to remove the whole map work if 
there is no branch in map work contain spark pruning sink operator.  update 
HIVE-16948.2.patch.

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: N

[jira] [Commented] (HIVE-17184) Unexpected new line in beeline output when running with -f option

2017-07-28 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104648#comment-16104648
 ] 

Peter Vary commented on HIVE-17184:
---

+1 LGTM

> Unexpected new line in beeline output when running with -f option
> -
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-28 Thread liyunzhang_intel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-16948:

Attachment: HIVE-16948.2.patch

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.2.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: string)
>   outputColumnNames: _col0
>   Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNa

[jira] [Updated] (HIVE-17194) JDBC: Implement Gzip servlet filter

2017-07-28 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17194:
---
Attachment: HIVE-17194.1.patch

> JDBC: Implement Gzip servlet filter
> ---
>
> Key: HIVE-17194
> URL: https://issues.apache.org/jira/browse/HIVE-17194
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17194.1.patch
>
>
> {code}
> POST /cliservice HTTP/1.1
> Content-Type: application/x-thrift
> Accept: application/x-thrift
> User-Agent: Java/THttpClient/HC
> Authorization: Basic YW5vbnltb3VzOmFub255bW91cw==
> Content-Length: 71
> Host: localhost:10007
> Connection: Keep-Alive
> Accept-Encoding: gzip,deflate
> X-XSRF-HEADER: true
> {code}
> The Beeline client clearly sends out HTTP compression headers which are 
> ignored by the HTTP service layer in HS2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17194) JDBC: Implement Gzip servlet filter

2017-07-28 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17194:
---
Status: Patch Available  (was: Open)

> JDBC: Implement Gzip servlet filter
> ---
>
> Key: HIVE-17194
> URL: https://issues.apache.org/jira/browse/HIVE-17194
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17194.1.patch
>
>
> {code}
> POST /cliservice HTTP/1.1
> Content-Type: application/x-thrift
> Accept: application/x-thrift
> User-Agent: Java/THttpClient/HC
> Authorization: Basic YW5vbnltb3VzOmFub255bW91cw==
> Content-Length: 71
> Host: localhost:10007
> Connection: Keep-Alive
> Accept-Encoding: gzip,deflate
> X-XSRF-HEADER: true
> {code}
> The Beeline client clearly sends out HTTP compression headers which are 
> ignored by the HTTP service layer in HS2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17194) JDBC: Implement Gzip servlet filter

2017-07-28 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17194:
---
Status: Open  (was: Patch Available)

> JDBC: Implement Gzip servlet filter
> ---
>
> Key: HIVE-17194
> URL: https://issues.apache.org/jira/browse/HIVE-17194
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17194.1.patch
>
>
> {code}
> POST /cliservice HTTP/1.1
> Content-Type: application/x-thrift
> Accept: application/x-thrift
> User-Agent: Java/THttpClient/HC
> Authorization: Basic YW5vbnltb3VzOmFub255bW91cw==
> Content-Length: 71
> Host: localhost:10007
> Connection: Keep-Alive
> Accept-Encoding: gzip,deflate
> X-XSRF-HEADER: true
> {code}
> The Beeline client clearly sends out HTTP compression headers which are 
> ignored by the HTTP service layer in HS2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.

2017-07-28 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-17195:
---


> Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
> --
>
> Key: HIVE-17195
> URL: https://issues.apache.org/jira/browse/HIVE-17195
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Fix For: 3.0.0
>
>
> Currently, long chain REPL LOAD tasks lead to huge recursive calls when try 
> to traverse the DAG.
> For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods 
> run recursively to traverse the DAG.
> Need to modify this traversal logic to reduce stack usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17194) JDBC: Implement Gzip servlet filter

2017-07-28 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17194:
---
Attachment: (was: HIVE-17194.1.patch)

> JDBC: Implement Gzip servlet filter
> ---
>
> Key: HIVE-17194
> URL: https://issues.apache.org/jira/browse/HIVE-17194
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17194.1.patch
>
>
> {code}
> POST /cliservice HTTP/1.1
> Content-Type: application/x-thrift
> Accept: application/x-thrift
> User-Agent: Java/THttpClient/HC
> Authorization: Basic YW5vbnltb3VzOmFub255bW91cw==
> Content-Length: 71
> Host: localhost:10007
> Connection: Keep-Alive
> Accept-Encoding: gzip,deflate
> X-XSRF-HEADER: true
> {code}
> The Beeline client clearly sends out HTTP compression headers which are 
> ignored by the HTTP service layer in HS2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17194) JDBC: Implement Gzip servlet filter

2017-07-28 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17194:
---
Attachment: HIVE-17194.1.patch

> JDBC: Implement Gzip servlet filter
> ---
>
> Key: HIVE-17194
> URL: https://issues.apache.org/jira/browse/HIVE-17194
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17194.1.patch
>
>
> {code}
> POST /cliservice HTTP/1.1
> Content-Type: application/x-thrift
> Accept: application/x-thrift
> User-Agent: Java/THttpClient/HC
> Authorization: Basic YW5vbnltb3VzOmFub255bW91cw==
> Content-Length: 71
> Host: localhost:10007
> Connection: Keep-Alive
> Accept-Encoding: gzip,deflate
> X-XSRF-HEADER: true
> {code}
> The Beeline client clearly sends out HTTP compression headers which are 
> ignored by the HTTP service layer in HS2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17194) JDBC: Implement Gzip servlet filter

2017-07-28 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17194:
---
Issue Type: Improvement  (was: Bug)

> JDBC: Implement Gzip servlet filter
> ---
>
> Key: HIVE-17194
> URL: https://issues.apache.org/jira/browse/HIVE-17194
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17194.1.patch
>
>
> {code}
> POST /cliservice HTTP/1.1
> Content-Type: application/x-thrift
> Accept: application/x-thrift
> User-Agent: Java/THttpClient/HC
> Authorization: Basic YW5vbnltb3VzOmFub255bW91cw==
> Content-Length: 71
> Host: localhost:10007
> Connection: Keep-Alive
> Accept-Encoding: gzip,deflate
> X-XSRF-HEADER: true
> {code}
> The Beeline client clearly sends out HTTP compression headers which are 
> ignored by the HTTP service layer in HS2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17194) JDBC: Implement Gzip servlet filter

2017-07-28 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-17194:
---
Status: Patch Available  (was: Open)

> JDBC: Implement Gzip servlet filter
> ---
>
> Key: HIVE-17194
> URL: https://issues.apache.org/jira/browse/HIVE-17194
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17194.1.patch
>
>
> {code}
> POST /cliservice HTTP/1.1
> Content-Type: application/x-thrift
> Accept: application/x-thrift
> User-Agent: Java/THttpClient/HC
> Authorization: Basic YW5vbnltb3VzOmFub255bW91cw==
> Content-Length: 71
> Host: localhost:10007
> Connection: Keep-Alive
> Accept-Encoding: gzip,deflate
> X-XSRF-HEADER: true
> {code}
> The Beeline client clearly sends out HTTP compression headers which are 
> ignored by the HTTP service layer in HS2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Work stopped] (HIVE-17100) Improve HS2 operation logs for REPL commands.

2017-07-28 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17100 stopped by Sankar Hariappan.
---
> Improve HS2 operation logs for REPL commands.
> -
>
> Key: HIVE-17100
> URL: https://issues.apache.org/jira/browse/HIVE-17100
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> It is necessary to log the progress the replication tasks in a structured 
> manner as follows.
> *+Bootstrap Dump:+*
> * At the start of bootstrap dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (BOOTSTRAP)
> * (Estimated) Total number of tables/views to dump
> * (Estimated) Total number of functions to dump.
> * Dump Start Time{color}
> * After each table dump, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table dump end time
> * Table dump progress. Format is Table sequence no/(Estimated) Total number 
> of tables and views.{color}
> * After each function dump, will add a log as follows
> {color:#59afe1}* Function Name
> * Function dump end time
> * Function dump progress. Format is Function sequence no/(Estimated) Total 
> number of functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> dump.
> {color:#59afe1}* Database Name.
> * Dump Type (BOOTSTRAP).
> * Dump End Time.
> * (Actual) Total number of tables/views dumped.
> * (Actual) Total number of functions dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The actual and estimated number of tables/functions may not match if 
> any table/function is dropped when dump in progress.
> *+Bootstrap Load:+*
> * At the start of bootstrap load, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump directory
> * Load Type (BOOTSTRAP)
> * Total number of tables/views to load
> * Total number of functions to load.
> * Load Start Time{color}
> * After each table load, will add a log as follows
> {color:#59afe1}* Table/View Name
> * Type (TABLE/VIEW/MATERIALIZED_VIEW)
> * Table load completion time
> * Table load progress. Format is Table sequence no/Total number of tables and 
> views.{color}
> * After each function load, will add a log as follows
> {color:#59afe1}* Function Name
> * Function load completion time
> * Function load progress. Format is Function sequence no/Total number of 
> functions.{color}
> * After completion of all dumps, will add a log as follows to consolidate the 
> load.
> {color:#59afe1}* Database Name.
> * Load Type (BOOTSTRAP).
> * Load End Time.
> * Total number of tables/views loaded.
> * Total number of functions loaded.
> * Last Repl ID of the loaded database.{color}
> *+Incremental Dump:+*
> * At the start of database dump, will add one log with below details.
> {color:#59afe1}* Database Name
> * Dump Type (INCREMENTAL)
> * (Estimated) Total number of events to dump.
> * Dump Start Time{color}
> * After each event dump, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Event dump end time
> * Event dump progress. Format is Event sequence no/ (Estimated) Total number 
> of events.{color}
> * After completion of all event dumps, will add a log as follows.
> {color:#59afe1}* Database Name.
> * Dump Type (INCREMENTAL).
> * Dump End Time.
> * (Actual) Total number of events dumped.
> * Dump Directory.
> * Last Repl ID of the dump.{color}
> *Note:* The estimated number of events can be terribly inaccurate with actual 
> number as we don’t have the number of events upfront until we read from 
> metastore NotificationEvents table.
> *+Incremental Load:+*
> * At the start of incremental load, will add one log with below details.
> {color:#59afe1}* Target Database Name 
> * Dump directory
> * Load Type (INCREMENTAL)
> * Total number of events to load
> * Load Start Time{color}
> * After each event load, will add a log as follows
> {color:#59afe1}* Event ID
> * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc)
> * Target Table/View/Function/Name
> * Target Partition Name (in case of partition operations such as 
> ADD_PARTITION, DROP_PARTITION, ALTER_PARTITION etc. For other operations, it 
> will be “null")
> * Event load end time
> * Event load progress. Format is Event sequence no/ Total number of 
> events.{color}
> * After completion of all event loads, will add a log as follows to 
> consolidate the load.
> {color:#59afe1}* Target Database Name.
> * Load Type (INCREMENTAL).
> * Load End Time.
> * Total number of events loaded.
> * Last Repl ID of the loaded da

[jira] [Assigned] (HIVE-17196) CM: ReplCopyTask should retain the original file names even if copied from CM path.

2017-07-28 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan reassigned HIVE-17196:
---


> CM: ReplCopyTask should retain the original file names even if copied from CM 
> path.
> ---
>
> Key: HIVE-17196
> URL: https://issues.apache.org/jira/browse/HIVE-17196
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Fix For: 3.0.0
>
>
> Consider the below scenario,
> 1. Insert into table T1 with value(X).
> 2. Insert into table T1 with value(X).
> 3. Truncate the table T1. 
> – This step backs up 2 files with same content to cmroot which ends up with 
> one file in cmroot as checksum matches.
> 4. Incremental repl with above 3 operations.
> – In this step, both the insert event files will be read from cmroot where 
> copy of one leads to overwrite the other one as the file name is same in cm 
> path (checksum as file name).
> So, this leads to data loss and hence it is necessary to retain the original 
> file names even if we copy from cm path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"

2017-07-28 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17050:
--
  Resolution: Fixed
Target Version/s: 3.0.0
  Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks for the patch [~Yibing]!

> Multiline queries that have comment in middle fail when executed via "beeline 
> -e"
> -
>
> Key: HIVE-17050
> URL: https://issues.apache.org/jira/browse/HIVE-17050
> Project: Hive
>  Issue Type: Bug
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, 
> HIVE-17050.3.PATCH, HIVE-17050.4.patch
>
>
> After applying HIVE-13864, multiple line queries that have comment at the end 
> of one of the middle lines fail when executed via beeline -e
> {noformat}
> $ beeline -u "" -e "select 1, --test
> > 2"
> scan complete in 3ms
> ..
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:9 
> cannot recognize input near '' '' '' in selection target 
> (state=42000,code=4)
> Closing: 0: 
> jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17050) Multiline queries that have comment in middle fail when executed via "beeline -e"

2017-07-28 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-17050:
--
Fix Version/s: 3.0.0

> Multiline queries that have comment in middle fail when executed via "beeline 
> -e"
> -
>
> Key: HIVE-17050
> URL: https://issues.apache.org/jira/browse/HIVE-17050
> Project: Hive
>  Issue Type: Bug
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Fix For: 3.0.0
>
> Attachments: HIVE-17050.1.patch, HIVE-17050.2.patch, 
> HIVE-17050.3.PATCH, HIVE-17050.4.patch
>
>
> After applying HIVE-13864, multiple line queries that have comment at the end 
> of one of the middle lines fail when executed via beeline -e
> {noformat}
> $ beeline -u "" -e "select 1, --test
> > 2"
> scan complete in 3ms
> ..
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Error: Error while compiling statement: FAILED: ParseException line 1:9 
> cannot recognize input near '' '' '' in selection target 
> (state=42000,code=4)
> Closing: 0: 
> jdbc:hive2://host-10-17-80-194.coe.cloudera.com:1/default;principal=hive/host-10-17-80-194.coe.cloudera@yshi.com;ssl=true;sslTrustStore=/certs/hive/hive-keystore.jks;trustStorePassword=cloudera
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16982) WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration option

2017-07-28 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16982:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.
Thanks for the patch [~klcopp]!

> WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration 
> option
> --
>
> Key: HIVE-16982
> URL: https://issues.apache.org/jira/browse/HIVE-16982
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Labels: newbie, patch
> Fix For: 3.0.0
>
> Attachments: HIVE-16982.3.patch
>
>
> In the Hive WebUI / Drilldown: the Show Query tab always displays "UNKNOWN."
> If the user wants to see the query plan here, they should set configuration 
> hive.log.explain.output to true. The user should be made aware of this option:
> 1) in WebUI / Drilldown / Show Query and
> 2) in HiveConf.java, line 2232.
> This configuration's description reads:
> "Whether to log explain output for every query
> When enabled, will log EXPLAIN EXTENDED output for the query at INFO log4j 
> log level."
> this should be added:
> "...and in the WebUI / Show Query tab."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-28 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104700#comment-16104700
 ] 

liyunzhang_intel commented on HIVE-16948:
-

{quote}

Thinking more about this, I find a bug in combing equivalent works. If 2 map 
works contain same operators, but will be pruned by different DPP sinks, then 
they can't be combined. E.g.,
{quote}

CombineEquivalentWorkResolver.EquivalentWorkMatcher#compareCurrentOperator only 
compare the self property of operator. Will not compare the relationship with 
other operators. Suggest to have a configuration to enable/disable combine 
equivalent work so that if users disable combine equivalent to cross above 
issue.

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.2.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: min(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Reducer 11 
> Reduce Operator Tree:
>   Group By Operator
> aggregations: max(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: _col0 is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputC

[jira] [Commented] (HIVE-17188) ObjectStore runs out of memory for large batches of addPartitions().

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104729#comment-16104729
 ] 

Hive QA commented on HIVE-17188:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879241/HIVE-17188.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11007 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=242)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6166/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6166/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6166/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879241 - PreCommit-HIVE-Build

> ObjectStore runs out of memory for large batches of addPartitions().
> 
>
> Key: HIVE-17188
> URL: https://issues.apache.org/jira/browse/HIVE-17188
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17188.1.patch
>
>
> For large batches (e.g. hundreds) of {{addPartitions()}}, the {{ObjectStore}} 
> runs out of memory. Flushing the {{PersistenceManager}} alleviates the 
> problem.
> Note: The problem being addressed here isn't so much with the size of the 
> hundreds of Partition objects, but the cruft that builds with the 
> PersistenceManager, in the JDO layer, as confirmed through memory-profiling.
> (Raising this on behalf of [~cdrome] and [~thiruvel].)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16945) Add method to compare Operators

2017-07-28 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-16945:
--
Attachment: HIVE-16945.2.patch

Fix NPE

> Add method to compare Operators 
> 
>
> Key: HIVE-16945
> URL: https://issues.apache.org/jira/browse/HIVE-16945
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Rui Li
> Attachments: HIVE-16945.1.patch, HIVE-16945.2.patch
>
>
> HIVE-10844 introduced a comparator factory class for operators that 
> encapsulates all the logic to assess whether two operators are equal:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
> The current design might create problems as any change in fields of operators 
> will break the comparators. It would be better to do this via inheritance 
> from Operator base class, by adding a {{logicalEquals(Operator other)}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.

2017-07-28 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17195:

Attachment: HIVE-17195.01.patch

Added 01.patch with changes to replace recursive calls with while loops when 
traverse the DAGs.
Request [~daijy]/[~anishek]/[~thejas] to please review it!

> Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
> --
>
> Key: HIVE-17195
> URL: https://issues.apache.org/jira/browse/HIVE-17195
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Fix For: 3.0.0
>
> Attachments: HIVE-17195.01.patch
>
>
> Currently, long chain REPL LOAD tasks lead to huge recursive calls when try 
> to traverse the DAG.
> For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods 
> run recursively to traverse the DAG.
> Need to modify this traversal logic to reduce stack usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.

2017-07-28 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17195:

Status: Patch Available  (was: Open)

> Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
> --
>
> Key: HIVE-17195
> URL: https://issues.apache.org/jira/browse/HIVE-17195
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Fix For: 3.0.0
>
> Attachments: HIVE-17195.01.patch
>
>
> Currently, long chain REPL LOAD tasks lead to huge recursive calls when try 
> to traverse the DAG.
> For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods 
> run recursively to traverse the DAG.
> Need to modify this traversal logic to reduce stack usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.

2017-07-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104800#comment-16104800
 ] 

ASF GitHub Bot commented on HIVE-17195:
---

GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/212

HIVE-17195: Long chain of tasks created by REPL LOAD shouldn't cause stack 
corruption.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-17195

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #212


commit 64cc709e8591eaee1b22aaf0bb6144c33259e058
Author: Sankar Hariappan 
Date:   2017-07-28T10:53:43Z

HIVE-17195: Long chain of tasks created by REPL LOAD shouldn't cause stack 
corruption.




> Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
> --
>
> Key: HIVE-17195
> URL: https://issues.apache.org/jira/browse/HIVE-17195
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Fix For: 3.0.0
>
> Attachments: HIVE-17195.01.patch
>
>
> Currently, long chain REPL LOAD tasks lead to huge recursive calls when try 
> to traverse the DAG.
> For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods 
> run recursively to traverse the DAG.
> Need to modify this traversal logic to reduce stack usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.

2017-07-28 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17195:

Labels: DAG DR Executor replication  (was: )

> Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
> --
>
> Key: HIVE-17195
> URL: https://issues.apache.org/jira/browse/HIVE-17195
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DAG, DR, Executor, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17195.01.patch
>
>
> Currently, long chain REPL LOAD tasks lead to huge recursive calls when try 
> to traverse the DAG.
> For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods 
> run recursively to traverse the DAG.
> Need to modify this traversal logic to reduce stack usage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104801#comment-16104801
 ] 

Hive QA commented on HIVE-16965:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879285/HIVE-16965.8.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6167/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6167/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6167/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879285 - PreCommit-HIVE-Build

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16750) Support change management for rename table/partition.

2017-07-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104831#comment-16104831
 ] 

ASF GitHub Bot commented on HIVE-16750:
---

Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/199


> Support change management for rename table/partition.
> -
>
> Key: HIVE-16750
> URL: https://issues.apache.org/jira/browse/HIVE-16750
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16750.01.patch, HIVE-16750.02.patch, 
> HIVE-16750.03.patch
>
>
> Currently, rename table/partition updates the data location by renaming the 
> directory which is equivalent to moving files to new path and delete old 
> path. So, this should trigger move of files into $CMROOT.
> Scenario:
> 1. Create a table (T1)
> 2. Insert a record
> 3. Rename the table(T1 -> T2)
> 4. Repl Dump till Insert.
> 5. Repl Load from the dump.
> 6. Target DB should have table T1 with the record.
> Similar scenario with rename partition as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16901) Distcp optimization - One distcp per ReplCopyTask

2017-07-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104832#comment-16104832
 ] 

ASF GitHub Bot commented on HIVE-16901:
---

Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/200


> Distcp optimization - One distcp per ReplCopyTask 
> --
>
> Key: HIVE-16901
> URL: https://issues.apache.org/jira/browse/HIVE-16901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16901.01.patch, HIVE-16901.02.patch, 
> HIVE-16901.03.patch, HIVE-16901.04.patch
>
>
> Currently, if a ReplCopyTask is created to copy a list of files, then distcp 
> is invoked for each and every file. Instead, need to pass the list of source 
> files to be copied to distcp tool which basically copies the files in 
> parallel and hence gets lot of performance gain.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-28 Thread Teddy Choi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-12631:
--
Attachment: HIVE-12631.25.patch

Fixed a NullPointerException bug and null partition values bugs.

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, 
> HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, 
> HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.24.patch, 
> HIVE-12631.25.patch, HIVE-12631.2.patch, HIVE-12631.3.patch, 
> HIVE-12631.4.patch, HIVE-12631.5.patch, HIVE-12631.6.patch, 
> HIVE-12631.7.patch, HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104890#comment-16104890
 ] 

Hive QA commented on HIVE-17006:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879254/HIVE-17006.01.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 11014 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6168/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6168/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6168/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879254 - PreCommit-HIVE-Build

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.patch, 
> HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104919#comment-16104919
 ] 

Vlad Gudikov commented on HIVE-17148:
-

ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query strong 
text*select * from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);*strong 
text* we get to predicates for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104919#comment-16104919
 ] 

Vlad Gudikov edited comment on HIVE-17148 at 7/28/17 1:07 PM:
--

ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query *select * 
from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);* we get to predicates 
for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.



was (Author: allgoodok):
ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query strong 
text*select * from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);*strong 
text* we get to predicates for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BRO

[jira] [Comment Edited] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104919#comment-16104919
 ] 

Vlad Gudikov edited comment on HIVE-17148 at 7/28/17 1:08 PM:
--

ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
*test1(a1 int, a2 int)* and *test2(b1)*. When we execute following query 
*select * from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);* we get to 
predicates for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.



was (Author: allgoodok):
ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query *select * 
from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);* we get to predicates 
for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>

[jira] [Commented] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105001#comment-16105001
 ] 

Hive QA commented on HIVE-17169:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878839/HIVE-17169.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6169/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6169/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6169/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878839 - PreCommit-HIVE-Build

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS

2017-07-28 Thread Barna Zsombor Klara (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara resolved HIVE-17001.

Resolution: Won't Fix

> Insert overwrite table doesn't clean partition directory on HDFS if partition 
> is missing from HMS
> -
>
> Key: HIVE-17001
> URL: https://issues.apache.org/jira/browse/HIVE-17001
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new 
> data files.
> For a partitioned table we will clean any folder of existing partitions on 
> HDFS, however if the partition folder exists only on HDFS and the partition 
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT OVERWRITE TABLE test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-17057) Flaky test: TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema

2017-07-28 Thread Janaki Lahorani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani resolved HIVE-17057.

Resolution: Duplicate

> Flaky test: 
> TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema
> -
>
> Key: HIVE-17057
> URL: https://issues.apache.org/jira/browse/HIVE-17057
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janaki Lahorani
>Assignee: PRASHANT GOLASH
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17057) Flaky test: TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema

2017-07-28 Thread Janaki Lahorani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105090#comment-16105090
 ] 

Janaki Lahorani commented on HIVE-17057:


Thanks [~pgolash].

> Flaky test: 
> TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema
> -
>
> Key: HIVE-17057
> URL: https://issues.apache.org/jira/browse/HIVE-17057
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janaki Lahorani
>Assignee: PRASHANT GOLASH
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17008) DbNotificationListener should skip failed events

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105105#comment-16105105
 ] 

Hive QA commented on HIVE-17008:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879261/HIVE-17008.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6170/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6170/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6170/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879261 - PreCommit-HIVE-Build

> DbNotificationListener should skip failed events
> 
>
> Key: HIVE-17008
> URL: https://issues.apache.org/jira/browse/HIVE-17008
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-17008.0.patch, HIVE-17008.1.patch, 
> HIVE-17008.2.patch
>
>
> When dropping a non-existent database, the HMS will still fire registered 
> {{DROP_DATABASE}} event listeners.  This results in an NPE when the listeners 
> attempt to deref the {{null}} database parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread Janaki Lahorani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105115#comment-16105115
 ] 

Janaki Lahorani commented on HIVE-16759:


The test failures are not related to this patch.  The following are tracked as 
part of HIVE-15058.
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)

The following are tracked as part of HIVE-16908.
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-07-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105271#comment-16105271
 ] 

Sergio Peña commented on HIVE-16886:


[~anishek] [~thejas] While running some tests with duplicated events IDs in HMS 
HA mode, I see that the NL_ID is never duplicated and is always consecutive and 
in order. Do you know why we're not using this ID instead? Seems more 
consistent and better to use.

[~akolb] FYI

{noformat}
[hive1]> select NL_ID, EVENT_ID, EVENT_TIME, EVENT_TYPE, DB_NAME from 
NOTIFICATION_LOG where NL_ID >= 5431 and NL_ID <= 5440;
+---+--++-++
| NL_ID | EVENT_ID | EVENT_TIME | EVENT_TYPE  | DB_NAME 
   |
+---+--++-++
|  5431 | 5094 | 1501109698 | CREATE_DATABASE | 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5432 | 5097 | 1501109698 | CREATE_TABLE| 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5433 | 5098 | 1501109699 | ADD_PARTITION   | 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5434 | 5101 | 1501109791 | DROP_TABLE  | 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5435 | 5104 | 1501109792 | DROP_DATABASE   | 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5436 | 5096 | 1501109698 | CREATE_DATABASE | 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
|  5437 | 5097 | 1501109698 | CREATE_TABLE| 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
|  5438 | 5100 | 1501109699 | ADD_PARTITION   | 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
|  5439 | 5102 | 1501109791 | DROP_TABLE  | 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
|  5440 | 5105 | 1501109792 | DROP_DATABASE   | 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
+---+--++-++
{noformat}

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert

[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-07-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105274#comment-16105274
 ] 

Sergio Peña commented on HIVE-16886:


Btw, the getNextNotification() method fetches all notifications after an 
EVENT_ID > X, so if we already fetched EVENT_ID = 5098, then the 
getNextNotification won't fetch those events after 5098 that are less than 
5098. Seems we can get it better with NL_ID

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17153:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17153.1.patch, HIVE-17153.2.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-28 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105290#comment-16105290
 ] 

Sahil Takiar commented on HIVE-17153:
-

Thanks for the review [~lirui]. Merged this into master.

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17153.1.patch, HIVE-17153.2.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17153:

Fix Version/s: 3.0.0

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-17153.1.patch, HIVE-17153.2.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17131:

Fix Version/s: 2.4.0

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-17131.1.branch-2.patch, HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17087:

Fix Version/s: 3.0.0

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, 
> HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1, partitioned_table2 where 
> partitioned_table1.part_col = partitioned_table2.part_col;
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 2
>   Stage: Stage-3
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table2
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   keys:
> 0 _col1 (type: int)
> 1 _co

[jira] [Commented] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105293#comment-16105293
 ] 

Hive QA commented on HIVE-17189:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879263/HIVE-17189.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11007 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=242)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6171/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6171/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6171/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879263 - PreCommit-HIVE-Build

> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17189.01.patch
>
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105294#comment-16105294
 ] 

Vihang Karajgaonkar commented on HIVE-16759:


Pushed to master. Thanks for your contribution [~janulatha]

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16759:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105304#comment-16105304
 ] 

Sergio Peña commented on HIVE-16759:


[~vihangk1] could you commit this to branch-2 as well?

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17129) Increase usage of InterfaceAudience and InterfaceStability annotations

2017-07-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105311#comment-16105311
 ] 

Sergio Peña commented on HIVE-17129:


{{MetaStoreEventListener}} is used by other components, so this must be public. 
Regarding {{ListenerEvent}}, I don't know. It is not used directly, but is used 
by inherited objects that {{MetaStoreEventListener}} uses, such as 
{{CreateTableEvent}}.

How does this work? If you mark {{ListenerEvent}} as private, then can users 
use {{CreateTableEvent}} for instance?

> Increase usage of InterfaceAudience and InterfaceStability annotations 
> ---
>
> Key: HIVE-17129
> URL: https://issues.apache.org/jira/browse/HIVE-17129
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{InterfaceAudience}} and {{InterfaceStability}} annotations were added a 
> while ago to mark certain classes as available for public use. However, they 
> were only added to a few classes. The annotations are largely missing for 
> major APIs such as the SerDe and UDF APIs. We should update these interfaces 
> to use these annotations.
> When done in conjunction with HIVE-17130, we should have an automated way to 
> prevent backwards incompatible changes to Hive APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17184) Unexpected new line in beeline output when running with -f option

2017-07-28 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17184:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master. Thanks for the review [~pvary]

> Unexpected new line in beeline output when running with -f option
> -
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105319#comment-16105319
 ] 

Vihang Karajgaonkar commented on HIVE-16759:


The patch doesn't apply cleanly on branch-2. There are some conflicts. Hi 
[~janulatha] Can you please provide a patch for branch-2 as well? Thanks!

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16974) Change the sort key for the schema tool validator to be

2017-07-28 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105332#comment-16105332
 ] 

Naveen Gangam commented on HIVE-16974:
--

Thanks for the suggestion [~aihuaxu]. I have tested out the fix with {{order by 
NAME, ID}}.
We are back to the problem we started with, which is having nulls first on 
certain DBs vs Nulls last on certain DBs.
on mysql
{code}
SD_ID in TBLS should not be NULL for Table Name=null, Table ID=101, Table 
Type=EXTERNAL_TABLE
SD_ID in TBLS should not be NULL for Table Name=table1, Table ID=100, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=table2, Table ID=106, Table 
Type=EXTERNAL_TABLE
SD_ID in TBLS should not be NULL for Table Name=table3, Table ID=102, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=table3, Table ID=107, Table 
Type=MANAGED_TABLE
{code}
or others
{code}
SD_ID in TBLS should not be NULL for Table Name=table1, Table ID=100, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=table2, Table ID=106, Table 
Type=EXTERNAL_TABLE
SD_ID in TBLS should not be NULL for Table Name=table3, Table ID=102, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=table3, Table ID=107, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=null, Table ID=101, Table 
Type=EXTERNAL_TABLE
{code}

The other option is to change the ordering to {{order by ID, NAME}} which is 
pretty similar to the output with just {{order by ID}} for search purposes. 

In both cases, we still print out the NAME value of the entity so I do not 
think it is much of a value add to add the second column for ordering.

Hope this helps. Thanks

> Change the sort key for the schema tool validator to be 
> 
>
> Key: HIVE-16974
> URL: https://issues.apache.org/jira/browse/HIVE-16974
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-16974.patch, HIVE-16974.patch
>
>
> In HIVE-16729, we introduced ordering of results/failures returned by 
> schematool's validators. This allows fault injection testing to expect 
> results that can be verified. However, they were sorted on NAME values which 
> in the HMS schema can be NULL. So if the introduced fault has a NULL/BLANK 
> name column value, the result could be different depending on the backend 
> database(if they sort NULLs first or last).
> So I think it is better to sort on a non-null column value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17201:
---


> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17184) Unexpected new line in beeline output when running with -f option

2017-07-28 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17184:
---
Fix Version/s: 2.4.0
   3.0.0

pushed to branch-2 as well.

> Unexpected new line in beeline output when running with -f option
> -
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-28 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105355#comment-16105355
 ] 

Vihang Karajgaonkar commented on HIVE-17189:


[~ashutoshc] [~pxiong] Can you please take a look? Thanks!

> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17189.01.patch
>
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16974) Change the sort key for the schema tool validator to be

2017-07-28 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105354#comment-16105354
 ] 

Aihua Xu commented on HIVE-16974:
-

Thanks for explanation. I think it's already a good improvement now.

+1.

> Change the sort key for the schema tool validator to be 
> 
>
> Key: HIVE-16974
> URL: https://issues.apache.org/jira/browse/HIVE-16974
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-16974.patch, HIVE-16974.patch
>
>
> In HIVE-16729, we introduced ordering of results/failures returned by 
> schematool's validators. This allows fault injection testing to expect 
> results that can be verified. However, they were sorted on NAME values which 
> in the HMS schema can be NULL. So if the introduced fault has a NULL/BLANK 
> name column value, the result could be different depending on the backend 
> database(if they sort NULLs first or last).
> So I think it is better to sort on a non-null column value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17129) Increase usage of InterfaceAudience and InterfaceStability annotations

2017-07-28 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105359#comment-16105359
 ] 

Sahil Takiar commented on HIVE-17129:
-

Technically users can use whatever Hive classes they want, as long as the class 
declaration is public, e.g. {{public class CreateTableEvent}}. The annotations 
don't actually stop users from using a class, they are just there to inform 
users what classes they *should* (or shouldn't) be using.

If {{MetaStoreEventListener}} should be public, then I suggest we make 
{{ListenerEvent}} and all classes used by {{MetaStoreEventListener}} public 
too. Generally, if an interface should be marked as Public, then all classes 
used by the interface should also be Public. For example, if we make 
{{MetaStoreEventListener}} then {{ConfigChangeEvent}}, {{CreateTableEvent}}, 
{{DropTableEvent}}, etc. should be Public too.

> Increase usage of InterfaceAudience and InterfaceStability annotations 
> ---
>
> Key: HIVE-17129
> URL: https://issues.apache.org/jira/browse/HIVE-17129
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{InterfaceAudience}} and {{InterfaceStability}} annotations were added a 
> while ago to mark certain classes as available for public use. However, they 
> were only added to a few classes. The annotations are largely missing for 
> major APIs such as the SerDe and UDF APIs. We should update these interfaces 
> to use these annotations.
> When done in conjunction with HIVE-17130, we should have an automated way to 
> prevent backwards incompatible changes to Hive APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17202) Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17202:
---


> Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs
> --
>
> Key: HIVE-17202
> URL: https://issues.apache.org/jira/browse/HIVE-17202
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17203) Add InterfaceAudience and InterfaceStability annotations for HCat APIs

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17203:
---


> Add InterfaceAudience and InterfaceStability annotations for HCat APIs
> --
>
> Key: HIVE-17203
> URL: https://issues.apache.org/jira/browse/HIVE-17203
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17201:

Attachment: HIVE-17201.1.patch

The following should clean up the build, for now.

> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17201.1.patch
>
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17201:

Status: Patch Available  (was: Open)

Submitting, to check the tests.

> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17201.1.patch
>
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105373#comment-16105373
 ] 

Mithun Radhakrishnan commented on HIVE-16908:
-

[~sbeeram]: I have raised HIVE-17201 to temporarily disable these three tests, 
to clean up the tests for others. This is only temporary, until we fix the 
tests properly.

bq. If the target metastore instance were accessed through a different 
classloader...
I made an initial pass at doing this. I don't have a proper solution yet. Will 
update.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-28 Thread Janaki Lahorani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16998:
---
Attachment: HIVE16998.5.patch

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-28 Thread Janaki Lahorani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105383#comment-16105383
 ] 

Janaki Lahorani commented on HIVE-16998:


Resolved merge conflicts with HIVE-17087.  Addressed comments from [~stakiar]

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-07-28 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-07-28 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Attachment: HIVE-16811.3.patch

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-07-28 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Patch Available  (was: Open)

Re-uploading same patch after renaming. Hopefully ptests will run this time.

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105399#comment-16105399
 ] 

Zoltan Haindrich commented on HIVE-17201:
-

I feel that disabling unit tests is a bad ideait's not the tests which have 
been broken - I perfectly understand that the code have used these things 
incorrectly...but in the earlier state it was working...
Because these failures mark that the current code does not perform as expected 
- I think we should instead consider reverting the original change (HIVE-16844)

> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17201.1.patch
>
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-28 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105400#comment-16105400
 ] 

Sahil Takiar commented on HIVE-16998:
-

Final patch LGTM pending results from Hive QA.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-28 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105429#comment-16105429
 ] 

Prasanth Jayachandran commented on HIVE-16077:
--

+1. Looks good to me. Only comment is the one you had already mentioned in the 
patch related to closing on abort. Will be good to add that too. 

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.08.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105428#comment-16105428
 ] 

Hive QA commented on HIVE-16965:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879285/HIVE-16965.8.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6172/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6172/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6172/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879285 - PreCommit-HIVE-Build

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17138) FileSinkOperator doesn't create empty files for acid path

2017-07-28 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17138:
--
Description: 
For bucketed tables, FileSinkOperator is expected (in some cases)  to produce a 
specific number of files even if they are empty.
FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty.

This doesn't property work for Acid path.  For Insert, the OrcRecordUpdater(s) 
is set up in createBucketForFileIdx() which creates the actual bucketN file (as 
of HIVE-14007, it does it regardless of whether RecordUpdater sees any rows).  
This causes empty (i.e.ORC metadata only) bucket files to be created for 
multiFileSpray=true if a particular FileSinkOperator.process() sees at least 1 
row.  For example,
{noformat}
create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored 
as orc TBLPROPERTIES ('transactional'='true');
insert into fourbuckets values(0,1),(1,1);
with mapreduce.job.reduces = 1 or 2 
{noformat}

For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that 
needs to land there is seen.  Thus it never creates empty buckets no mater what 
the value of _skipFiles_ in closeOp(boolean).

Once Split Update does the split early (in operator pipeline) only the Insert 
path will matter since base and delta are the only files split computation, etc 
looks at.  delete_delta is only for Acid internals so there is never any reason 
for create empty files there.


Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()

  was:
For bucketed tables, FileSinkOperator is expected (in some cases)  to produce a 
specific number of files even if they are empty.
FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty.

This doesn't property work for Acid path.  For Insert, the OrcRecordUpdater(s) 
is set up in createBucketForFileIdx() which creates the actual bucketN file (as 
of HIVE-14007, it does it regardless of whether RecordUpdater sees any rows).  
This causes empty (i.e.ORC metadata only) bucket files to be created for 
multiFileSpray=true if a particular FileSinkOperator.process() sees at least 1 
row.  For example,
{noformat}
create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored 
as orc TBLPROPERTIES ('transactional'='true');
insert into fourbuckets values(0,1),(1,1);
with mapreduce.job.reduces = 1 or 2 
{noformat}

For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that 
needs to land there is seen.  Thus it never creates empty buckets no mater what 
the value of _skipFiles_ in closeOp(boolean).

Once Split Update does the split early (in operator pipeline) only the Insert 
path will matter since base and delta are the only files split computation, etc 
looks at.  delete_delta is only for Acid internals so there is never any reason 
for create empty files there.



> FileSinkOperator doesn't create empty files for acid path
> -
>
> Key: HIVE-17138
> URL: https://issues.apache.org/jira/browse/HIVE-17138
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> For bucketed tables, FileSinkOperator is expected (in some cases)  to produce 
> a specific number of files even if they are empty.
> FileSinkOperator.closeOp(boolean abort) has logic to create files even if 
> empty.
> This doesn't property work for Acid path.  For Insert, the 
> OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the 
> actual bucketN file (as of HIVE-14007, it does it regardless of whether 
> RecordUpdater sees any rows).  This causes empty (i.e.ORC metadata only) 
> bucket files to be created for multiFileSpray=true if a particular 
> FileSinkOperator.process() sees at least 1 row.  For example,
> {noformat}
> create table fourbuckets (a int, b int) clustered by (a) into 4 buckets 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into fourbuckets values(0,1),(1,1);
> with mapreduce.job.reduces = 1 or 2 
> {noformat}
> For Update/Delete path, OrcRecordWriter is created lazily when the 1st row 
> that needs to land there is seen.  Thus it never creates empty buckets no 
> mater what the value of _skipFiles_ in closeOp(boolean).
> Once Split Update does the split early (in operator pipeline) only the Insert 
> path will matter since base and delta are the only files split computation, 
> etc looks at.  delete_delta is only for Acid internals so there is never any 
> reason for create empty files there.
> Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17179:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the review [~aihuaxu]. Merged to master.

> Add InterfaceAudience and InterfaceStability annotations for Hook APIs
> --
>
> Key: HIVE-17179
> URL: https://issues.apache.org/jira/browse/HIVE-17179
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hooks
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17179.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17203) Add InterfaceAudience and InterfaceStability annotations for HCat APIs

2017-07-28 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105476#comment-16105476
 ] 

Sahil Takiar commented on HIVE-17203:
-

Right now I'm thinking of marking as the following classes as Public:

* Most of the classes under {{org.apache.hive.hcatalog.api}}
* Maybe a few more classes under the hive-hcatalog-core package

> Add InterfaceAudience and InterfaceStability annotations for HCat APIs
> --
>
> Key: HIVE-17203
> URL: https://issues.apache.org/jira/browse/HIVE-17203
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17190) Don't store bitvectors for unpartitioned table

2017-07-28 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105492#comment-16105492
 ] 

Ashutosh Chauhan commented on HIVE-17190:
-

Yes, it will help with auto-gather column stats where we collect stats for 
newly inserted data. We can update bit vectors for those scenarios to get a 
better ndv estimate.
So, yes we shall store bitvectors for unpartitioned tables too. Currently, some 
of upgrade scripts don't cover that. Will use this jira for that and change 
description accordingly.

> Don't store bitvectors for unpartitioned table
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.patch
>
>
> Since current ones can't be intersected, there is no advantage of storing 
> them for unpartitioned tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool

2017-07-28 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105495#comment-16105495
 ] 

Vihang Karajgaonkar commented on HIVE-17167:


Hi [~alangates] Thanks for the patch. Quick question regarding the patch: Looks 
like it is possible to store two different values for MetaConf.Confvars.varname 
and MetaConf. Confvars.hivename. For example: user could do the following:

{noformat}
MetaConf metaConf = newMetaConf();
metaConf.set("metastore.myconfig.name", "X");
metaConf.set("hive.metastore.myconfig.name", "Y");
{noformat}

In this case will MetaConf.get(metaConf, "metastore.myconfig.name") and 
MetaConf.get(metaConf, "hive.metastore.myconfig.name") return two different 
values? Shouldn't the set call check if a corresponding equivalent key is set 
as well and if yes, overwrite it as well?


> Create metastore specific configuration tool
> 
>
> Key: HIVE-17167
> URL: https://issues.apache.org/jira/browse/HIVE-17167
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17167.patch
>
>
> As part of making the metastore a separately releasable module we need 
> configuration tools that are specific to that module.  It cannot use or 
> extend HiveConf as that is in hive common.  But it must take a HiveConf 
> object and be able to operate on it.
> The best way to achieve this is using Hadoop's Configuration object (which 
> HiveConf extends) together with enums and static methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly

2017-07-28 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105523#comment-16105523
 ] 

Sahil Takiar commented on HIVE-16357:
-

I think [~pvary] brings up a valid question. Why do we fire events even if the 
operation failed? [~spena] any ideas?

It seems {{NotificationListener}} always checks the status of a given event 
before firing a notification for it.

> Failed folder creation when creating a new table is reported incorrectly
> 
>
> Key: HIVE-16357
> URL: https://issues.apache.org/jira/browse/HIVE-16357
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16357.01.patch, HIVE-16357.02.patch
>
>
> If the directory for a Hive table could not be created, them the HMS will 
> throw a metaexception:
> {code}
>  if (tblPath != null) {
>   if (!wh.isDir(tblPath)) {
> if (!wh.mkdirs(tblPath, true)) {
>   throw new MetaException(tblPath
>   + " is not a directory or unable to create one");
> }
> madeDir = true;
>   }
> }
> {code}
> However in the finally block we always try to call the 
> DbNotificationListener, which in turn will also throw an exception because 
> the directory is missing, overwriting the initial exception with a 
> FileNotFoundException.
> Actual stacktrace seen by the caller:
> {code}
> 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: 
> MetaException(message:java.lang.RuntimeException: 
> java.io.FileNotFoundException: File file:/.../0 does not exist)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File 
> file:/.../0 does not exist
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482)
>   ... 20 more
> Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
>

[jira] [Commented] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly

2017-07-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105539#comment-16105539
 ] 

Sergio Peña commented on HIVE-16357:


I'm not sure why either. I don't think we should trigger events with failed 
operations, but the code was already there. Perhaps we could fix this and 
trigger the events only when it succeeds ?

> Failed folder creation when creating a new table is reported incorrectly
> 
>
> Key: HIVE-16357
> URL: https://issues.apache.org/jira/browse/HIVE-16357
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16357.01.patch, HIVE-16357.02.patch
>
>
> If the directory for a Hive table could not be created, them the HMS will 
> throw a metaexception:
> {code}
>  if (tblPath != null) {
>   if (!wh.isDir(tblPath)) {
> if (!wh.mkdirs(tblPath, true)) {
>   throw new MetaException(tblPath
>   + " is not a directory or unable to create one");
> }
> madeDir = true;
>   }
> }
> {code}
> However in the finally block we always try to call the 
> DbNotificationListener, which in turn will also throw an exception because 
> the directory is missing, overwriting the initial exception with a 
> FileNotFoundException.
> Actual stacktrace seen by the caller:
> {code}
> 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: 
> MetaException(message:java.lang.RuntimeException: 
> java.io.FileNotFoundException: File file:/.../0 does not exist)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File 
> file:/.../0 does not exist
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482)
>   ... 20 more
> Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
>   at org.apache.hadoop.fs.FileSystem.lis

[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-28 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105538#comment-16105538
 ] 

slim bouguerra commented on HIVE-17160:
---

[~sseth] the old path was not working, so i guess there is not need to add any 
new method.

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17129) Increase usage of InterfaceAudience and InterfaceStability annotations

2017-07-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105541#comment-16105541
 ] 

Sergio Peña commented on HIVE-17129:


Got it.
+1

> Increase usage of InterfaceAudience and InterfaceStability annotations 
> ---
>
> Key: HIVE-17129
> URL: https://issues.apache.org/jira/browse/HIVE-17129
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{InterfaceAudience}} and {{InterfaceStability}} annotations were added a 
> while ago to mark certain classes as available for public use. However, they 
> were only added to a few classes. The annotations are largely missing for 
> major APIs such as the SerDe and UDF APIs. We should update these interfaces 
> to use these annotations.
> When done in conjunction with HIVE-17130, we should have an automated way to 
> prevent backwards incompatible changes to Hive APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105554#comment-16105554
 ] 

Hive QA commented on HIVE-17139:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879301/HIVE-17139.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_coalesce_3]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets_grouping]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6173/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6173/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6173/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879301 - PreCommit-HIVE-Build

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17192) Add InterfaceAudience and InterfaceStability annotations for Stats Collection APIs

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17192:

Attachment: HIVE-17192.1.patch

> Add InterfaceAudience and InterfaceStability annotations for Stats Collection 
> APIs
> --
>
> Key: HIVE-17192
> URL: https://issues.apache.org/jira/browse/HIVE-17192
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17192.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17192) Add InterfaceAudience and InterfaceStability annotations for Stats Collection APIs

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17192:

Status: Patch Available  (was: Open)

> Add InterfaceAudience and InterfaceStability annotations for Stats Collection 
> APIs
> --
>
> Key: HIVE-17192
> URL: https://issues.apache.org/jira/browse/HIVE-17192
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17192.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-28 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105597#comment-16105597
 ] 

Alan Gates commented on HIVE-17189:
---

Adding a test that exercises the "new" code paths would be good, especially for 
the alter_table call.

Other than that, +1.

> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17189.01.patch
>
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-4362) Allow Hive unit tests to run against fully-distributed cluster

2017-07-28 Thread Mark Grover (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Grover reassigned HIVE-4362:
-

Assignee: (was: Mark Grover)

> Allow Hive unit tests to run against fully-distributed cluster
> --
>
> Key: HIVE-4362
> URL: https://issues.apache.org/jira/browse/HIVE-4362
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 0.10.0
>Reporter: Mark Grover
>
> It seems like Hive unit tests can run in (Hadoop) local mode or miniMR mode. 
> It would be nice (especially for projects like Apache Bigtop) to be able to 
> run Hive tests in fully distributed mode.
> This JIRA tracks the introduction of such functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Summary: Schema changes for bitvectors for unpartitioned tables  (was: 
Don't store bitvectors for unpartitioned table)

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.patch
>
>
> Since current ones can't be intersected, there is no advantage of storing 
> them for unpartitioned tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Description: Missed in HIVE-16997  (was: Since current ones can't be 
intersected, there is no advantage of storing them for unpartitioned tables.)

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.patch
>
>
> Missed in HIVE-16997



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Attachment: HIVE-17190.2.patch

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.2.patch
>
>
> Missed in HIVE-16997



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Status: Patch Available  (was: Open)

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.2.patch
>
>
> Missed in HIVE-16997



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17202) Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17202:

Attachment: HIVE-17202.1.patch

> Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs
> --
>
> Key: HIVE-17202
> URL: https://issues.apache.org/jira/browse/HIVE-17202
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17202.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Attachment: (was: HIVE-17190.patch)

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.2.patch
>
>
> Missed in HIVE-16997



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17202) Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs

2017-07-28 Thread Sahil Takiar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17202:

Status: Patch Available  (was: Open)

> Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs
> --
>
> Key: HIVE-17202
> URL: https://issues.apache.org/jira/browse/HIVE-17202
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17202.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool

2017-07-28 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105613#comment-16105613
 ] 

Alan Gates commented on HIVE-17167:
---

Sort of, but not quite like that.  The constructor to MetastoreConf is private 
so that an instance of this object can never be instantiated.  The intent is to 
use Configuration instead, since this allows passing in a HiveConf and it will 
"just work".  So doing:
{code}
Configuration conf = MetastoreConf.newMetastoreConf();
MetastoreConf.setVar(conf, ConfVars.X, "val");
{code}
will work and avoid the issue you are concerned about.

The loophole is that a user could do:
{code}
Configuration conf = MetastoreConf.newMetastoreConf();
conf.set("metastore.myconfig.name", "x");
conf.set("hive.metastore.myconfig.name", "y");
{code}

So the rule is, as long as the users use the MetastoreConf methods, all is 
good.  If not, things can go sideways on them.  For clients this should be ok 
as they should only be interacting with the config by calling the setMetaConf 
methods in IMetaStoreClient, which will do the right thing.  For hook writers 
and Hive developers, they will have to be aware of this subtlety if they want 
to set metastore configuration variables.  That is unfortunate.

If MetastoreConf subclasses Configuration (as HiveConf does) then it will not 
be able to operate on a HiveConf object as is.  It would have to construct a 
new instance of Configuration from HiveConf, which is expensive.  It seems to 
me better to optimize for interoperability.

> Create metastore specific configuration tool
> 
>
> Key: HIVE-17167
> URL: https://issues.apache.org/jira/browse/HIVE-17167
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17167.patch
>
>
> As part of making the metastore a separately releasable module we need 
> configuration tools that are specific to that module.  It cannot use or 
> extend HiveConf as that is in hive common.  But it must take a HiveConf 
> object and be able to operate on it.
> The best way to achieve this is using Hadoop's Configuration object (which 
> HiveConf extends) together with enums and static methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17191) Add InterfaceAudience and InterfaceStability annotations for StorageHandler APIs

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105676#comment-16105676
 ] 

Hive QA commented on HIVE-17191:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879304/HIVE-17191.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6174/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6174/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6174/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879304 - PreCommit-HIVE-Build

> Add InterfaceAudience and InterfaceStability annotations for StorageHandler 
> APIs
> 
>
> Key: HIVE-17191
> URL: https://issues.apache.org/jira/browse/HIVE-17191
> Project: Hive
>  Issue Type: Sub-task
>  Components: StorageHandler
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17191.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-28 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16965:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17174) LLAP: ShuffleHandler: optimize fadvise calls for broadcast edge

2017-07-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105803#comment-16105803
 ] 

Hive QA commented on HIVE-17174:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879309/HIVE-17174.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6175/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6175/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6175/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879309 - PreCommit-HIVE-Build

> LLAP: ShuffleHandler: optimize fadvise calls for broadcast edge
> ---
>
> Key: HIVE-17174
> URL: https://issues.apache.org/jira/browse/HIVE-17174
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-17174.1.patch, HIVE-17174.2.patch
>
>
> Currently, once the data is transferred `fadvise` call is invoked to throw 
> away the pages. This may not be very helpful in broadcast, as it would tend 
> to transfer the same data to multiple downstream tasks. 
> e.g Q50 at 1 TB scale
> {noformat}
>   Edges:
> Map 1 <- Map 5 (BROADCAST_EDGE)
> Map 6 <- Reducer 2 (BROADCAST_EDGE), Reducer 3 (BROADCAST_EDGE), 
> Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 4 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 7 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 10 (BROADCAST_EDGE), Map 
> 11 (BROADCAST_EDGE), Map 6 (CUSTOM_SIMPLE_EDGE)
> Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
> Reducer 9 <- Reducer 8 (SIMPLE_EDGE)
> Status: Running (Executing on YARN cluster with App id 
> application_1490656001509_6084)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 5 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED 11 1100  
>  0   0
> Reducer 4 ..  llap SUCCEEDED  1  100  
>  0   0
> Reducer 2 ..  llap SUCCEEDED  1  100  
>  0   0
> Reducer 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 6 ..  llap SUCCEEDED13913900  
>  0   0
> Map 10 .  llap SUCCEEDED  1  100  
>  0   0
> Map 11 .  llap SUCCEEDED  1  100  
>  0   0
> Reducer 7 ..  llap SUCCEEDED83483400  
>  0   0
> Reducer 8 ..  llap SUCCEEDED 24 2400  
>  0   0
> Reducer 9 ..  llap SUCCEEDED  1  100  
>  0   0
> --
> e.g count of evictions on files
> 139 
> /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/appl

[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification

2017-07-28 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105861#comment-16105861
 ] 

Vaibhav Gumashta commented on HIVE-13989:
-

[~cdrome] Thanks for the patch. I have a couple of questions on the overall 
approach (doc I'm using for reference: 
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control#permissions-on-new-files-and-folders).
1. It appears for child directories, HDFS should correctly transfer the default 
ACLs. However, I understand that in Hive we want to avoid the HDFS permissions 
umasking (the traditional file permissions and not ACLs). Would it make sense 
to first let HDFS create the child directory (so that it transfers the 
default/access ACLs) and then set the desired permissions?
2. This comment will be relevant if we decide to manage ACL transfer from 
parent to child: referring the above doc, it seems when transferring access 
ACLs, the rwx on other should be removed if it exist. We might need to consider 
that in the code.



> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, 
> HIVE-13989-branch-1.patch, HIVE-13989-branch-2.2.patch, 
> HIVE-13989-branch-2.2.patch, HIVE-13989-branch-2.2.patch
>
>
> Hive takes two approaches to working with extended ACLs depending on whether 
> data is being produced via a Hive query or HCatalog APIs. A Hive query will 
> run an FsShell command to recursively set the extended ACLs for a directory 
> sub-tree. HCatalog APIs will attempt to build up the directory sub-tree 
> programmatically and runs some code to set the ACLs to match the parent 
> directory.
> Some incorrect assumptions were made when implementing the extended ACLs 
> support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the 
> design documents of extended ACLs in HDFS. These documents model the 
> implementation after the POSIX implementation on Linux, which can be found at 
> http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html.
> The code for setting extended ACLs via HCatalog APIs is found in 
> HdfsUtils.java:
> {code}
> if (aclEnabled) {
>   aclStatus =  sourceStatus.getAclStatus();
>   if (aclStatus != null) {
> LOG.trace(aclStatus.toString());
> aclEntries = aclStatus.getEntries();
> removeBaseAclEntries(aclEntries);
> //the ACL api's also expect the tradition user/group/other permission 
> in the form of ACL
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, 
> sourcePerm.getUserAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, 
> sourcePerm.getGroupAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, 
> sourcePerm.getOtherAction()));
>   }
> }
> {code}
> We found that DEFAULT extended ACL rules were not being inherited properly by 
> the directory sub-tree, so the above code is incomplete because it 
> effectively drops the DEFAULT rules. The second problem is with the call to 
> {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended 
> ACLs. When extended ACLs are used the GROUP permission is replaced with the 
> extended ACL mask. So the above code will apply the wrong permissions to the 
> GROUP. Instead the correct GROUP permissions now need to be pulled from the 
> AclEntry as returned by {{getAclStatus().getEntries()}}. See the 
> implementation of the new method {{getDefaultAclEntries}} for details.
> Similar issues exist with the HCatalog API. None of the API accounts for 
> setting extended ACLs on the directory sub-tree. The changes to the HCatalog 
> API allow the extended ACLs to be passed into the required methods similar to 
> how basic permissions are passed in. When building the directory sub-tree the 
> extended ACLs of the table directory are inherited by all sub-directories, 
> including the DEFAULT rules.
> Replicating the problem:
> Create a table to write data into (I will use acl_test as the destination and 
> words_text as the source) and set the ACLs as follows:
> {noformat}
> $ hdfs dfs -setfacl -m 
> default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx
>  /user/cdrome/hive/acl_test
> $ hdfs dfs -ls -d /user/cdrome/hive/acl_test
> drwxrwx---+  - cdrome hdfs  0 2016-07-13 20:36 
> /user/cdrome/hive/acl_test
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: c

1 2 >

1 - 100 of 130 matches

Mail list logo