[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747588#comment-15747588
 ] 

Hive QA commented on HIVE-13278:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843161/HIVE-13278.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10784 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2570/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2570/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2570/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843161 - PreCommit-HIVE-Build

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlo

[jira] [Updated] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15192:
---
Status: Patch Available  (was: Open)

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15192.2.patch, HIVE-15192.3.patch, 
> HIVE-15192.4.patch, HIVE-15192.5.patch, HIVE-15192.6.patch, 
> HIVE-15192.7.patch, HIVE-15192.8.patch, HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15192:
---
Status: Open  (was: Patch Available)

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15192.2.patch, HIVE-15192.3.patch, 
> HIVE-15192.4.patch, HIVE-15192.5.patch, HIVE-15192.6.patch, 
> HIVE-15192.7.patch, HIVE-15192.8.patch, HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15192:
---
Attachment: HIVE-15192.9.patch

Addressed review comments

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15192.2.patch, HIVE-15192.3.patch, 
> HIVE-15192.4.patch, HIVE-15192.5.patch, HIVE-15192.6.patch, 
> HIVE-15192.7.patch, HIVE-15192.8.patch, HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747467#comment-15747467
 ] 

Hive QA commented on HIVE-13278:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843156/HIVE-13278.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10814 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite 
(batchId=186)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2569/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2569/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2569/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843156 - PreCommit-HIVE-Build

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> 

[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Attachment: HIVE-13278.2.patch

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Attachment: (was: HIVE-13278.2.patch)

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747436#comment-15747436
 ] 

Rui Li commented on HIVE-13278:
---

[~csun], I think the error is found in container's log. SparkPlanGenerator runs 
in AM in yarn-cluster mode. If we find the error in any container other than 
AM, it means we somehow call the method on task side. I think Spark may call it 
too, e.g. in {{PairRDDFunctions::saveAsHadoopDataset}}.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747409#comment-15747409
 ] 

Chao Sun commented on HIVE-13278:
-

Thanks [~stakiar] :)

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Attachment: HIVE-13278.2.patch

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Attachment: (was: HIVE-13278.2.patch)

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15297) Hive should not split semicolon within quoted string literals

2016-12-13 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747394#comment-15747394
 ] 

Pengcheng Xiong commented on HIVE-15297:


[~ashutoshc]. The query string is set in Driver.java L532 
{code}
conf.setQueryString(queryStr);
{code}
Later, PreExecutePrinter prints out the query. The original code in master is 
to change all ";" to "\;" in comments. It seems that, we could not find a 
better way than the way proposed in the patch.

> Hive should not split semicolon within quoted string literals
> -
>
> Key: HIVE-15297
> URL: https://issues.apache.org/jira/browse/HIVE-15297
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15297.01.patch, HIVE-15297.02.patch, 
> HIVE-15297.03.patch
>
>
> String literals in query cannot have reserved symbols. The same set of query 
> works fine in mysql and postgresql. 
> {code}
> hive> CREATE TABLE ts(s varchar(550));
> OK
> Time taken: 0.075 seconds
> hive> INSERT INTO ts VALUES ('Mozilla/5.0 (iPhone; CPU iPhone OS 5_0');
> MismatchedTokenException(14!=326)
>   at 
> org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
>   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:7271)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:7370)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:7510)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:51854)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:45432)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:44578)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:8)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1694)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1176)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1288)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:31 mismatched input '/' expecting ) near 
> 'Mozilla' in value row constructor
> hive>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15425) Eliminate conflicting output from schematool's table validator.

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747380#comment-15747380
 ] 

Hive QA commented on HIVE-15425:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843152/HIVE-15425.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10814 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2568/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2568/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2568/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843152 - PreCommit-HIVE-Build

> Eliminate conflicting output from schematool's table validator.
> ---
>
> Key: HIVE-15425
> URL: https://issues.apache.org/jira/browse/HIVE-15425
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15425.patch
>
>
> Running the schemaTool's validate command against Derby DB yields in the 
> following output.
> {code}
> Validating tables in the schema for version 2.2.0
> Expected (from schema definition) 57 tables, Found (from HMS metastore) 58 
> tables
> Schema table validation successful
> {code}
> The output above creates some confusion when there are extra tables (not part 
> of hive schema) in the database. The intention was to report the total tables 
> found and did not expect the schema namespace to contain additional tables. 
> Even as the validation is successful, the output is confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747376#comment-15747376
 ] 

Sahil Takiar commented on HIVE-13278:
-

[~csun], [~xuefuz] yes feel free to move this patch forward. Thanks.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Attachment: HIVE-13278.2.patch

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HIVE-13278:
---

Assignee: Chao Sun  (was: Sahil Takiar)

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747344#comment-15747344
 ] 

Xuefu Zhang commented on HIVE-13278:


[~csun], please feel free to address MR case first if we need more time for HoS 
case. Thanks. 

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747342#comment-15747342
 ] 

Xuefu Zhang commented on HIVE-13278:


[~csun], please feel free to address MR case first if we need more time for HoS 
case. Thanks. 

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747301#comment-15747301
 ] 

Chao Sun commented on HIVE-13278:
-

Actually for HoS besides {{checkOutputSpecs}} I can't think of any case where 
we'll get the FileNotFoundException. 
[~xhao1], do you still have the query that caused this issue? did this happen 
with some non-native storage such as HBase?

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747241#comment-15747241
 ] 

Chao Sun edited comment on HIVE-13278 at 12/14/16 4:59 AM:
---

Thanks [~xuefuz] and [~lirui], I think you are right. Let me revise the patch.
[~lirui], to your question. I think for HoS the {{checkOutputSpecs}} is only 
called in {{SparkPlanGenerator}}, as discussed in 
https://issues.apache.org/jira/browse/HIVE-10073.
Also this only triggers for non-native storage (e.g., HBase), so we don't 
normally see it.




was (Author: csun):
Thanks [~xuefuz] and [~lirui]], I think you are right. Let me revise the patch.
[~lirui], to your question. I think for HoS the {{checkOutputSpecs}} is only 
called in {{SparkPlanGenerator}}, as discussed in 
https://issues.apache.org/jira/browse/HIVE-10073.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15425) Eliminate conflicting output from schematool's table validator.

2016-12-13 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15425:
-
Status: Patch Available  (was: Open)

> Eliminate conflicting output from schematool's table validator.
> ---
>
> Key: HIVE-15425
> URL: https://issues.apache.org/jira/browse/HIVE-15425
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15425.patch
>
>
> Running the schemaTool's validate command against Derby DB yields in the 
> following output.
> {code}
> Validating tables in the schema for version 2.2.0
> Expected (from schema definition) 57 tables, Found (from HMS metastore) 58 
> tables
> Schema table validation successful
> {code}
> The output above creates some confusion when there are extra tables (not part 
> of hive schema) in the database. The intention was to report the total tables 
> found and did not expect the schema namespace to contain additional tables. 
> Even as the validation is successful, the output is confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15425) Eliminate conflicting output from schematool's table validator.

2016-12-13 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15425:
-
Attachment: HIVE-15425.patch

Attaching a patch that will only print info about missing tables when there is 
an issue. Removed the message that will print the total number of expected 
tables vs found tables because there could be extra tables in the schema that 
we should have to worry about.

> Eliminate conflicting output from schematool's table validator.
> ---
>
> Key: HIVE-15425
> URL: https://issues.apache.org/jira/browse/HIVE-15425
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15425.patch
>
>
> Running the schemaTool's validate command against Derby DB yields in the 
> following output.
> {code}
> Validating tables in the schema for version 2.2.0
> Expected (from schema definition) 57 tables, Found (from HMS metastore) 58 
> tables
> Schema table validation successful
> {code}
> The output above creates some confusion when there are extra tables (not part 
> of hive schema) in the database. The intention was to report the total tables 
> found and did not expect the schema namespace to contain additional tables. 
> Even as the validation is successful, the output is confusing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747241#comment-15747241
 ] 

Chao Sun edited comment on HIVE-13278 at 12/14/16 4:39 AM:
---

Thanks [~xuefuz] and [~lirui]], I think you are right. Let me revise the patch.
[~lirui], to your question. I think for HoS the {{checkOutputSpecs}} is only 
called in {{SparkPlanGenerator}}, as discussed in 
https://issues.apache.org/jira/browse/HIVE-10073.


was (Author: csun):
Thanks [~xuefuz] and [~ruili], I think you are right. Let me revise the patch.
[~lirui], to your question. I think for HoS the {{checkOutputSpecs}} is only 
called in {{SparkPlanGenerator}}, as discussed in 
https://issues.apache.org/jira/browse/HIVE-10073.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747241#comment-15747241
 ] 

Chao Sun commented on HIVE-13278:
-

Thanks [~xuefuz] and [~ruili], I think you are right. Let me revise the patch.
[~lirui], to your question. I think for HoS the {{checkOutputSpecs}} is only 
called in {{SparkPlanGenerator}}, as discussed in 
https://issues.apache.org/jira/browse/HIVE-10073.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747102#comment-15747102
 ] 

Hive QA commented on HIVE-15277:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843141/HIVE-15277.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10814 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage2] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_external] 
(batchId=85)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_location] 
(batchId=85)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2567/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2567/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2567/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843141 - PreCommit-HIVE-Build

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15428) HoS DPP doesn't remove cyclic dependency

2016-12-13 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15428:
--
Description: More details in HIVE-15357

> HoS DPP doesn't remove cyclic dependency
> 
>
> Key: HIVE-15428
> URL: https://issues.apache.org/jira/browse/HIVE-15428
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>
> More details in HIVE-15357



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15357) Fix and re-enable the spark-only tests

2016-12-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747039#comment-15747039
 ] 

Rui Li commented on HIVE-15357:
---

Hi [~csun], I created HIVE-15428 for it.

> Fix and re-enable the spark-only tests
> --
>
> Key: HIVE-15357
> URL: https://issues.apache.org/jira/browse/HIVE-15357
> Project: Hive
>  Issue Type: Test
>Reporter: Rui Li
>Assignee: Rui Li
>
> Defined by {{spark.only.query.files}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747031#comment-15747031
 ] 

Rui Li commented on HIVE-13278:
---

[~csun], thanks for the patch. I think Xuefu's concerns are valid. In 
SparkPlanGenerator, we clone JobConf for each BaseWork. So we can set the flag 
separately for each of them.
Actually I'm still not very clear when will 
{{HiveOutputFormatImpl.checkOutputSpecs}} be called. My understanding is, since 
HoS stores the plan file in different paths for different BaseWork, we can 
never get both MapWork and ReduceWork from a single JobConf. That means for 
HoS, we should always hit the FNF error. Or we should see FNF error for map.xml 
as well. It'll be better if we figure out these first.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747021#comment-15747021
 ] 

Hive QA commented on HIVE-13278:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843132/HIVE-13278.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 39 failed/errored test(s), 10260 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=133)

[mapreduce2.q,orc_llap_counters1.q,bucket6.q,insert_into1.q,empty_dir_in_table.q,orc_merge1.q,script_env_var1.q,orc_merge_diff_fs.q,llapdecider.q,load_hdfs_file_with_space_in_the_name.q,llap_nullscan.q,orc_ppd_basic.q,transform_ppr1.q,rcfile_merge4.q,orc_merge3.q]
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=134)
[acid_bucket_pruning.q]
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=135)

[intersect_all.q,unionDistinct_1.q,orc_ppd_schema_evol_3a.q,table_nonprintable.q,tez_union_dynamic_partition.q,transform_ppr2.q,temp_table_external.q,global_limit.q,transform2.q,schemeAuthority.q,cte_2.q,rcfile_createas1.q,dynamic_partition_pruning_2.q,intersect_merge.q,transform1.q]
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=136)

[script_pipe.q,import_exported_table.q,except_distinct.q,orc_merge10.q,mapreduce1.q,explainuser_2.q,orc_merge4.q,rcfile_merge2.q,bucket5.q,llap_udf.q,external_table_with_space_in_location_path.q,load_fs2.q,script_env_var2.q,intersect_distinct.q,remote_script.q]
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=137)

[orc_merge2.q,insert_into2.q,reduce_deduplicate.q,orc_llap_counters.q,cte_4.q,schemeAuthority2.q,file_with_header_footer.q,rcfile_merge3.q]
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=138)

[join1.q,schema_evol_orc_acidvec_table_update.q,vector_decimal_5.q,insert_values_tmp_table.q,join32_lessSize.q,escape1.q,orc_predicate_pushdown.q,tez_union2.q,cte_mat_5.q,cte_mat_4.q,groupby3.q,smb_mapjoin_19.q,join46.q,dynpart_sort_optimization2.q,tez_bmj_schema_evolution.q,bucketmapjoin4.q,vector_include_no_sel.q,uber_reduce.q,schema_evol_orc_nonvec_part_all_complex.q,vector_interval_arithmetic.q,bucketsortoptimize_insert_2.q,smb_mapjoin_17.q,auto_sortmerge_join_3.q,vectorization_9.q,merge2.q,join_nulls.q,bucketsortoptimize_insert_6.q,ctas.q,cbo_udf_udaf.q,bucketmapjoin2.q]
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=139)

[skewjoinopt15.q,vector_coalesce.q,orc_ppd_decimal.q,cbo_rp_lineage2.q,insert_into_with_schema.q,join_emit_interval.q,load_dyn_part3.q,auto_sortmerge_join_14.q,vector_null_projection.q,vector_cast_constant.q,mapjoin2.q,bucket_map_join_tez2.q,correlationoptimizer4.q,vectorization_12.q,vector_number_compare_projection.q,orc_merge_incompat3.q,vector_leftsemi_mapjoin.q,update_all_non_partitioned.q,multi_column_in_single.q,schema_evol_orc_nonvec_table.q,cbo_rp_subq_in.q,cbo_rp_semijoin.q,tez_insert_overwrite_local_directory_1.q,schema_evol_text_vecrow_table.q,vector_count.q,auto_sortmerge_join_15.q,vector_if_expr.q,delete_whole_partition.q,vector_decimal_6.q,sample1.q]
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=140)

[bucket3.q,schema_evol_text_nonvec_table.q,mrr.q,orc_ppd_schema_evol_2b.q,orc_analyze.q,schema_evol_orc_acidvec_part_update.q,cbo_simple_select.q,cbo_rp_udf_udaf_stats_opt.q,subquery_views.q,multi_column_in.q,vector_interval_1.q,tez_join_result_complex.q,groupby1.q,ptf_matchpath.q,cbo_rp_udf_udaf.q,vector_decimal_aggregate.q,constprog_dpp.q,leftsemijoin_mr.q,unionDistinct_2.q,vectorization_14.q,update_all_types.q,cbo_stats.q,auto_sortmerge_join_6.q,vector_decimal_3.q,vector_groupby4.q,ptf.q,update_where_non_partitioned.q,insert_dir_distcp.q,vectorized_nested_mapjoin.q,schema_evol_text_nonvec_part.q]
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=141)

[insert_values_non_partitioned.q,union5.q,vectorized_math_funcs.q,vectorization_4.q,vectorization_2.q,vector_join_nulls.q,vector_decimal_math_funcs.q,vector_left_outer_join.q,tez_union_decimal.q,llap_partitioned.q,order_null.q,cbo_rp_views.q,smb_mapjoin_4.q,vector_date_1.q,lvj_mapjoin.q,partition_multilevels.q,varchar_udf1.q,select_dummy_source.q,limit_join_transpose.q,tez_multi_union.q,skewjoin.q,cte_mat_3.q,autoColumnStats_1.q,vector_decimal_round_2.q,semijoin.q,column_names_with_leading_and_trailing_spaces.q,update_two_cols.q,update_where_no_match.q,union_stats.q,authorization_2.q]
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
ou

[jira] [Updated] (HIVE-15277) Teach Hive how to create/delete Druid segments

2016-12-13 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15277:
--
Attachment: HIVE-15277.patch

> Teach Hive how to create/delete Druid segments 
> ---
>
> Key: HIVE-15277
> URL: https://issues.apache.org/jira/browse/HIVE-15277
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15277.2.patch, HIVE-15277.patch, HIVE-15277.patch, 
> file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS  `metric2`>;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15378) clean up HADOOP_USER_CLASSPATH_FIRST in bin scripts

2016-12-13 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746995#comment-15746995
 ] 

Fei Hui commented on HIVE-15378:


hi [~spena]
There is no errors, but HADOOP_USER_CLASSPATH_FIRST is uesless in beeline and 
hplsql.
Because beeline and hplsql will call hive --service, and 
HADOOP_USER_CLASSPATH_FIRST  is in hive script
Thanks

> clean up HADOOP_USER_CLASSPATH_FIRST in bin scripts
> ---
>
> Key: HIVE-15378
> URL: https://issues.apache.org/jira/browse/HIVE-15378
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15378.1.patch
>
>
> beeline, hive, hplsql have this statement
> export HADOOP_USER_CLASSPATH_FIRST=true
> beeline and hplsql use 'hive --service' to start, so it is uselese in beeline 
> and hplsql
> add export HADOOP_USER_CLASSPATH_FIRST=true to hive.cmd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746850#comment-15746850
 ] 

Xuefu Zhang commented on HIVE-13278:


[~stakiar], let us know if you like to work on this. Otherwise, we will move 
this forward.

[~csun], thanks for working on this. I have two questions:
1. I saw ConditionalTask.hasReduce() is removed, which would change the 
semantics, right? I'm not sure if this breaks anything.
2. For SparkTask, I assume it's possible to have two MapTasks: one ends at FS, 
while the other connects to a reduce task. Would the first MapTask is still 
going to hit hdfs for reduce.xml?

[~lirui], please also take a look the approach. Thanks.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15427) Hadoop3 support

2016-12-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15427:

Attachment: HIVE-15427.WIP.patch

Preliminary patch to make it build (somewhat, I didn't make the original on 
master). 
I suspect that we'd need to have two builds for binary compat, similar to how 
we used to with hadoop1 vs 2. Consider e.g. netty API changes in ShuffleHandler 
- we take netty library from Hadoop, and I suspect even if those are changed to 
use a reflection-based, single-build netty-shim (like the JVM monitor changes), 
the version built against one netty won't work with the other.

 cc [~sseth] [~gopalv] [~jnp] [~hagleitn]

> Hadoop3 support
> ---
>
> Key: HIVE-15427
> URL: https://issues.apache.org/jira/browse/HIVE-15427
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15427.WIP.patch
>
>
> Need to start working on Hadoop 3 support at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15357) Fix and re-enable the spark-only tests

2016-12-13 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746826#comment-15746826
 ] 

Chao Sun commented on HIVE-15357:
-

[~ruili] I don't remember exactly why we didn't include the cycle detection in 
the code, but seems this should be added. Can you create a separate JIRA for 
this? Thanks.

> Fix and re-enable the spark-only tests
> --
>
> Key: HIVE-15357
> URL: https://issues.apache.org/jira/browse/HIVE-15357
> Project: Hive
>  Issue Type: Test
>Reporter: Rui Li
>Assignee: Rui Li
>
> Defined by {{spark.only.query.files}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Assignee: Sahil Takiar  (was: Chao Sun)

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HIVE-13278:
---

Assignee: Chao Sun  (was: Sahil Takiar)

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Status: Patch Available  (was: Open)

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Attachment: HIVE-13278.1.patch

I'm also working on this issue and I think it's better to avoid unnecessary NN 
calls, especially the # of mappers is huge.
Attaching an initial patch, which adopts the idea of using a separate conf 
flag. [~stakiar], [~ruili], [~xuefuz] can you give a review on this?
(sorry I have to own this JIRA in order to attach).

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Assignee: Sahil Takiar  (was: Chao Sun)

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-13278.1.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HIVE-13278:
---

Assignee: Chao Sun  (was: Sahil Takiar)

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746753#comment-15746753
 ] 

Hive QA commented on HIVE-15422:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843119/HIVE-15422.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10815 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationReads.testReadTableFailure
 (batchId=209)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2565/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2565/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2565/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843119 - PreCommit-HIVE-Build

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge 
> number of objects for partitioned dataset
> 
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, 
> Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-13 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746720#comment-15746720
 ] 

Eugene Koifman commented on HIVE-15376:
---

patch 6
Why do you need to expose the "delay" to the client?
Why is the delay always 0.  Why is it not checking hive.txn.timeout as before?

should releaseLocks() do isOpenTxn() before quitting heartbeat?  Seems like it 
would be more consistent

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14948) properly handle special characters in identifiers

2016-12-13 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746710#comment-15746710
 ] 

Eugene Koifman commented on HIVE-14948:
---

They were moved to TestTxnCommands to make runtime of the test more balanced.  
This helps with pTest runtime which parallelizes at test suite granularity.

The TestTxnCommands2 are also run in Acid 2.0 mode - there is no need to do 
that for all MERGE tests.

> properly handle special characters in identifiers
> -
>
> Key: HIVE-14948
> URL: https://issues.apache.org/jira/browse/HIVE-14948
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14948.01.patch, HIVE-14948.02.patch, 
> HIVE-14948.03.patch
>
>
> The treatment of quoted identifiers in HIVE-14943 is inconsistent.  Need to 
> clean this up and if possible only quote those identifiers that need to be 
> quoted in the generated SQL statement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14948) properly handle special characters in identifiers

2016-12-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746699#comment-15746699
 ] 

Alan Gates commented on HIVE-14948:
---

+1.

Were the tests removed from TestTxnCommands2 superseded by other tests or 
removed for another reason?

> properly handle special characters in identifiers
> -
>
> Key: HIVE-14948
> URL: https://issues.apache.org/jira/browse/HIVE-14948
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14948.01.patch, HIVE-14948.02.patch, 
> HIVE-14948.03.patch
>
>
> The treatment of quoted identifiers in HIVE-14943 is inconsistent.  Need to 
> clean this up and if possible only quote those identifiers that need to be 
> quoted in the generated SQL statement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746687#comment-15746687
 ] 

Sushanth Sowmyan commented on HIVE-15426:
-

None of the test failures are relted. [~vgumashta], could you please review?

> Fix order guarantee of event executions for REPL LOAD
> -
>
> Key: HIVE-15426
> URL: https://issues.apache.org/jira/browse/HIVE-15426
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-15426.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-13 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746685#comment-15746685
 ] 

Anthony Hsu commented on HIVE-15353:


Test failures look unrelated.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-13 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746678#comment-15746678
 ] 

Wei Zheng commented on HIVE-15376:
--

[~ekoifman] Can you take another look please?

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, HIVE-15376.6.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746614#comment-15746614
 ] 

Hive QA commented on HIVE-15426:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843101/HIVE-15426.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10814 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite 
(batchId=186)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2564/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2564/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2564/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843101 - PreCommit-HIVE-Build

> Fix order guarantee of event executions for REPL LOAD
> -
>
> Key: HIVE-15426
> URL: https://issues.apache.org/jira/browse/HIVE-15426
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-15426.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15294) Capture additional metadata to replicate a simple insert at destination

2016-12-13 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746605#comment-15746605
 ] 

Vaibhav Gumashta commented on HIVE-15294:
-

[~thejas] I'll also need to create a copy of DbNotificationListener and 
TestDbNotificationListener in org.apache.hadoop.hive.metastore since HS2 does 
not add hcatalog libs in classpath as of now. Also consistent with the previous 
messaging based move we did here: HIVE-15180. Will update relevant code and 
upload a patch to soon. 

> Capture additional metadata to replicate a simple insert at destination
> ---
>
> Key: HIVE-15294
> URL: https://issues.apache.org/jira/browse/HIVE-15294
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-15294.1.patch
>
>
> For replicating inserts like {{INSERT INTO ... SELECT ... FROM}}, we will 
> need to capture the newly added files in the notification message to be able 
> to replicate the event at destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

2016-12-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HIVE-15422:
---

Assignee: Rajesh Balamohan

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge 
> number of objects for partitioned dataset
> 
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, 
> Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

2016-12-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15422:

Attachment: HIVE-15422.2.patch

Thanks [~sershe]. Addressed the review comments in latest patch. 

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge 
> number of objects for partitioned dataset
> 
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, 
> Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-6365) Alter a partition to be of a different fileformat than the Table's fileformat. Use insert overwrite to write data to this partition. The partition fileformat is converted

2016-12-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu reassigned HIVE-6365:
-

Assignee: Anthony Hsu

> Alter a partition to be of a different fileformat than the Table's 
> fileformat. Use insert overwrite to write data to this partition. The 
> partition fileformat is converted back to table's fileformat after the insert 
> operation. 
> --
>
> Key: HIVE-6365
> URL: https://issues.apache.org/jira/browse/HIVE-6365
> Project: Hive
>  Issue Type: Bug
> Environment: emr
>Reporter: Pavan Srinivas
>Assignee: Anthony Hsu
>
> Lets say, there is partitioned table like 
> Step1:
> >> CREATE TABLE srcpart (key STRING, value STRING)
> PARTITIONED BY (ds STRING, hr STRING)
> STORED AS TEXTFILE;
> Step2:
> Alter the fileformat for a specific available partition. 
> >> alter table srcpart partition(ds="2008-04-08", hr="12") set fileformat  
> >> orc;
> Step3:
> Describe the partition.
> >> desc formatted srcpart partition(ds="2008-04-08", hr="12")
> .
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  -1
> Bucket Columns:   []
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Step4:
> Write the data to this partition using insert overwrite. 
> >>insert overwrite  table srcpart partition(ds="2008-04-08",hr="12") select 
> >>key, value from ... 
> Step5:
> Describe the partition again. 
> >> desc formatted srcpart partition(ds="2008-04-08", hr="12")
> .
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Compressed:   No
> Num Buckets:  -1
> Bucket Columns:   []
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> The fileformat of the partition is converted back to the table's original 
> fileformat. It should have retained and written the data in the modified 
> fileformat. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-13 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-15426:

Attachment: HIVE-15426.patch

Attached patch to fix order guarantee of event executions.

Also fixes a bug where REPL STATUS did not fetch the actual repl.last.id 
parameter, and a bug where batchSize was always being hardcoded to 15.

> Fix order guarantee of event executions for REPL LOAD
> -
>
> Key: HIVE-15426
> URL: https://issues.apache.org/jira/browse/HIVE-15426
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-15426.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-13 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-15426:

Status: Patch Available  (was: Open)

> Fix order guarantee of event executions for REPL LOAD
> -
>
> Key: HIVE-15426
> URL: https://issues.apache.org/jira/browse/HIVE-15426
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-15426.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15410) WebHCat supports get/set table property with its name containing period and hyphen

2016-12-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746407#comment-15746407
 ] 

Thejas M Nair commented on HIVE-15410:
--

+1
Thanks for the update and explanation.

> WebHCat supports get/set table property with its name containing period and 
> hyphen
> --
>
> Key: HIVE-15410
> URL: https://issues.apache.org/jira/browse/HIVE-15410
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Attachments: HIVE-15410.1.patch, HIVE-15410.patch
>
>
> Hive table properties could have period (.) or hyphen (-) in their names, 
> auto.purge is one of the examples. But WebHCat APIs does not support either 
> set or get these properties, and they throw out the error msg ""Invalid DDL 
> identifier :property". For example:
> {code}
> [root@ctang-1 ~]# curl -s 
> 'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key1?user.name=hiveuser'
> {"error":"Invalid DDL identifier :property"}
> [root@ctang-1 ~]# curl -s -X PUT -HContent-type:application/json -d '{ 
> "value": "true" }' 
> 'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key2?user.name=hiveuser/'
> {"error":"Invalid DDL identifier :property"}
> {code}
> This patch is going to add the supports to the property name containing 
> period and/or hyphen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15410) WebHCat supports get/set table property with its name containing period and hyphen

2016-12-13 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746380#comment-15746380
 ] 

Chaoyu Tang commented on HIVE-15410:


[~thejas] Could you take a look at new patch to see if there are any other 
comments/suggestions? Thanks

> WebHCat supports get/set table property with its name containing period and 
> hyphen
> --
>
> Key: HIVE-15410
> URL: https://issues.apache.org/jira/browse/HIVE-15410
> Project: Hive
>  Issue Type: Improvement
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Attachments: HIVE-15410.1.patch, HIVE-15410.patch
>
>
> Hive table properties could have period (.) or hyphen (-) in their names, 
> auto.purge is one of the examples. But WebHCat APIs does not support either 
> set or get these properties, and they throw out the error msg ""Invalid DDL 
> identifier :property". For example:
> {code}
> [root@ctang-1 ~]# curl -s 
> 'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key1?user.name=hiveuser'
> {"error":"Invalid DDL identifier :property"}
> [root@ctang-1 ~]# curl -s -X PUT -HContent-type:application/json -d '{ 
> "value": "true" }' 
> 'http://ctang-1.gce.cloudera.com:7272/templeton/v1/ddl/database/default/table/sample_07/property/prop.key2?user.name=hiveuser/'
> {"error":"Invalid DDL identifier :property"}
> {code}
> This patch is going to add the supports to the property name containing 
> period and/or hyphen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746363#comment-15746363
 ] 

Hive QA commented on HIVE-15353:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843060/HIVE-15353.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10814 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=92)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2563/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2563/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2563/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843060 - PreCommit-HIVE-Build

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14870) OracleStore: RawStore implementation optimized for Oracle

2016-12-13 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746299#comment-15746299
 ] 

Chris Drome edited comment on HIVE-14870 at 12/13/16 9:17 PM:
--

[~alangates], let me answer from the bottom up.

Page 2-3 explains what I did regarding deduplicating data. In short, I have 
removed LOCATION and CD_ID from the SDS table, because that results in a unique 
entry per table/partition. I also collapsed SDS and SERDES tables into a single 
table. These two changes result in a decrease from 3.4M records to 15 records.

I didn't check the impact of each individual change, but all of the changes in 
aggregate result in a 3-4x speed up for getTable calls.

I haven't tested array types to replace columns, etc because some of our table 
consist of 100s of columns and felt that the tradeoff would not be worth it 
(concerned about needlessly bloating tables with array types). I plan to 
implement the same caching mechanism that you employ in HBaseStore, so the 
savings we get would be minimized. Furthermore, getTable calls take a fraction 
of the time that getPartitions calls take, so the majority of the effort was to 
optimize those calls.

I'm currently working with our QE to hammer out the last couple of failures 
that we are hitting in regression/integration tests. I'd like to refactor and 
clean up some code around the getPartitions calls as well. I hope to have a 
cleaner version that I can post before the end of the year.


was (Author: cdrome):
[~alangates], let me answer from the bottom up.

Page 2-3 explains what I did regarding deduplicating data. In short, I have 
removed LOCATION and CD_ID from the SDS table, because that results in a unique 
entry per table/partition. I also collapsed SDS and SERDES tables into a single 
table. These two changes result in a decrease from 3.4M records to 15 records.

I didn't check the impact of each individual change, but all of the changes in 
aggregate result in a 3-4x speed up for getTable calls.

I haven't tested array types to replace columns, etc because some of our table 
consist of 100s of columns and felt that the tradeoff would not be worth it. I 
plan to implement the same caching mechanism that you employ in HBaseStore, so 
the savings we get would be minimized. Furthermore, getTable calls take a 
fraction of the time that getPartitions calls take, so the majority of the 
effort was to optimize those calls.

I'm currently working with our QE to hammer out the last couple of failures 
that we are hitting in regression/integration tests. I'd like to refactor and 
clean up some code around the getPartitions calls as well. I hope to have a 
cleaner version that I can post before the end of the year.

> OracleStore: RawStore implementation optimized for Oracle
> -
>
> Key: HIVE-14870
> URL: https://issues.apache.org/jira/browse/HIVE-14870
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: OracleStoreDesignProposal.pdf
>
>
> The attached document is a proposal for a RawStore implementation which is 
> optimized for Oracle and replaces DataNucleus. The document outlines schema 
> changes, OracleStore implementation details, and performance tests against 
> ObjectStore, ObjectStore+DirectSQL, and OracleStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14870) OracleStore: RawStore implementation optimized for Oracle

2016-12-13 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746299#comment-15746299
 ] 

Chris Drome commented on HIVE-14870:


[~alangates], let me answer from the bottom up.

Page 2-3 explains what I did regarding deduplicating data. In short, I have 
removed LOCATION and CD_ID from the SDS table, because that results in a unique 
entry per table/partition. I also collapsed SDS and SERDES tables into a single 
table. These two changes result in a decrease from 3.4M records to 15 records.

I didn't check the impact of each individual change, but all of the changes in 
aggregate result in a 3-4x speed up for getTable calls.

I haven't tested array types to replace columns, etc because some of our table 
consist of 100s of columns and felt that the tradeoff would not be worth it. I 
plan to implement the same caching mechanism that you employ in HBaseStore, so 
the savings we get would be minimized. Furthermore, getTable calls take a 
fraction of the time that getPartitions calls take, so the majority of the 
effort was to optimize those calls.

I'm currently working with our QE to hammer out the last couple of failures 
that we are hitting in regression/integration tests. I'd like to refactor and 
clean up some code around the getPartitions calls as well. I hope to have a 
cleaner version that I can post before the end of the year.

> OracleStore: RawStore implementation optimized for Oracle
> -
>
> Key: HIVE-14870
> URL: https://issues.apache.org/jira/browse/HIVE-14870
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: OracleStoreDesignProposal.pdf
>
>
> The attached document is a proposal for a RawStore implementation which is 
> optimized for Oracle and replaces DataNucleus. The document outlines schema 
> changes, OracleStore implementation details, and performance tests against 
> ObjectStore, ObjectStore+DirectSQL, and OracleStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-5921) Better heuristics for worst case statistics estimates for join, limit and filter operator

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HIVE-5921:
--

Assignee: Chao Sun  (was: Prasanth Jayachandran)

> Better heuristics for worst case statistics estimates for join, limit and 
> filter operator
> -
>
> Key: HIVE-5921
> URL: https://issues.apache.org/jira/browse/HIVE-5921
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor, Statistics
>Affects Versions: 0.13.0
>Reporter: Prasanth Jayachandran
>Assignee: Chao Sun
> Fix For: 0.13.0
>
> Attachments: HIVE-5921.1.patch, HIVE-5921.2.patch, HIVE-5921.3.patch, 
> HIVE-5921.4.patch
>
>
> This is a subtask of HIVE-5369. In worst case (i.e; absence of column 
> statistics) HIVE-5849 improved the basic statistics with heuristics. But the 
> heuristics failed to provide better estimates in few cases. For example: 
> FILTER operator heuristics did not take into account the number of predicates 
> and if the predicate contains partition column. Also, JOIN estimates were too 
> aggressive and was not user configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5921) Better heuristics for worst case statistics estimates for join, limit and filter operator

2016-12-13 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-5921:
---
Assignee: Prasanth Jayachandran  (was: Chao Sun)

> Better heuristics for worst case statistics estimates for join, limit and 
> filter operator
> -
>
> Key: HIVE-5921
> URL: https://issues.apache.org/jira/browse/HIVE-5921
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor, Statistics
>Affects Versions: 0.13.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 0.13.0
>
> Attachments: HIVE-5921.1.patch, HIVE-5921.2.patch, HIVE-5921.3.patch, 
> HIVE-5921.4.patch
>
>
> This is a subtask of HIVE-5369. In worst case (i.e; absence of column 
> statistics) HIVE-5849 improved the basic statistics with heuristics. But the 
> heuristics failed to provide better estimates in few cases. For example: 
> FILTER operator heuristics did not take into account the number of predicates 
> and if the predicate contains partition column. Also, JOIN estimates were too 
> aggressive and was not user configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14948) properly handle special characters in identifiers

2016-12-13 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746272#comment-15746272
 ] 

Eugene Koifman commented on HIVE-14948:
---

Failures not related

[~alangates] could you review please

> properly handle special characters in identifiers
> -
>
> Key: HIVE-14948
> URL: https://issues.apache.org/jira/browse/HIVE-14948
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14948.01.patch, HIVE-14948.02.patch, 
> HIVE-14948.03.patch
>
>
> The treatment of quoted identifiers in HIVE-14943 is inconsistent.  Need to 
> clean this up and if possible only quote those identifiers that need to be 
> quoted in the generated SQL statement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14948) properly handle special characters in identifiers

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746248#comment-15746248
 ] 

Hive QA commented on HIVE-14948:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843059/HIVE-14948.03.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10766 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=108)

[union_remove_1.q,ppd_outer_join2.q,groupby1_noskew.q,join20.q,smb_mapjoin_13.q,multi_insert.q,groupby_rollup1.q,temp_table_gb1.q,vector_string_concat.q,smb_mapjoin_6.q,metadata_only_queries.q,auto_sortmerge_join_12.q,groupby_bigdata.q,groupby3_map_multi_distinct.q,innerjoin.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables_compact]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=92)
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite 
(batchId=186)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2562/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2562/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2562/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843059 - PreCommit-HIVE-Build

> properly handle special characters in identifiers
> -
>
> Key: HIVE-14948
> URL: https://issues.apache.org/jira/browse/HIVE-14948
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14948.01.patch, HIVE-14948.02.patch, 
> HIVE-14948.03.patch
>
>
> The treatment of quoted identifiers in HIVE-14943 is inconsistent.  Need to 
> clean this up and if possible only quote those identifiers that need to be 
> quoted in the generated SQL statement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15397) metadata-only queries may return incorrect results with empty tables

2016-12-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15397:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review

> metadata-only queries may return incorrect results with empty tables
> 
>
> Key: HIVE-15397
> URL: https://issues.apache.org/jira/browse/HIVE-15397
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-15397.01.patch, HIVE-15397.patch
>
>
> Queries like select 1=1 from t group by 1=1 may return rows, based on 
> OneNullRowInputFormat, even if the source table is empty. For now, add some 
> basic detection of empty tables and turn this off by default (since we can't 
> know whether a table is empty or not based on there being some files, without 
> reading them).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15399) Parser change for UniqueJoin

2016-12-13 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746221#comment-15746221
 ] 

Pengcheng Xiong commented on HIVE-15399:


The last two test case failures are related and we need to update the golden 
files accordingly. All the others are unrelated.

> Parser change for UniqueJoin
> 
>
> Key: HIVE-15399
> URL: https://issues.apache.org/jira/browse/HIVE-15399
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15399.01.patch
>
>
> UniqueJoin was introduced in HIVE-591. Add Unique Join. (Emil Ibrishimov via 
> namit). It sounds like that there is only one q test for unique join, i.e., 
> uniquejoin.q. In the q test, unique join source can only come from a table. 
> However, in parser, its source can come from not only tableSource, but also
> {code}
> partitionedTableFunction | tableSource | subQuerySource | virtualTableSource
> {code}
> I think it would be better to change the parser and limit it to meet the 
> user's real requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15351) Disable vectorized VectorUDFAdaptor usage with non-column or constant parameters

2016-12-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746087#comment-15746087
 ] 

Thejas M Nair commented on HIVE-15351:
--

Setting fix version to 2.2.0 for master commit.


> Disable vectorized VectorUDFAdaptor usage with non-column or constant 
> parameters
> 
>
> Key: HIVE-15351
> URL: https://issues.apache.org/jira/browse/HIVE-15351
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-15351.01.patch, HIVE-15351.02.patch
>
>
> Vectorization using VectorUDFAdaptor is broken and produces wrong results 
> when the parameter(s) have vectorized expressions that allocate scratch 
> columns.  So, for now, we restrict VectorUDFAdaptor usage to columns or 
> constant expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset

2016-12-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746089#comment-15746089
 ] 

Sergey Shelukhin commented on HIVE-15422:
-

Nit: is linkedhashset needed? Doesn't seem to rely on ordering. Otherwise +1

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge 
> number of objects for partitioned dataset
> 
>
> Key: HIVE-15422
> URL: https://issues.apache.org/jira/browse/HIVE-15422
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15422.1.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node 
> cluster, lots of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large 
> amount of objects created just in path comparisons in HiveInputFormat.  
> HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends 
> up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15351) Disable vectorized VectorUDFAdaptor usage with non-column or constant parameters

2016-12-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15351:
-
Fix Version/s: 2.2.0

> Disable vectorized VectorUDFAdaptor usage with non-column or constant 
> parameters
> 
>
> Key: HIVE-15351
> URL: https://issues.apache.org/jira/browse/HIVE-15351
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-15351.01.patch, HIVE-15351.02.patch
>
>
> Vectorization using VectorUDFAdaptor is broken and produces wrong results 
> when the parameter(s) have vectorized expressions that allocate scratch 
> columns.  So, for now, we restrict VectorUDFAdaptor usage to columns or 
> constant expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15347) LLAP: Executor memory and Xmx should have some headroom for other services

2016-12-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746073#comment-15746073
 ] 

Sergey Shelukhin commented on HIVE-15347:
-

+1

> LLAP: Executor memory and Xmx should have some headroom for other services
> --
>
> Key: HIVE-15347
> URL: https://issues.apache.org/jira/browse/HIVE-15347
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-15347.1.patch
>
>
> If executor memory + cache memory is configured close or equal to Xmx, the 
> task attempts that is causing OOM can take down the LLAP daemon. Provide some 
> leeway for other services during memory crunch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746070#comment-15746070
 ] 

Hive QA commented on HIVE-15421:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843047/HIVE-15421.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10768 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=115)

[groupby_map_ppr_multi_distinct.q,bucketmapjoin10.q,vectorization_13.q,mapjoin_mapjoin.q,union2.q,join41.q,groupby8_map.q,cbo_subq_not_in.q,identity_project_remove_skip.q,groupby8_map_skew.q,nullgroup2.q,mapjoin_subquery.q,bucket2.q,smb_mapjoin_1.q,union_remove_8.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=92)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2560/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2560/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2560/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843047 - PreCommit-HIVE-Build

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15421.1.patch, HIVE-15421.2.patch, 
> HIVE-15421.3.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15386) Expose Spark task counts and stage Ids information in SparkTask from SparkJobMonitor

2016-12-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745988#comment-15745988
 ] 

zhihai xu commented on HIVE-15386:
--

Thanks [~xuefuz] for the review! Thanks [~lirui] for the review and committing 
the patch!

> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor
> 
>
> Key: HIVE-15386
> URL: https://issues.apache.org/jira/browse/HIVE-15386
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.1.1
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.2.0
>
> Attachments: HIVE-15386.000.patch, HIVE-15386.001.patch, 
> HIVE-15386.002.patch
>
>
> Expose Spark task counts and stage Ids information in SparkTask from 
> SparkJobMonitor. So these information can be used by hive hook to monitor 
> spark jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-13 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15421:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~daijy] for reivew.

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15421.1.patch, HIVE-15421.2.patch, 
> HIVE-15421.3.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Patch Available  (was: Open)

Reuploaded the same patch as HIVE-15353.4.patch to try to trigger the PreCommit 
tests again.

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Status: Open  (was: Patch Available)

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15353) Metastore throws NPE if StorageDescriptor.cols is null

2016-12-13 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-15353:
---
Attachment: HIVE-15353.4.patch

> Metastore throws NPE if StorageDescriptor.cols is null
> --
>
> Key: HIVE-15353
> URL: https://issues.apache.org/jira/browse/HIVE-15353
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-15353.1.patch, HIVE-15353.2.patch, 
> HIVE-15353.3.patch, HIVE-15353.4.patch
>
>
> When using the HiveMetaStoreClient API directly to talk to the metastore, you 
> get NullPointerExceptions when StorageDescriptor.cols is null in the 
> Table/Partition object in the following calls:
> * create_table
> * alter_table
> * alter_partition
> Calling add_partition with StorageDescriptor.cols set to null causes null to 
> be stored in the metastore database and subsequent calls to alter_partition 
> for that partition to fail with an NPE.
> Null checks should be added to eliminate the NPEs in the metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14948) properly handle special characters in identifiers

2016-12-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14948:
--
Attachment: HIVE-14948.03.patch

> properly handle special characters in identifiers
> -
>
> Key: HIVE-14948
> URL: https://issues.apache.org/jira/browse/HIVE-14948
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14948.01.patch, HIVE-14948.02.patch, 
> HIVE-14948.03.patch
>
>
> The treatment of quoted identifiers in HIVE-14943 is inconsistent.  Need to 
> clean this up and if possible only quote those identifiers that need to be 
> quoted in the generated SQL statement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14948) properly handle special characters in identifiers

2016-12-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14948:
--
Status: Patch Available  (was: In Progress)

> properly handle special characters in identifiers
> -
>
> Key: HIVE-14948
> URL: https://issues.apache.org/jira/browse/HIVE-14948
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14948.01.patch, HIVE-14948.02.patch, 
> HIVE-14948.03.patch
>
>
> The treatment of quoted identifiers in HIVE-14943 is inconsistent.  Need to 
> clean this up and if possible only quote those identifiers that need to be 
> quoted in the generated SQL statement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745885#comment-15745885
 ] 

Daniel Dai commented on HIVE-15421:
---

+1

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch, HIVE-15421.2.patch, 
> HIVE-15421.3.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-13 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15421:
-
Attachment: HIVE-15421.3.patch

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch, HIVE-15421.2.patch, 
> HIVE-15421.3.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745863#comment-15745863
 ] 

Daniel Dai commented on HIVE-15421:
---

Also we'd better retain the exception chain in the new IOException: "throw new 
IOException(msg, e);"

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch, HIVE-15421.2.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-13 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15421:
-
Attachment: HIVE-15421.2.patch

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch, HIVE-15421.2.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15421) Assumption in exception handling can be wrong in DagUtils.localizeResource

2016-12-13 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745850#comment-15745850
 ] 

Wei Zheng commented on HIVE-15421:
--

[~daijy] patch 2 added HADOOP ticket number and fix version.

Test failures are not related.

> Assumption in exception handling can be wrong in DagUtils.localizeResource
> --
>
> Key: HIVE-15421
> URL: https://issues.apache.org/jira/browse/HIVE-15421
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15421.1.patch, HIVE-15421.2.patch
>
>
> In localizeResource once we got an IOException, we always assume this is due 
> to another thread writing the same file. But that is not always the case. 
> Even without the interference from other threads, it may still get an 
> IOException (RemoteException) due to failure of copyFromLocalFile in a 
> specific environment, for example, in a kerberized HDFS encryption zone where 
> the TGT is expired.
> We'd better fail early with different message to avoid confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15384) Compressor plugin

2016-12-13 Thread Kevin Liew (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liew updated HIVE-15384:
--
Description: Splitting compressor into separate JIRA from server framework  
(was: Splitting server framework into separate JIRA from compressor)

> Compressor plugin
> -
>
> Key: HIVE-15384
> URL: https://issues.apache.org/jira/browse/HIVE-15384
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ziyang Zhao
>Assignee: Kevin Liew
>
> Splitting compressor into separate JIRA from server framework



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15384) Snappy Compressor plugin

2016-12-13 Thread Kevin Liew (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liew updated HIVE-15384:
--
Summary: Snappy Compressor plugin  (was: Compressor plugin)

> Snappy Compressor plugin
> 
>
> Key: HIVE-15384
> URL: https://issues.apache.org/jira/browse/HIVE-15384
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ziyang Zhao
>Assignee: Kevin Liew
>
> Splitting compressor into separate JIRA from server framework



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15384) Compressor plugin

2016-12-13 Thread Kevin Liew (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Liew reassigned HIVE-15384:
-

Assignee: Kevin Liew

> Compressor plugin
> -
>
> Key: HIVE-15384
> URL: https://issues.apache.org/jira/browse/HIVE-15384
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ziyang Zhao
>Assignee: Kevin Liew
>
> Splitting server framework into separate JIRA from compressor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator

2016-12-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745794#comment-15745794
 ] 

Thejas M Nair commented on HIVE-15339:
--

Updated jira summary to clarify what the patch does as [~rajesh.balamohan] 
mentioned in earlier comment.


> Batch metastore calls to get column stats for fields needed in 
> FilterSelectivityEstimator
> -
>
> Key: HIVE-15339
> URL: https://issues.apache.org/jira/browse/HIVE-15339
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch
>
>
> Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics 
> from metastore in multiple calls. For instance, in the following query, it 
> ends up getting individual column statistics for for flights multiple number 
> of times.
> When the table has large number of partitions, getting statistics for columns 
> via multiple calls can be very expensive. This would adversely impact the 
> overall compilation time. The following query took 14 seconds to compile.
> {noformat}
> SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
> YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
> FROM `flights` as `flights`
> JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
> JOIN `airports` as `source_airport` ON (`flights`.`origin` = 
> `source_airport`.`iata`)
> JOIN `airports` as `dest_airport` ON (`flights`.`dest` = 
> `dest_airport`.`iata`)
> GROUP BY YEAR(`flights`.`dateofflight`);
> {noformat}
> It may be helpful to club all columns that need statistics and fetch these 
> details in single remote call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator

2016-12-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-15339:
-
Summary: Batch metastore calls to get column stats for fields needed in 
FilterSelectivityEstimator  (was: Prefetch column stats for fields needed in 
FilterSelectivityEstimator)

> Batch metastore calls to get column stats for fields needed in 
> FilterSelectivityEstimator
> -
>
> Key: HIVE-15339
> URL: https://issues.apache.org/jira/browse/HIVE-15339
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch
>
>
> Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics 
> from metastore in multiple calls. For instance, in the following query, it 
> ends up getting individual column statistics for for flights multiple number 
> of times.
> When the table has large number of partitions, getting statistics for columns 
> via multiple calls can be very expensive. This would adversely impact the 
> overall compilation time. The following query took 14 seconds to compile.
> {noformat}
> SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
> YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
> FROM `flights` as `flights`
> JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
> JOIN `airports` as `source_airport` ON (`flights`.`origin` = 
> `source_airport`.`iata`)
> JOIN `airports` as `dest_airport` ON (`flights`.`dest` = 
> `dest_airport`.`iata`)
> GROUP BY YEAR(`flights`.`dateofflight`);
> {noformat}
> It may be helpful to club all columns that need statistics and fetch these 
> details in single remote call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15378) clean up HADOOP_USER_CLASSPATH_FIRST in bin scripts

2016-12-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745590#comment-15745590
 ] 

Sergio Peña commented on HIVE-15378:


what's the problem with this variable on the scripts? Are there any errors with 
the variable no cleaned up? Can add it on the description please?

> clean up HADOOP_USER_CLASSPATH_FIRST in bin scripts
> ---
>
> Key: HIVE-15378
> URL: https://issues.apache.org/jira/browse/HIVE-15378
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15378.1.patch
>
>
> beeline, hive, hplsql have this statement
> export HADOOP_USER_CLASSPATH_FIRST=true
> beeline and hplsql use 'hive --service' to start, so it is uselese in beeline 
> and hplsql
> add export HADOOP_USER_CLASSPATH_FIRST=true to hive.cmd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14735) Build Infra: Spark artifacts download takes a long time

2016-12-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745569#comment-15745569
 ] 

Sergio Peña commented on HIVE-14735:


- is skipSparkTests or skipSparkAssemblyDeploy?
- can we use maven instead of gradle? I just want to avoid using another build 
tool that contributors will require to learn to do maintenance. 
- can you add on the README how to publish files manually? The current repo is 
not a maven repo, so the publish function won't work.

> Build Infra: Spark artifacts download takes a long time
> ---
>
> Key: HIVE-14735
> URL: https://issues.apache.org/jira/browse/HIVE-14735
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Vaibhav Gumashta
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14735.1.patch, HIVE-14735.1.patch, 
> HIVE-14735.1.patch, HIVE-14735.1.patch, HIVE-14735.2.patch, HIVE-14735.3.patch
>
>
> In particular this command:
> {{curl -Sso ./../thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz 
> http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.6.0-bin-hadoop2-without-hive.tgz}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15424) Hive dropped table during table creation if table already exists

2016-12-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15424:

Status: Patch Available  (was: Open)

+1

> Hive dropped table during table creation if table already exists
> 
>
> Key: HIVE-15424
> URL: https://issues.apache.org/jira/browse/HIVE-15424
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Suresh Bahuguna
>
> While creating a table, rollbackCreateTable() shouldn't be called if table 
> already exists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15076) Improve scalability of LDAP authentication provider group filter

2016-12-13 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745376#comment-15745376
 ] 

Illya Yalovyy commented on HIVE-15076:
--

[~aihuaxu] I have addressed your concerns. Please see my latest patch.

> Improve scalability of LDAP authentication provider group filter
> 
>
> Key: HIVE-15076
> URL: https://issues.apache.org/jira/browse/HIVE-15076
> Project: Hive
>  Issue Type: Improvement
>  Components: Authentication
>Affects Versions: 2.1.0
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-15076.1.patch, HIVE-15076.2.patch, 
> HIVE-15076.3.patch, HIVE-15076.4.patch, HIVE-15076.5.patch
>
>
> Current implementation uses following algorithm:
> #   For a given user find all groups that user is a member of. (A list of 
> LDAP groups is constructed as a result of that request)
> #  Match this list of groups with provided group filter.
>  
> Time/Memory complexity of this approach is O(N) on client side, where N – is 
> a number of groups the user has membership in. On a large directory (800+ 
> groups per user) we can observe up to 2x performance degradation and failures 
> because of size of LDAP response (LDAP: error code 4 - Sizelimit Exceeded).
>  
> Some Directory Services (Microsoft Active Directory for instance) provide a 
> virtual attribute for User Object that contains a list of groups that user 
> belongs to. This attribute can be used to quickly determine whether this user 
> passes or fails the group filter.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15391) Location validation for table should ignore the values for view.

2016-12-13 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-15391:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master.
Thanks Aihua for reviewing the code.

> Location validation for table should ignore the values for view.
> 
>
> Key: HIVE-15391
> URL: https://issues.apache.org/jira/browse/HIVE-15391
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-15206.1.patch
>
>
> When use schematool to do location validation, we got error message for 
> views, for example:
> {noformat}
> n DB with Name: viewa
> NULL Location for TABLE with Name: viewa
> In DB with Name: viewa
> NULL Location for TABLE with Name: viewb
> In DB with Name: viewa
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-12-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745219#comment-15745219
 ] 

Xuefu Zhang commented on HIVE-13278:


[~lirui], thanks for the summary. Following your idea, can we first check if 
the mapwork ends a RS and use this to determine if reduce.xml is expected?

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Sahil Takiar
>Priority: Minor
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745172#comment-15745172
 ] 

Hive QA commented on HIVE-15335:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842981/HIVE-15335.08.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10896 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hadoop.hive.ql.io.parquet.TestVectorizedColumnReader.decimalRead 
(batchId=251)
org.apache.hadoop.hive.ql.io.parquet.TestVectorizedDictionaryEncodingColumnReader.decimalRead
 (batchId=250)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2559/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2559/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2559/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842981 - PreCommit-HIVE-Build

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15357) Fix and re-enable the spark-only tests

2016-12-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745110#comment-15745110
 ] 

Rui Li commented on HIVE-15357:
---

When I tried to re-enable these tests, I found this issue with DPP.
Running the following query in our qtest:
{code}
EXPLAIN select count(*) from srcpart join (select ds as ds, ds as `date` from 
srcpart group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
{code}
will get this following plan:
{noformat}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
Spark
  Edges:
Reducer 7 <- Map 6 (GROUP, 1)
  DagName: lirui_20161213205144_16c810ef-52f6-40c9-a346-7dbd2c9ef99e:2
  Vertices:
Map 6 
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: '2008-04-08' (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: '2008-04-08' (type: string)
sort order: +
Map-reduce partition columns: '2008-04-08' (type: 
string)
Statistics: Num rows: 1000 Data size: 10624 Basic 
stats: COMPLETE Column stats: NONE
Map 8 
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: ds (type: string)
outputColumnNames: _col0
Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: _col0 (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
  Spark Partition Pruning Sink Operator
partition key expr: ds
Statistics: Num rows: 2000 Data size: 21248 Basic 
stats: COMPLETE Column stats: NONE
target column name: ds
target work: Map 1
Reducer 7 
Reduce Operator Tree:
  Group By Operator
keys: '2008-04-08' (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
Column stats: NONE
Select Operator
  Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: '2008-04-08' (type: string)
outputColumnNames: _col0
Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: _col0 (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
  Spark Partition Pruning Sink Operator
partition key expr: ds
Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
target column name: ds
target work: Map 5

  Stage: Stage-1
Spark
  Edges:
Reducer 2 <- Map 1 (GROUP, 1)
Reducer 3 <- Map 5 (PARTITION-LEVEL SORT, 1), Reducer 2 
(PARTITION-LEVEL SORT, 1)
Reducer 4 <- Reducer 3 (GROUP, 1)
  DagName: lirui_20161213205144_16c810ef-52f6-40c9-a346-7dbd2c9ef99e:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: srcpart
  Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: '2008-04-08' (type: string)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num r

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Attachment: HIVE-15335.08.patch

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: Patch Available  (was: In Progress)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: In Progress  (was: Patch Available)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15347) LLAP: Executor memory and Xmx should have some headroom for other services

2016-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744882#comment-15744882
 ] 

Hive QA commented on HIVE-15347:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12842964/HIVE-15347.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10813 tests 
executed
*Failed tests:*
{noformat}
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_partitioned] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2558/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2558/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2558/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12842964 - PreCommit-HIVE-Build

> LLAP: Executor memory and Xmx should have some headroom for other services
> --
>
> Key: HIVE-15347
> URL: https://issues.apache.org/jira/browse/HIVE-15347
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-15347.1.patch
>
>
> If executor memory + cache memory is configured close or equal to Xmx, the 
> task attempts that is causing OOM can take down the LLAP daemon. Provide some 
> leeway for other services during memory crunch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14496) Enable Calcite rewriting with materialized views

2016-12-13 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744797#comment-15744797
 ] 

Jesus Camacho Rodriguez commented on HIVE-14496:


[~ashutoshc], fails are unrelated. Could you take a look? Thanks

> Enable Calcite rewriting with materialized views
> 
>
> Key: HIVE-14496
> URL: https://issues.apache.org/jira/browse/HIVE-14496
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14496.01.patch, HIVE-14496.02.patch, 
> HIVE-14496.03.patch, HIVE-14496.04.patch, HIVE-14496.05.patch, 
> HIVE-14496.07.patch, HIVE-14496.patch
>
>
> Calcite already supports query rewriting using materialized views. We will 
> use it to support this feature in Hive.
> In order to do that, we need to register the existing materialized views with 
> Calcite view service and enable the materialized views rewriting rules. 
> We should include a HiveConf flag to completely disable query rewriting using 
> materialized views if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >