[jira] [Commented] (HIVE-17000) Upgrade Hive to PARQUET 1.9.0

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069581#comment-16069581
 ] 

Hive QA commented on HIVE-17000:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875175/HIVE-17000.001.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10832 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=238)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_analyze] 
(batchId=22)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5846/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5846/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5846/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875175 - PreCommit-HIVE-Build

> Upgrade Hive to PARQUET 1.9.0
> -
>
> Key: HIVE-17000
> URL: https://issues.apache.org/jira/browse/HIVE-17000
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-17000.001.patch
>
>
> Parquet 1.9.0 is released and added many new features, such as PARQUET-601
> Add support in Parquet to configure the encoding used by ValueWriters
> We should upgrade Parquet dependence to 1.9.0 and bring these optimizations 
> to Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16981) hive.optimize.bucketingsorting should compare the schema before removing RS

2017-06-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069564#comment-16069564
 ] 

Pengcheng Xiong commented on HIVE-16981:


[~ashutoshc], all the tests sounds good to me. could u take a look? thx

> hive.optimize.bucketingsorting should compare the schema before removing RS
> ---
>
> Key: HIVE-16981
> URL: https://issues.apache.org/jira/browse/HIVE-16981
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16981.01.patch, HIVE-16981.02.patch
>
>
> on master, smb_mapjoin_20.q, run
> {code}
> select * from test_table3;
> {code}
> you will get
> {code}
> val_0  0   NULL1
> ...
> {code}
> The correct result is
> {code}
> val_0  0   val_01
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16972) FetchOperator: filter out inputSplits which length is zero

2017-06-29 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069525#comment-16069525
 ] 

Wei Zheng commented on HIVE-16972:
--

[~debugger87] patch 3 caused many test failures. Can you take a look and run 
the test again?

> FetchOperator: filter out inputSplits which length is zero
> --
>
> Key: HIVE-16972
> URL: https://issues.apache.org/jira/browse/HIVE-16972
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Attachments: HIVE-16972.2.patch, HIVE-16972.3.patch, HIVE-16972.patch
>
>
> * Background
>We can describe the basic work flow of  common HQL query as follows:
>   1. compile and execute
>   2. fetch results
>   In many cases, we don't need to  worry about the issues fetching results 
> from HDFS(iff there are mapreduce jobs generated in planning step). However, 
> the number of results files on HDFS and data distribution will affect the 
> final status of HQL query, especially for HiveServer2. We have some map-only 
> queries, e.g: 
> {code:sql}
> select * from myTable where date > '20170101' and date <= '20170301' and id = 
> 88;
> {code}
> This query will generate more than 20,000 files(look at screenshot image 
> uploaded) on HDFS and most of those files are empty. Of course, they are very 
> sparse. If we send TFetchResultsRequest from HiveServer2 client with  some 
> parameters(timeout:90s, maxRows:1024) , FetchOperator can not fetch 1024 rows 
> in 90 seconds and our HiveServer2 client will mark this TFetchResultsRequest 
> as timed out failure. Why? In fact, It's expensive to fetch results from 
> empty file. In our HDFS cluster( 5000+ DataNodes) , reading data from an 
> empty file will cost almost 100 ms (100ms * 1000 ==> 100s > 90s timeout). 
> Obviously, we can filter out those empty files or splits to speed up the 
> process of FetchResults. 
>   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069521#comment-16069521
 ] 

Hive QA commented on HIVE-16993:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875161/HIVE-16993.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5845/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5845/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5845/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-30 05:36:44.079
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5845/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-30 05:36:44.081
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 3fa4834 HIVE-16761 : LLAP IO: SMB joins fail elevator  (Sergey 
Shelukhin, reviewed by Jason Dere, Deepak Jaiswal)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 3fa4834 HIVE-16761 : LLAP IO: SMB joins fail elevator  (Sergey 
Shelukhin, reviewed by Jason Dere, Deepak Jaiswal)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-30 05:36:44.648
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file metastore/if/hive_metastore.thrift
patching file metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp
patching file metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h
patching file 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java
patching file metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb
patching file 
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStorePartitionSpecs.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/TestObjectStore.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseStore.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.17) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MColumnDescriptor
ENHANCE

[jira] [Commented] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069517#comment-16069517
 ] 

Hive QA commented on HIVE-16993:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875161/HIVE-16993.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5844/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5844/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5844/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-30 05:34:37.641
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5844/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-30 05:34:37.644
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 3fa4834 HIVE-16761 : LLAP IO: SMB joins fail elevator  (Sergey 
Shelukhin, reviewed by Jason Dere, Deepak Jaiswal)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 3fa4834 HIVE-16761 : LLAP IO: SMB joins fail elevator  (Sergey 
Shelukhin, reviewed by Jason Dere, Deepak Jaiswal)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-30 05:34:39.901
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file metastore/if/hive_metastore.thrift
patching file metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp
patching file metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h
patching file 
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java
patching file metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb
patching file 
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStorePartitionSpecs.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/TestObjectStore.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
patching file 
metastore/src/test/org/apache/hadoop/hive/metastore/hbase/TestHBaseStore.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.17) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MColumnDescriptor
ENHANCE

[jira] [Commented] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069505#comment-16069505
 ] 

Hive QA commented on HIVE-6990:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875152/HIVE-6990.06.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10832 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=221)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5843/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5843/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5843/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875152 - PreCommit-HIVE-Build

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6990.06.patch, HIVE-6990.1.patch, 
> HIVE-6990.2.patch, HIVE-6990.3.patch, HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflec

[jira] [Commented] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069463#comment-16069463
 ] 

Hive QA commented on HIVE-6990:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875152/HIVE-6990.06.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10832 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5842/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5842/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5842/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875152 - PreCommit-HIVE-Build

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6990.06.patch, HIVE-6990.1.patch, 
> HIVE-6990.2.patch, HIVE-6990.3.patch, HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>

[jira] [Commented] (HIVE-17000) Upgrade Hive to PARQUET 1.9.0

2017-06-29 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069426#comment-16069426
 ] 

Dapeng Sun commented on HIVE-17000:
---

Submit a simple patch to upgrade the Parquet dependence.

> Upgrade Hive to PARQUET 1.9.0
> -
>
> Key: HIVE-17000
> URL: https://issues.apache.org/jira/browse/HIVE-17000
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-17000.001.patch
>
>
> Parquet 1.9.0 is released and added many new features, such as PARQUET-601
> Add support in Parquet to configure the encoding used by ValueWriters
> We should upgrade Parquet dependence to 1.9.0 and bring these optimizations 
> to Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17000) Upgrade Hive to PARQUET 1.9.0

2017-06-29 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-17000:
--
Attachment: HIVE-17000.001.patch

> Upgrade Hive to PARQUET 1.9.0
> -
>
> Key: HIVE-17000
> URL: https://issues.apache.org/jira/browse/HIVE-17000
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-17000.001.patch
>
>
> Parquet 1.9.0 is released and added many new features, such as PARQUET-601
> Add support in Parquet to configure the encoding used by ValueWriters
> We should upgrade Parquet dependence to 1.9.0 and bring these optimizations 
> to Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17000) Upgrade Hive to PARQUET 1.9.0

2017-06-29 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-17000:
--
Status: Patch Available  (was: Open)

> Upgrade Hive to PARQUET 1.9.0
> -
>
> Key: HIVE-17000
> URL: https://issues.apache.org/jira/browse/HIVE-17000
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>
> Parquet 1.9.0 is released and added many new features, such as PARQUET-601
> Add support in Parquet to configure the encoding used by ValueWriters
> We should upgrade Parquet dependence to 1.9.0 and bring these optimizations 
> to Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16935) Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069421#comment-16069421
 ] 

Hive QA commented on HIVE-16935:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875147/HIVE-16935.2.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10834 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5841/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5841/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5841/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875147 - PreCommit-HIVE-Build

> Hive should strip comments from input before choosing which CommandProcessor 
> to run.
> 
>
> Key: HIVE-16935
> URL: https://issues.apache.org/jira/browse/HIVE-16935
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16935.1.patch, HIVE-16935.2.patch
>
>
> While using Beeswax, Hue fails to execute statement with following error:
> Error while compiling statement: FAILED: ParseException line 3:4 missing 
> KW_ROLE at 'a' near 'a' line 3:5 missing EOF at '=' near 'a'
> {quote}
> -- comment
> SET a=1;
> SELECT 1;
> {quote}
> The same code works in Beeline and in Impala.
> The same code fails in CliDriver 
>  
> h2. Background
> Hive deals with sql comments (“-- to end of line”) in different places.
> Some clients attempt to strip comments. For example BeeLine was recently 
> enhanced in https://issues.apache.org/jira/browse/HIVE-13864 to strip 
> comments from multi-line commands before they are executed.
> Other clients such as Hue or Jdbc do not strip comments before sending text.
> Some tests such as TestCliDriver strip comments before running tests.
> When Hive gets a command the CommandProcessorFactory looks at the text to 
> determine which CommandProcessor should handle the command. In the bug case 
> the correct CommandProcessor is SetProcessor, but the comments confuse the 
> CommandProcessorFactory and so the command is treated as sql. Hive’s sql 
> parser understands and ignores comments, but it does not understand the set 
> commands usually handled by SetProcessor and so we get the ParseException 
> shown above.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16972) FetchOperator: filter out inputSplits which length is zero

2017-06-29 Thread Chaozhong Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069419#comment-16069419
 ] 

Chaozhong Yang commented on HIVE-16972:
---

[~wzheng] I just create a review request on RB, can you review it please ?

> FetchOperator: filter out inputSplits which length is zero
> --
>
> Key: HIVE-16972
> URL: https://issues.apache.org/jira/browse/HIVE-16972
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Attachments: HIVE-16972.2.patch, HIVE-16972.3.patch, HIVE-16972.patch
>
>
> * Background
>We can describe the basic work flow of  common HQL query as follows:
>   1. compile and execute
>   2. fetch results
>   In many cases, we don't need to  worry about the issues fetching results 
> from HDFS(iff there are mapreduce jobs generated in planning step). However, 
> the number of results files on HDFS and data distribution will affect the 
> final status of HQL query, especially for HiveServer2. We have some map-only 
> queries, e.g: 
> {code:sql}
> select * from myTable where date > '20170101' and date <= '20170301' and id = 
> 88;
> {code}
> This query will generate more than 20,000 files(look at screenshot image 
> uploaded) on HDFS and most of those files are empty. Of course, they are very 
> sparse. If we send TFetchResultsRequest from HiveServer2 client with  some 
> parameters(timeout:90s, maxRows:1024) , FetchOperator can not fetch 1024 rows 
> in 90 seconds and our HiveServer2 client will mark this TFetchResultsRequest 
> as timed out failure. Why? In fact, It's expensive to fetch results from 
> empty file. In our HDFS cluster( 5000+ DataNodes) , reading data from an 
> empty file will cost almost 100 ms (100ms * 1000 ==> 100s > 90s timeout). 
> Obviously, we can filter out those empty files or splits to speed up the 
> process of FetchResults. 
>   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17000) Upgrade Hive to PARQUET 1.9.0

2017-06-29 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun reassigned HIVE-17000:
-


> Upgrade Hive to PARQUET 1.9.0
> -
>
> Key: HIVE-17000
> URL: https://issues.apache.org/jira/browse/HIVE-17000
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>
> Parquet 1.9.0 is released and added many new features, such as PARQUET-601
> Add support in Parquet to configure the encoding used by ValueWriters
> We should upgrade Parquet dependence to 1.9.0 and bring these optimizations 
> to Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16972) FetchOperator: filter out inputSplits which length is zero

2017-06-29 Thread Chaozhong Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaozhong Yang updated HIVE-16972:
--
External issue URL:   (was: https://reviews.apache.org/r/60554/)

> FetchOperator: filter out inputSplits which length is zero
> --
>
> Key: HIVE-16972
> URL: https://issues.apache.org/jira/browse/HIVE-16972
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Attachments: HIVE-16972.2.patch, HIVE-16972.3.patch, HIVE-16972.patch
>
>
> * Background
>We can describe the basic work flow of  common HQL query as follows:
>   1. compile and execute
>   2. fetch results
>   In many cases, we don't need to  worry about the issues fetching results 
> from HDFS(iff there are mapreduce jobs generated in planning step). However, 
> the number of results files on HDFS and data distribution will affect the 
> final status of HQL query, especially for HiveServer2. We have some map-only 
> queries, e.g: 
> {code:sql}
> select * from myTable where date > '20170101' and date <= '20170301' and id = 
> 88;
> {code}
> This query will generate more than 20,000 files(look at screenshot image 
> uploaded) on HDFS and most of those files are empty. Of course, they are very 
> sparse. If we send TFetchResultsRequest from HiveServer2 client with  some 
> parameters(timeout:90s, maxRows:1024) , FetchOperator can not fetch 1024 rows 
> in 90 seconds and our HiveServer2 client will mark this TFetchResultsRequest 
> as timed out failure. Why? In fact, It's expensive to fetch results from 
> empty file. In our HDFS cluster( 5000+ DataNodes) , reading data from an 
> empty file will cost almost 100 ms (100ms * 1000 ==> 100s > 90s timeout). 
> Obviously, we can filter out those empty files or splits to speed up the 
> process of FetchResults. 
>   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16972) FetchOperator: filter out inputSplits which length is zero

2017-06-29 Thread Chaozhong Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaozhong Yang updated HIVE-16972:
--
External issue URL: https://reviews.apache.org/r/60554/

> FetchOperator: filter out inputSplits which length is zero
> --
>
> Key: HIVE-16972
> URL: https://issues.apache.org/jira/browse/HIVE-16972
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Attachments: HIVE-16972.2.patch, HIVE-16972.3.patch, HIVE-16972.patch
>
>
> * Background
>We can describe the basic work flow of  common HQL query as follows:
>   1. compile and execute
>   2. fetch results
>   In many cases, we don't need to  worry about the issues fetching results 
> from HDFS(iff there are mapreduce jobs generated in planning step). However, 
> the number of results files on HDFS and data distribution will affect the 
> final status of HQL query, especially for HiveServer2. We have some map-only 
> queries, e.g: 
> {code:sql}
> select * from myTable where date > '20170101' and date <= '20170301' and id = 
> 88;
> {code}
> This query will generate more than 20,000 files(look at screenshot image 
> uploaded) on HDFS and most of those files are empty. Of course, they are very 
> sparse. If we send TFetchResultsRequest from HiveServer2 client with  some 
> parameters(timeout:90s, maxRows:1024) , FetchOperator can not fetch 1024 rows 
> in 90 seconds and our HiveServer2 client will mark this TFetchResultsRequest 
> as timed out failure. Why? In fact, It's expensive to fetch results from 
> empty file. In our HDFS cluster( 5000+ DataNodes) , reading data from an 
> empty file will cost almost 100 ms (100ms * 1000 ==> 100s > 90s timeout). 
> Obviously, we can filter out those empty files or splits to speed up the 
> process of FetchResults. 
>   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16981) hive.optimize.bucketingsorting should compare the schema before removing RS

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069381#comment-16069381
 ] 

Hive QA commented on HIVE-16981:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875145/HIVE-16981.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10832 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5840/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5840/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5840/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875145 - PreCommit-HIVE-Build

> hive.optimize.bucketingsorting should compare the schema before removing RS
> ---
>
> Key: HIVE-16981
> URL: https://issues.apache.org/jira/browse/HIVE-16981
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16981.01.patch, HIVE-16981.02.patch
>
>
> on master, smb_mapjoin_20.q, run
> {code}
> select * from test_table3;
> {code}
> you will get
> {code}
> val_0  0   NULL1
> ...
> {code}
> The correct result is
> {code}
> val_0  0   val_01
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16961) Hive on Spark leaks spark application in case user cancels query and closes session

2017-06-29 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069345#comment-16069345
 ] 

Rui Li commented on HIVE-16961:
---

+1

> Hive on Spark leaks spark application in case user cancels query and closes 
> session
> ---
>
> Key: HIVE-16961
> URL: https://issues.apache.org/jira/browse/HIVE-16961
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-16961.patch, HIVE-16961.patch
>
>
> It's found that a Spark application is leaked when user cancels query and 
> closes the session while Hive is waiting for remote driver to connect back. 
> This is found for asynchronous query execution, but seemingly equally 
> applicable for synchronous submission when session is abruptly closed. The 
> leaked Spark application that runs Spark driver connects back to Hive 
> successfully and run for ever (until HS2 restarts), but receives no job 
> submission because the session is already closed. Ideally, Hive should 
> rejects the connection from the driver so the driver will exist.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069329#comment-16069329
 ] 

Hive QA commented on HIVE-6990:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875152/HIVE-6990.06.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10832 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.minorPartitionWithBase 
(batchId=258)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5839/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5839/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5839/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875152 - PreCommit-HIVE-Build

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6990.06.patch, HIVE-6990.1.patch, 
> HIVE-6990.2.patch, HIVE-6990.3.patch, HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodA

[jira] [Updated] (HIVE-16999) Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Summary: Performance bottleneck in the ADD FILE/ARCHIVE commands for an 
HDFS resource  (was: Performance bottleneck in the ADD FILE/ARCHIVE commands 
when resource is on HDFS)

> Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource
> 
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[which is lying on HDFS] to 
> the distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs://some_dir/file.txt"
> {code}
> Here is the log corresponding to the archive adding operation:-
> {noformat}
>  converting to local hdfs://some_dir/archive.tar
>  Added resources: [hdfs://some_dir/archive.tar
> {noformat}
> Hive is downloading the resource to the local filesystem [shown in log by 
> "converting to local"]. 
> {color:#d04437}Ideally there is no need to bring the file to the local 
> filesystem when this operation is all about copying the file from one 
> location on HDFS to other location on HDFS[distributed cache].{color}
> This adds lot of performance bottleneck when the the resource is a big file 
> and all commands need the same resource.
> After debugging around the impacted piece of code is found to be :-
> {code:java}
> public List add_resources(ResourceType t, Collection values, 
> boolean convertToUnix)
>   throws RuntimeException {
> Set resourceSet = resourceMaps.getResourceSet(t);
> Map> resourcePathMap = 
> resourceMaps.getResourcePathMap(t);
> Map> reverseResourcePathMap = 
> resourceMaps.getReverseResourcePathMap(t);
> List localized = new ArrayList();
> try {
>   for (String value : values) {
> String key;
>  {color:#d04437}//get the local path of downloaded jars{color}
> List downloadedURLs = resolveAndDownload(t, value, 
> convertToUnix);
>  ;
>   .
> {code}
> {code:java}
>   List resolveAndDownload(ResourceType t, String value, boolean 
> convertToUnix) throws URISyntaxException,
>   IOException {
> URI uri = createURI(value);
> if (getURLType(value).equals("file")) {
>   return Arrays.asList(uri);
> } else if (getURLType(value).equals("ivy")) {
>   return dependencyResolver.downloadDependencies(uri);
> } else { // goes here for HDFS
>   return Arrays.asList(createURI(downloadResource(value, 
> convertToUnix))); // Here when the resource is not local it will download it 
> to the local machine.
> }
>   }
> {code}
> Here, the function resolveAndDownload() always calls the downloadResource() 
> api in case of external filesystem. It should take into consideration the 
> fact that - when the resource is on same HDFS then bringing it on local 
> machine is not a needed step and can be skipped for better performance.
> Thanks,
> Sailee



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16999) Performance bottleneck in the ADD FILE/ARCHIVE commands when resource is on HDFS

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Summary: Performance bottleneck in the ADD FILE/ARCHIVE commands when 
resource is on HDFS  (was: Performance bottleneck in the ADD FILE/ARCHIVE 
commands)

> Performance bottleneck in the ADD FILE/ARCHIVE commands when resource is on 
> HDFS
> 
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[which is lying on HDFS] to 
> the distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs://some_dir/file.txt"
> {code}
> Here is the log corresponding to the archive adding operation:-
> {noformat}
>  converting to local hdfs://some_dir/archive.tar
>  Added resources: [hdfs://some_dir/archive.tar
> {noformat}
> Hive is downloading the resource to the local filesystem [shown in log by 
> "converting to local"]. 
> {color:#d04437}Ideally there is no need to bring the file to the local 
> filesystem when this operation is all about copying the file from one 
> location on HDFS to other location on HDFS[distributed cache].{color}
> This adds lot of performance bottleneck when the the resource is a big file 
> and all commands need the same resource.
> After debugging around the impacted piece of code is found to be :-
> {code:java}
> public List add_resources(ResourceType t, Collection values, 
> boolean convertToUnix)
>   throws RuntimeException {
> Set resourceSet = resourceMaps.getResourceSet(t);
> Map> resourcePathMap = 
> resourceMaps.getResourcePathMap(t);
> Map> reverseResourcePathMap = 
> resourceMaps.getReverseResourcePathMap(t);
> List localized = new ArrayList();
> try {
>   for (String value : values) {
> String key;
>  {color:#d04437}//get the local path of downloaded jars{color}
> List downloadedURLs = resolveAndDownload(t, value, 
> convertToUnix);
>  ;
>   .
> {code}
> {code:java}
>   List resolveAndDownload(ResourceType t, String value, boolean 
> convertToUnix) throws URISyntaxException,
>   IOException {
> URI uri = createURI(value);
> if (getURLType(value).equals("file")) {
>   return Arrays.asList(uri);
> } else if (getURLType(value).equals("ivy")) {
>   return dependencyResolver.downloadDependencies(uri);
> } else { // goes here for HDFS
>   return Arrays.asList(createURI(downloadResource(value, 
> convertToUnix))); // Here when the resource is not local it will download it 
> to the local machine.
> }
>   }
> {code}
> Here, the function resolveAndDownload() always calls the downloadResource() 
> api in case of external filesystem. It should take into consideration the 
> fact that - when the resource is on same HDFS then bringing it on local 
> machine is not a needed step and can be skipped for better performance.
> Thanks,
> Sailee



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16999) Performance bottleneck in the ADD FILE/ARCHIVE commands

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Summary: Performance bottleneck in the ADD FILE/ARCHIVE commands  (was: 
Performance bottleneck in the add_resource() api)

> Performance bottleneck in the ADD FILE/ARCHIVE commands
> ---
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[which is lying on HDFS] to 
> the distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs://some_dir/file.txt"
> {code}
> Here is the log corresponding to the archive adding operation:-
> {noformat}
>  converting to local hdfs://some_dir/archive.tar
>  Added resources: [hdfs://some_dir/archive.tar
> {noformat}
> Hive is downloading the resource to the local filesystem [shown in log by 
> "converting to local"]. 
> {color:#d04437}Ideally there is no need to bring the file to the local 
> filesystem when this operation is all about copying the file from one 
> location on HDFS to other location on HDFS[distributed cache].{color}
> This adds lot of performance bottleneck when the the resource is a big file 
> and all commands need the same resource.
> After debugging around the impacted piece of code is found to be :-
> {code:java}
> public List add_resources(ResourceType t, Collection values, 
> boolean convertToUnix)
>   throws RuntimeException {
> Set resourceSet = resourceMaps.getResourceSet(t);
> Map> resourcePathMap = 
> resourceMaps.getResourcePathMap(t);
> Map> reverseResourcePathMap = 
> resourceMaps.getReverseResourcePathMap(t);
> List localized = new ArrayList();
> try {
>   for (String value : values) {
> String key;
>  {color:#d04437}//get the local path of downloaded jars{color}
> List downloadedURLs = resolveAndDownload(t, value, 
> convertToUnix);
>  ;
>   .
> {code}
> {code:java}
>   List resolveAndDownload(ResourceType t, String value, boolean 
> convertToUnix) throws URISyntaxException,
>   IOException {
> URI uri = createURI(value);
> if (getURLType(value).equals("file")) {
>   return Arrays.asList(uri);
> } else if (getURLType(value).equals("ivy")) {
>   return dependencyResolver.downloadDependencies(uri);
> } else { // goes here for HDFS
>   return Arrays.asList(createURI(downloadResource(value, 
> convertToUnix))); // Here when the resource is not local it will download it 
> to the local machine.
> }
>   }
> {code}
> Here, the function resolveAndDownload() always calls the downloadResource() 
> api in case of external filesystem. It should take into consideration the 
> fact that - when the resource is on same HDFS then bringing it on local 
> machine is not a needed step and can be skipped for better performance.
> Thanks,
> Sailee



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource() api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Summary: Performance bottleneck in the add_resource() api  (was: 
Performance bottleneck in the add_resource api)

> Performance bottleneck in the add_resource() api
> 
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[which is lying on HDFS] to 
> the distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs://some_dir/file.txt"
> {code}
> Here is the log corresponding to the archive adding operation:-
> {noformat}
>  converting to local hdfs://some_dir/archive.tar
>  Added resources: [hdfs://some_dir/archive.tar
> {noformat}
> Hive is downloading the resource to the local filesystem [shown in log by 
> "converting to local"]. 
> {color:#d04437}Ideally there is no need to bring the file to the local 
> filesystem when this operation is all about copying the file from one 
> location on HDFS to other location on HDFS[distributed cache].{color}
> This adds lot of performance bottleneck when the the resource is a big file 
> and all commands need the same resource.
> After debugging around the impacted piece of code is found to be :-
> {code:java}
> public List add_resources(ResourceType t, Collection values, 
> boolean convertToUnix)
>   throws RuntimeException {
> Set resourceSet = resourceMaps.getResourceSet(t);
> Map> resourcePathMap = 
> resourceMaps.getResourcePathMap(t);
> Map> reverseResourcePathMap = 
> resourceMaps.getReverseResourcePathMap(t);
> List localized = new ArrayList();
> try {
>   for (String value : values) {
> String key;
>  {color:#d04437}//get the local path of downloaded jars{color}
> List downloadedURLs = resolveAndDownload(t, value, 
> convertToUnix);
>  ;
>   .
> {code}
> {code:java}
>   List resolveAndDownload(ResourceType t, String value, boolean 
> convertToUnix) throws URISyntaxException,
>   IOException {
> URI uri = createURI(value);
> if (getURLType(value).equals("file")) {
>   return Arrays.asList(uri);
> } else if (getURLType(value).equals("ivy")) {
>   return dependencyResolver.downloadDependencies(uri);
> } else { // goes here for HDFS
>   return Arrays.asList(createURI(downloadResource(value, 
> convertToUnix))); // Here when the resource is not local it will download it 
> to the local machine.
> }
>   }
> {code}
> Here, the function resolveAndDownload() always calls the downloadResource() 
> api in case of external filesystem. It should take into consideration the 
> fact that - when the resource is on same HDFS then bringing it on local 
> machine is not a needed step and can be skipped for better performance.
> Thanks,
> Sailee



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[which is lying on HDFS] to 
the distributed cache. 
Commands used are :-

{code:java}
1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
2. ADD FILE "hdfs://some_dir/file.txt"
{code}

Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
{color:#d04437}Ideally there is no need to bring the file to the local 
filesystem when this operation is all about copying the file from one location 
on HDFS to other location on HDFS[distributed cache].{color}
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{code:java}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{code}


{code:java}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{code}

Here, the function resolveAndDownload() always calls the downloadResource() api 
in case of external filesystem. It should take into consideration the fact that 
- when the resource is on same HDFS then bringing it on local machine is not a 
needed step and can be skipped for better performance.



Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[which is lying on HDFS] to 
the distributed cache. 
Commands used are :-

{code:java}
1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
2. ADD FILE "hdfs://some_dir/file.txt"
{code}

Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
{color:#d04437}Ideally there is no need to bring the file to the local 
filesystem when this operation is all about copying the file from one location 
on HDFS to other location on HDFS[distributed cache].{color}
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{code:java}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{code}


{code:java}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{code}





Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
> 

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[which is lying on HDFS] to 
the distributed cache. 
Commands used are :-

{code:java}
1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
2. ADD FILE "hdfs://some_dir/file.txt"
{code}

Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
{color:#d04437}Ideally there is no need to bring the file to the local 
filesystem when this operation is all about copying the file from one location 
on HDFS to other location on HDFS[distributed cache].{color}
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{code:java}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{code}


{code:java}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{code}





Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-

{code:java}
1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
2. ADD FILE "hdfs://some_dir/file.txt"
{code}

Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
{color:#d04437}Ideally there is no need to bring the file to the local 
filesystem when this operation is all about copying the file from one location 
on HDFS to other location on HDFS[distributed cache].{color}
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{code:java}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{code}


{code:java}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{code}





Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[which is lying on HDFS] to 
> the distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-

{code:java}
1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
2. ADD FILE "hdfs://some_dir/file.txt"
{code}

Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
{color:#d04437}Ideally there is no need to bring the file to the local 
filesystem when this operation is all about copying the file from one location 
on HDFS to other location on HDFS[distributed cache].{color}
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{code:java}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{code}


{code:java}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{code}





Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-

{code:java}
1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
2. ADD FILE "hdfs://some_dir/file.txt"
{code}

Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-





{code:java}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{code}


{code:java}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{code}





Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[lying on hdfs] to the 
> distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs://some_dir/file.txt"
> {code}
> Here

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-

{code:java}
1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
2. ADD FILE "hdfs://some_dir/file.txt"
{code}

Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-





{code:java}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{code}


{code:java}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{code}





Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-

{code:java}
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
{code}

Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[lying on hdfs] to the 
> distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs://some_dir

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[lying on hdfs] to the 
> distributed cache. 
> Commands used are :-
> 1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
> 2. ADD FILE "{color:#d04437}hdfs{col

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-

{code:java}
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
{code}

Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-

{noformat}
 converting to local hdfs://some_dir/archive.tar
 Added resources: [hdfs://some_dir/archive.tar
{noformat}


Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[lying on hdfs] to the 
> distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_d

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List {color:#d04437}add_resources{color}(ResourceType t, 
Collection values, boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[lying on hdfs] to the 
> distributed cache. 
> Commands used are :-
> 1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
> 2. ADD FILE "{color:#d04437}hdfs{

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List {color:#d04437}add_resources{color}(ResourceType t, 
Collection values, boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
 {color:#d04437}//get the local path of downloaded jars{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
{color:#d04437}//get the local path of downloaded jars.{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[lying on hdfs] to the 
> distributed cache. 
> Commands used are :-
> 1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
> 2. ADD FILE "{color:#d04437}hdfs{

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-

public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
{color:#d04437}//get the local path of downloaded jars.{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.

{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-

public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
{color:#d04437}//get the local path of downloaded jars.{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
bq.   List {color:#d04437}resolveAndDownload{color}(ResourceType t, String 
value, boolean convertToUnix) throws URISyntaxException,
bq.   IOException {
bq. URI uri = createURI(value);
bq. if (getURLType(value).equals("file")) {
bq.   return Arrays.asList(uri);
bq. } else if (getURLType(value).equals("ivy")) {
bq.   return dependencyResolver.downloadDependencies(uri);
bq. } else {{color:#d04437} // goes here for HDFS{color}
bq.   {color:#d04437}return Arrays.asList(createURI(downloadResource(value, 
convertToUnix)));{color} 
bq. }
bq.   }

Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[lying on hdfs] to the 
> distributed cache. 
> Commands used are :-
> 1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
> 2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
> Here is the log corresponding to th

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-


{noformat}
public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
{color:#d04437}//get the local path of downloaded jars.{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
{noformat}


{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-

public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
{color:#d04437}//get the local path of downloaded jars.{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.

{noformat}
  List resolveAndDownload(ResourceType t, String value, boolean 
convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else { // goes here for HDFS
  return Arrays.asList(createURI(downloadResource(value, convertToUnix))); 
// Here when the resource is not local it will download it to the local machine.
}
  }
{noformat}



Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[lying on hdfs] to the 
> distributed cache. 
> Commands used are :-
> 1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
> 2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
> Here is the log 

[jira] [Updated] (HIVE-16999) Performance bottleneck in the add_resource api

2017-06-29 Thread Sailee Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailee Jain updated HIVE-16999:
---
Description: 
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-

public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
{color:#d04437}//get the local path of downloaded jars.{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.
bq.   List {color:#d04437}resolveAndDownload{color}(ResourceType t, String 
value, boolean convertToUnix) throws URISyntaxException,
bq.   IOException {
bq. URI uri = createURI(value);
bq. if (getURLType(value).equals("file")) {
bq.   return Arrays.asList(uri);
bq. } else if (getURLType(value).equals("ivy")) {
bq.   return dependencyResolver.downloadDependencies(uri);
bq. } else {{color:#d04437} // goes here for HDFS{color}
bq.   {color:#d04437}return Arrays.asList(createURI(downloadResource(value, 
convertToUnix)));{color} 
bq. }
bq.   }

Thanks,
Sailee

  was:
Performance bottleneck is found in adding resource[lying on hdfs] to the 
distributed cache. 
Commands used are :-
{{1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"}}
Here is the log corresponding to the archive adding operation:-
=> converting to local hdfs://some_dir/archive.tar
=> Added resources: [hdfs://some_dir/archive.tar]

Hive is downloading the resource to the local filesystem [shown in log by 
"converting to local"]. 
Ideally there is no need to bring the file to the local filesystem when this 
operation is all about copying the file from one location on HDFS to other 
location on HDFS[distributed cache].
This adds lot of performance bottleneck when the the resource is a big file and 
all commands need the same resource.
After debugging around the impacted piece of code is found to be :-

{{public List add_resources(ResourceType t, Collection values, 
boolean convertToUnix)
  throws RuntimeException {
Set resourceSet = resourceMaps.getResourceSet(t);
Map> resourcePathMap = 
resourceMaps.getResourcePathMap(t);
Map> reverseResourcePathMap = 
resourceMaps.getReverseResourcePathMap(t);
List localized = new ArrayList();
try {
  for (String value : values) {
String key;
{color:#d04437}//get the local path of downloaded jars.{color}
List downloadedURLs = resolveAndDownload(t, value, convertToUnix);
 ;
.}}
{{  List {color:#d04437}resolveAndDownload{color}(ResourceType t, String 
value, boolean convertToUnix) throws URISyntaxException,
  IOException {
URI uri = createURI(value);
if (getURLType(value).equals("file")) {
  return Arrays.asList(uri);
} else if (getURLType(value).equals("ivy")) {
  return dependencyResolver.downloadDependencies(uri);
} else {{color:#d04437} // goes here for HDFS{color}
  {color:#d04437}return Arrays.asList(createURI(downloadResource(value, 
convertToUnix)));{color} 
}
  }}}

Thanks,
Sailee


> Performance bottleneck in the add_resource api
> --
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Priority: Critical
>
> Performance bottleneck is found in adding resource[lying on hdfs] to the 
> distributed cache. 
> Commands used are :-
> 1. ADD ARCHIVE "{color:#d04437}hdfs{color}://some_dir/archive.tar"
> 2. ADD FILE "{color:#d04437}hdfs{color}://some_dir/file.txt"
> Here is the log corresponding to the archive adding operation:

[jira] [Assigned] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-06-29 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16998:
---


> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069276#comment-16069276
 ] 

Hive QA commented on HIVE-16973:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875119/HIVE-16973.001.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5838/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5838/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5838/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-30 00:46:41.249
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5838/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-30 00:46:41.252
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 3fa4834 HIVE-16761 : LLAP IO: SMB joins fail elevator  (Sergey 
Shelukhin, reviewed by Jason Dere, Deepak Jaiswal)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 3fa4834 HIVE-16761 : LLAP IO: SMB joins fail elevator  (Sergey 
Shelukhin, reviewed by Jason Dere, Deepak Jaiswal)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-30 00:46:41.798
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloConnectionParameters.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/HiveAccumuloHelper.java:
 No such file or directory
error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java: 
No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableInputFormat.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/CompositeAccumuloRowIdFactory.java:
 No such file or directory
error: 
a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/DefaultAccumuloRowIdFactory.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloStorageHandler.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestHiveAccumuloHelper.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableInputFormat.java:
 No such file or directory
error: 
a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableOutputFormat.java:
 No such file or directory
error: a/pom.xml: No such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875119 - PreCommit-HIVE-Build

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>

[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069271#comment-16069271
 ] 

Hive QA commented on HIVE-15665:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875115/HIVE-15665.05.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5837/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5837/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5837/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-06-30 00:35:35.832
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5837/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-30 00:35:35.834
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 3fa4834 HIVE-16761 : LLAP IO: SMB joins fail elevator  (Sergey 
Shelukhin, reviewed by Jason Dere, Deepak Jaiswal)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 3fa4834 HIVE-16761 : LLAP IO: SMB joins fail elevator  (Sergey 
Shelukhin, reviewed by Jason Dere, Deepak Jaiswal)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-30 00:35:42.465
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p0
patching file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/EvictionDispatcher.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCacheImpl.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileMetadata.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcMetadataCache.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
patching file 
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestIncrementalObjectSizeEstimator.java
patching file 
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestOrcMetadataCache.java
patching file ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java
patching file ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/Reader.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/ReaderImpl.java
patching file ql/src/test/results/clientpositive/llap/orc_llap_counters.q.out
patching file ql/src/test/results/clientpositive/llap/orc_llap_counters1.q.out
patching file ql/src/test/results/clientpositive/llap/orc_ppd_basic.q.out
patching file 
ql/src/test/results/clientpositive/llap/orc_ppd_schema_evol_3a.q.out
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/pars

[jira] [Commented] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069256#comment-16069256
 ] 

Hive QA commented on HIVE-16993:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875114/HIVE-16993.0-master.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10832 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5836/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5836/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5836/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875114 - PreCommit-HIVE-Build

> ThriftHiveMetastore.create_database can fail if the locationUri is not set
> --
>
> Key: HIVE-16993
> URL: https://issues.apache.org/jira/browse/HIVE-16993
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-16993.0-master.patch, HIVE-16993.1-master.patch, 
> HIVE-16993.2.patch, HIVE-16993.3.patch
>
>
> Calling 
> [{{ThriftHiveMetastore.create_database}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L1078]
>  with a database with an unset {{locationUri}} field through the C++ 
> implementation fails with:
> {code}
> MetaException(message=java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string)
> {code}
> The 
> [{{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L270]
>  Thrift field is 'default requiredness (implicit)', and Thrift [does not 
> specify|https://thrift.apache.org/docs/idl#default-requiredness-implicit] 
> whether unset default requiredness fields are encoded.  Empirically, the Java 
> generated code [does not write the 
> {{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java#L938-L942]
>  when the field is unset, while the C++ generated code 
> [does|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp#L3888-L3890].
> The MetaStore treats the field as optional, and [fills in a default 
> value|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L867-L871]
>  if the field is unset.
> The end result is that when the C++ implementation sends a {{Database}} 
> without the field set, it actually writes an empty string, and the MetaStore 
> treats it as a set field (non-null), and then calls a {{Path}} API which 
> rejects the empty string.  The fix is simple: make the {{locationUri}} field 
> optional in metastore.thrift.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16997) Extend object store to store bit vectors

2017-06-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16997:
--


> Extend object store to store bit vectors
> 
>
> Key: HIVE-16997
> URL: https://issues.apache.org/jira/browse/HIVE-16997
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Dan Burkert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert updated HIVE-16993:
---
Attachment: HIVE-16993.3.patch

R3: fix a small typo in the test fixup.

> ThriftHiveMetastore.create_database can fail if the locationUri is not set
> --
>
> Key: HIVE-16993
> URL: https://issues.apache.org/jira/browse/HIVE-16993
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-16993.0-master.patch, HIVE-16993.1-master.patch, 
> HIVE-16993.2.patch, HIVE-16993.3.patch
>
>
> Calling 
> [{{ThriftHiveMetastore.create_database}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L1078]
>  with a database with an unset {{locationUri}} field through the C++ 
> implementation fails with:
> {code}
> MetaException(message=java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string)
> {code}
> The 
> [{{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L270]
>  Thrift field is 'default requiredness (implicit)', and Thrift [does not 
> specify|https://thrift.apache.org/docs/idl#default-requiredness-implicit] 
> whether unset default requiredness fields are encoded.  Empirically, the Java 
> generated code [does not write the 
> {{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java#L938-L942]
>  when the field is unset, while the C++ generated code 
> [does|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp#L3888-L3890].
> The MetaStore treats the field as optional, and [fills in a default 
> value|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L867-L871]
>  if the field is unset.
> The end result is that when the C++ implementation sends a {{Database}} 
> without the field set, it actually writes an empty string, and the MetaStore 
> treats it as a set field (non-null), and then calls a {{Path}} API which 
> rejects the empty string.  The fix is simple: make the {{locationUri}} field 
> optional in metastore.thrift.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16995) Merge NDV across partitions using bit vectors

2017-06-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16995:
--


> Merge NDV across partitions using bit vectors
> -
>
> Key: HIVE-16995
> URL: https://issues.apache.org/jira/browse/HIVE-16995
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-06-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16996:
--


> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14988) Support INSERT OVERWRITE into a partition on transactional tables

2017-06-29 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069233#comment-16069233
 ] 

Wei Zheng commented on HIVE-14988:
--

The 4 failures with age==1 are not relevant
 org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23]
 org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14]
 
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]

[~ekoifman] Can you take another look please?

> Support INSERT OVERWRITE into a partition on transactional tables
> -
>
> Key: HIVE-14988
> URL: https://issues.apache.org/jira/browse/HIVE-14988
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-14988.01.patch, HIVE-14988.02.patch, 
> HIVE-14988.03.patch, HIVE-14988.04.patch, HIVE-14988.05.patch, 
> HIVE-14988.06.patch
>
>
> Insert overwrite operation on transactional table will currently raise an 
> error.
> This can/should be supported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Dan Burkert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert updated HIVE-16993:
---
Attachment: HIVE-16993.2.patch

Same as patch rev. 1, but with a fixed name.

> ThriftHiveMetastore.create_database can fail if the locationUri is not set
> --
>
> Key: HIVE-16993
> URL: https://issues.apache.org/jira/browse/HIVE-16993
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-16993.0-master.patch, HIVE-16993.1-master.patch, 
> HIVE-16993.2.patch
>
>
> Calling 
> [{{ThriftHiveMetastore.create_database}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L1078]
>  with a database with an unset {{locationUri}} field through the C++ 
> implementation fails with:
> {code}
> MetaException(message=java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string)
> {code}
> The 
> [{{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L270]
>  Thrift field is 'default requiredness (implicit)', and Thrift [does not 
> specify|https://thrift.apache.org/docs/idl#default-requiredness-implicit] 
> whether unset default requiredness fields are encoded.  Empirically, the Java 
> generated code [does not write the 
> {{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java#L938-L942]
>  when the field is unset, while the C++ generated code 
> [does|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp#L3888-L3890].
> The MetaStore treats the field as optional, and [fills in a default 
> value|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L867-L871]
>  if the field is unset.
> The end result is that when the C++ implementation sends a {{Database}} 
> without the field set, it actually writes an empty string, and the MetaStore 
> treats it as a set field (non-null), and then calls a {{Path}} API which 
> rejects the empty string.  The fix is simple: make the {{locationUri}} field 
> optional in metastore.thrift.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Dan Burkert (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069221#comment-16069221
 ] 

Dan Burkert commented on HIVE-16993:


Added patch revision 1 with refreshed generated thrift sources, and a few 
updates to the tests to make it compatible with the newly generated ctor 
signature for {{Database}}.

> ThriftHiveMetastore.create_database can fail if the locationUri is not set
> --
>
> Key: HIVE-16993
> URL: https://issues.apache.org/jira/browse/HIVE-16993
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-16993.0-master.patch, HIVE-16993.1-master.patch
>
>
> Calling 
> [{{ThriftHiveMetastore.create_database}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L1078]
>  with a database with an unset {{locationUri}} field through the C++ 
> implementation fails with:
> {code}
> MetaException(message=java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string)
> {code}
> The 
> [{{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L270]
>  Thrift field is 'default requiredness (implicit)', and Thrift [does not 
> specify|https://thrift.apache.org/docs/idl#default-requiredness-implicit] 
> whether unset default requiredness fields are encoded.  Empirically, the Java 
> generated code [does not write the 
> {{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java#L938-L942]
>  when the field is unset, while the C++ generated code 
> [does|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp#L3888-L3890].
> The MetaStore treats the field as optional, and [fills in a default 
> value|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L867-L871]
>  if the field is unset.
> The end result is that when the C++ implementation sends a {{Database}} 
> without the field set, it actually writes an empty string, and the MetaStore 
> treats it as a set field (non-null), and then calls a {{Path}} API which 
> rejects the empty string.  The fix is simple: make the {{locationUri}} field 
> optional in metastore.thrift.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Dan Burkert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert updated HIVE-16993:
---
Attachment: HIVE-16993.1-master.patch

> ThriftHiveMetastore.create_database can fail if the locationUri is not set
> --
>
> Key: HIVE-16993
> URL: https://issues.apache.org/jira/browse/HIVE-16993
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-16993.0-master.patch, HIVE-16993.1-master.patch
>
>
> Calling 
> [{{ThriftHiveMetastore.create_database}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L1078]
>  with a database with an unset {{locationUri}} field through the C++ 
> implementation fails with:
> {code}
> MetaException(message=java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string)
> {code}
> The 
> [{{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L270]
>  Thrift field is 'default requiredness (implicit)', and Thrift [does not 
> specify|https://thrift.apache.org/docs/idl#default-requiredness-implicit] 
> whether unset default requiredness fields are encoded.  Empirically, the Java 
> generated code [does not write the 
> {{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java#L938-L942]
>  when the field is unset, while the C++ generated code 
> [does|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp#L3888-L3890].
> The MetaStore treats the field as optional, and [fills in a default 
> value|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L867-L871]
>  if the field is unset.
> The end result is that when the C++ implementation sends a {{Database}} 
> without the field set, it actually writes an empty string, and the MetaStore 
> treats it as a set field (non-null), and then calls a {{Path}} API which 
> rejects the empty string.  The fix is simple: make the {{locationUri}} field 
> optional in metastore.thrift.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6990:
---
Attachment: (was: HIVE-6990.06.patch)

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6990.06.patch, HIVE-6990.1.patch, 
> HIVE-6990.2.patch, HIVE-6990.3.patch, HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source)
> Reproduce steps:
> 1. set the following properties in hive-site.xml
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   user1
>  
> 2. execute hive queries
> hive> create table mytbl ( key int, value string);
> hive> load data local inpath 'examples/files/kv1.txt' overwrite into table 
> mytbl;
> hive> select * from mytbl;
> hive> create view myview partitioned on (value) as select key, value from 
> mytbl where key=98;
> hive> alter view myview add partition (value='val_98') partition 
> (value='val_xyz');
> hive> alter view myview drop partition (value='val_xyz');



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6990:
---
Attachment: HIVE-6990.06.patch

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6990.06.patch, HIVE-6990.1.patch, 
> HIVE-6990.2.patch, HIVE-6990.3.patch, HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source)
> Reproduce steps:
> 1. set the following properties in hive-site.xml
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   user1
>  
> 2. execute hive queries
> hive> create table mytbl ( key int, value string);
> hive> load data local inpath 'examples/files/kv1.txt' overwrite into table 
> mytbl;
> hive> select * from mytbl;
> hive> create view myview partitioned on (value) as select key, value from 
> mytbl where key=98;
> hive> alter view myview add partition (value='val_98') partition 
> (value='val_xyz');
> hive> alter view myview drop partition (value='val_xyz');



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6990:
---
Attachment: (was: HIVE-6990.06.patch)

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6990.06.patch, HIVE-6990.1.patch, 
> HIVE-6990.2.patch, HIVE-6990.3.patch, HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source)
> Reproduce steps:
> 1. set the following properties in hive-site.xml
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   user1
>  
> 2. execute hive queries
> hive> create table mytbl ( key int, value string);
> hive> load data local inpath 'examples/files/kv1.txt' overwrite into table 
> mytbl;
> hive> select * from mytbl;
> hive> create view myview partitioned on (value) as select key, value from 
> mytbl where key=98;
> hive> alter view myview add partition (value='val_98') partition 
> (value='val_xyz');
> hive> alter view myview drop partition (value='val_xyz');



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069180#comment-16069180
 ] 

Hive QA commented on HIVE-16832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875110/HIVE-16832.16.patch

{color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 10833 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=74)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=100)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=290)
org.apache.hadoop.hive.ql.io.orc.TestVectorizedOrcAcidRowBatchReader.testVectorizedOrcAcidRowBatchReader
 (batchId=261)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5835/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5835/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5835/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875110 - PreCommit-HIVE-Build

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6990:
---
Attachment: HIVE-6990.06.patch

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6990.06.patch, HIVE-6990.06.patch, 
> HIVE-6990.1.patch, HIVE-6990.2.patch, HIVE-6990.3.patch, HIVE-6990.4.patch, 
> HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source)
> Reproduce steps:
> 1. set the following properties in hive-site.xml
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   user1
>  
> 2. execute hive queries
> hive> create table mytbl ( key int, value string);
> hive> load data local inpath 'examples/files/kv1.txt' overwrite into table 
> mytbl;
> hive> select * from mytbl;
> hive> create view myview partitioned on (value) as select key, value from 
> mytbl where key=98;
> hive> alter view myview add partition (value='val_98') partition 
> (value='val_xyz');
> hive> alter view myview drop partition (value='val_xyz');



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16984) HoS: avoid waiting for RemoteSparkJobStatus::getAppID() when remote driver died

2017-06-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069173#comment-16069173
 ] 

Xuefu Zhang commented on HIVE-16984:


The patch looks good. However, I'm thinking we'd better to keep the previous 
logic intact (before HIVE-15171). For this, we can change the current API, 
SparkJobStatus.getAppId(), to a pure getter and add 
SparkJobStatus.fetchAppId(), which invokes the remote RPC call and caches app 
Id in SparkJobStatus class. This way, getAppId() can be called any time w/o any 
network trips. Existing call to getAppId() is changed to fetchAppId().

> HoS: avoid waiting for RemoteSparkJobStatus::getAppID() when remote driver 
> died
> ---
>
> Key: HIVE-16984
> URL: https://issues.apache.org/jira/browse/HIVE-16984
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-16984.1.patch
>
>
> In HoS, after a RemoteDriver is launched, it may fail to initialize a Spark 
> context and thus the ApplicationMaster will die eventually. In this case, 
> there are two issues related to RemoteSparkJobStatus::getAppID():
> 1. Currently we call {{getAppID()}} before starting the monitoring job. For 
> the first, it will wait for {{hive.spark.client.future.timeout}}, and for the 
> latter, it will wait for {{hive.spark.job.monitor.timeout}}. The error 
> message for the latter treats the {{hive.spark.job.monitor.timeout}} as the 
> time waiting for the job submission. However, this is inaccurate as it 
> doesn't include {{hive.spark.client.future.timeout}}.
> 2. In case the RemoteDriver suddenly died, currently we still may wait 
> hopelessly for the timeouts. This should potentially be avoided if we know 
> that the channel has closed between the client and remote driver.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16935) Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-29 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-16935:
--
Attachment: HIVE-16935.2.patch

> Hive should strip comments from input before choosing which CommandProcessor 
> to run.
> 
>
> Key: HIVE-16935
> URL: https://issues.apache.org/jira/browse/HIVE-16935
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16935.1.patch, HIVE-16935.2.patch
>
>
> While using Beeswax, Hue fails to execute statement with following error:
> Error while compiling statement: FAILED: ParseException line 3:4 missing 
> KW_ROLE at 'a' near 'a' line 3:5 missing EOF at '=' near 'a'
> {quote}
> -- comment
> SET a=1;
> SELECT 1;
> {quote}
> The same code works in Beeline and in Impala.
> The same code fails in CliDriver 
>  
> h2. Background
> Hive deals with sql comments (“-- to end of line”) in different places.
> Some clients attempt to strip comments. For example BeeLine was recently 
> enhanced in https://issues.apache.org/jira/browse/HIVE-13864 to strip 
> comments from multi-line commands before they are executed.
> Other clients such as Hue or Jdbc do not strip comments before sending text.
> Some tests such as TestCliDriver strip comments before running tests.
> When Hive gets a command the CommandProcessorFactory looks at the text to 
> determine which CommandProcessor should handle the command. In the bug case 
> the correct CommandProcessor is SetProcessor, but the comments confuse the 
> CommandProcessorFactory and so the command is treated as sql. Hive’s sql 
> parser understands and ignores comments, but it does not understand the set 
> commands usually handled by SetProcessor and so we get the ParseException 
> shown above.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16981) hive.optimize.bucketingsorting should compare the schema before removing RS

2017-06-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16981:
---
Status: Patch Available  (was: Open)

> hive.optimize.bucketingsorting should compare the schema before removing RS
> ---
>
> Key: HIVE-16981
> URL: https://issues.apache.org/jira/browse/HIVE-16981
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16981.01.patch, HIVE-16981.02.patch
>
>
> on master, smb_mapjoin_20.q, run
> {code}
> select * from test_table3;
> {code}
> you will get
> {code}
> val_0  0   NULL1
> ...
> {code}
> The correct result is
> {code}
> val_0  0   val_01
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16981) hive.optimize.bucketingsorting should compare the schema before removing RS

2017-06-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16981:
---
Status: Open  (was: Patch Available)

> hive.optimize.bucketingsorting should compare the schema before removing RS
> ---
>
> Key: HIVE-16981
> URL: https://issues.apache.org/jira/browse/HIVE-16981
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16981.01.patch, HIVE-16981.02.patch
>
>
> on master, smb_mapjoin_20.q, run
> {code}
> select * from test_table3;
> {code}
> you will get
> {code}
> val_0  0   NULL1
> ...
> {code}
> The correct result is
> {code}
> val_0  0   val_01
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16981) hive.optimize.bucketingsorting should compare the schema before removing RS

2017-06-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069169#comment-16069169
 ] 

Pengcheng Xiong commented on HIVE-16981:


The only failing test tez_smb_main  is resolved in HIVE-16761

> hive.optimize.bucketingsorting should compare the schema before removing RS
> ---
>
> Key: HIVE-16981
> URL: https://issues.apache.org/jira/browse/HIVE-16981
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16981.01.patch, HIVE-16981.02.patch
>
>
> on master, smb_mapjoin_20.q, run
> {code}
> select * from test_table3;
> {code}
> you will get
> {code}
> val_0  0   NULL1
> ...
> {code}
> The correct result is
> {code}
> val_0  0   val_01
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16981) hive.optimize.bucketingsorting should compare the schema before removing RS

2017-06-29 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16981:
---
Attachment: HIVE-16981.02.patch

> hive.optimize.bucketingsorting should compare the schema before removing RS
> ---
>
> Key: HIVE-16981
> URL: https://issues.apache.org/jira/browse/HIVE-16981
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16981.01.patch, HIVE-16981.02.patch
>
>
> on master, smb_mapjoin_20.q, run
> {code}
> select * from test_table3;
> {code}
> you will get
> {code}
> val_0  0   NULL1
> ...
> {code}
> The correct result is
> {code}
> val_0  0   val_01
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069167#comment-16069167
 ] 

Pengcheng Xiong commented on HIVE-16761:


this helps revolve one failing test in HIVE-16981

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.03.patch, HIVE-16761.04.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6990:
---
Attachment: HIVE-6990.06.patch

Rebased, removed tabs, changed to use pre-set strings, also updated the newly 
added queries since the last patch and I think a few places that were missed. 
[~ashutoshc] can you please take a look?

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6990.06.patch, HIVE-6990.1.patch, 
> HIVE-6990.2.patch, HIVE-6990.3.patch, HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source)
> Reproduce steps:
> 1. set the following properties in hive-site.xml
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   user1
>  
> 2. execute hive queries
> hive> create table mytbl ( key int, value string);
> hive> load data local inpath 'examples/files/kv1.txt' overwrite into table 
> mytbl;
> hive> select * from mytbl;
> hive> create view myview partitioned on (value) as select key, value from 
> mytbl where key=98;
> hive> alter view myview add partition (value='val_98') partition 
> (value='val_xyz');
> hive> alter view myview drop partition (value='val_xyz');



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-29 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069167#comment-16069167
 ] 

Pengcheng Xiong edited comment on HIVE-16761 at 6/29/17 11:11 PM:
--

this helps revolve one failing test in HIVE-16981. Just in time!


was (Author: pxiong):
this helps revolve one failing test in HIVE-16981

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.03.patch, HIVE-16761.04.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-6990) Direct SQL fails when the explicit schema setting is different from the default one

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-6990:
--

Assignee: Sergey Shelukhin  (was: Bing Li)

> Direct SQL fails when the explicit schema setting is different from the 
> default one
> ---
>
> Key: HIVE-6990
> URL: https://issues.apache.org/jira/browse/HIVE-6990
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0, 0.14.0, 1.2.1
> Environment: hive + derby
>Reporter: Bing Li
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6990.1.patch, HIVE-6990.2.patch, HIVE-6990.3.patch, 
> HIVE-6990.4.patch, HIVE-6990.5.patch
>
>
> I got the following ERROR in hive.log
> 2014-04-23 17:30:23,331 ERROR metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(1756)) - Direct SQL failed, falling 
> back to ORM
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = 
> TBLS.TBL_ID   inner join DBS on TBLS.DB_ID = DBS.DB_ID inner join 
> PARTITION_KEY_VALS as FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and 
> FILTER0.INTEGER_IDX = 0 where TBLS.TBL_NAME = ? and DBS.NAME = ? and 
> ((FILTER0.PART_KEY_VAL = ?))".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:181)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:1833)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1806)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
> at com.sun.proxy.$Proxy11.getPartitionsByFilter(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:3310)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:619)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> at com.sun.proxy.$Proxy12.get_partitions_by_filter(Unknown Source)
> Reproduce steps:
> 1. set the following properties in hive-site.xml
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   user1
>  
> 2. execute hive queries
> hive> create table mytbl ( key int, value string);
> hive> load data local inpath 'examples/files/kv1.txt' overwrite into table 
> mytbl;
> hive> select * from mytbl;
> hive> create view myview partitioned on (value) as select key, value from 
> mytbl where key=98;
> hive> alter view myview add partition (value='val_98') partition 
> (value='val_xyz');
> hive> alter view myview drop partition (value='val_xyz');



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16914) Change HiveMetaStoreClient to AutoCloseable

2017-06-29 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069154#comment-16069154
 ] 

Yongzhi Chen commented on HIVE-16914:
-

The change looks good.  +1

> Change HiveMetaStoreClient to AutoCloseable
> ---
>
> Key: HIVE-16914
> URL: https://issues.apache.org/jira/browse/HIVE-16914
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Xiaomeng Zhang
>Assignee: Xiaomeng Zhang
>Priority: Trivial
> Attachments: HIVE-16914.01.patch
>
>
> During my usage of HiveMetaStoreClient for testing, I found that this class 
> is not AutoCloseable. Adding this change to support 
> try(HiveMetaStoreClient hmsClient = HiveEnv.createHMSClient()) {
> ...
> ...
> ...
> ...
> }
> So we don't have to manually close it in finally block.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16960) Hive throws an ugly error exception when HDFS sticky bit is set

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069111#comment-16069111
 ] 

Hive QA commented on HIVE-16960:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875108/HIVE16960.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10819 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=239)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_windowing2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=146)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=102)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5834/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5834/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5834/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875108 - PreCommit-HIVE-Build

> Hive throws an ugly error exception when HDFS sticky bit is set
> ---
>
> Key: HIVE-16960
> URL: https://issues.apache.org/jira/browse/HIVE-16960
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>Priority: Critical
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HIVE16960.1.patch, HIVE16960.2.patch
>
>
> When calling LOAD DATA INPATH ... OVERWRITE INTO TABLE ... from a Hive user 
> other than the HDFS file owner, and the HDFS sticky bit is set, then Hive 
> will throw an error exception message that the file cannot be moved due to 
> permission issues.
> Caused by: org.apache.hadoop.security.AccessControlException: Permission 
> denied by sticky bit setting: user=hive, 
> inode=sasdata-2016-04-20-17-13-43-630-e-1.dlv.bk
> The permission denied is expected, but the error message does not make sense 
> to users + the stack trace displayed is huge. We should display a better 
> error message to users, and maybe provide with help information about how to 
> fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-06-29 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-16973:
--
Status: Patch Available  (was: Open)

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2

2017-06-29 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-16973:
--
Attachment: HIVE-16973.001.patch

.001 First stab at successful test runs locally with:

* Kerberos
* Local or HiveServer2 execution
* Tested execution engines of local (simple table scan), MapReduce, and Tez.

Need to clean this patch up some more and add some additional tests, but I 
wanted to get it off my machine.

> Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in 
> HS2
> 
>
> Key: HIVE-16973
> URL: https://issues.apache.org/jira/browse/HIVE-16973
> Project: Hive
>  Issue Type: Bug
>  Components: Accumulo Storage Handler
>Reporter: Josh Elser
>Assignee: Josh Elser
> Attachments: HIVE-16973.001.patch
>
>
> Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. 
> Looking into it, it seems like the bit-rot got pretty bad. You'll see 
> something like the following:
> {noformat}
> Caused by: java.io.IOException: Failed to unwrap AuthenticationToken 
> at 
> org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312)
>  
> at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122)
>  
> {noformat}
> It appears that some of the code-paths changed since when I first did my 
> testing (or I just did poor testing) and the delegation token was never being 
> fetched/serialized. There also are some issues with fetching the delegation 
> token from Accumulo properly which were addressed in ACCUMULO-4665
> I believe it would also be best to just update the dependency to use Accumulo 
> 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would 
> otherwise get much more complicated with reflection -- Accumulo has moved on 
> past 1.6, so let's do the same in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16960) Hive throws an ugly error exception when HDFS sticky bit is set

2017-06-29 Thread Janaki Lahorani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069019#comment-16069019
 ] 

Janaki Lahorani commented on HIVE-16960:


Review Board created: https://reviews.apache.org/r/60549/

> Hive throws an ugly error exception when HDFS sticky bit is set
> ---
>
> Key: HIVE-16960
> URL: https://issues.apache.org/jira/browse/HIVE-16960
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>Priority: Critical
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HIVE16960.1.patch, HIVE16960.2.patch
>
>
> When calling LOAD DATA INPATH ... OVERWRITE INTO TABLE ... from a Hive user 
> other than the HDFS file owner, and the HDFS sticky bit is set, then Hive 
> will throw an error exception message that the file cannot be moved due to 
> permission issues.
> Caused by: org.apache.hadoop.security.AccessControlException: Permission 
> denied by sticky bit setting: user=hive, 
> inode=sasdata-2016-04-20-17-13-43-630-e-1.dlv.bk
> The permission denied is expected, but the error message does not make sense 
> to users + the stack trace displayed is huge. We should display a better 
> error message to users, and maybe provide with help information about how to 
> fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069008#comment-16069008
 ] 

Hive QA commented on HIVE-16759:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875102/HIVE16759.1.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10832 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=238)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=225)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5833/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5833/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5833/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875102 - PreCommit-HIVE-Build

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: HIVE-15665.05.patch

Updated the patch

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Dan Burkert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert updated HIVE-16993:
---
Attachment: HIVE-16993.0-master.patch

> ThriftHiveMetastore.create_database can fail if the locationUri is not set
> --
>
> Key: HIVE-16993
> URL: https://issues.apache.org/jira/browse/HIVE-16993
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-16993.0-master.patch
>
>
> Calling 
> [{{ThriftHiveMetastore.create_database}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L1078]
>  with a database with an unset {{locationUri}} field through the C++ 
> implementation fails with:
> {code}
> MetaException(message=java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string)
> {code}
> The 
> [{{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L270]
>  Thrift field is 'default requiredness (implicit)', and Thrift [does not 
> specify|https://thrift.apache.org/docs/idl#default-requiredness-implicit] 
> whether unset default requiredness fields are encoded.  Empirically, the Java 
> generated code [does not write the 
> {{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java#L938-L942]
>  when the field is unset, while the C++ generated code 
> [does|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp#L3888-L3890].
> The MetaStore treats the field as optional, and [fills in a default 
> value|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L867-L871]
>  if the field is unset.
> The end result is that when the C++ implementation sends a {{Database}} 
> without the field set, it actually writes an empty string, and the MetaStore 
> treats it as a set field (non-null), and then calls a {{Path}} API which 
> rejects the empty string.  The fix is simple: make the {{locationUri}} field 
> optional in metastore.thrift.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16993) ThriftHiveMetastore.create_database can fail if the locationUri is not set

2017-06-29 Thread Dan Burkert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Burkert updated HIVE-16993:
---
Assignee: Dan Burkert
  Status: Patch Available  (was: Open)

> ThriftHiveMetastore.create_database can fail if the locationUri is not set
> --
>
> Key: HIVE-16993
> URL: https://issues.apache.org/jira/browse/HIVE-16993
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-16993.0-master.patch
>
>
> Calling 
> [{{ThriftHiveMetastore.create_database}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L1078]
>  with a database with an unset {{locationUri}} field through the C++ 
> implementation fails with:
> {code}
> MetaException(message=java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string)
> {code}
> The 
> [{{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/if/hive_metastore.thrift#L270]
>  Thrift field is 'default requiredness (implicit)', and Thrift [does not 
> specify|https://thrift.apache.org/docs/idl#default-requiredness-implicit] 
> whether unset default requiredness fields are encoded.  Empirically, the Java 
> generated code [does not write the 
> {{locationUri}}|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java#L938-L942]
>  when the field is unset, while the C++ generated code 
> [does|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp#L3888-L3890].
> The MetaStore treats the field as optional, and [fills in a default 
> value|https://github.com/apache/hive/blob/3fa48346d509813977cd3c7622d581c0ccd51e99/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L867-L871]
>  if the field is unset.
> The end result is that when the C++ implementation sends a {{Database}} 
> without the field set, it actually writes an empty string, and the MetaStore 
> treats it as a set field (non-null), and then calls a {{Path}} API which 
> rejects the empty string.  The fix is simple: make the {{locationUri}} field 
> optional in metastore.thrift.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-06-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
Attachment: HIVE-16832.16.patch

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16960) Hive throws an ugly error exception when HDFS sticky bit is set

2017-06-29 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16960:
---
Attachment: HIVE16960.2.patch

Changed dyn_part_max.q.out to match the output.

> Hive throws an ugly error exception when HDFS sticky bit is set
> ---
>
> Key: HIVE-16960
> URL: https://issues.apache.org/jira/browse/HIVE-16960
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>Priority: Critical
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HIVE16960.1.patch, HIVE16960.2.patch
>
>
> When calling LOAD DATA INPATH ... OVERWRITE INTO TABLE ... from a Hive user 
> other than the HDFS file owner, and the HDFS sticky bit is set, then Hive 
> will throw an error exception message that the file cannot be moved due to 
> permission issues.
> Caused by: org.apache.hadoop.security.AccessControlException: Permission 
> denied by sticky bit setting: user=hive, 
> inode=sasdata-2016-04-20-17-13-43-630-e-1.dlv.bk
> The permission denied is expected, but the error message does not make sense 
> to users + the stack trace displayed is huge. We should display a better 
> error message to users, and maybe provide with help information about how to 
> fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16960) Hive throws an ugly error exception when HDFS sticky bit is set

2017-06-29 Thread Janaki Lahorani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068958#comment-16068958
 ] 

Janaki Lahorani commented on HIVE-16960:


org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[dyn_part_max] 
(batchId=88) looks to be a fix to out file.  The other test errors looks to be 
not related to this patch.  

> Hive throws an ugly error exception when HDFS sticky bit is set
> ---
>
> Key: HIVE-16960
> URL: https://issues.apache.org/jira/browse/HIVE-16960
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>Priority: Critical
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HIVE16960.1.patch
>
>
> When calling LOAD DATA INPATH ... OVERWRITE INTO TABLE ... from a Hive user 
> other than the HDFS file owner, and the HDFS sticky bit is set, then Hive 
> will throw an error exception message that the file cannot be moved due to 
> permission issues.
> Caused by: org.apache.hadoop.security.AccessControlException: Permission 
> denied by sticky bit setting: user=hive, 
> inode=sasdata-2016-04-20-17-13-43-630-e-1.dlv.bk
> The permission denied is expected, but the error message does not make sense 
> to users + the stack trace displayed is huge. We should display a better 
> error message to users, and maybe provide with help information about how to 
> fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16759) Add table type information to HMS log notifications

2017-06-29 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16759:
---
Attachment: HIVE16759.1.patch

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16759) Add table type information to HMS log notifications

2017-06-29 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16759:
---
Status: Patch Available  (was: Open)

Notification Events (MNNotificationLog, NotificationEvent, 
HCatNotificationEvent) will record the table type along with table name for 
Events: Create Table, Drop Table, Alter Table, Add Partition, Drop Partition, 
Alter Partition, and Insert.


> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16992) LLAP: better default lambda for LRFU policy

2017-06-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068902#comment-16068902
 ] 

Sergey Shelukhin commented on HIVE-16992:
-

I was going to do this late July, but thanks!

> LLAP: better default lambda for LRFU policy
> ---
>
> Key: HIVE-16992
> URL: https://issues.apache.org/jira/browse/HIVE-16992
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Gopal V
>
> LRFU is currently skewed heavily towards LRU; there are 10k-s or 100k-s of 
> buffers tracked during a typical workload, but the heap size is around 700. 
> We should see if making it closer to LFU (by tweaking the lambda) will 
> improve hit rate with small queries infrequently interleaved with large 
> scans; and whether it will have negative effects due to perf overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16992) LLAP: better default lambda for LRFU policy

2017-06-29 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068893#comment-16068893
 ] 

Gopal V commented on HIVE-16992:


I'll run a matrix of tests and see if we can have a better default. 

> LLAP: better default lambda for LRFU policy
> ---
>
> Key: HIVE-16992
> URL: https://issues.apache.org/jira/browse/HIVE-16992
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Gopal V
>
> LRFU is currently skewed heavily towards LRU; there are 10k-s or 100k-s of 
> buffers tracked during a typical workload, but the heap size is around 700. 
> We should see if making it closer to LFU (by tweaking the lambda) will 
> improve hit rate with small queries infrequently interleaved with large 
> scans; and whether it will have negative effects due to perf overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-29 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068894#comment-16068894
 ] 

Jason Dere commented on HIVE-16926:
---

[~sseth] [~sershe] can you review?

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16992) LLAP: better default lambda for LRFU policy

2017-06-29 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-16992:
--

Assignee: Gopal V

> LLAP: better default lambda for LRFU policy
> ---
>
> Key: HIVE-16992
> URL: https://issues.apache.org/jira/browse/HIVE-16992
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Gopal V
>
> LRFU is currently skewed heavily towards LRU; there are 10k-s or 100k-s of 
> buffers tracked during a typical workload, but the heap size is around 700. 
> We should see if making it closer to LFU (by tweaking the lambda) will 
> improve hit rate with small queries infrequently interleaved with large 
> scans; and whether it will have negative effects due to perf overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16761:

Fix Version/s: 2.3.0

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.03.patch, HIVE-16761.04.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16761:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the reviews!

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.03.patch, HIVE-16761.04.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16979) Cache UGI for metastore

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068871#comment-16068871
 ] 

Hive QA commented on HIVE-16979:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874981/HIVE-16979.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10831 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5832/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5832/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5832/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874981 - PreCommit-HIVE-Build

> Cache UGI for metastore
> ---
>
> Key: HIVE-16979
> URL: https://issues.apache.org/jira/browse/HIVE-16979
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-16979.1.patch, HIVE-16979.2.patch
>
>
> FileSystem.closeAllForUGI is called per request against metastore to dispose 
> UGI, which involves talking to HDFS name node and is time consuming. So the 
> perf improvement would be caching and reusing the UGI.
> Per FileSystem.closeAllForUG call could take up to 20 ms as E2E latency 
> against HDFS. Usually a Hive query could result in several calls against 
> metastore, so we can save up to 50-100 ms per hive query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068837#comment-16068837
 ] 

Andrew Sherman commented on HIVE-16991:
---

But test failures have nothing to do with my change.
[~spena] please commit if this looks OK to you.

> HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility
> -
>
> Key: HIVE-16991
> URL: https://issues.apache.org/jira/browse/HIVE-16991
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16991.1.patch
>
>
> Some client code that is not easy to change uses a 2-arg constructor on 
> HiveMetaStoreClient.
> It is trivial and safe to add this constructor:
> {noformat}
> public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
> throws MetaException {
> this(conf, hookLoader, true);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068824#comment-16068824
 ] 

Andrew Sherman edited comment on HIVE-16991 at 6/29/17 7:30 PM:


The only interesting failure is

{noformat}
Caused by: java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) 
~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1471) 
~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1469) 
~[?:1.8.0_102]
at java.util.AbstractCollection.toArray(AbstractCollection.java:196) 
~[?:1.8.0_102]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:290) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:216) 
~[hive-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1404) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1360) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:201)
 ~[hive-service-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

which looks like [HIVE-15936]


was (Author: asherman):
The only interesting failure is

{formatted}
Caused by: java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) 
~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1471) 
~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1469) 
~[?:1.8.0_102]
at java.util.AbstractCollection.toArray(AbstractCollection.java:196) 
~[?:1.8.0_102]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:290) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:216) 
~[hive-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1404) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1360) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:201)
 ~[hive-service-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{formatted}

which looks like [HIVE-15936]

> HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility
> -
>
> Key: HIVE-16991
> URL: https://issues.apache.org/jira/browse/HIVE-16991
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16991.1.patch
>
>
> Some client code that is not easy to change uses a 2-arg constructor on 
> HiveMetaStoreClient.
> It is trivial and safe to add this constructor:
> {noformat}
> public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
> throws MetaException {
> this(conf, hookLoader, true);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Andrew Sherman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068824#comment-16068824
 ] 

Andrew Sherman commented on HIVE-16991:
---

The only interesting failure is

{formatted}
Caused by: java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) 
~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1471) 
~[?:1.8.0_102]
at java.util.HashMap$EntryIterator.next(HashMap.java:1469) 
~[?:1.8.0_102]
at java.util.AbstractCollection.toArray(AbstractCollection.java:196) 
~[?:1.8.0_102]
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:290) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.log.PerfLogger.getEndTimes(PerfLogger.java:216) 
~[hive-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1404) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1360) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:201)
 ~[hive-service-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{formatted}

which looks like [HIVE-15936]

> HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility
> -
>
> Key: HIVE-16991
> URL: https://issues.apache.org/jira/browse/HIVE-16991
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16991.1.patch
>
>
> Some client code that is not easy to change uses a 2-arg constructor on 
> HiveMetaStoreClient.
> It is trivial and safe to add this constructor:
> {noformat}
> public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
> throws MetaException {
> this(conf, hookLoader, true);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068786#comment-16068786
 ] 

Hive QA commented on HIVE-16991:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875087/HIVE-16991.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10831 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testParallelCompilation (batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5831/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5831/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5831/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875087 - PreCommit-HIVE-Build

> HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility
> -
>
> Key: HIVE-16991
> URL: https://issues.apache.org/jira/browse/HIVE-16991
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16991.1.patch
>
>
> Some client code that is not easy to change uses a 2-arg constructor on 
> HiveMetaStoreClient.
> It is trivial and safe to add this constructor:
> {noformat}
> public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
> throws MetaException {
> this(conf, hookLoader, true);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16962) Better error msg for Hive on Spark in case user cancels query and closes session

2017-06-29 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-16962:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Patch committed to master. Thanks to Chao for the review.

> Better error msg for Hive on Spark in case user cancels query and closes 
> session
> 
>
> Key: HIVE-16962
> URL: https://issues.apache.org/jira/browse/HIVE-16962
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 2.2.0
>
> Attachments: HIVE-16962.2.patch, HIVE-16962.patch, HIVE-16962.patch
>
>
> In case user cancels a query and closes the session, Hive marks the query as 
> failed. However, the error message is a little confusing. It still says:
> {quote}
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create spark 
> client. This is likely because the queue you assigned to does not have free 
> resource at the moment to start the job. Please check your queue usage and 
> try the query again later.
> {quote}
> followed by some InterruptedException.
> Ideally, the error should clearly indicates the fact that user cancels the 
> execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16979) Cache UGI for metastore

2017-06-29 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-16979:
--
Status: Patch Available  (was: Open)

> Cache UGI for metastore
> ---
>
> Key: HIVE-16979
> URL: https://issues.apache.org/jira/browse/HIVE-16979
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-16979.1.patch, HIVE-16979.2.patch
>
>
> FileSystem.closeAllForUGI is called per request against metastore to dispose 
> UGI, which involves talking to HDFS name node and is time consuming. So the 
> perf improvement would be caching and reusing the UGI.
> Per FileSystem.closeAllForUG call could take up to 20 ms as E2E latency 
> against HDFS. Usually a Hive query could result in several calls against 
> metastore, so we can save up to 50-100 ms per hive query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16974) Change the sort key for the schema tool validator to be

2017-06-29 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-16974:
-
Attachment: HIVE-16974.patch

> Change the sort key for the schema tool validator to be 
> 
>
> Key: HIVE-16974
> URL: https://issues.apache.org/jira/browse/HIVE-16974
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-16974.patch
>
>
> In HIVE-16729, we introduced ordering of results/failures returned by 
> schematool's validators. This allows fault injection testing to expect 
> results that can be verified. However, they were sorted on NAME values which 
> in the HMS schema can be NULL. So if the introduced fault has a NULL/BLANK 
> name column value, the result could be different depending on the backend 
> database(if they sort NULLs first or last).
> So I think it is better to sort on a non-null column value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16982) WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration option

2017-06-29 Thread Karen Coppage (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-16982:
-
Status: Open  (was: Patch Available)

> WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration 
> option
> --
>
> Key: HIVE-16982
> URL: https://issues.apache.org/jira/browse/HIVE-16982
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Labels: newbie, patch
> Attachments: HIVE-16982.patch
>
>
> In the Hive WebUI / Drilldown: the Show Query tab always displays "UNKNOWN."
> If the user wants to see the query plan here, they should set configuration 
> hive.log.explain.output to true. The user should be made aware of this option:
> 1) in WebUI / Drilldown / Show Query and
> 2) in HiveConf.java, line 2232.
> This configuration's description reads:
> "Whether to log explain output for every query
> When enabled, will log EXPLAIN EXTENDED output for the query at INFO log4j 
> log level."
> this should be added:
> "...and in the WebUI / Show Query tab."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16982) WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration option

2017-06-29 Thread Karen Coppage (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068589#comment-16068589
 ] 

Karen Coppage edited comment on HIVE-16982 at 6/29/17 6:00 PM:
---

[~pvary], would you mind reviewing this please?


was (Author: klcopp):
[~pvary] and [~leftylev], would you mind reviewing this please?

> WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration 
> option
> --
>
> Key: HIVE-16982
> URL: https://issues.apache.org/jira/browse/HIVE-16982
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Labels: newbie, patch
> Attachments: HIVE-16982.patch
>
>
> In the Hive WebUI / Drilldown: the Show Query tab always displays "UNKNOWN."
> If the user wants to see the query plan here, they should set configuration 
> hive.log.explain.output to true. The user should be made aware of this option:
> 1) in WebUI / Drilldown / Show Query and
> 2) in HiveConf.java, line 2232.
> This configuration's description reads:
> "Whether to log explain output for every query
> When enabled, will log EXPLAIN EXTENDED output for the query at INFO log4j 
> log level."
> this should be added:
> "...and in the WebUI / Show Query tab."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16982) WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration option

2017-06-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068711#comment-16068711
 ] 

Hive QA commented on HIVE-16982:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12875082/HIVE-16982.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10831 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_smb_main]
 (batchId=150)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testBootstrapFunctionReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionIncrementalReplication
 (batchId=217)
org.apache.hadoop.hive.ql.parse.TestReplicationScenariosAcrossInstances.testCreateFunctionWithFunctionBinaryJarsOnHDFS
 (batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5830/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5830/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5830/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12875082 - PreCommit-HIVE-Build

> WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration 
> option
> --
>
> Key: HIVE-16982
> URL: https://issues.apache.org/jira/browse/HIVE-16982
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Labels: newbie, patch
> Attachments: HIVE-16982.patch
>
>
> In the Hive WebUI / Drilldown: the Show Query tab always displays "UNKNOWN."
> If the user wants to see the query plan here, they should set configuration 
> hive.log.explain.output to true. The user should be made aware of this option:
> 1) in WebUI / Drilldown / Show Query and
> 2) in HiveConf.java, line 2232.
> This configuration's description reads:
> "Whether to log explain output for every query
> When enabled, will log EXPLAIN EXTENDED output for the query at INFO log4j 
> log level."
> this should be added:
> "...and in the WebUI / Show Query tab."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068703#comment-16068703
 ] 

Sergio Peña commented on HIVE-16991:


+1

Waiting for tests before the commit.

> HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility
> -
>
> Key: HIVE-16991
> URL: https://issues.apache.org/jira/browse/HIVE-16991
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16991.1.patch
>
>
> Some client code that is not easy to change uses a 2-arg constructor on 
> HiveMetaStoreClient.
> It is trivial and safe to add this constructor:
> {noformat}
> public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
> throws MetaException {
> this(conf, hookLoader, true);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16982) WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration option

2017-06-29 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068699#comment-16068699
 ] 

Peter Vary commented on HIVE-16982:
---

Hi [~klcopp],

Thanks for the patch. Just a quick question: Are we sure, that if the 
{{explainPlan}} is null, then it is because the {{hive.log.explain.output}} is 
false, or there can be another reason too?

Thanks,
Peter

> WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration 
> option
> --
>
> Key: HIVE-16982
> URL: https://issues.apache.org/jira/browse/HIVE-16982
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Labels: newbie, patch
> Attachments: HIVE-16982.patch
>
>
> In the Hive WebUI / Drilldown: the Show Query tab always displays "UNKNOWN."
> If the user wants to see the query plan here, they should set configuration 
> hive.log.explain.output to true. The user should be made aware of this option:
> 1) in WebUI / Drilldown / Show Query and
> 2) in HiveConf.java, line 2232.
> This configuration's description reads:
> "Whether to log explain output for every query
> When enabled, will log EXPLAIN EXTENDED output for the query at INFO log4j 
> log level."
> this should be added:
> "...and in the WebUI / Show Query tab."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-16991:
--
Status: Patch Available  (was: Open)

> HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility
> -
>
> Key: HIVE-16991
> URL: https://issues.apache.org/jira/browse/HIVE-16991
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16991.1.patch
>
>
> Some client code that is not easy to change uses a 2-arg constructor on 
> HiveMetaStoreClient.
> It is trivial and safe to add this constructor:
> {noformat}
> public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
> throws MetaException {
> this(conf, hookLoader, true);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-16991:
--
Attachment: HIVE-16991.1.patch

> HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility
> -
>
> Key: HIVE-16991
> URL: https://issues.apache.org/jira/browse/HIVE-16991
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-16991.1.patch
>
>
> Some client code that is not easy to change uses a 2-arg constructor on 
> HiveMetaStoreClient.
> It is trivial and safe to add this constructor:
> {noformat}
> public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
> throws MetaException {
> this(conf, hookLoader, true);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16992) LLAP: better default lambda for LRFU policy

2017-06-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068671#comment-16068671
 ] 

Sergey Shelukhin commented on HIVE-16992:
-

cc [~gopalv]

> LLAP: better default lambda for LRFU policy
> ---
>
> Key: HIVE-16992
> URL: https://issues.apache.org/jira/browse/HIVE-16992
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> LRFU is currently skewed heavily towards LRU; there are 10k-s or 100k-s of 
> buffers tracked during a typical workload, but the heap size is around 700. 
> We should see if making it closer to LFU (by tweaking the lambda) will 
> improve hit rate with small queries infrequently interleaved with large 
> scans; and whether it will have negative effects due to perf overhead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16987) Llap: StackOverFlow in EvictionDispatcher:notifyEvicted

2017-06-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16987:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master

> Llap: StackOverFlow in EvictionDispatcher:notifyEvicted
> ---
>
> Key: HIVE-16987
> URL: https://issues.apache.org/jira/browse/HIVE-16987
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16987.patch
>
>
> Env: hive master (commit:6de5d1d4ba40269a0bf5ad6ee42797021541fe97) + tez 
> master
> Query: insert overwrite with dynamic partitions
> {noformat}
>  errorMessage=Cannot recover from this 
> error:org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: java.lang.StackOverflowError
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.StackOverflowError
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
> ... 17 more
> Caused by: java.io.IOException: java.lang.StackOverflowError
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.rethrowErrorIfAny(LlapRecordReader.java:307)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.nextCvb(LlapRecordReader.java:262)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:202)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:65)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
> ... 23 more
> Caused by: java.lang.StackOverflowError
> at 
> org.apache.hadoop.hive.llap.cache.EvictionDispatcher.notifyEvicted(EvictionDispatcher.java:44)
> at 
> org.apache.hadoop.hive.llap.cache.SerDeLowLevelCacheImpl$LlapSerDeDataBuffer.notifyEvicted(SerDeLowLevelCacheImpl.java:60)
> at 
> org.apache.hadoop.hive.llap.cache.EvictionDispatcher.notifyEvicted(EvictionDispatcher.java:44)
> at 
> org.apache.hadoop.hive.llap.cache.Ser

[jira] [Commented] (HIVE-16989) Fix some issues identified by lgtm.com

2017-06-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068647#comment-16068647
 ] 

Sergey Shelukhin commented on HIVE-16989:
-

Details? As it is, this JIRA makes me apprehensive about clicking the links. 
"Your fedex package is waiting, click here" :)

> Fix some issues identified by lgtm.com
> --
>
> Key: HIVE-16989
> URL: https://issues.apache.org/jira/browse/HIVE-16989
> Project: Hive
>  Issue Type: Improvement
>Reporter: Malcolm Taylor
>Assignee: Malcolm Taylor
>
> [lgtm.com|https://lgtm.com] has identified a number of issues where there may 
> be scope for improvement. The plan is to address some of the alerts found at 
> [https://lgtm.com/projects/g/apache/hive/].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-16991:
--
Description: 
Some client code that is not easy to change uses a 2-arg constructor on 
HiveMetaStoreClient.
It is trivial and safe to add this constructor:

{noformat}
public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) throws 
MetaException {
this(conf, hookLoader, true);
}
{noformat}

  was:
Some client code that is not easy to change uses a 2-arg constructor on 
HiveMetaStoreClient.
It is trivial and safe to add this constructor:

{quote}
public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) throws 
MetaException {
this(conf, hookLoader, true);
}
{quote}


> HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility
> -
>
> Key: HIVE-16991
> URL: https://issues.apache.org/jira/browse/HIVE-16991
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>
> Some client code that is not easy to change uses a 2-arg constructor on 
> HiveMetaStoreClient.
> It is trivial and safe to add this constructor:
> {noformat}
> public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
> throws MetaException {
> this(conf, hookLoader, true);
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-16991:
--
Description: 
Some client code that is not easy to change uses a 2-arg constructor on 
HiveMetaStoreClient.
It is trivial and safe to add this constructor:

{quote}
public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) throws 
MetaException {
this(conf, hookLoader, true);
}
{quote}

  was:
Some client code that is not easy to change uses a 2-arg constructor on 
HiveMetaStoreClient.
It is trivial and safe to add this constructor:

public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) throws 
MetaException {
this(conf, hookLoader, true);
}


> HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility
> -
>
> Key: HIVE-16991
> URL: https://issues.apache.org/jira/browse/HIVE-16991
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>
> Some client code that is not easy to change uses a 2-arg constructor on 
> HiveMetaStoreClient.
> It is trivial and safe to add this constructor:
> {quote}
> public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
> throws MetaException {
> this(conf, hookLoader, true);
> }
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >