date:20170712

[jira] [Commented] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr

2017-07-12 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085276#comment-16085276
 ] 

Rui Li commented on HIVE-15104:
---

I also run another round of TPC-DS. The overall shuffle data is reduced by 12%. 
The query time improvement is however negligible - about 1.5%.
[~xuefuz] do you think it's worth the effort?

> Hive on Spark generate more shuffle data than hive on mr
> 
>
> Key: HIVE-15104
> URL: https://issues.apache.org/jira/browse/HIVE-15104
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1
>Reporter: wangwenli
>Assignee: Rui Li
> Attachments: HIVE-15104.1.patch, HIVE-15104.2.patch, 
> HIVE-15104.3.patch, HIVE-15104.4.patch, TPC-H 100G.xlsx
>
>
> the same sql,  running on spark  and mr engine, will generate different size 
> of shuffle data.
> i think it is because of hive on mr just serialize part of HiveKey, but hive 
> on spark which using kryo will serialize full of Hivekey object.  
> what is your opionion?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15898) add Type2 SCD merge tests

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085275#comment-16085275
 ] 

Hive QA commented on HIVE-15898:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12877014/HIVE-15898.08.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10890 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_type2_scd]
 (batchId=144)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6001/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6001/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6001/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12877014 - PreCommit-HIVE-Build

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, 
> HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch, 
> HIVE-15898.06.patch, HIVE-15898.07.patch, HIVE-15898.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr

2017-07-12 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15104:
--
Attachment: HIVE-15104.4.patch

Update patch v4:
1. Moved the registrator code to a resource file. Hopefully the patch is more 
readable.
2. To be safe, we still have to store the hash code. But that's still better 
than the generic serializer.

> Hive on Spark generate more shuffle data than hive on mr
> 
>
> Key: HIVE-15104
> URL: https://issues.apache.org/jira/browse/HIVE-15104
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1
>Reporter: wangwenli
>Assignee: Rui Li
> Attachments: HIVE-15104.1.patch, HIVE-15104.2.patch, 
> HIVE-15104.3.patch, HIVE-15104.4.patch, TPC-H 100G.xlsx
>
>
> the same sql,  running on spark  and mr engine, will generate different size 
> of shuffle data.
> i think it is because of hive on mr just serialize part of HiveKey, but hive 
> on spark which using kryo will serialize full of Hivekey object.  
> what is your opionion?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Yibing Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-17078:
--
Attachment: HIVE-17078.4.PATCH

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, 
> HIVE-17078.3.patch, HIVE-17078.4.PATCH
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Yibing Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085252#comment-16085252
 ] 

Yibing Shi commented on HIVE-17078:
---

Checked the failed tests.

# org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
fails with something irrelevant to this patch
# org.apache.hive.hcatalog.api.TestHCatClient. The failure also has nothing to 
do our patch.
# org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14/23] fails 
because the output has changed in order. Nothing serious. We need to somehow 
update the .out files, but maybe in a separate JIRA
# The other tests fails because now we have more logs in local tasks. Will 
update the .out files.

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, 
> HIVE-17078.3.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16960) Hive throws an ugly error exception when HDFS sticky bit is set

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085218#comment-16085218
 ] 

Hive QA commented on HIVE-16960:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12877008/HIVE16960.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10890 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6000/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6000/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6000/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12877008 - PreCommit-HIVE-Build

> Hive throws an ugly error exception when HDFS sticky bit is set
> ---
>
> Key: HIVE-16960
> URL: https://issues.apache.org/jira/browse/HIVE-16960
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>Priority: Critical
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HIVE16960.1.patch, HIVE16960.2.patch, HIVE16960.2.patch, 
> HIVE16960.3.patch
>
>
> When calling LOAD DATA INPATH ... OVERWRITE INTO TABLE ... from a Hive user 
> other than the HDFS file owner, and the HDFS sticky bit is set, then Hive 
> will throw an error exception message that the file cannot be moved due to 
> permission issues.
> Caused by: org.apache.hadoop.security.AccessControlException: Permission 
> denied by sticky bit setting: user=hive, 
> inode=sasdata-2016-04-20-17-13-43-630-e-1.dlv.bk
> The permission denied is expected, but the error message does not make sense 
> to users + the stack trace displayed is huge. We should display a better 
> error message to users, and maybe provide with help information about how to 
> fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-12 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Open  (was: Patch Available)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085203#comment-16085203
 ] 

Vaibhav Gumashta commented on HIVE-4577:


[~libing] Looks like patch v7 didn't apply on master

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, 
> HIVE-4577.6.patch, HIVE-4577.7.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-12 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Patch Available  (was: Open)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-12 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Attachment: HIVE-16966.05.patch

rebase to master

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch, HIVE-16966.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085178#comment-16085178
 ] 

Hive QA commented on HIVE-17069:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12877003/HIVE-17069.02.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5999/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5999/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5999/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-13 04:50:59.904
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5999/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-13 04:50:59.906
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as 
TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as 
TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-13 04:51:00.740
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java:290
error: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java: 
patch does not apply
error: patch failed: 
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java:35
error: 
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcRawRecordMerger.java: patch 
does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12877003 - PreCommit-HIVE-Build

> Refactor OrcRawRecrodMerger.ReaderPair
> --
>
> Key: HIVE-17069
> URL: https://issues.apache.org/jira/browse/HIVE-17069
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17069.01.patch, HIVE-17069.02.patch
>
>
> this should be done post HIVE-16177 so as not to obscure the functional 
> changes completely
> Make ReaderPair an interface
> ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" 
> code path
> OriginalReaderPair - same as now but w/o incomprehensible override/variable 
> shadowing logic.
> Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common 
> base class.
> Push discoverKeyBounds() into appropriate implementation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085176#comment-16085176
 ] 

Hive QA commented on HIVE-4577:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12877002/HIVE-4577.7.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5998/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5998/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5998/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-13 04:50:25.104
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5998/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-13 04:50:25.107
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as 
TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline)
+ git clean -f -d
Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java.orig
Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderAdaptor.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java.orig
Removing ql/src/test/queries/clientpositive/llap_acid_fast.q
Removing ql/src/test/results/clientpositive/llap/llap_acid.q.out
Removing ql/src/test/results/clientpositive/llap/llap_acid_fast.q.out
Removing ql/src/test/results/clientpositive/llap_acid_fast.q.out
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as 
TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-13 04:50:30.983
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java: No 
such file or directory
error: a/ql/src/test/results/clientpositive/perf/query14.q.out: No such file or 
directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12877002 - PreCommit-HIVE-Build

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, 
> HIVE-4577.6.patch, HIVE-4577.7.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing

[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085173#comment-16085173
 ] 

Hive QA commented on HIVE-12631:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876993/HIVE-12631.18.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10892 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_windowing2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=151)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5997/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5997/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5997/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876993 - PreCommit-HIVE-Build

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.1.patch, 
> HIVE-12631.2.patch, HIVE-12631.3.patch, HIVE-12631.4.patch, 
> HIVE-12631.5.patch, HIVE-12631.6.patch, HIVE-12631.7.patch, 
> HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15898) add Type2 SCD merge tests

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15898:
--
Attachment: HIVE-15898.08.patch

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, 
> HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch, 
> HIVE-15898.06.patch, HIVE-15898.07.patch, HIVE-15898.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085146#comment-16085146
 ] 

Hive QA commented on HIVE-16996:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876982/HIVE-16966.04.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5996/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5996/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5996/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-13 03:48:51.176
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5996/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-13 03:48:51.178
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as 
TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 31a7987 HIVE-16975: Vectorization: Fully vectorize CAST date as 
TIMESTAMP so VectorUDFAdaptor is now used (Teddy Choi, reviewed by Matt McCline)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-13 03:48:57.351
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
fatal: git apply: bad git-diff - inconsistent old filename on line 33
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876982 - PreCommit-HIVE-Build

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085144#comment-16085144
 ] 

Hive QA commented on HIVE-16926:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876981/HIVE-16926.5.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10889 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5995/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5995/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5995/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876981 - PreCommit-HIVE-Build

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch, HIVE-16926.5.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085090#comment-16085090
 ] 

Rui Li commented on HIVE-16922:
---

Hi [~libing], I guess we also need metastore upgrade scripts to update the 
values that are already stored in DB, so that users can continue use the data 
when they upgrade to the new hive version.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085078#comment-16085078
 ] 

Hive QA commented on HIVE-17078:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876980/HIVE-17078.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10888 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_8]
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_convert_join]
 (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5994/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5994/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5994/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876980 - PreCommit-HIVE-Build

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, 
> HIVE-17078.3.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16960) Hive throws an ugly error exception when HDFS sticky bit is set

2017-07-12 Thread Janaki Lahorani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16960:
---
Attachment: HIVE16960.3.patch

> Hive throws an ugly error exception when HDFS sticky bit is set
> ---
>
> Key: HIVE-16960
> URL: https://issues.apache.org/jira/browse/HIVE-16960
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
>Priority: Critical
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HIVE16960.1.patch, HIVE16960.2.patch, HIVE16960.2.patch, 
> HIVE16960.3.patch
>
>
> When calling LOAD DATA INPATH ... OVERWRITE INTO TABLE ... from a Hive user 
> other than the HDFS file owner, and the HDFS sticky bit is set, then Hive 
> will throw an error exception message that the file cannot be moved due to 
> permission issues.
> Caused by: org.apache.hadoop.security.AccessControlException: Permission 
> denied by sticky bit setting: user=hive, 
> inode=sasdata-2016-04-20-17-13-43-630-e-1.dlv.bk
> The permission denied is expected, but the error message does not make sense 
> to users + the stack trace displayed is huge. We should display a better 
> error message to users, and maybe provide with help information about how to 
> fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085069#comment-16085069
 ] 

Bing Li commented on HIVE-16922:


[~lirui], I can't reproduce TestMiniLlapLocalCliDriver[vector_if_expr] in my 
env, I don't think it caused by this patch.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17069:
--
Status: Patch Available  (was: Open)

> Refactor OrcRawRecrodMerger.ReaderPair
> --
>
> Key: HIVE-17069
> URL: https://issues.apache.org/jira/browse/HIVE-17069
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17069.01.patch, HIVE-17069.02.patch
>
>
> this should be done post HIVE-16177 so as not to obscure the functional 
> changes completely
> Make ReaderPair an interface
> ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" 
> code path
> OriginalReaderPair - same as now but w/o incomprehensible override/variable 
> shadowing logic.
> Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common 
> base class.
> Push discoverKeyBounds() into appropriate implementation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Bing Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4577:
--
Attachment: HIVE-4577.7.patch

Add the golden file for dfscmd.q

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, 
> HIVE-4577.6.patch, HIVE-4577.7.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17069:
--
Attachment: HIVE-17069.02.patch

> Refactor OrcRawRecrodMerger.ReaderPair
> --
>
> Key: HIVE-17069
> URL: https://issues.apache.org/jira/browse/HIVE-17069
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17069.01.patch, HIVE-17069.02.patch
>
>
> this should be done post HIVE-16177 so as not to obscure the functional 
> changes completely
> Make ReaderPair an interface
> ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" 
> code path
> OriginalReaderPair - same as now but w/o incomprehensible override/variable 
> shadowing logic.
> Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common 
> base class.
> Push discoverKeyBounds() into appropriate implementation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used

2017-07-12 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085040#comment-16085040
 ] 

Matt McCline commented on HIVE-16975:
-

I don't see test failures related to this change.

> Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is 
> now used
> -
>
> Key: HIVE-16975
> URL: https://issues.apache.org/jira/browse/HIVE-16975
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16975.1.patch
>
>
> Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085036#comment-16085036
 ] 

Hive QA commented on HIVE-16793:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876977/HIVE-16793.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10876 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.minikdc.TestJdbcNonKrbSASLWithMiniKdc.org.apache.hive.minikdc.TestJdbcNonKrbSASLWithMiniKdc
 (batchId=238)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5993/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5993/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5993/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876977 - PreCommit-HIVE-Build

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch, HIVE-16793.6.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60]

[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-12 Thread Teddy Choi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-12631:
--
Attachment: HIVE-12631.18.patch

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.1.patch, 
> HIVE-12631.2.patch, HIVE-12631.3.patch, HIVE-12631.4.patch, 
> HIVE-12631.5.patch, HIVE-12631.6.patch, HIVE-12631.7.patch, 
> HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16541) PTF: Avoid shuffling constant keys for empty OVER()

2017-07-12 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085004#comment-16085004
 ] 

Gopal V commented on HIVE-16541:


Thanks [~ashutoshc], the window non-streaming codepath seems to be the problem, 
I'll debug a bit more.

> PTF: Avoid shuffling constant keys for empty OVER()
> ---
>
> Key: HIVE-16541
> URL: https://issues.apache.org/jira/browse/HIVE-16541
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16541.1.patch, HIVE-16541.2.patch
>
>
> Generating surrogate keys with 
> {code}
> select row_number() over() as p_key, * from table; 
> {code}
> uses a sorted edge with "0 ASC NULLS FIRST" as the sort order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084986#comment-16084986
 ] 

Hive QA commented on HIVE-16100:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876961/HIVE-16100.5.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10888 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby_empty] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_move_tasks_share_dependencies]
 (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_windowing2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[reducesink_dedup] 
(batchId=22)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_lineage2]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_windowing_2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5992/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5992/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5992/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876961 - PreCommit-HIVE-Build

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch, HIVE-16100.5.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-07-12 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Attachment: HIVE-16926.5.patch

Patch v5, with changes per feedback.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch, HIVE-16926.5.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-12 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Patch Available  (was: Open)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17021) Support replication of concatenate operation.

2017-07-12 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084950#comment-16084950
 ] 

Daniel Dai commented on HIVE-17021:
---

+1. Will commit shortly.

> Support replication of concatenate operation.
> -
>
> Key: HIVE-17021
> URL: https://issues.apache.org/jira/browse/HIVE-17021
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17021.01.patch
>
>
> We need to handle cases like ALTER TABLE ... CONCATENATE that also change the 
> files on disk, and potentially treat them similar to INSERT OVERWRITE, as it 
> does something equivalent to a compaction.
> Note that a ConditionalTask might also be fired at the end of inserts at the 
> end of a tez task (or other exec engine) if appropriate HiveConf settings are 
> set, to automatically do this operation - these also need to be taken care of 
> for replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-12 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Attachment: (was: HIVE-16966.04.patch)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-12 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Status: Open  (was: Patch Available)

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats

2017-07-12 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16996:
---
Attachment: HIVE-16966.04.patch

> Add HLL as an alternative to FM sketch to compute stats
> ---
>
> Key: HIVE-16996
> URL: https://issues.apache.org/jira/browse/HIVE-16996
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: Accuracy and performance comparison between HyperLogLog 
> and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, 
> HIVE-16966.03.patch, HIVE-16966.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084918#comment-16084918
 ] 

Hive QA commented on HIVE-16793:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876952/HIVE-16793.5.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5991/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5991/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5991/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-12 23:46:50.471
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5991/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-12 23:46:50.474
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   353781c..6af30bf  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 353781c HIVE-17079: LLAP: Use FQDN by default for work 
submission (Prasanth Jayachandran reviewed by Gopal V)
+ git clean -f -d
Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderAdaptor.java
Removing ql/src/test/queries/clientpositive/llap_acid_fast.q
Removing ql/src/test/results/clientpositive/llap/llap_acid.q.out
Removing ql/src/test/results/clientpositive/llap/llap_acid_fast.q.out
Removing ql/src/test/results/clientpositive/llap_acid_fast.q.out
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 6af30bf HIVE-16832 duplicate ROW__ID possible in multi insert 
into transactional table (Eugene Koifman, reviewed by Gopal V)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-12 23:46:56.393
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: No such 
file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java:
 No such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: No 
such file or directory
error: a/ql/src/test/queries/clientpositive/subquery_scalar.q: No such file or 
directory
error: a/ql/src/test/results/clientpositive/llap/subquery_scalar.q.out: No such 
file or directory
error: a/ql/src/test/results/clientpositive/perf/query14.q.out: No such file or 
directory
error: a/ql/src/test/results/clientpositive/perf/query23.q.out: No such file or 
directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876952 - PreCommit-HIVE-Build

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task

[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084914#comment-16084914
 ] 

Hive QA commented on HIVE-12631:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876915/HIVE-12631.17.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10871 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_reader] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] 
(batchId=56)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=239)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5990/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5990/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5990/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876915 - PreCommit-HIVE-Build

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, 
> HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16541) PTF: Avoid shuffling constant keys for empty OVER()

2017-07-12 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084941#comment-16084941
 ] 

Ashutosh Chauhan commented on HIVE-16541:
-

In some of golden files result set has changed, which looks incorrect.

> PTF: Avoid shuffling constant keys for empty OVER()
> ---
>
> Key: HIVE-16541
> URL: https://issues.apache.org/jira/browse/HIVE-16541
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16541.1.patch, HIVE-16541.2.patch
>
>
> Generating surrogate keys with 
> {code}
> select row_number() over() as p_key, * from table; 
> {code}
> uses a sorted edge with "0 ASC NULLS FIRST" as the sort order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Yibing Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-17078:
--
Attachment: HIVE-17078.3.patch

Add a bit more logs

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch, 
> HIVE-17078.3.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Attachment: HIVE-16793.6.patch

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch, HIVE-16793.6.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Status: Open  (was: Patch Available)

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Status: Patch Available  (was: Open)

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch, HIVE-16793.6.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_

[jira] [Updated] (HIVE-17013) Delete request with a subquery based on select over a view

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17013:
--
Component/s: Transactions

> Delete request with a subquery based on select over a view
> --
>
> Key: HIVE-17013
> URL: https://issues.apache.org/jira/browse/HIVE-17013
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Frédéric ESCANDELL
>Priority: Blocker
>
> Hi, 
> I based my DDL on this exemple 
> https://fr.hortonworks.com/tutorial/using-hive-acid-transactions-to-insert-update-and-delete-data/.
> In a delete request, the use of a view in a subquery throw an exception : 
> FAILED: IllegalStateException Expected 'insert into table default.mydim 
> select ROW__ID from default.mydim sort by ROW__ID' to be in sub-query or set 
> operation.
> {code}
> {code:sql}
> drop table if exists mydim;
> create table mydim (key int, name string, zip string, is_current boolean)
> clustered by(key) into 3 buckets
> stored as orc tblproperties ('transactional'='true');
> insert into mydim values
>   (1, 'bob',   '95136', true),
>   (2, 'joe',   '70068', true),
>   (3, 'steve', '22150', true);
> drop table if exists updates_staging_table;
> create table updates_staging_table (key int, newzip string);
> insert into updates_staging_table values (1, 87102), (3, 45220);
> drop view if exists updates_staging_view;
> create view updates_staging_view (key, newzip) as select key, newzip from 
> updates_staging_table;
> delete from mydim
> where mydim.key in (select key from updates_staging_view);
> FAILED: IllegalStateException Expected 'insert into table default.mydim 
> select ROW__ID from default.mydim sort by ROW__ID' to be in sub-query or set 
> operation.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Yibing Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084891#comment-16084891
 ] 

Yibing Shi commented on HIVE-17078:
---

I am trying to keep the current behaviour. With Hive CLI, by default Hive logs 
are not printed. Some users may rely on the stdout/stderr information. I don't 
want to surprise them.
If you still think it is unnecessary to print child stdout/stderr to Hive 
stdout/stderr, I can remove the corresponding code.

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-12 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084878#comment-16084878
 ] 

Eugene Koifman commented on HIVE-16832:
---

no related failures (see builds 5985,5984 for same failures)
HIVE-16832.22.patch committed to master (3.0)
thanks Gopal for the review

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch, HIVE-16832.22.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16953) OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and end are in the same stripe

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16953:
--
Summary: OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split 
start and end are in the same stripe  (was: 
OrcRawRecordMerger.discoverOriginalKeyBounds)

> OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and 
> end are in the same stripe
> -
>
> Key: HIVE-16953
> URL: https://issues.apache.org/jira/browse/HIVE-16953
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>
> if getOffset() and getMaxOffset() are inside
> * the sames tripe - in this case we have minKey & isTail=false but 
> rowLength is never set.
> don't know if we can ever have a split like that



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HIVE-14947) Add support for Acid 2 in Merge

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-14947.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Add support for Acid 2 in Merge
> ---
>
> Key: HIVE-14947
> URL: https://issues.apache.org/jira/browse/HIVE-14947
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 3.0.0
>
>
> HIVE-14035 etc introduced a more efficient data layout for acid tables
> Additional work is needed to support Merge for these tables
> Need to make sure we generate unique ROW__IDs in each branch of the 
> multi-insert statement.  StatementId was introduced in HIVE-11030 but it's 
> not surfaced from storage layer.  It needs to be made part of ROW__ID to 
> ensure unique ROW__ID from concurrent writes from the same query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-14947) Add support for Acid 2 in Merge

2017-07-12 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084884#comment-16084884
 ] 

Eugene Koifman commented on HIVE-14947:
---

fixed via HIVE-16832 

> Add support for Acid 2 in Merge
> ---
>
> Key: HIVE-14947
> URL: https://issues.apache.org/jira/browse/HIVE-14947
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 3.0.0
>
>
> HIVE-14035 etc introduced a more efficient data layout for acid tables
> Additional work is needed to support Merge for these tables
> Need to make sure we generate unique ROW__IDs in each branch of the 
> multi-insert statement.  StatementId was introduced in HIVE-11030 but it's 
> not surfaced from storage layer.  It needs to be made part of ROW__ID to 
> ensure unique ROW__ID from concurrent writes from the same query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
  Resolution: Fixed
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0  (was: 3.0.0, 2.4.0)
  Status: Resolved  (was: Patch Available)

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch, HIVE-16832.22.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16979) Cache UGI for metastore

2017-07-12 Thread Tao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084860#comment-16084860
 ] 

Tao Li commented on HIVE-16979:
---

[~gopalv] Can you please take a look at the patch? Thanks!

> Cache UGI for metastore
> ---
>
> Key: HIVE-16979
> URL: https://issues.apache.org/jira/browse/HIVE-16979
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-16979.1.patch, HIVE-16979.2.patch, 
> HIVE-16979.3.patch
>
>
> FileSystem.closeAllForUGI is called per request against metastore to dispose 
> UGI, which involves talking to HDFS name node and is time consuming. So the 
> perf improvement would be caching and reusing the UGI.
> Per FileSystem.closeAllForUG call could take up to 20 ms as E2E latency 
> against HDFS. Usually a Hive query could result in several calls against 
> metastore, so we can save up to 50-100 ms per hive query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084858#comment-16084858
 ] 

Hive QA commented on HIVE-16832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876914/HIVE-16832.22.patch

{color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10888 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] 
(batchId=94)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5989/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5989/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5989/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876914 - PreCommit-HIVE-Build

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch, HIVE-16832.22.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17079:
-
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 3.0.0
>
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-07-12 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084817#comment-16084817
 ] 

Siddharth Seth commented on HIVE-16926:
---

bq.  Is there any action needed on this part?
I don't thing there is, unless you see this as a problem for the running spark 
task. The  number of threads created etc is quite small afaik.

bq. Maybe I can just replace pendingClients/registeredClients with a single 
list and the RequestInfo can keep a state to show if the request is 
pending/running/etc.
That'll work as well. Think there's still 2 places which have similar code 
related to heartbeats - heartbeat / nodePinged.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-07-12 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16100:
---
Attachment: HIVE-16100.5.patch

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch, HIVE-16100.5.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084770#comment-16084770
 ] 

Vineet Garg commented on HIVE-16793:


It does if gby keys are constant

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COM

[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)

2017-07-12 Thread liyunzhang_intel (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084774#comment-16084774
 ] 

liyunzhang_intel commented on HIVE-17018:
-

[~csun]: 
{quote}
Yes. I think we don't need to change the existing behavior. I'm just suggesting 
that we might need a HoS specific config to replace 
hive.auto.convert.join.nonconditionaltask.size
{quote}
rename {{hive.auto.convert.join.nonconditionaltask.size}} to 
{{hive.auto.convert.join.within.sparktask.size}}? and the description of the 
configuration 
{noformat} is changed from
the sum of size for n-1 of the tables/partitions for a n-way join is smaller 
than it
{noformat}

to 
{noformat}

the sum of size for n-1 of the tables/partitions for a n-way join is smaller 
than it in 1 MapTask or ReduceTask
{noformat}


Can you give some suggestion?


> Small table is converted to map join even the total size of small tables 
> exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
> -
>
> Key: HIVE-17018
> URL: https://issues.apache.org/jira/browse/HIVE-17018
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt
>
>
>  we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it 
> means  the sum of size for n-1 of the tables/partitions for a n-way join is 
> smaller than it, it will be converted to a map join. for example, A join B 
> join C join D join E. Big table is A(100M), small tables are 
> B(10M),C(10M),D(10M),E(10M).  If we set 
> hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B 
> will be converted to map join but C will not be converted to map join. In my 
> understanding, because hive.auto.convert.join.noconditionaltask.size can only 
> contain E and D, so C and B should not be converted to map join.  
> Let's explain more why E can be converted to map join.
> in current code, 
> [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364]
>  calculates all the mapjoins  in the parent path and child path. The search 
> stops when encountering [UnionOperator or 
> ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381].
>  Because C is not converted to map join because {{connectedMapJoinSize + 
> totalSize) > maxSize}} [see 
> code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The
>  RS before the join of C remains. When calculating whether B will be 
> converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering 
> [RS 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409]
>  and causes  {{connectedMapJoinSize + totalSize) < maxSize}} matches.
> [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not  as you 
> are more familiar with SparkJoinOptimizer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084767#comment-16084767
 ] 

Hive QA commented on HIVE-17079:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876908/HIVE-17079.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10874 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5988/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5988/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5988/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876908 - PreCommit-HIVE-Build

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084760#comment-16084760
 ] 

Gopal V commented on HIVE-16793:


Does enabling this optimization remove the cross-products triggered by the 
scalar sub-query?

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Attachment: HIVE-16793.5.patch

Latest patch adds a config param {{hive.optimize.remove.sq_count_check}} to 
enable this optimization. Since this optimization caters to a very specific 
case but could have adverse effects (join reordering, joins not merging) we 
have decided to disable this optimization by default

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
>

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Status: Open  (was: Patch Available)

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>   <-Select Operator

[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant

2017-07-12 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16793:
---
Status: Patch Available  (was: Open)

> Scalar sub-query: sq_count_check not required if gby keys are constant
> --
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
> Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, 
> HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch
>
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>

[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084708#comment-16084708
 ] 

Adam Szita commented on HIVE-8838:
--

Thanks for reviewing [~spena], [~sushanth], [~aihuaxu] and committing!

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Fix For: 3.0.0
>
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16732:
--
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

HIVE-16732.03-branch-2.patch committed to branch-2 (2.x)
thanks Wei for the review

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, 
> HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084644#comment-16084644
 ] 

Hive QA commented on HIVE-16732:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876901/HIVE-16732.03-branch-2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10585 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228)
org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5987/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5987/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5987/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876901 - PreCommit-HIVE-Build

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, 
> HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-12 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084614#comment-16084614
 ] 

Pengcheng Xiong edited comment on HIVE-16907 at 7/12/17 8:18 PM:
-

That is exactly what I am worrying about : Hive may not well support table name 
with ".". Could u estimate the work that we need to do if we want to support 
this? Thanks.


was (Author: pxiong):
That is exactly what I am worrying about : Hive may not well support table name 
with ".". Could u evaluate the work that we need to do if we want to support 
this? Thanks.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator

[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-12 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084614#comment-16084614
 ] 

Pengcheng Xiong commented on HIVE-16907:


That is exactly what I am worrying about : Hive may not well support table name 
with ".". Could u evaluate the work that we need to do if we want to support 
this? Thanks.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>

[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084585#comment-16084585
 ] 

Sergio Peña commented on HIVE-8838:
---

Aaa, that's what happened haha, I tried to push it when I got an error that I 
had to update my local repo, and when I updated it, I saw the patch was already 
there, then I got confused.

Anyway, no worries, thanks for the heads up.

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Fix For: 3.0.0
>
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084566#comment-16084566
 ] 

Hive QA commented on HIVE-16922:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876876/HIVE-16922.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10874 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5986/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5986/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5986/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876876 - PreCommit-HIVE-Build

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084479#comment-16084479
 ] 

Hive QA commented on HIVE-17072:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876874/HIVE-17072.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] 
(batchId=94)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5985/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5985/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5985/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876874 - PreCommit-HIVE-Build

> Make the parallelized timeout configurable in BeeLine tests
> ---
>
> Key: HIVE-17072
> URL: https://issues.apache.org/jira/browse/HIVE-17072
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Attachments: HIVE-17072.1.patch
>
>
> When running the BeeLine tests parallel, the timeout is hardcoded in the 
> Parallelized.java:
> {noformat}
> @Override
> public void finished() {
>   executor.shutdown();
>   try {
> executor.awaitTermination(10, TimeUnit.MINUTES);
>   } catch (InterruptedException exc) {
> throw new RuntimeException(exc);
>   }
> }
> {noformat}
> It would be better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084462#comment-16084462
 ] 

Sahil Takiar commented on HIVE-17078:
-

If we are printing the child stdout / stderr to the Hive logs then do we need 
to also print them to Hive stdout / stderr too?

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-12 Thread Teddy Choi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-12631:
--
Attachment: HIVE-12631.17.patch

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, 
> HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode

2017-07-12 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16821:
---
Status: Open  (was: Patch Available)

Temporarily obsoleted by HIVE-17073

> Vectorization: support Explain Analyze in vectorized mode
> -
>
> Key: HIVE-16821
> URL: https://issues.apache.org/jira/browse/HIVE-16821
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Vectorization
>Affects Versions: 2.1.1, 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, 
> HIVE-16821.2.patch, HIVE-16821.3.patch
>
>
> Currently, to avoid a branch in the operator inner loop - the runtime stats 
> are only available in non-vector mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084438#comment-16084438
 ] 

Gopal V commented on HIVE-17079:


LGTM - +1

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17079:
-
Attachment: HIVE-17079.1.patch

[~sseth] can you please take a look? small change

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17079:
-
Status: Patch Available  (was: Open)

> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17079.1.patch
>
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
Attachment: HIVE-16832.22.patch

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch, HIVE-16832.22.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Fix Version/s: 3.0.0

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Fix For: 3.0.0
>
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8838:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks [~szita] for your contribution. I committed to master.

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Fix For: 3.0.0
>
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084425#comment-16084425
 ] 

Vineet Garg commented on HIVE-17066:


Failures are not reproducible/un-related

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Fix For: 3.0.0
>
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084434#comment-16084434
 ] 

Sushanth Sowmyan commented on HIVE-8838:


([~spena], I just pushed to master too, hopefully our pushes don't conflict :D )

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Fix For: 3.0.0
>
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-12 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084432#comment-16084432
 ] 

Vineet Garg commented on HIVE-17066:


Pushed to master

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-17079) LLAP: Use FQDN by default for work submission

2017-07-12 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-17079:



> LLAP: Use FQDN by default for work submission
> -
>
> Key: HIVE-17079
> URL: https://issues.apache.org/jira/browse/HIVE-17079
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> HIVE-14624 added FDQN for work submission. We should enable it by default to 
> avoid DNS issues. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8838:
--
Issue Type: New Feature  (was: Bug)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: New Feature
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, 
> HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-12 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084360#comment-16084360
 ] 

Eugene Koifman commented on HIVE-16177:
---

HIVE-16177.20-branch-2.patch committed to branch-2 (2.x)


> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, 
> HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16812) VectorizedOrcAcidRowBatchReader doesn't filter delete events

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16812:
--
Priority: Critical  (was: Major)

> VectorizedOrcAcidRowBatchReader doesn't filter delete events
> 
>
> Key: HIVE-16812
> URL: https://issues.apache.org/jira/browse/HIVE-16812
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>
> the c'tor of VectorizedOrcAcidRowBatchReader has
> {noformat}
> // Clone readerOptions for deleteEvents.
> Reader.Options deleteEventReaderOptions = readerOptions.clone();
> // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX 
> because
> // we always want to read all the delete delta files.
> deleteEventReaderOptions.range(0, Long.MAX_VALUE);
> {noformat}
> This is suboptimal since base and deltas are sorted by ROW__ID.  So for each 
> split if base we can find min/max ROW_ID and only load events from delta that 
> are in [min,max] range.  This will reduce the number of delete events we load 
> in memory (to no more than there in the split).
> When we support sorting on PK, the same should apply but we'd need to make 
> sure to store PKs in ORC index
> See OrcRawRecordMerger.discoverKeyBounds()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084376#comment-16084376
 ] 

Hive QA commented on HIVE-4577:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876871/HIVE-4577.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10841 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_12] 
(batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dfscmd] (batchId=33)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] 
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5984/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5984/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5984/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876871 - PreCommit-HIVE-Build

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-12 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16732:
--
Attachment: HIVE-16732.03-branch-2.patch

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, 
> HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084287#comment-16084287
 ] 

Hive QA commented on HIVE-17078:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876861/HIVE-17078.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=226)
org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5983/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5983/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5983/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876861 - PreCommit-HIVE-Build

> Add more logs to MapredLocalTask
> 
>
> Key: HIVE-17078
> URL: https://issues.apache.org/jira/browse/HIVE-17078
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yibing Shi
>Assignee: Yibing Shi
>Priority: Minor
> Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch
>
>
> By default, {{MapredLocalTask}} is executed in a child process of Hive, in 
> case the local task uses too much resources that may affect Hive. Currently, 
> the stdout and stderr information of the child process is printed in Hive's 
> stdout/stderr log, which doesn't have a timestamp information, and is 
> separated from Hive service logs. This makes it hard to troubleshoot problems 
> in MapredLocalTasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (HIVE-16999) Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource

2017-07-12 Thread Bing Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned HIVE-16999:
--

Assignee: Bing Li

> Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource
> 
>
> Key: HIVE-16999
> URL: https://issues.apache.org/jira/browse/HIVE-16999
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Sailee Jain
>Assignee: Bing Li
>Priority: Critical
>
> Performance bottleneck is found in adding resource[which is lying on HDFS] to 
> the distributed cache. 
> Commands used are :-
> {code:java}
> 1. ADD ARCHIVE "hdfs://some_dir/archive.tar"
> 2. ADD FILE "hdfs://some_dir/file.txt"
> {code}
> Here is the log corresponding to the archive adding operation:-
> {noformat}
>  converting to local hdfs://some_dir/archive.tar
>  Added resources: [hdfs://some_dir/archive.tar
> {noformat}
> Hive is downloading the resource to the local filesystem [shown in log by 
> "converting to local"]. 
> {color:#d04437}Ideally there is no need to bring the file to the local 
> filesystem when this operation is all about copying the file from one 
> location on HDFS to other location on HDFS[distributed cache].{color}
> This adds lot of performance bottleneck when the the resource is a big file 
> and all commands need the same resource.
> After debugging around the impacted piece of code is found to be :-
> {code:java}
> public List add_resources(ResourceType t, Collection values, 
> boolean convertToUnix)
>   throws RuntimeException {
> Set resourceSet = resourceMaps.getResourceSet(t);
> Map> resourcePathMap = 
> resourceMaps.getResourcePathMap(t);
> Map> reverseResourcePathMap = 
> resourceMaps.getReverseResourcePathMap(t);
> List localized = new ArrayList();
> try {
>   for (String value : values) {
> String key;
>  {color:#d04437}//get the local path of downloaded jars{color}
> List downloadedURLs = resolveAndDownload(t, value, 
> convertToUnix);
>  ;
>   .
> {code}
> {code:java}
>   List resolveAndDownload(ResourceType t, String value, boolean 
> convertToUnix) throws URISyntaxException,
>   IOException {
> URI uri = createURI(value);
> if (getURLType(value).equals("file")) {
>   return Arrays.asList(uri);
> } else if (getURLType(value).equals("ivy")) {
>   return dependencyResolver.downloadDependencies(uri);
> } else { // goes here for HDFS
>   return Arrays.asList(createURI(downloadResource(value, 
> convertToUnix))); // Here when the resource is not local it will download it 
> to the local machine.
> }
>   }
> {code}
> Here, the function resolveAndDownload() always calls the downloadResource() 
> api in case of external filesystem. It should take into consideration the 
> fact that - when the resource is on same HDFS then bringing it on local 
> machine is not a needed step and can be skipped for better performance.
> Thanks,
> Sailee



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-12 Thread Bing Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084245#comment-16084245
 ] 

Bing Li commented on HIVE-16907:


[~pxiong] and [~lirui], thank you for your comments.
I tried CREATE TABLE statement in MySQL, and found that it treats the `db.tbl` 
as the table name. And "dot" is allowed in the table name. 
e.g.

{code:java}
mysql> create table xxx (col int);
mysql> create table test.yyy (col int);
mysql> create table `test.zzz` (col int);
mysql> create table `test.test.tbl` (col int);

mysql> show tables;
++
| Tables_in_test |
++
| test.test.tbl  |
| test.zzz   |
| xxx|
| yyy|
++
{code}

Back to Hive, if we would like to make it having the same behavior as MySQL, we 
should change the logic of processing it.
My previous patch is NOT enough and can't handle `db.db.tbl` neither.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0

[jira] [Updated] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled

2017-07-12 Thread Bing Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-13384:
---
Description: 
I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
But found that it can't new a HiveMetaStoreClient object successfully via a 
proxy user in Kerberos env.

===
15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
==

When I debugging on Hive, I found that the error came from open() method in 
HiveMetaStoreClient class.

Around line 406,
 transport = UserGroupInformation.getCurrentUser().doAs(new 
PrivilegedExceptionAction() {  //FAILED, because the current user 
doesn't have the cridential

But it will work if I change above line to
 transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
PrivilegedExceptionAction() {  //PASS

I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when 
initialize the object via HCatalog.

It would be better to fix this issue in Hive side.

  was:
I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
But found that it can't new a HiveMetaStoreClient object successfully via a 
proxy using in Kerberos env.

===
15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
==

When I debugging on Hive, I found that the error came from open() method in 
HiveMetaStoreClient class.

Around line 406,
 transport = UserGroupInformation.getCurrentUser().doAs(new 
PrivilegedExceptionAction() {  //FAILED, because the current user 
doesn't have the cridential

But it will work if I change above line to
 transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
PrivilegedExceptionAction() {  //PASS

I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when 
initialize the object via HCatalog.

It would be better to fix this issue in Hive side.


> Failed to create HiveMetaStoreClient object with proxy user when Kerberos 
> enabled
> -
>
> Key: HIVE-13384
> URL: https://issues.apache.org/jira/browse/HIVE-13384
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>
> I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0)
> But found that it can't new a HiveMetaStoreClient object successfully via a 
> proxy user in Kerberos env.
> ===
> 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
> ==
> When I debugging on Hive, I found that the error came from open() method in 
> HiveMetaStoreClient class.
> Around line 406,
>  transport = UserGroupInformation.getCurrentUser().doAs(new 
> PrivilegedExceptionAction() {  //FAILED, because the current user 
> doesn't have the cridential
> But it will work if I change above line to
>  transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new 
> PrivilegedExceptionAction() {  //PASS
> I found DRILL-3413 fixes this error in Drill side as a workaround. But if I 
> submit a mapreduce job via Pig/HCatalog, it runs into the same issue again 
> when initialize the object

[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084226#comment-16084226
 ] 

Adam Szita commented on HIVE-8838:
--

Test results above are irrelevant again - I think this is ready for commit

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084197#comment-16084197
 ] 

Hive QA commented on HIVE-8838:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876860/HIVE-8838.4.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10873 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5982/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5982/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5982/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876860 - PreCommit-HIVE-Build

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch, HIVE-8838.4.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-12 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17073:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Fixed TestVectorSelectOperator and pushed to master, thanks for reviewing 
[~mmccline]!

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.03.patch, HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests

2017-07-12 Thread Marta Kuczora (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-17072:
-
Status: Patch Available  (was: Open)

> Make the parallelized timeout configurable in BeeLine tests
> ---
>
> Key: HIVE-17072
> URL: https://issues.apache.org/jira/browse/HIVE-17072
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Minor
> Attachments: HIVE-17072.1.patch
>
>
> When running the BeeLine tests parallel, the timeout is hardcoded in the 
> Parallelized.java:
> {noformat}
> @Override
> public void finished() {
>   executor.shutdown();
>   try {
> executor.awaitTermination(10, TimeUnit.MINUTES);
>   } catch (InterruptedException exc) {
> throw new RuntimeException(exc);
>   }
> }
> {noformat}
> It would be better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084183#comment-16084183
 ] 

Bing Li commented on HIVE-16922:


Thank you, [~lirui]. Seems that the result page has been expired. Just 
re-submitted the patch to check.

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: HIVE-16922.2.patch

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-12 Thread Bing Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16922:
---
Attachment: (was: HIVE-16922.2.patch)

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-12 Thread Bing Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084174#comment-16084174
 ] 

Bing Li commented on HIVE-4577:
---

Thank you, [~vgumashta]. I could reproduce TestPerfCliDriver [query14] in my 
env, and update its golden file. The failure of 
TestMiniLlapLocalCliDriver[vector_if_expr] and 
TestBeeLineDriver[materialized_view_create_rewrite] should not caused by this 
patch.

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 >

1 - 100 of 141 matches

Mail list logo