[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979736#comment-15979736
 ] 

Josh Elser commented on HIVE-15795:
---

Lefty, yes, Mike had pinged me offline. He's putting something together and I 
still have write karma afaik.

I'll try to take a look at the test failures tmrw. They passed for me locally, 
so hopefully it's just something minor.

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16497) FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file system operations should be impersonated

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979733#comment-15979733
 ] 

Hive QA commented on HIVE-16497:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864459/HIVE-16497.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10622 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_add_partition]
 (batchId=87)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_alterpart_loc]
 (batchId=87)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_create_table1]
 (batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_create_table_ext]
 (batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_createdb]
 (batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_index]
 (batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_insert]
 (batchId=87)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_insert_local]
 (batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_load_data]
 (batchId=87)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=236)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4837/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4837/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4837/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864459 - PreCommit-HIVE-Build

> FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file 
> system operations should be impersonated
> --
>
> Key: HIVE-16497
> URL: https://issues.apache.org/jira/browse/HIVE-16497
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 3.0.0
>
> Attachments: HIVE-16497.1.patch
>
>
> FileUtils.isActionPermittedForFileHierarchy checks if user has permissions 
> for given action. The checks are made by impersonating the user.
> However, the listing of child dirs are done as the hiveserver2 user. If the 
> hive user doesn't have permissions on the filesystem, it gives incorrect 
> error that the user doesn't have permissions to perform the action.
> Impersonating the end user for all file operations in that function is also 
> logically correct thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16296) use LLAP executor count to configure reducer auto-parallelism

2017-04-21 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979732#comment-15979732
 ] 

Lefty Leverenz commented on HIVE-16296:
---

Doc note:  This adds *hive.tez.llap.min.reducer.per.executor* to HiveConf.java, 
so it needs to be documented in the Tez section of Configuration Properties 
with a crossreferece in the LLAP section.

* [Configuration Properties -- Tez | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez]
* [Configuration Properties -- LLAP | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAP]

Added a TODOC3.0 label.

> use LLAP executor count to configure reducer auto-parallelism
> -
>
> Key: HIVE-16296
> URL: https://issues.apache.org/jira/browse/HIVE-16296
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16296.01.patch, HIVE-16296.03.patch, 
> HIVE-16296.04.patch, HIVE-16296.05.patch, HIVE-16296.06.patch, 
> HIVE-16296.07.patch, HIVE-16296.08.patch, HIVE-16296.09.patch, 
> HIVE-16296.10.patch, HIVE-16296.10.patch, HIVE-16296.11.patch, 
> HIVE-16296.12.patch, HIVE-16296.2.patch, HIVE-16296.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16296) use LLAP executor count to configure reducer auto-parallelism

2017-04-21 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16296:
--
Labels: TODOC3.0  (was: )

> use LLAP executor count to configure reducer auto-parallelism
> -
>
> Key: HIVE-16296
> URL: https://issues.apache.org/jira/browse/HIVE-16296
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16296.01.patch, HIVE-16296.03.patch, 
> HIVE-16296.04.patch, HIVE-16296.05.patch, HIVE-16296.06.patch, 
> HIVE-16296.07.patch, HIVE-16296.08.patch, HIVE-16296.09.patch, 
> HIVE-16296.10.patch, HIVE-16296.10.patch, HIVE-16296.11.patch, 
> HIVE-16296.12.patch, HIVE-16296.2.patch, HIVE-16296.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16058) Disable falling back to non-cbo for SemanticException for tests

2017-04-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16058:

Status: Patch Available  (was: Open)

> Disable falling back to non-cbo for SemanticException for tests
> ---
>
> Key: HIVE-16058
> URL: https://issues.apache.org/jira/browse/HIVE-16058
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16058.1.patch, HIVE-16058.2.patch, 
> HIVE-16058.2.patch, HIVE-16058.3.patch
>
>
> Currently optimizer falls back to non-cbo path if cbo path throws an 
> exception of type SemanticException. This might be eclipsing some genuine 
> issues within cbo-path.
> We would like to turn off the fall back mechanism for tests to see if there 
> are indeed genuine issues/bugs within cbo path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16058) Disable falling back to non-cbo for SemanticException for tests

2017-04-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16058:

Status: Open  (was: Patch Available)

> Disable falling back to non-cbo for SemanticException for tests
> ---
>
> Key: HIVE-16058
> URL: https://issues.apache.org/jira/browse/HIVE-16058
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16058.1.patch, HIVE-16058.2.patch, 
> HIVE-16058.2.patch, HIVE-16058.3.patch
>
>
> Currently optimizer falls back to non-cbo path if cbo path throws an 
> exception of type SemanticException. This might be eclipsing some genuine 
> issues within cbo-path.
> We would like to turn off the fall back mechanism for tests to see if there 
> are indeed genuine issues/bugs within cbo path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16058) Disable falling back to non-cbo for SemanticException for tests

2017-04-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16058:

Attachment: HIVE-16058.3.patch

> Disable falling back to non-cbo for SemanticException for tests
> ---
>
> Key: HIVE-16058
> URL: https://issues.apache.org/jira/browse/HIVE-16058
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16058.1.patch, HIVE-16058.2.patch, 
> HIVE-16058.2.patch, HIVE-16058.3.patch
>
>
> Currently optimizer falls back to non-cbo path if cbo path throws an 
> exception of type SemanticException. This might be eclipsing some genuine 
> issues within cbo-path.
> We would like to turn off the fall back mechanism for tests to see if there 
> are indeed genuine issues/bugs within cbo path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15982) Support the width_bucket function

2017-04-21 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979716#comment-15979716
 ] 

Lefty Leverenz commented on HIVE-15982:
---

Doc note:  The width_bucket function needs to be documented in the UDFs wiki.  
I'm not sure which section it belongs in -- perhaps UDAFs.

* [Hive Operators and UDFs | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF]

Added a TODOC3.0 label.

> Support the width_bucket function
> -
>
> Key: HIVE-15982
> URL: https://issues.apache.org/jira/browse/HIVE-15982
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Sahil Takiar
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-15982.1.patch, HIVE-15982.2.patch, 
> HIVE-15982.3.patch, HIVE-15982.4.patch, HIVE-15982.5.patch, HIVE-15982.6.patch
>
>
> Support the width_bucket(wbo, wbb1, wbb2, wbc) which returns an integer 
> between 0 and wbc+1 by mapping wbo into the ith equally sized bucket made by 
> dividing wbb1 and wbb2 into equally sized regions. If wbo < wbb1, return 1, 
> if wbo > wbb2 return wbc+1. Reference: SQL standard section 4.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979712#comment-15979712
 ] 

Hive QA commented on HIVE-15665:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864574/HIVE-15665.wo.16233.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4836/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4836/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4836/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-04-22 02:24:04.440
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4836/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-04-22 02:24:04.442
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil 
Takiar via Ashutosh Chauhan)
+ git clean -f -d
Removing ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java.orig
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil 
Takiar via Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-04-22 02:24:09.802
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestLowLevelCacheImpl.java:394
error: 
llap-server/src/test/org/apache/hadoop/hive/llap/cache/TestLowLevelCacheImpl.java:
 patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864574 - PreCommit-HIVE-Build

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.04.patch, HIVE-15665.patch, 
> HIVE-15665.wo.16233.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16441) De-duplicate semijoin branches in n-way joins

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979706#comment-15979706
 ] 

Hive QA commented on HIVE-16441:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864564/HIVE-16441.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10626 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[semijoin_hint]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4835/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4835/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4835/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864564 - PreCommit-HIVE-Build

> De-duplicate semijoin branches in n-way joins
> -
>
> Key: HIVE-16441
> URL: https://issues.apache.org/jira/browse/HIVE-16441
> Project: Hive
>  Issue Type: Improvement
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16441.1.patch, HIVE-16441.2.patch, 
> HIVE-16441.3.patch
>
>
> Currently in n-way joins, semi join optimization creates n branches on same 
> key. Instead it should reuse one branch for all the joins.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15982) Support the width_bucket function

2017-04-21 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-15982:
--
Labels: TODOC3.0  (was: )

> Support the width_bucket function
> -
>
> Key: HIVE-15982
> URL: https://issues.apache.org/jira/browse/HIVE-15982
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Sahil Takiar
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-15982.1.patch, HIVE-15982.2.patch, 
> HIVE-15982.3.patch, HIVE-15982.4.patch, HIVE-15982.5.patch, HIVE-15982.6.patch
>
>
> Support the width_bucket(wbo, wbb1, wbb2, wbc) which returns an integer 
> between 0 and wbc+1 by mapping wbo into the ith equally sized bucket made by 
> dividing wbb1 and wbb2 into equally sized regions. If wbo < wbb1, return 1, 
> if wbo > wbb2 return wbc+1. Reference: SQL standard section 4.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16392) Remove hive.warehouse.subdir.inherit.perms and all permissions inheritance logic

2017-04-21 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979693#comment-15979693
 ] 

Lefty Leverenz commented on HIVE-16392:
---

Doc note:  The removal of *hive.warehouse.subdir.inherit.perms* needs to be 
documented in the wiki.  It would be good to mention the use of external 
security systems and include Ashutosh's comment "user may also choose 
SQLStdAuth which ships with Hive if user doesn't want to use external security 
system."

Also, the wiki says that when *hive.files.umask.value* was removed in 0.9.0, it 
was replaced by *hive.warehouse.subdir.inherit.perms* -- perhaps that should be 
updated too.

* [Configuration Properties -- hive.warehouse.subdir.inherit.perms | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.warehouse.subdir.inherit.perms]
* [Configuration Properties -- hive.files.umask.value | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.files.umask.value]

Added a TODOC3.0 label.

> Remove hive.warehouse.subdir.inherit.perms and all permissions inheritance 
> logic
> 
>
> Key: HIVE-16392
> URL: https://issues.apache.org/jira/browse/HIVE-16392
> Project: Hive
>  Issue Type: Task
>  Components: Security
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC3.0, backwards-incompatible
> Fix For: 3.0.0
>
> Attachments: HIVE-16392.1.patch, HIVE-16392.2.patch
>
>
> As discussed in HIVE-16346 we should remove the config 
> {{hive.warehouse.subdir.inherit.perms}} and all the permissions inheritance 
> logic.
> This feature is no longer needed in Hive as the traditional permission model 
> has largely been replaced by external security systems such as Ranger and 
> Sentry.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16392) Remove hive.warehouse.subdir.inherit.perms and all permissions inheritance logic

2017-04-21 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16392:
--
Labels: TODOC3.0 backwards-incompatible  (was: backwards-incompatible)

> Remove hive.warehouse.subdir.inherit.perms and all permissions inheritance 
> logic
> 
>
> Key: HIVE-16392
> URL: https://issues.apache.org/jira/browse/HIVE-16392
> Project: Hive
>  Issue Type: Task
>  Components: Security
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC3.0, backwards-incompatible
> Fix For: 3.0.0
>
> Attachments: HIVE-16392.1.patch, HIVE-16392.2.patch
>
>
> As discussed in HIVE-16346 we should remove the config 
> {{hive.warehouse.subdir.inherit.perms}} and all the permissions inheritance 
> logic.
> This feature is no longer needed in Hive as the traditional permission model 
> has largely been replaced by external security systems such as Ranger and 
> Sentry.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979688#comment-15979688
 ] 

Ashutosh Chauhan commented on HIVE-15795:
-

[~sershe] accumulo_index.q test is failing after this commit. Please take a 
look?

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16239) remove useless hiveserver

2017-04-21 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979681#comment-15979681
 ] 

Lefty Leverenz commented on HIVE-16239:
---

I added the fix versions.

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.0.1, 2.1.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Fix For: 2.0.2, 2.1.2
>
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.1.patch, HIVE-16239.2-branch-2.0.patch, 
> HIVE-16239.2-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16239) remove useless hiveserver

2017-04-21 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16239:
--
Fix Version/s: 2.1.2
   2.0.2

> remove useless hiveserver
> -
>
> Key: HIVE-16239
> URL: https://issues.apache.org/jira/browse/HIVE-16239
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.0.1, 2.1.1
>Reporter: Fei Hui
>Assignee: Fei Hui
> Fix For: 2.0.2, 2.1.2
>
> Attachments: HIVE-16239.1-branch-2.0.patch, 
> HIVE-16239.1-branch-2.1.patch, HIVE-16239.2-branch-2.0.patch, 
> HIVE-16239.2-branch-2.1.patch
>
>
> {quote}
> [hadoop@header hive]$ hive --service hiveserver
> Starting Hive Thrift Server
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/apps/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/spark-1.6.2-bin-hadoop2.7/lib/spark-assembly-1.6.2-hadoop2.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/apps/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.service.HiveServer
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {quote}
> hiveserver does not exist, we should remove hiveserver from cli on branch-2.0
> After removing it, we get useful message
> {quote}
> Service hiveserver not found
> Available Services: beeline cli hbaseimport hbaseschematool help 
> hiveburninclient hiveserver2 hplsql hwi jar lineage llap metastore metatool 
> orcfiledump rcfilecat schemaTool version
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16079) HS2: high memory pressure due to duplicate Properties objects

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979677#comment-15979677
 ] 

Hive QA commented on HIVE-16079:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864561/HIVE-16079.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10626 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=96)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4834/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4834/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4834/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864561 - PreCommit-HIVE-Build

> HS2: high memory pressure due to duplicate Properties objects
> -
>
> Key: HIVE-16079
> URL: https://issues.apache.org/jira/browse/HIVE-16079
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-16079.01.patch, HIVE-16079.02.patch, 
> HIVE-16079.03.patch, hs2-crash-2000p-500m-50q.txt
>
>
> I've created a Hive table with 2000 partitions, each backed by two files, 
> with one row in each file. When I execute some number of concurrent queries 
> against this table, e.g. as follows
> {code}
> for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p 
> admin -e "select count(i_f_1) from misha_table;" & done
> {code}
> it results in a big memory spike. With 20 queries I caused an OOM in a HS2 
> server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
> I am attaching the results of jxray (www.jxray.com) analysis of a heap dump 
> that was generated in the 50queries/500m heap scenario. It suggests that 
> there are several opportunities to reduce memory pressure with not very 
> invasive changes to the code. One (duplicate strings) has been addressed in 
> https://issues.apache.org/jira/browse/HIVE-15882 In this ticket, I am going 
> to address the fact that almost 20% of memory is used by instances of 
> java.util.Properties. These objects are highly duplicate, since for each 
> partition each concurrently running query creates its own copy of Partion, 
> PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 
> partitions) Properties in memory. By interning/deduplicating these objects we 
> may be able to save perhaps 15% of memory.
> Note, however, that if there are queries that mutate partitions, the 
> corresponding Properties would be mutated as well. Thus we cannot simply use 
> a single "canonicalized" Properties object at all times for all Partition 
> objects representing the same DB partition. Instead, I am going to introduce 
> a special CopyOnFirstWriteProperties class. Such an object initially 
> internally references a canonicalized Properties object, and keeps doing so 
> while only read methods are called. However, once any mutating method is 
> called, the given CopyOnFirstWriteProperties copies the data into its own 
> table from the canonicalized table, and uses it ever after.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2017-04-21 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979661#comment-15979661
 ] 

Lefty Leverenz commented on HIVE-11418:
---

Okay, thanks Sergio.

> Dropping a database in an encryption zone with CASCADE and trash enabled fails
> --
>
> Key: HIVE-11418
> URL: https://issues.apache.org/jira/browse/HIVE-11418
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-11418.1.patch, HIVE-11418.2.patch
>
>
> Here's the query that fails:
> {noformat}
> hive> CREATE DATABASE db;
> hive> USE db;
> hive> CREATE TABLE a(id int);
> hive> SET fs.trash.interval=1;
> hive> DROP DATABASE db CASCADE;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> db.a because it is in an encryption zone and trash
>  is enabled.  Use PURGE option to skip trash.)
> {noformat}
> DROP DATABASE does not support PURGE, so we have to remove the tables one by 
> one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-21 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979659#comment-15979659
 ] 

Lefty Leverenz commented on HIVE-15795:
---

Should this be documented in the wiki?

* [Hive Accumulo Integration | 
https://cwiki.apache.org/confluence/display/Hive/AccumuloIntegration]

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16058) Disable falling back to non-cbo for SemanticException for tests

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979642#comment-15979642
 ] 

Hive QA commented on HIVE-16058:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864560/HIVE-16058.2.patch

{color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10626 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_in_file] (batchId=22)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] 
(batchId=92)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_gby] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_join]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_limit]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_semijoin]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[udf_in_file] 
(batchId=108)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4833/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4833/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4833/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864560 - PreCommit-HIVE-Build

> Disable falling back to non-cbo for SemanticException for tests
> ---
>
> Key: HIVE-16058
> URL: https://issues.apache.org/jira/browse/HIVE-16058
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16058.1.patch, HIVE-16058.2.patch, 
> HIVE-16058.2.patch
>
>
> Currently optimizer falls back to non-cbo path if cbo path throws an 
> exception of type SemanticException. This might be eclipsing some genuine 
> issues within cbo-path.
> We would like to turn off the fall back mechanism for tests to see if there 
> are indeed genuine issues/bugs within cbo path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
--
Attachment: HIVE-16321.03-branch-2.patch

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, 
> HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
--
Comment: was deleted

(was: don't get why this failed to apply)

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, 
> HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
--
Attachment: (was: HIVE-16321.03.branch-2.patch)

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, 
> HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
--
Attachment: HIVE-16321.03.branch-2.patch

don't get why this failed to apply

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, 
> HIVE-16321.03.branch-2.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16484:

Attachment: HIVE-16484.3.patch

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, 
> HIVE-16484.3.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979576#comment-15979576
 ] 

Hive QA commented on HIVE-16321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864558/HIVE-16321.02.branch-2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4832/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4832/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4832/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-04-21 23:31:56.671
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4832/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-04-21 23:31:56.674
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil 
Takiar via Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil 
Takiar via Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-04-21 23:31:57.802
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:737
error: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: patch does 
not apply
error: patch failed: 
metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:147
error: metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: 
patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864558 - PreCommit-HIVE-Build

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This me

[jira] [Commented] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979568#comment-15979568
 ] 

Hive QA commented on HIVE-12636:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864557/HIVE-12636.13.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10618 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoinopt1] 
(batchId=73)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4831/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4831/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4831/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864557 - PreCommit-HIVE-Build

> Ensure that all queries (with DbTxnManager) run in a transaction
> 
>
> Key: HIVE-12636
> URL: https://issues.apache.org/jira/browse/HIVE-12636
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, 
> HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, 
> HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, 
> HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch
>
>
> Assuming Hive is using DbTxnManager
> Currently (as of this writing only auto commit mode is supported), only 
> queries that write to an Acid table start a transaction.
> Read-only queries don't open a txn but still acquire locks.
> This makes internal structures confusing/odd.
> The are constantly 2 code paths to deal with which is inconvenient and error 
> prone.
> Also, a txn id is convenient "handle" for all locks/resources within a txn.
> Doing thing would mean the client no longer needs to track locks that it 
> acquired.  This enables further improvements to metastore side of Acid.
> # add metastore call to openTxn() and acquireLocks() in a single call.  this 
> it to make sure perf doesn't degrade for read-only query.  (Would also be 
> useful for auto commit write queries)
> # Should RO queries generate txn ids from the same sequence?  (they could for 
> example use negative values of a different sequence).  Txnid is part of the 
> delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
> we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
> there is value in being able to pick txn id and commit timestamp out of the 
> same logical sequence.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979553#comment-15979553
 ] 

Carter Shanklin commented on HIVE-16496:


I'll mention that the first use case, that of excluding columns with duplicate 
names, is handled by the named columns join, HIVE-15983

So: select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
Is the same as: select * from t1 join t2 using (x);

If that's a very common problem this may offer some relief. I think there's 
also a ticket for natural join support which automatically resolves the column 
names and uses them for the join list.

IMO the proposal is interesting. Question I have, if the table is wide, say 
2000 columns, how likely is a user to know the ordinal positions of the columns 
to exclude? It seems like the name would be the most likely info to have at 
hand.

> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding 
> asterisk.
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> Considering a table may hold hundreds or even thousands of column, this can 
> be come very ugly and error prone.
> Often this require some scripting work.
> h4. Extended syntax: positional notation support
> The positional notation is similar to the one used but the *cut* unix command 
> {code}
> select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
> select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
> select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
> select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 
> (same as previous example)
> select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
> columns from the 5th column to the end
> select * exclude (-$(1,2))from t   -- exclude last 2 columns
> select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
> column from the end
> select * exclude (-$(4-)) from t   -- exclude all columns except for the 
> last 3
> {code}
> A complex example would look like:
> {code}
> select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
> {code}
> exclude:
> - first 3 columns
> - 5th and 7th columns
> - x and y 
> - last 2 columns
> P.s. 1
> There should be *no* error raised for the following scenarios:
>  - Excluding a column that does not exists in the columns set
>  - Excluding the same column more than once (e.g. by name and by position).
> - Excluding all columns
> P.s. 2
> This enhancement answers a real need that is being raised again again in the 
> Hive users community as well in legacy RDBMS communities. 
> As far as I know, no provider had yet implemented something similar and we 
> have an opportunity here to lead the SQL ecosystem.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16496) Enhance asterisk expression (as in "select *") with EXCLUDE clause

2017-04-21 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979553#comment-15979553
 ] 

Carter Shanklin edited comment on HIVE-16496 at 4/21/17 11:17 PM:
--

I'll mention that the first use case, that of excluding columns with duplicate 
names, is handled by the named columns join, HIVE-15983

So:
{code}
select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
{code}
Is the same as:
{code}
select * from t1 join t2 using (x);
{code}

If that's a very common problem this may offer some relief. I think there's 
also a ticket for natural join support which automatically resolves the column 
names and uses them for the join list.

IMO the proposal is interesting. Question I have, if the table is wide, say 
2000 columns, how likely is a user to know the ordinal positions of the columns 
to exclude? It seems like the name would be the most likely info to have at 
hand.


was (Author: cartershanklin):
I'll mention that the first use case, that of excluding columns with duplicate 
names, is handled by the named columns join, HIVE-15983

So: select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
Is the same as: select * from t1 join t2 using (x);

If that's a very common problem this may offer some relief. I think there's 
also a ticket for natural join support which automatically resolves the column 
names and uses them for the join list.

IMO the proposal is interesting. Question I have, if the table is wide, say 
2000 columns, how likely is a user to know the ordinal positions of the columns 
to exclude? It seems like the name would be the most likely info to have at 
hand.

> Enhance asterisk expression (as in "select *") with EXCLUDE clause
> --
>
> Key: HIVE-16496
> URL: https://issues.apache.org/jira/browse/HIVE-16496
> Project: Hive
>  Issue Type: Wish
>  Components: Parser
>Reporter: Dudu Markovitz
>
> support the following syntax:
> {code}
> select * exclude (a,b,e) from t
> {code}
> which for a table t with columns a,b,c,d,e would be equal to:
> {code}
> select c,d from t
> {code}
> Please note that the EXCLUDE clause relates directly to its preceding 
> asterisk.
> A common use case would be:
> {code}
> select t1.* exclude (x), t2.* from t1 join t2 on t1.x=t2.x;
> {code}
> This supplies a very clean way to select all columns without getting 
> "Ambiguous column reference" and without the need to specify all the columns 
> of at least one of the tables.
>  
> Currently, without this enhancement, the query would look something like this:
> {code}
> select a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,t2.* from t1 join t2 
> on t1.x=t2.x;
> {code}
> Considering a table may hold hundreds or even thousands of column, this can 
> be come very ugly and error prone.
> Often this require some scripting work.
> h4. Extended syntax: positional notation support
> The positional notation is similar to the one used but the *cut* unix command 
> {code}
> select * exclude ($(1,2,5))   from t   -- exclude columns 1, 2 and 5
> select * exclude ($(1-3)) from t   -- exclude columns 1 to 3
> select * exclude ($(1,3-5,7)) from t   -- exclude columns 1,3 to 5 and 7
> select * exclude ($(7,3-5,1)) from t   -- exclude columns 1,3 to 5 and 7 
> (same as previous example)
> select * exclude ($(3,5-))from t   -- exclude the 3rd column and all 
> columns from the 5th column to the end
> select * exclude (-$(1,2))from t   -- exclude last 2 columns
> select * exclude (-$(1-3,7))  from t   -- exclude last 3 columns and the 7th 
> column from the end
> select * exclude (-$(4-)) from t   -- exclude all columns except for the 
> last 3
> {code}
> A complex example would look like:
> {code}
> select * exclude ($(1-3,5,7),x,y,-$(1-2)) from t  
> {code}
> exclude:
> - first 3 columns
> - 5th and 7th columns
> - x and y 
> - last 2 columns
> P.s. 1
> There should be *no* error raised for the following scenarios:
>  - Excluding a column that does not exists in the columns set
>  - Excluding the same column more than once (e.g. by name and by position).
> - Excluding all columns
> P.s. 2
> This enhancement answers a real need that is being raised again again in the 
> Hive users community as well in legacy RDBMS communities. 
> As far as I know, no provider had yet implemented something similar and we 
> have an opportunity here to lead the SQL ecosystem.
>   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-04-21 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979552#comment-15979552
 ] 

Prasanth Jayachandran commented on HIVE-16503:
--

[~sseth] [~hagleitn] could someone please take a look?

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16503.1.patch
>
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979501#comment-15979501
 ] 

Hive QA commented on HIVE-16484:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864554/HIVE-16484.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4830/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4830/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4830/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-04-21 22:32:37.242
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4830/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-04-21 22:32:37.245
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil 
Takiar via Ashutosh Chauhan)
+ git clean -f -d
Removing ql/src/java/org/apache/hadoop/hive/ql/QueryInfo.java
Removing 
service/src/java/org/apache/hive/service/cli/operation/QueryInfoCache.java
Removing 
service/src/test/org/apache/hive/service/cli/operation/TestQueryLifeTimeHooksWithSQLOperation.java
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil 
Takiar via Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-04-21 22:32:38.403
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file common/src/java/org/apache/hadoop/hive/common/ProcessRunner.java
patching file 
spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java
patching file 
spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriverLocalRunner.java
patching file 
spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
[ERROR] COMPILATION ERROR : 
[ERROR] 
/data/hiveptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java:[80,12]
 unreported exception java.lang.InterruptedException; must be caught or 
declared to be thrown
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) 
on project spark-client: Compilation failure
[ERROR] 
/data/hiveptest/working/apache-github-source-source/spark-client/src/main/java/org/apache/hive/spark/client/SparkClientFactory.java:[80,12]
 unreported exception java.lang.InterruptedException; must be caught or 
declared to be thrown
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :spark-client
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864554 - PreCommit-HIVE-Build

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> ---

[jira] [Commented] (HIVE-16389) Allow HookContext to access SQLOperationDisplay

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979496#comment-15979496
 ] 

Hive QA commented on HIVE-16389:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864553/HIVE-16389.5.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10623 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=226)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=237)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4829/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4829/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4829/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864553 - PreCommit-HIVE-Build

> Allow HookContext to access SQLOperationDisplay
> ---
>
> Key: HIVE-16389
> URL: https://issues.apache.org/jira/browse/HIVE-16389
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16389.1.patch, HIVE-16389.2.patch, 
> HIVE-16389.3.patch, HIVE-16389.4.patch, HIVE-16389.5.patch
>
>
> There is a lot of useful information in {{SQLOperationDisplay}} that users of 
> Hive Hooks may be interested in.
> We should allow Hive Hooks to access this info by adding the 
> {{SQLOperationDisplay}} to {{HookContext}}.
> This will allow hooks to have access to all information available in the HS2 
> Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-04-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-16503:
-
Status: Patch Available  (was: Open)

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16503.1.patch
>
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-04-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-16503:
-
Attachment: HIVE-16503.1.patch

Currently testing this patch. The logic is up for review. Will include unit 
tests and qfile test shortly. 

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16503.1.patch
>
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11133) Support hive.explain.user for Spark

2017-04-21 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979487#comment-15979487
 ] 

Sahil Takiar commented on HIVE-11133:
-

Thanks [~xuefuz] - addressed the comments from the RB, updated the RB, attached 
an updated patch. Created HIVE-16507 to track the naming issue about vertex 
dependencies. Created HIVE-16508 and HIVE-16509 as additional follow up tasks 
for this JIRA.

[~pxiong] yeah its odd. I created HIVE-16507 as a follow up item to address the 
issue. It has all the details.

> Support hive.explain.user for Spark
> ---
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Mohit Sabharwal
>Assignee: Sahil Takiar
> Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch, 
> HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch, 
> HIVE-11133.6.patch, HIVE-11133.7.patch, HIVE-11133.8.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support 
> Spark as well. 
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16509) Improve HoS user-level explain plan for Local Work and Map Reduce Local

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16509:
---


> Improve HoS user-level explain plan for Local Work and Map Reduce Local
> ---
>
> Key: HIVE-16509
> URL: https://issues.apache.org/jira/browse/HIVE-16509
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The current user-level explain plans (this may affect regular explain plans 
> too, haven't checked) for HoS create empty entries for {{Local Work}} and 
> {{Map Reduce Local}}
> The sections for local work are largely empty. We should add more info this 
> section (or remove if it isn't needed).
> Example:
> {code}
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (GROUP)
> Reducer 3 <- Reducer 2 (GROUP)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 3
>   File Output Operator [FS_14]
> Group By Operator [GBY_12] (rows=1 width=16)
>   
> Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)"]
> <-Reducer 2 [GROUP]
>   GROUP [RS_11]
> Group By Operator [GBY_10] (rows=1 width=16)
>   
> Output:["_col0","_col1"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)"]
> <-Map 1 [GROUP]
>   GROUP [RS_9]
> PartitionCols:rand()
> Select Operator [SEL_7] (rows=1 width=33)
>   Output:["_col0","_col1"]
>   Map Join Operator [MAPJOIN_15] (rows=1 width=33)
> Conds:SEL_1.(UDFToDouble(_col0) + 
> 1.0)=SEL_1.UDFToDouble(_col0)(Left Outer),Output:["_col0","_col2"]
>   <-Select Operator [SEL_1] (rows=1 width=30)
>   Output:["_col0"]
>   TableScan [TS_0] (rows=1 width=30)
> default@t1,k,Tbl:COMPLETE,Col:NONE,Output:["key"]
>   Map Reduce Local Work
> Stage-2
>   Map 4
>   keys: [HASHTABLESINK_17]
> 0(UDFToDouble(_col0) + 1.0),1UDFToDouble(_col0)
> Select Operator [SEL_3] (rows=1 width=30)
>   Output:["_col0","_col1"]
>   TableScan [TS_2] (rows=1 width=30)
> default@t1,v,Tbl:COMPLETE,Col:NONE,Output:["key","val"]
>   Map Reduce Local Work
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16508) Add additional tests for HoS user-level explain plans

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16508:
---


> Add additional tests for HoS user-level explain plans
> -
>
> Key: HIVE-16508
> URL: https://issues.apache.org/jira/browse/HIVE-16508
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> In HIVE-11133 we added support for user-level explain plans to HoS. In order 
> to test this new feature we copied the {{explainuser_1.q}} qfile for Spark. 
> However, there are three other qfiles that test user-level explain plans 
> ({{explainuser_2.q}}, {{explainuser_3.q}}, {{explainuser_4.q}}).
> We should add these qfiles for HoS too and fix any bugs that come up while 
> adding them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16507) Hive Explain User-Level may print out "Vertex dependency in root stage" twice

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16507:

Summary: Hive Explain User-Level may print out "Vertex dependency in root 
stage" twice  (was: Hive Explain User-Level may print out "Vertex dependency in 
root stage")

> Hive Explain User-Level may print out "Vertex dependency in root stage" twice
> -
>
> Key: HIVE-16507
> URL: https://issues.apache.org/jira/browse/HIVE-16507
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> User-level explain plans have a section titled {{Vertex dependency in root 
> stage}} - which (according to the name) prints out the dependencies between 
> all vertices that are in the root stage.
> This logic is controlled by {{DagJsonParser#print}} and it may print out 
> {{Vertex dependency in root stage}} twice.
> The logic in this method first extracts all stages and plans. It then 
> iterates over all the stages, and if the stage contains any edges, it prints 
> them out.
> If we want to be consistent with the statement {{Vertex dependency in root 
> stage}} then we should add a check to see if the stage we are processing 
> during the iteration is the root stage or not.
> Alternatively, we could print out the edges for each stage and change the 
> line from {{Vertex dependency in root stage}} to {{Vertex dependency in 
> [stage-id]}}
> I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root 
> stage that contains edges, but it is possible for Hive-on-Spark (support 
> added for HoS in HIVE-11133).
> Example for HoS:
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.explain.user=true;
> set hive.spark.dynamic.partition.pruning=true;
> EXPLAIN select count(*) from srcpart where srcpart.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> Prints
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Map 9 (GROUP)
> Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
> Reducer 13 <- Map 12 (GROUP)
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
> Reducer 3 <- Reducer 2 (GROUP)
> Reducer 5 <- Map 4 (GROUP)
> Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
> Reducer 8 <- Map 7 (GROUP)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 3
>   File Output Operator [FS_34]
> Group By Operator [GBY_32] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Reducer 2 [GROUP]
>   GROUP [RS_31]
> Group By Operator [GBY_30] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count()"]
>   Join Operator [JOIN_28] (rows=2200 width=10)
> condition 
> map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
>   <-Map 1 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_26]
>   PartitionCols:_col0
>   Select Operator [SEL_2] (rows=2000 width=10)
> Output:["_col0"]
> TableScan [TS_0] (rows=2000 width=10)
>   default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
>   <-Reducer 6 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_27]
>   PartitionCols:_col0
>   Group By Operator [GBY_24] (rows=1 width=184)
> Output:["_col0"],keys:KEY._col0
>   <-Reducer 5 [GROUP]
> GROUP [RS_23]
>   PartitionCols:_col0
>   Group By Operator [GBY_22] (rows=2 width=184)
> Output:["_col0"],keys:_col0
> Filter Operator [FIL_9] (rows=1 width=184)
>   predicate:_col0 is not null
>   Group By Operator [GBY_7] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(VALUE._col0)"]
>   <-Map 4 [GROUP]
> GROUP [RS_6]
>   Group By Operator [GBY_5] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(ds)"]
> Select Operator [SEL_4] (rows=2000 width=10)
>   Output:["ds"]
>   TableScan [TS_3] (rows=2000 width=10)
> 
> de

[jira] [Assigned] (HIVE-16507) Hive Explain User-Level may print out "Vertex dependency in root stage"

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16507:
---


> Hive Explain User-Level may print out "Vertex dependency in root stage"
> ---
>
> Key: HIVE-16507
> URL: https://issues.apache.org/jira/browse/HIVE-16507
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> User-level explain plans have a section titled {{Vertex dependency in root 
> stage}} - which (according to the name) prints out the dependencies between 
> all vertices that are in the root stage.
> This logic is controlled by {{DagJsonParser#print}} and it may print out 
> {{Vertex dependency in root stage}} twice.
> The logic in this method first extracts all stages and plans. It then 
> iterates over all the stages, and if the stage contains any edges, it prints 
> them out.
> If we want to be consistent with the statement {{Vertex dependency in root 
> stage}} then we should add a check to see if the stage we are processing 
> during the iteration is the root stage or not.
> Alternatively, we could print out the edges for each stage and change the 
> line from {{Vertex dependency in root stage}} to {{Vertex dependency in 
> [stage-id]}}
> I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root 
> stage that contains edges, but it is possible for Hive-on-Spark (support 
> added for HoS in HIVE-11133).
> Example for HoS:
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.explain.user=true;
> set hive.spark.dynamic.partition.pruning=true;
> EXPLAIN select count(*) from srcpart where srcpart.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> Prints
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Map 9 (GROUP)
> Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
> Reducer 13 <- Map 12 (GROUP)
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
> Reducer 3 <- Reducer 2 (GROUP)
> Reducer 5 <- Map 4 (GROUP)
> Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
> Reducer 8 <- Map 7 (GROUP)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 3
>   File Output Operator [FS_34]
> Group By Operator [GBY_32] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Reducer 2 [GROUP]
>   GROUP [RS_31]
> Group By Operator [GBY_30] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count()"]
>   Join Operator [JOIN_28] (rows=2200 width=10)
> condition 
> map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
>   <-Map 1 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_26]
>   PartitionCols:_col0
>   Select Operator [SEL_2] (rows=2000 width=10)
> Output:["_col0"]
> TableScan [TS_0] (rows=2000 width=10)
>   default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
>   <-Reducer 6 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_27]
>   PartitionCols:_col0
>   Group By Operator [GBY_24] (rows=1 width=184)
> Output:["_col0"],keys:KEY._col0
>   <-Reducer 5 [GROUP]
> GROUP [RS_23]
>   PartitionCols:_col0
>   Group By Operator [GBY_22] (rows=2 width=184)
> Output:["_col0"],keys:_col0
> Filter Operator [FIL_9] (rows=1 width=184)
>   predicate:_col0 is not null
>   Group By Operator [GBY_7] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(VALUE._col0)"]
>   <-Map 4 [GROUP]
> GROUP [RS_6]
>   Group By Operator [GBY_5] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(ds)"]
> Select Operator [SEL_4] (rows=2000 width=10)
>   Output:["ds"]
>   TableScan [TS_3] (rows=2000 width=10)
> 
> default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
>   <-Reducer 8 [GROUP]
> GROUP [RS_23]
>   PartitionCols:_col0
> 

[jira] [Comment Edited] (HIVE-11133) Support hive.explain.user for Spark

2017-04-21 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979385#comment-15979385
 ] 

Sahil Takiar edited comment on HIVE-11133 at 4/21/17 10:08 PM:
---

[~xuefuz]

The query:

{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.explain.user=true;
set hive.spark.dynamic.partition.pruning=true;

EXPLAIN select count(*) from srcpart where srcpart.ds in (select 
max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
{code}

Prints

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 10 <- Map 9 (GROUP)
Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
Reducer 13 <- Map 12 (GROUP)

Vertex dependency in root stage
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
Reducer 3 <- Reducer 2 (GROUP)
Reducer 5 <- Map 4 (GROUP)
Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
Reducer 8 <- Map 7 (GROUP)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 3
  File Output Operator [FS_34]
Group By Operator [GBY_32] (rows=1 width=8)
  Output:["_col0"],aggregations:["count(VALUE._col0)"]
<-Reducer 2 [GROUP]
  GROUP [RS_31]
Group By Operator [GBY_30] (rows=1 width=8)
  Output:["_col0"],aggregations:["count()"]
  Join Operator [JOIN_28] (rows=2200 width=10)
condition 
map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
  <-Map 1 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_26]
  PartitionCols:_col0
  Select Operator [SEL_2] (rows=2000 width=10)
Output:["_col0"]
TableScan [TS_0] (rows=2000 width=10)
  default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
  <-Reducer 6 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_27]
  PartitionCols:_col0
  Group By Operator [GBY_24] (rows=1 width=184)
Output:["_col0"],keys:KEY._col0
  <-Reducer 5 [GROUP]
GROUP [RS_23]
  PartitionCols:_col0
  Group By Operator [GBY_22] (rows=2 width=184)
Output:["_col0"],keys:_col0
Filter Operator [FIL_9] (rows=1 width=184)
  predicate:_col0 is not null
  Group By Operator [GBY_7] (rows=1 width=184)
Output:["_col0"],aggregations:["max(VALUE._col0)"]
  <-Map 4 [GROUP]
GROUP [RS_6]
  Group By Operator [GBY_5] (rows=1 width=184)
Output:["_col0"],aggregations:["max(ds)"]
Select Operator [SEL_4] (rows=2000 width=10)
  Output:["ds"]
  TableScan [TS_3] (rows=2000 width=10)

default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
  <-Reducer 8 [GROUP]
GROUP [RS_23]
  PartitionCols:_col0
  Group By Operator [GBY_22] (rows=2 width=184)
Output:["_col0"],keys:_col0
Filter Operator [FIL_17] (rows=1 width=184)
  predicate:_col0 is not null
  Group By Operator [GBY_15] (rows=1 width=184)
Output:["_col0"],aggregations:["min(VALUE._col0)"]
  <-Map 7 [GROUP]
GROUP [RS_14]
  Group By Operator [GBY_13] (rows=1 width=184)
Output:["_col0"],aggregations:["min(ds)"]
Select Operator [SEL_12] (rows=2000 width=10)
  Output:["ds"]
  TableScan [TS_11] (rows=2000 width=10)

default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
Stage-2
  Reducer 11
{code}

So there are two sections that say {{Vertex dependency in root stage}}. I 
haven't checked to see if this is possible with Hive-on-Tez, but it looks like 
an existing bug in the user-level explain code.


was (Author: stakiar):
[~xuefuz]

The query:

{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.explain.use

[jira] [Updated] (HIVE-11133) Support hive.explain.user for Spark

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-11133:

Attachment: HIVE-11133.8.patch

> Support hive.explain.user for Spark
> ---
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Mohit Sabharwal
>Assignee: Sahil Takiar
> Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch, 
> HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch, 
> HIVE-11133.6.patch, HIVE-11133.7.patch, HIVE-11133.8.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support 
> Spark as well. 
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16506) HIVE-15982 broke show_functions.q

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved HIVE-16506.
-
Resolution: Fixed

Looks like it was fixed when committing HIVE-15982

> HIVE-15982 broke show_functions.q
> -
>
> Key: HIVE-16506
> URL: https://issues.apache.org/jira/browse/HIVE-16506
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16483) HoS should populate split related configurations to HiveConf

2017-04-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-16483:

Attachment: (was: HIVE-16483.1.patch)

> HoS should populate split related configurations to HiveConf
> 
>
> Key: HIVE-16483
> URL: https://issues.apache.org/jira/browse/HIVE-16483
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-16483.1.patch
>
>
> There are several split related configurations, such as 
> {{MAPREDMINSPLITSIZE}}, {{MAPREDMINSPLITSIZEPERNODE}}, 
> {{MAPREDMINSPLITSIZEPERRACK}}, etc., that should be populated to HiveConf. 
> Currently we only do this for {{MAPREDMINSPLITSIZE}}.
> All the others, if not set, will be using the default value, which is 1.
> Without these, Spark sometimes will not merge small files for file formats 
> such as text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16483) HoS should populate split related configurations to HiveConf

2017-04-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-16483:

Attachment: HIVE-16483.1.patch

> HoS should populate split related configurations to HiveConf
> 
>
> Key: HIVE-16483
> URL: https://issues.apache.org/jira/browse/HIVE-16483
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-16483.1.patch
>
>
> There are several split related configurations, such as 
> {{MAPREDMINSPLITSIZE}}, {{MAPREDMINSPLITSIZEPERNODE}}, 
> {{MAPREDMINSPLITSIZEPERRACK}}, etc., that should be populated to HiveConf. 
> Currently we only do this for {{MAPREDMINSPLITSIZE}}.
> All the others, if not set, will be using the default value, which is 1.
> Without these, Spark sometimes will not merge small files for file formats 
> such as text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16506) HIVE-15982 broke show_functions.q

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16506:
---


> HIVE-15982 broke show_functions.q
> -
>
> Key: HIVE-16506
> URL: https://issues.apache.org/jira/browse/HIVE-16506
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16483) HoS should populate split related configurations to HiveConf

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979419#comment-15979419
 ] 

Hive QA commented on HIVE-16483:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864555/HIVE-16483.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4828/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4828/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4828/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-04-21 21:28:51.197
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4828/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-04-21 21:28:51.200
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   17fcac0..6566065  master -> origin/master
   ca29a7c..0ecdfcd  branch-2   -> origin/branch-2
+ git reset --hard HEAD
HEAD is now at 17fcac0 HIVE-16419: Exclude hadoop related classes for JDBC 
stabdalone jar (Tao Li reviewed by Vaibhav Gumashta)
+ git clean -f -d
Removing 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFWidthBucket.java
Removing 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFWidthBucket.java
Removing ql/src/test/queries/clientpositive/udf_width_bucket.q
Removing ql/src/test/results/clientpositive/udf_width_bucket.q.out
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil 
Takiar via Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-04-21 21:28:53.506
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p0
patching file 
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.17) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MColumnDescriptor
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStringList
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartition
ENHANCED (Persi

[jira] [Commented] (HIVE-16079) HS2: high memory pressure due to duplicate Properties objects

2017-04-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979413#comment-15979413
 ] 

Sergio Peña commented on HIVE-16079:


The patch looks good [~mi...@cloudera.com]
+1

I'll wait for the tests before commit it.

> HS2: high memory pressure due to duplicate Properties objects
> -
>
> Key: HIVE-16079
> URL: https://issues.apache.org/jira/browse/HIVE-16079
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-16079.01.patch, HIVE-16079.02.patch, 
> HIVE-16079.03.patch, hs2-crash-2000p-500m-50q.txt
>
>
> I've created a Hive table with 2000 partitions, each backed by two files, 
> with one row in each file. When I execute some number of concurrent queries 
> against this table, e.g. as follows
> {code}
> for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p 
> admin -e "select count(i_f_1) from misha_table;" & done
> {code}
> it results in a big memory spike. With 20 queries I caused an OOM in a HS2 
> server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
> I am attaching the results of jxray (www.jxray.com) analysis of a heap dump 
> that was generated in the 50queries/500m heap scenario. It suggests that 
> there are several opportunities to reduce memory pressure with not very 
> invasive changes to the code. One (duplicate strings) has been addressed in 
> https://issues.apache.org/jira/browse/HIVE-15882 In this ticket, I am going 
> to address the fact that almost 20% of memory is used by instances of 
> java.util.Properties. These objects are highly duplicate, since for each 
> partition each concurrently running query creates its own copy of Partion, 
> PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 
> partitions) Properties in memory. By interning/deduplicating these objects we 
> may be able to save perhaps 15% of memory.
> Note, however, that if there are queries that mutate partitions, the 
> corresponding Properties would be mutated as well. Thus we cannot simply use 
> a single "canonicalized" Properties object at all times for all Partition 
> objects representing the same DB partition. Instead, I am going to introduce 
> a special CopyOnFirstWriteProperties class. Such an object initially 
> internally references a canonicalized Properties object, and keeps doing so 
> while only read methods are called. However, once any mutating method is 
> called, the given CopyOnFirstWriteProperties copies the data into its own 
> table from the canonicalized table, and uses it ever after.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15982) Support the width_bucket function

2017-04-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15982:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Sahil!

> Support the width_bucket function
> -
>
> Key: HIVE-15982
> URL: https://issues.apache.org/jira/browse/HIVE-15982
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-15982.1.patch, HIVE-15982.2.patch, 
> HIVE-15982.3.patch, HIVE-15982.4.patch, HIVE-15982.5.patch, HIVE-15982.6.patch
>
>
> Support the width_bucket(wbo, wbb1, wbb2, wbc) which returns an integer 
> between 0 and wbc+1 by mapping wbo into the ith equally sized bucket made by 
> dividing wbb1 and wbb2 into equally sized regions. If wbo < wbb1, return 1, 
> if wbo > wbb2 return wbc+1. Reference: SQL standard section 4.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11133) Support hive.explain.user for Spark

2017-04-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979394#comment-15979394
 ] 

Xuefu Zhang commented on HIVE-11133:


Looks good. Two things:
1. Please update the patch based on the review comments. It's mostly cosmetic, 
so we don't have to wait for another round of test run.
2. Create a JIRA to track the naming issue about vertex dependencies.

> Support hive.explain.user for Spark
> ---
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Mohit Sabharwal
>Assignee: Sahil Takiar
> Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch, 
> HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch, 
> HIVE-11133.6.patch, HIVE-11133.7.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support 
> Spark as well. 
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11133) Support hive.explain.user for Spark

2017-04-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979395#comment-15979395
 ] 

Pengcheng Xiong commented on HIVE-11133:


Why it outputs "Vertex dependency in root stage" twice? Sounds weird...

> Support hive.explain.user for Spark
> ---
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Mohit Sabharwal
>Assignee: Sahil Takiar
> Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch, 
> HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch, 
> HIVE-11133.6.patch, HIVE-11133.7.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support 
> Spark as well. 
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-04-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: HIVE-15665.wo.16233.patch
HIVE-15665.patch

Rebased the patch on top of HIVE-16233. Also including the path with both for a 
test run, this is still broken iirc

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.04.patch, HIVE-15665.patch, 
> HIVE-15665.wo.16233.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11133) Support hive.explain.user for Spark

2017-04-21 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979385#comment-15979385
 ] 

Sahil Takiar commented on HIVE-11133:
-

[~xuefuz]

The query:

{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.explain.user=true;
set hive.spark.dynamic.partition.pruning=true;

EXPLAIN select count(*) from srcpart where srcpart.ds in (select 
max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
{code}

Prints

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 10 <- Map 9 (GROUP)
Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
Reducer 13 <- Map 12 (GROUP)

Vertex dependency in root stage
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
Reducer 3 <- Reducer 2 (GROUP)
Reducer 5 <- Map 4 (GROUP)
Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
Reducer 8 <- Map 7 (GROUP)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Reducer 3
  File Output Operator [FS_34]
Group By Operator [GBY_32] (rows=1 width=8)
  Output:["_col0"],aggregations:["count(VALUE._col0)"]
<-Reducer 2 [GROUP]
  GROUP [RS_31]
Group By Operator [GBY_30] (rows=1 width=8)
  Output:["_col0"],aggregations:["count()"]
  Join Operator [JOIN_28] (rows=2200 width=10)
condition 
map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
  <-Map 1 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_26]
  PartitionCols:_col0
  Select Operator [SEL_2] (rows=2000 width=10)
Output:["_col0"]
TableScan [TS_0] (rows=2000 width=10)
  default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
  <-Reducer 6 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_27]
  PartitionCols:_col0
  Group By Operator [GBY_24] (rows=1 width=184)
Output:["_col0"],keys:KEY._col0
  <-Reducer 5 [GROUP]
GROUP [RS_23]
  PartitionCols:_col0
  Group By Operator [GBY_22] (rows=2 width=184)
Output:["_col0"],keys:_col0
Filter Operator [FIL_9] (rows=1 width=184)
  predicate:_col0 is not null
  Group By Operator [GBY_7] (rows=1 width=184)
Output:["_col0"],aggregations:["max(VALUE._col0)"]
  <-Map 4 [GROUP]
GROUP [RS_6]
  Group By Operator [GBY_5] (rows=1 width=184)
Output:["_col0"],aggregations:["max(ds)"]
Select Operator [SEL_4] (rows=2000 width=10)
  Output:["ds"]
  TableScan [TS_3] (rows=2000 width=10)

default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
  <-Reducer 8 [GROUP]
GROUP [RS_23]
  PartitionCols:_col0
  Group By Operator [GBY_22] (rows=2 width=184)
Output:["_col0"],keys:_col0
Filter Operator [FIL_17] (rows=1 width=184)
  predicate:_col0 is not null
  Group By Operator [GBY_15] (rows=1 width=184)
Output:["_col0"],aggregations:["min(VALUE._col0)"]
  <-Map 7 [GROUP]
GROUP [RS_14]
  Group By Operator [GBY_13] (rows=1 width=184)
Output:["_col0"],aggregations:["min(ds)"]
Select Operator [SEL_12] (rows=2000 width=10)
  Output:["ds"]
  TableScan [TS_11] (rows=2000 width=10)

default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
Stage-2
  Reducer 11
{code}

So there are two sections that say {{Vertex dependency in root stage}}. I 
haven't checked to see if this is possible with Hive-on-Tez, but it looks like 
an existing bug in the user-level explain code.

> Support hive.explain.user for Spark
> ---
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Mohit Sabharwal
>Assignee: Sahil Takiar
> Attachments: H

[jira] [Commented] (HIVE-14881) integrate MM tables into ACID: merge cleaner into ACID threads

2017-04-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979384#comment-15979384
 ] 

Wei Zheng commented on HIVE-14881:
--

You mean MmCleanerThread? It's removed in HIVE-14879 patch

> integrate MM tables into ACID: merge cleaner into ACID threads 
> ---
>
> Key: HIVE-14881
> URL: https://issues.apache.org/jira/browse/HIVE-14881
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-14881.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15982) Support the width_bucket function

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979383#comment-15979383
 ] 

Hive QA commented on HIVE-15982:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864551/HIVE-15982.6.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10626 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=69)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4827/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4827/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4827/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864551 - PreCommit-HIVE-Build

> Support the width_bucket function
> -
>
> Key: HIVE-15982
> URL: https://issues.apache.org/jira/browse/HIVE-15982
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Sahil Takiar
> Attachments: HIVE-15982.1.patch, HIVE-15982.2.patch, 
> HIVE-15982.3.patch, HIVE-15982.4.patch, HIVE-15982.5.patch, HIVE-15982.6.patch
>
>
> Support the width_bucket(wbo, wbb1, wbb2, wbc) which returns an integer 
> between 0 and wbc+1 by mapping wbo into the ith equally sized bucket made by 
> dividing wbb1 and wbb2 into equally sized regions. If wbo < wbb1, return 1, 
> if wbo > wbb2 return wbc+1. Reference: SQL standard section 4.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14881) integrate MM tables into ACID: merge cleaner into ACID threads

2017-04-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979377#comment-15979377
 ] 

Sergey Shelukhin commented on HIVE-14881:
-

SHould it remove the cleaner, or does it still rely on the cleaner?

> integrate MM tables into ACID: merge cleaner into ACID threads 
> ---
>
> Key: HIVE-14881
> URL: https://issues.apache.org/jira/browse/HIVE-14881
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-14881.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14881) integrate MM tables into ACID: merge cleaner into ACID threads

2017-04-21 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14881:
-
Attachment: HIVE-14881.1.patch

preliminary patch 1

> integrate MM tables into ACID: merge cleaner into ACID threads 
> ---
>
> Key: HIVE-14881
> URL: https://issues.apache.org/jira/browse/HIVE-14881
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-14881.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16497) FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file system operations should be impersonated

2017-04-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979364#comment-15979364
 ] 

Sushanth Sowmyan commented on HIVE-16497:
-

(once the unit tests return, we can commit)

> FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file 
> system operations should be impersonated
> --
>
> Key: HIVE-16497
> URL: https://issues.apache.org/jira/browse/HIVE-16497
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 3.0.0
>
> Attachments: HIVE-16497.1.patch
>
>
> FileUtils.isActionPermittedForFileHierarchy checks if user has permissions 
> for given action. The checks are made by impersonating the user.
> However, the listing of child dirs are done as the hiveserver2 user. If the 
> hive user doesn't have permissions on the filesystem, it gives incorrect 
> error that the user doesn't have permissions to perform the action.
> Impersonating the end user for all file operations in that function is also 
> logically correct thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-14881) integrate MM tables into ACID: merge cleaner into ACID threads

2017-04-21 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng reassigned HIVE-14881:


Assignee: Wei Zheng

> integrate MM tables into ACID: merge cleaner into ACID threads 
> ---
>
> Key: HIVE-14881
> URL: https://issues.apache.org/jira/browse/HIVE-14881
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16497) FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file system operations should be impersonated

2017-04-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979362#comment-15979362
 ] 

Sushanth Sowmyan commented on HIVE-16497:
-

+1, Lokks good to me.

> FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file 
> system operations should be impersonated
> --
>
> Key: HIVE-16497
> URL: https://issues.apache.org/jira/browse/HIVE-16497
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 3.0.0
>
> Attachments: HIVE-16497.1.patch
>
>
> FileUtils.isActionPermittedForFileHierarchy checks if user has permissions 
> for given action. The checks are made by impersonating the user.
> However, the listing of child dirs are done as the hiveserver2 user. If the 
> hive user doesn't have permissions on the filesystem, it gives incorrect 
> error that the user doesn't have permissions to perform the action.
> Impersonating the end user for all file operations in that function is also 
> logically correct thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-04-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: (was: HIVE-15665.03.patch)

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.04.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-04-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: (was: HIVE-15665.02.patch)

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.04.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-04-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: (was: HIVE-15665.01.patch)

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.04.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-04-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: (was: HIVE-15665.patch)

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.04.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16441) De-duplicate semijoin branches in n-way joins

2017-04-21 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16441:
--
Attachment: HIVE-16441.3.patch

Resubmitting to trigger a run due to some issues in Jenkins.

> De-duplicate semijoin branches in n-way joins
> -
>
> Key: HIVE-16441
> URL: https://issues.apache.org/jira/browse/HIVE-16441
> Project: Hive
>  Issue Type: Improvement
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16441.1.patch, HIVE-16441.2.patch, 
> HIVE-16441.3.patch
>
>
> Currently in n-way joins, semi join optimization creates n branches on same 
> key. Instead it should reuse one branch for all the joins.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16493) Skip column stats when colStats is empty

2017-04-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979328#comment-15979328
 ] 

Ashutosh Chauhan commented on HIVE-16493:
-

+1

> Skip column stats when colStats is empty
> 
>
> Key: HIVE-16493
> URL: https://issues.apache.org/jira/browse/HIVE-16493
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16493.01.patch
>
>
> Otherwise it will throw NPE



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15761) ObjectStore.getNextNotification could return an empty NotificationEventResponse causing TProtocolException

2017-04-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-15761:
---
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

> ObjectStore.getNextNotification could return an empty 
> NotificationEventResponse causing TProtocolException 
> ---
>
> Key: HIVE-15761
> URL: https://issues.apache.org/jira/browse/HIVE-15761
> Project: Hive
>  Issue Type: Bug
>Reporter: Hao Hao
>Assignee: Sergio Peña
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-15761.1.patch
>
>
> If there is no new events greater than the requested event,  
> ObjectStore.getNextNotification will return an empty 
> NotificationEventResponse. And the client side will get the following 
> exception:
> {noformat} [ERROR - 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:295)]
>  Thrift error occurred during processing of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'events' is 
> unset! Struct:NotificationEventResponse(events:null)
>   at 
> org.apache.hadoop.hive.metastore.api.NotificationEventResponse.validate(NotificationEventResponse.java:310)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result.validate(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result$get_next_notification_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result$get_next_notification_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result.write(ThriftHiveMetastore.java)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745){noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16079) HS2: high memory pressure due to duplicate Properties objects

2017-04-21 Thread Misha Dmitriev (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979322#comment-15979322
 ] 

Misha Dmitriev commented on HIVE-16079:
---

Uploaded the 3rd patch, that addresses Sergio's comments.

> HS2: high memory pressure due to duplicate Properties objects
> -
>
> Key: HIVE-16079
> URL: https://issues.apache.org/jira/browse/HIVE-16079
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-16079.01.patch, HIVE-16079.02.patch, 
> HIVE-16079.03.patch, hs2-crash-2000p-500m-50q.txt
>
>
> I've created a Hive table with 2000 partitions, each backed by two files, 
> with one row in each file. When I execute some number of concurrent queries 
> against this table, e.g. as follows
> {code}
> for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p 
> admin -e "select count(i_f_1) from misha_table;" & done
> {code}
> it results in a big memory spike. With 20 queries I caused an OOM in a HS2 
> server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
> I am attaching the results of jxray (www.jxray.com) analysis of a heap dump 
> that was generated in the 50queries/500m heap scenario. It suggests that 
> there are several opportunities to reduce memory pressure with not very 
> invasive changes to the code. One (duplicate strings) has been addressed in 
> https://issues.apache.org/jira/browse/HIVE-15882 In this ticket, I am going 
> to address the fact that almost 20% of memory is used by instances of 
> java.util.Properties. These objects are highly duplicate, since for each 
> partition each concurrently running query creates its own copy of Partion, 
> PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 
> partitions) Properties in memory. By interning/deduplicating these objects we 
> may be able to save perhaps 15% of memory.
> Note, however, that if there are queries that mutate partitions, the 
> corresponding Properties would be mutated as well. Thus we cannot simply use 
> a single "canonicalized" Properties object at all times for all Partition 
> objects representing the same DB partition. Instead, I am going to introduce 
> a special CopyOnFirstWriteProperties class. Such an object initially 
> internally references a canonicalized Properties object, and keeps doing so 
> while only read methods are called. However, once any mutating method is 
> called, the given CopyOnFirstWriteProperties copies the data into its own 
> table from the canonicalized table, and uses it ever after.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15761) ObjectStore.getNextNotification could return an empty NotificationEventResponse causing TProtocolException

2017-04-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979321#comment-15979321
 ] 

Sergio Peña commented on HIVE-15761:


The tests are unrelated. Thanks [~aihuaxu].

> ObjectStore.getNextNotification could return an empty 
> NotificationEventResponse causing TProtocolException 
> ---
>
> Key: HIVE-15761
> URL: https://issues.apache.org/jira/browse/HIVE-15761
> Project: Hive
>  Issue Type: Bug
>Reporter: Hao Hao
>Assignee: Sergio Peña
> Attachments: HIVE-15761.1.patch
>
>
> If there is no new events greater than the requested event,  
> ObjectStore.getNextNotification will return an empty 
> NotificationEventResponse. And the client side will get the following 
> exception:
> {noformat} [ERROR - 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:295)]
>  Thrift error occurred during processing of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'events' is 
> unset! Struct:NotificationEventResponse(events:null)
>   at 
> org.apache.hadoop.hive.metastore.api.NotificationEventResponse.validate(NotificationEventResponse.java:310)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result.validate(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result$get_next_notification_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result$get_next_notification_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result.write(ThriftHiveMetastore.java)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745){noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16079) HS2: high memory pressure due to duplicate Properties objects

2017-04-21 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-16079:
--
Attachment: HIVE-16079.03.patch

> HS2: high memory pressure due to duplicate Properties objects
> -
>
> Key: HIVE-16079
> URL: https://issues.apache.org/jira/browse/HIVE-16079
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-16079.01.patch, HIVE-16079.02.patch, 
> HIVE-16079.03.patch, hs2-crash-2000p-500m-50q.txt
>
>
> I've created a Hive table with 2000 partitions, each backed by two files, 
> with one row in each file. When I execute some number of concurrent queries 
> against this table, e.g. as follows
> {code}
> for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p 
> admin -e "select count(i_f_1) from misha_table;" & done
> {code}
> it results in a big memory spike. With 20 queries I caused an OOM in a HS2 
> server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
> I am attaching the results of jxray (www.jxray.com) analysis of a heap dump 
> that was generated in the 50queries/500m heap scenario. It suggests that 
> there are several opportunities to reduce memory pressure with not very 
> invasive changes to the code. One (duplicate strings) has been addressed in 
> https://issues.apache.org/jira/browse/HIVE-15882 In this ticket, I am going 
> to address the fact that almost 20% of memory is used by instances of 
> java.util.Properties. These objects are highly duplicate, since for each 
> partition each concurrently running query creates its own copy of Partion, 
> PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 
> partitions) Properties in memory. By interning/deduplicating these objects we 
> may be able to save perhaps 15% of memory.
> Note, however, that if there are queries that mutate partitions, the 
> corresponding Properties would be mutated as well. Thus we cannot simply use 
> a single "canonicalized" Properties object at all times for all Partition 
> objects representing the same DB partition. Instead, I am going to introduce 
> a special CopyOnFirstWriteProperties class. Such an object initially 
> internally references a canonicalized Properties object, and keeps doing so 
> while only read methods are called. However, once any mutating method is 
> called, the given CopyOnFirstWriteProperties copies the data into its own 
> table from the canonicalized table, and uses it ever after.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16079) HS2: high memory pressure due to duplicate Properties objects

2017-04-21 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-16079:
--
Status: Patch Available  (was: Open)

> HS2: high memory pressure due to duplicate Properties objects
> -
>
> Key: HIVE-16079
> URL: https://issues.apache.org/jira/browse/HIVE-16079
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-16079.01.patch, HIVE-16079.02.patch, 
> HIVE-16079.03.patch, hs2-crash-2000p-500m-50q.txt
>
>
> I've created a Hive table with 2000 partitions, each backed by two files, 
> with one row in each file. When I execute some number of concurrent queries 
> against this table, e.g. as follows
> {code}
> for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p 
> admin -e "select count(i_f_1) from misha_table;" & done
> {code}
> it results in a big memory spike. With 20 queries I caused an OOM in a HS2 
> server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
> I am attaching the results of jxray (www.jxray.com) analysis of a heap dump 
> that was generated in the 50queries/500m heap scenario. It suggests that 
> there are several opportunities to reduce memory pressure with not very 
> invasive changes to the code. One (duplicate strings) has been addressed in 
> https://issues.apache.org/jira/browse/HIVE-15882 In this ticket, I am going 
> to address the fact that almost 20% of memory is used by instances of 
> java.util.Properties. These objects are highly duplicate, since for each 
> partition each concurrently running query creates its own copy of Partion, 
> PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 
> partitions) Properties in memory. By interning/deduplicating these objects we 
> may be able to save perhaps 15% of memory.
> Note, however, that if there are queries that mutate partitions, the 
> corresponding Properties would be mutated as well. Thus we cannot simply use 
> a single "canonicalized" Properties object at all times for all Partition 
> objects representing the same DB partition. Instead, I am going to introduce 
> a special CopyOnFirstWriteProperties class. Such an object initially 
> internally references a canonicalized Properties object, and keeps doing so 
> while only read methods are called. However, once any mutating method is 
> called, the given CopyOnFirstWriteProperties copies the data into its own 
> table from the canonicalized table, and uses it ever after.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16079) HS2: high memory pressure due to duplicate Properties objects

2017-04-21 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-16079:
--
Status: Open  (was: Patch Available)

> HS2: high memory pressure due to duplicate Properties objects
> -
>
> Key: HIVE-16079
> URL: https://issues.apache.org/jira/browse/HIVE-16079
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-16079.01.patch, HIVE-16079.02.patch, 
> hs2-crash-2000p-500m-50q.txt
>
>
> I've created a Hive table with 2000 partitions, each backed by two files, 
> with one row in each file. When I execute some number of concurrent queries 
> against this table, e.g. as follows
> {code}
> for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p 
> admin -e "select count(i_f_1) from misha_table;" & done
> {code}
> it results in a big memory spike. With 20 queries I caused an OOM in a HS2 
> server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
> I am attaching the results of jxray (www.jxray.com) analysis of a heap dump 
> that was generated in the 50queries/500m heap scenario. It suggests that 
> there are several opportunities to reduce memory pressure with not very 
> invasive changes to the code. One (duplicate strings) has been addressed in 
> https://issues.apache.org/jira/browse/HIVE-15882 In this ticket, I am going 
> to address the fact that almost 20% of memory is used by instances of 
> java.util.Properties. These objects are highly duplicate, since for each 
> partition each concurrently running query creates its own copy of Partion, 
> PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 
> partitions) Properties in memory. By interning/deduplicating these objects we 
> may be able to save perhaps 15% of memory.
> Note, however, that if there are queries that mutate partitions, the 
> corresponding Properties would be mutated as well. Thus we cannot simply use 
> a single "canonicalized" Properties object at all times for all Partition 
> objects representing the same DB partition. Instead, I am going to introduce 
> a special CopyOnFirstWriteProperties class. Such an object initially 
> internally references a canonicalized Properties object, and keeps doing so 
> while only read methods are called. However, once any mutating method is 
> called, the given CopyOnFirstWriteProperties copies the data into its own 
> table from the canonicalized table, and uses it ever after.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16058) Disable falling back to non-cbo for SemanticException for tests

2017-04-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16058:

Status: Open  (was: Patch Available)

> Disable falling back to non-cbo for SemanticException for tests
> ---
>
> Key: HIVE-16058
> URL: https://issues.apache.org/jira/browse/HIVE-16058
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16058.1.patch, HIVE-16058.2.patch, 
> HIVE-16058.2.patch
>
>
> Currently optimizer falls back to non-cbo path if cbo path throws an 
> exception of type SemanticException. This might be eclipsing some genuine 
> issues within cbo-path.
> We would like to turn off the fall back mechanism for tests to see if there 
> are indeed genuine issues/bugs within cbo path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16058) Disable falling back to non-cbo for SemanticException for tests

2017-04-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16058:

Status: Patch Available  (was: Open)

> Disable falling back to non-cbo for SemanticException for tests
> ---
>
> Key: HIVE-16058
> URL: https://issues.apache.org/jira/browse/HIVE-16058
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16058.1.patch, HIVE-16058.2.patch, 
> HIVE-16058.2.patch
>
>
> Currently optimizer falls back to non-cbo path if cbo path throws an 
> exception of type SemanticException. This might be eclipsing some genuine 
> issues within cbo-path.
> We would like to turn off the fall back mechanism for tests to see if there 
> are indeed genuine issues/bugs within cbo path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16058) Disable falling back to non-cbo for SemanticException for tests

2017-04-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16058:

Attachment: HIVE-16058.2.patch

> Disable falling back to non-cbo for SemanticException for tests
> ---
>
> Key: HIVE-16058
> URL: https://issues.apache.org/jira/browse/HIVE-16058
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16058.1.patch, HIVE-16058.2.patch, 
> HIVE-16058.2.patch
>
>
> Currently optimizer falls back to non-cbo path if cbo path throws an 
> exception of type SemanticException. This might be eclipsing some genuine 
> issues within cbo-path.
> We would like to turn off the fall back mechanism for tests to see if there 
> are indeed genuine issues/bugs within cbo path.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15761) ObjectStore.getNextNotification could return an empty NotificationEventResponse causing TProtocolException

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979302#comment-15979302
 ] 

Hive QA commented on HIVE-15761:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864381/HIVE-15761.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10618 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_11] 
(batchId=234)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer7] 
(batchId=20)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4826/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4826/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4826/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864381 - PreCommit-HIVE-Build

> ObjectStore.getNextNotification could return an empty 
> NotificationEventResponse causing TProtocolException 
> ---
>
> Key: HIVE-15761
> URL: https://issues.apache.org/jira/browse/HIVE-15761
> Project: Hive
>  Issue Type: Bug
>Reporter: Hao Hao
>Assignee: Sergio Peña
> Attachments: HIVE-15761.1.patch
>
>
> If there is no new events greater than the requested event,  
> ObjectStore.getNextNotification will return an empty 
> NotificationEventResponse. And the client side will get the following 
> exception:
> {noformat} [ERROR - 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:295)]
>  Thrift error occurred during processing of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'events' is 
> unset! Struct:NotificationEventResponse(events:null)
>   at 
> org.apache.hadoop.hive.metastore.api.NotificationEventResponse.validate(NotificationEventResponse.java:310)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result.validate(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result$get_next_notification_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result$get_next_notification_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_next_notification_result.write(ThriftHiveMetastore.java)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745){noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11133) Support hive.explain.user for Spark

2017-04-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979287#comment-15979287
 ] 

Xuefu Zhang commented on HIVE-11133:


[~stakiar], could you please provide an example where non-root stage vertex 
dependency is shown? Thanks.

> Support hive.explain.user for Spark
> ---
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Mohit Sabharwal
>Assignee: Sahil Takiar
> Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch, 
> HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch, 
> HIVE-11133.6.patch, HIVE-11133.7.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support 
> Spark as well. 
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
--
Attachment: HIVE-16321.02.branch-2.patch

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2017-04-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12636:
--
Attachment: HIVE-12636.13.patch

> Ensure that all queries (with DbTxnManager) run in a transaction
> 
>
> Key: HIVE-12636
> URL: https://issues.apache.org/jira/browse/HIVE-12636
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, 
> HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, 
> HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, 
> HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch
>
>
> Assuming Hive is using DbTxnManager
> Currently (as of this writing only auto commit mode is supported), only 
> queries that write to an Acid table start a transaction.
> Read-only queries don't open a txn but still acquire locks.
> This makes internal structures confusing/odd.
> The are constantly 2 code paths to deal with which is inconvenient and error 
> prone.
> Also, a txn id is convenient "handle" for all locks/resources within a txn.
> Doing thing would mean the client no longer needs to track locks that it 
> acquired.  This enables further improvements to metastore side of Acid.
> # add metastore call to openTxn() and acquireLocks() in a single call.  this 
> it to make sure perf doesn't degrade for read-only query.  (Would also be 
> useful for auto commit write queries)
> # Should RO queries generate txn ids from the same sequence?  (they could for 
> example use negative values of a different sequence).  Txnid is part of the 
> delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
> we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
> there is value in being able to pick txn id and commit timestamp out of the 
> same logical sequence.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16483) HoS should populate split related configurations to HiveConf

2017-04-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-16483:

Attachment: HIVE-16483.1.patch

> HoS should populate split related configurations to HiveConf
> 
>
> Key: HIVE-16483
> URL: https://issues.apache.org/jira/browse/HIVE-16483
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-16483.1.patch
>
>
> There are several split related configurations, such as 
> {{MAPREDMINSPLITSIZE}}, {{MAPREDMINSPLITSIZEPERNODE}}, 
> {{MAPREDMINSPLITSIZEPERRACK}}, etc., that should be populated to HiveConf. 
> Currently we only do this for {{MAPREDMINSPLITSIZE}}.
> All the others, if not set, will be using the default value, which is 1.
> Without these, Spark sometimes will not merge small files for file formats 
> such as text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16483) HoS should populate split related configurations to HiveConf

2017-04-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-16483:

Attachment: (was: HIVE-16483.1.patch)

> HoS should populate split related configurations to HiveConf
> 
>
> Key: HIVE-16483
> URL: https://issues.apache.org/jira/browse/HIVE-16483
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-16483.1.patch
>
>
> There are several split related configurations, such as 
> {{MAPREDMINSPLITSIZE}}, {{MAPREDMINSPLITSIZEPERNODE}}, 
> {{MAPREDMINSPLITSIZEPERRACK}}, etc., that should be populated to HiveConf. 
> Currently we only do this for {{MAPREDMINSPLITSIZE}}.
> All the others, if not set, will be using the default value, which is 1.
> Without these, Spark sometimes will not merge small files for file formats 
> such as text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16389) Allow HookContext to access SQLOperationDisplay

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16389:

Attachment: HIVE-16389.5.patch

> Allow HookContext to access SQLOperationDisplay
> ---
>
> Key: HIVE-16389
> URL: https://issues.apache.org/jira/browse/HIVE-16389
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16389.1.patch, HIVE-16389.2.patch, 
> HIVE-16389.3.patch, HIVE-16389.4.patch, HIVE-16389.5.patch
>
>
> There is a lot of useful information in {{SQLOperationDisplay}} that users of 
> Hive Hooks may be interested in.
> We should allow Hive Hooks to access this info by adding the 
> {{SQLOperationDisplay}} to {{HookContext}}.
> This will allow hooks to have access to all information available in the HS2 
> Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16484:

Attachment: HIVE-16484.2.patch

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15982) Support the width_bucket function

2017-04-21 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15982:

Attachment: HIVE-15982.6.patch

> Support the width_bucket function
> -
>
> Key: HIVE-15982
> URL: https://issues.apache.org/jira/browse/HIVE-15982
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Sahil Takiar
> Attachments: HIVE-15982.1.patch, HIVE-15982.2.patch, 
> HIVE-15982.3.patch, HIVE-15982.4.patch, HIVE-15982.5.patch, HIVE-15982.6.patch
>
>
> Support the width_bucket(wbo, wbb1, wbb2, wbc) which returns an integer 
> between 0 and wbc+1 by mapping wbo into the ith equally sized bucket made by 
> dividing wbb1 and wbb2 into equally sized regions. If wbo < wbb1, return 1, 
> if wbo > wbb2 return wbc+1. Reference: SQL standard section 4.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11133) Support hive.explain.user for Spark

2017-04-21 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979255#comment-15979255
 ] 

Sahil Takiar commented on HIVE-11133:
-

[~lirui] this may be an existing bug in the user-level explain plan, or maybe 
something that needs more documentation. From what I can tell, the section 
labeled {{Vertex dependency in root stage}} basically iterates through all the 
stages, and prints out the edges between the vertices. It doesn't seem to be 
limited to the vertices of the root stage. The code that controls this is 
{{DagJsonParser#print}} - specifically the section that contains 
{{printer.println("Vertex dependency in root stage");}}

It does seem confusing to me that the section says {{root stage}} but iterates 
through all the stages. I can file a separate JIRA for this.

> Support hive.explain.user for Spark
> ---
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Mohit Sabharwal
>Assignee: Sahil Takiar
> Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch, 
> HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch, 
> HIVE-11133.6.patch, HIVE-11133.7.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support 
> Spark as well. 
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13583) E061-14: Search Conditions

2017-04-21 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979254#comment-15979254
 ] 

Zoltan Haindrich commented on HIVE-13583:
-

Yeah...sure - it's a good idea to separate them ; I've read it in connection 
with this...and it stuck in my mind :)

> E061-14: Search Conditions
> --
>
> Key: HIVE-13583
> URL: https://issues.apache.org/jira/browse/HIVE-13583
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
>
> This is a part of the SQL:2011 Analytics Complete Umbrella JIRA HIVE-13554. 
> Support for various forms of search conditions are mandatory in the SQL 
> standard. For example, " is not true;" Hive should support those 
> forms mandated by the standard.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-13583) E061-14: Search Conditions

2017-04-21 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-13583:
---

Assignee: Zoltan Haindrich

> E061-14: Search Conditions
> --
>
> Key: HIVE-13583
> URL: https://issues.apache.org/jira/browse/HIVE-13583
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
>
> This is a part of the SQL:2011 Analytics Complete Umbrella JIRA HIVE-13554. 
> Support for various forms of search conditions are mandatory in the SQL 
> standard. For example, " is not true;" Hive should support those 
> forms mandated by the standard.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16446) org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by sett

2017-04-21 Thread Kalexin Baoerjiin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979224#comment-15979224
 ] 

Kalexin Baoerjiin edited comment on HIVE-16446 at 4/21/17 7:00 PM:
---

[~vihangk1] no it doesn't work with fs.s3a.secret.key and fs.s3a.access.key.


was (Author: kalexin):
[~vihangk1] no it doesn't work with fs.s3a.secret.key and fs.s3a.access.key?

> org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.IllegalArgumentException: AWS Access Key ID 
> and Secret Access Key must be specified by setting the fs.s3n.awsAccessKeyId 
> and fs.s3n.awsSecretAccessKey properties
> -
>
> Key: HIVE-16446
> URL: https://issues.apache.org/jira/browse/HIVE-16446
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Kalexin Baoerjiin
>Assignee: Vihang Karajgaonkar
>
> After upgrading our Cloudera cluster to CDH 5.10.1 we are experiencing the 
> following problem during some Hive DDL.
> 
> SET fs.s3n.awsSecretAccessKey=;
> SET fs.s3n.awsAccessKeyId=;
> 
> ALTER TABLE hive_1k_partitions ADD IF NOT EXISTS partition (year='2014', 
> month='2014-01', dt='2014-01-01', hours='00', minutes='16', seconds='22') 
> location 's3n://'
> 
> Stack trace I was able to recover: 
> [ Message content over the limit has been removed. ]
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416)
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Job Submission failed with exception ‘java.lang.IllegalArgumentException(AWS 
> Access Key ID and Secret Access Key must be specified by setting the 
> fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey properties 
> (respectively).)’
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> [9:31] 
> Logging initialized using configuration in 
> jar:file:/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/hive-common-1.1.0-cdh5.10.1.jar!/hive-log4j.properties
> In the past we did not have to set s3 key and ID in core-site.xml because we 
> were using them dynamically inside our hive DDL scripts.
> After setting S3 secret key and Access ID in core-site.xml this problem goes 
> away. However this is an incompatibility change from the previous Hive 
> shipped in CDH 5.9. 
> Cloudera 5.10.x release note mentioned (HIVE-14269 : Enhanced write 
> performance for Hive tables stored on Amazon S3.) is the only Hive related 
> changes. 
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_new_in_cdh_510.html
> https://issues.apache.org/jira/browse/HIVE-14269



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16446) org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting t

2017-04-21 Thread Kalexin Baoerjiin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979224#comment-15979224
 ] 

Kalexin Baoerjiin commented on HIVE-16446:
--

[~vihangk1] no it doesn't work with fs.s3a.secret.key and fs.s3a.access.key?

> org.apache.hadoop.hive.ql.exec.DDLTask. 
> MetaException(message:java.lang.IllegalArgumentException: AWS Access Key ID 
> and Secret Access Key must be specified by setting the fs.s3n.awsAccessKeyId 
> and fs.s3n.awsSecretAccessKey properties
> -
>
> Key: HIVE-16446
> URL: https://issues.apache.org/jira/browse/HIVE-16446
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Kalexin Baoerjiin
>Assignee: Vihang Karajgaonkar
>
> After upgrading our Cloudera cluster to CDH 5.10.1 we are experiencing the 
> following problem during some Hive DDL.
> 
> SET fs.s3n.awsSecretAccessKey=;
> SET fs.s3n.awsAccessKeyId=;
> 
> ALTER TABLE hive_1k_partitions ADD IF NOT EXISTS partition (year='2014', 
> month='2014-01', dt='2014-01-01', hours='00', minutes='16', seconds='22') 
> location 's3n://'
> 
> Stack trace I was able to recover: 
> [ Message content over the limit has been removed. ]
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416)
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Job Submission failed with exception ‘java.lang.IllegalArgumentException(AWS 
> Access Key ID and Secret Access Key must be specified by setting the 
> fs.s3n.awsAccessKeyId and fs.s3n.awsSecretAccessKey properties 
> (respectively).)’
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> [9:31] 
> Logging initialized using configuration in 
> jar:file:/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/hive-common-1.1.0-cdh5.10.1.jar!/hive-log4j.properties
> In the past we did not have to set s3 key and ID in core-site.xml because we 
> were using them dynamically inside our hive DDL scripts.
> After setting S3 secret key and Access ID in core-site.xml this problem goes 
> away. However this is an incompatibility change from the previous Hive 
> shipped in CDH 5.9. 
> Cloudera 5.10.x release note mentioned (HIVE-14269 : Enhanced write 
> performance for Hive tables stored on Amazon S3.) is the only Hive related 
> changes. 
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_new_in_cdh_510.html
> https://issues.apache.org/jira/browse/HIVE-14269



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15035) Clean up Hive licenses for binary distribution

2017-04-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979209#comment-15979209
 ] 

Alan Gates commented on HIVE-15035:
---

Created HIVE-16504 to deal with the rat issue.

> Clean up Hive licenses for binary distribution
> --
>
> Key: HIVE-15035
> URL: https://issues.apache.org/jira/browse/HIVE-15035
> Project: Hive
>  Issue Type: Bug
>  Components: distribution
>Affects Versions: 2.1.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Blocker
>  Labels: 2.3.0
> Fix For: 2.2.0, 2.3.0
>
> Attachments: HIVE-15035.2.patch, HIVE-15035.3.patch, HIVE-15035.patch
>
>
> Hive's current LICENSE file contains information not needed for the source 
> distribution.  For the binary distribution we are missing many license files 
> as a number of jars included in Hive come with various licenses.  This all 
> needs cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16504) Addition of binary licenses broke rat check

2017-04-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned HIVE-16504:
-


> Addition of binary licenses broke rat check
> ---
>
> Key: HIVE-16504
> URL: https://issues.apache.org/jira/browse/HIVE-16504
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Blocker
>
> The clean up of Hive's licenses (HIVE-15035) broke the rat check, as all the 
> license files get reported as invalid licenses.  The rat check needs to be 
> modified to ignore those files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15035) Clean up Hive licenses for binary distribution

2017-04-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979204#comment-15979204
 ] 

Alan Gates commented on HIVE-15035:
---

Ah, I forgot to modify RAT to tell it not to pay attention to these.  I'll open 
a separate issue for that.

> Clean up Hive licenses for binary distribution
> --
>
> Key: HIVE-15035
> URL: https://issues.apache.org/jira/browse/HIVE-15035
> Project: Hive
>  Issue Type: Bug
>  Components: distribution
>Affects Versions: 2.1.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Blocker
>  Labels: 2.3.0
> Fix For: 2.2.0, 2.3.0
>
> Attachments: HIVE-15035.2.patch, HIVE-15035.3.patch, HIVE-15035.patch
>
>
> Hive's current LICENSE file contains information not needed for the source 
> distribution.  For the binary distribution we are missing many license files 
> as a number of jars included in Hive come with various licenses.  This all 
> needs cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15035) Clean up Hive licenses for binary distribution

2017-04-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979196#comment-15979196
 ] 

Pengcheng Xiong commented on HIVE-15035:


[~alangates], i tried master and 2.3 just now. However ,the same problem still 
remains...

> Clean up Hive licenses for binary distribution
> --
>
> Key: HIVE-15035
> URL: https://issues.apache.org/jira/browse/HIVE-15035
> Project: Hive
>  Issue Type: Bug
>  Components: distribution
>Affects Versions: 2.1.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Blocker
>  Labels: 2.3.0
> Fix For: 2.2.0, 2.3.0
>
> Attachments: HIVE-15035.2.patch, HIVE-15035.3.patch, HIVE-15035.patch
>
>
> Hive's current LICENSE file contains information not needed for the source 
> distribution.  For the binary distribution we are missing many license files 
> as a number of jars included in Hive come with various licenses.  This all 
> needs cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13583) E061-14: Search Conditions

2017-04-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979186#comment-15979186
 ] 

Ashutosh Chauhan commented on HIVE-13583:
-

Are you suggesting to add {{UNKNOWN}} as a keyword in grammar.  I am not sure 
how is that connected to this jira which is about supporting search conditions 
of form : " is not true; It should be possible (infact, desired 
that) we add this support with current grammar.
Even if {{UNKNOWN}} is specified in standard discussion on how and when to add 
it should be a separate effort. 

> E061-14: Search Conditions
> --
>
> Key: HIVE-13583
> URL: https://issues.apache.org/jira/browse/HIVE-13583
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Carter Shanklin
>
> This is a part of the SQL:2011 Analytics Complete Umbrella JIRA HIVE-13554. 
> Support for various forms of search conditions are mandatory in the SQL 
> standard. For example, " is not true;" Hive should support those 
> forms mandated by the standard.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
--
Priority: Blocker  (was: Critical)

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16321:
--
Target Version/s: 2.3.0, 3.0.0, 2.4.0  (was: 2.3.0, 3.0.0)

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-04-21 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-16503:


Assignee: Prasanth Jayachandran

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16399) create an index for tc_txnid in TXN_COMPONENTS

2017-04-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16399:
--
Target Version/s: 3.0.0, 2.4.0  (was: 3.0.0)

> create an index for tc_txnid in TXN_COMPONENTS
> --
>
> Key: HIVE-16399
> URL: https://issues.apache.org/jira/browse/HIVE-16399
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-16399.branch-2.3.patch, HIVE-16399.branch-2.patch, 
> HIVE-16399.master.patch
>
>
> w/o this TxnStore.cleanEmptyAbortedTxns() can be very slow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16419) Exclude hadoop related classes for JDBC stabdalone jar

2017-04-21 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-16419:

  Resolution: Fixed
Target Version/s: 2.3.0, 3.0.0, 2.4.0  (was: 2.3.0)
  Status: Resolved  (was: Patch Available)

> Exclude hadoop related classes for JDBC stabdalone jar
> --
>
> Key: HIVE-16419
> URL: https://issues.apache.org/jira/browse/HIVE-16419
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Blocker
> Attachments: HIVE-16419.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16399) create an index for tc_txnid in TXN_COMPONENTS

2017-04-21 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979176#comment-15979176
 ] 

Wei Zheng commented on HIVE-16399:
--

[~ekoifman] Do we want to ship this to both 2.3 and 3.0?

> create an index for tc_txnid in TXN_COMPONENTS
> --
>
> Key: HIVE-16399
> URL: https://issues.apache.org/jira/browse/HIVE-16399
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-16399.branch-2.3.patch, HIVE-16399.branch-2.patch, 
> HIVE-16399.master.patch
>
>
> w/o this TxnStore.cleanEmptyAbortedTxns() can be very slow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16419) Exclude hadoop related classes for JDBC stabdalone jar

2017-04-21 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979173#comment-15979173
 ] 

Vaibhav Gumashta commented on HIVE-16419:
-

Committed. Thanks [~taoli-hwx]

> Exclude hadoop related classes for JDBC stabdalone jar
> --
>
> Key: HIVE-16419
> URL: https://issues.apache.org/jira/browse/HIVE-16419
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Blocker
> Attachments: HIVE-16419.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >