[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647283#comment-14647283
 ] 

Lefty Leverenz commented on HIVE-4239:
--

Hmm ... a compiler section would be nice to have.  Maybe we could add one.  
Thanks Carl.

> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
> HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
> HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.08.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647306#comment-14647306
 ] 

Hive QA commented on HIVE-11409:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747926/HIVE-11409.01.patch

{color:green}SUCCESS:{color} +1 9276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4756/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4756/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4756/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747926 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before 
> UNION
> --
>
> Key: HIVE-11409
> URL: https://issues.apache.org/jira/browse/HIVE-11409
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11409.01.patch
>
>
> Two purpose: (1) to ensure that the data type of non-primary branch (the 1st 
> branch is the primary branch) of union can be casted to that of the primary 
> branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is 
> redundant, it will be removed by IdentidyProjectRemover optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11406) Vectorization: StringExpr::compare() == 0 is bad for performance

2015-07-30 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11406:

Attachment: HIVE-11406.01.patch

> Vectorization: StringExpr::compare() == 0 is bad for performance
> 
>
> Key: HIVE-11406
> URL: https://issues.apache.org/jira/browse/HIVE-11406
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-11406.01.patch
>
>
> {{StringExpr::compare() == 0}} is forced to evaluate the whole memory 
> comparison loop for differing lengths of strings, though there is no 
> possibility they will ever be equal.
> Add a {{StringExpr::equals}} which can be a smaller and tighter loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647384#comment-14647384
 ] 

Hive QA commented on HIVE-10319:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747929/HIVE-10319.5.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9276 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4757/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4757/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4757/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747929 - PreCommit-HIVE-TRUNK-Build

> Hive CLI startup takes a long time with a large number of databases
> ---
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.0.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.5.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11383:
---
Attachment: HIVE-11383.7.patch

> Upgrade Hive to Calcite 1.4
> ---
>
> Key: HIVE-11383
> URL: https://issues.apache.org/jira/browse/HIVE-11383
> Project: Hive
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, 
> HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
> HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, HIVE-11383.7.patch
>
>
> CLEAR LIBRARY CACHE
> Upgrade Hive to Calcite 1.4.0-incubating.
> There is currently a snapshot release, which is close to what will be in 1.4. 
> I have checked that Hive compiles against the new snapshot, fixing one issue. 
> The patch is attached.
> Next step is to validate that Hive runs against the new Calcite, and post any 
> issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
> can you please do that.
> [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
> the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11391:
---
Attachment: HIVE-11391.patch

> CBO (Calcite Return Path): Add CBO tests with return path on
> 
>
> Key: HIVE-11391
> URL: https://issues.apache.org/jira/browse/HIVE-11391
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11391.patch, HIVE-11391.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11406) Vectorization: StringExpr::compare() == 0 is bad for performance

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647474#comment-14647474
 ] 

Hive QA commented on HIVE-11406:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747946/HIVE-11406.01.patch

{color:green}SUCCESS:{color} +1 9276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4758/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4758/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4758/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747946 - PreCommit-HIVE-TRUNK-Build

> Vectorization: StringExpr::compare() == 0 is bad for performance
> 
>
> Key: HIVE-11406
> URL: https://issues.apache.org/jira/browse/HIVE-11406
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-11406.01.patch
>
>
> {{StringExpr::compare() == 0}} is forced to evaluate the whole memory 
> comparison loop for differing lengths of strings, though there is no 
> possibility they will ever be equal.
> Add a {{StringExpr::equals}} which can be a smaller and tighter loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647478#comment-14647478
 ] 

Hive QA commented on HIVE-11383:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747964/HIVE-11383.7.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4759/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4759/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4759/

Messages:
{noformat}
 This message was trimmed, see log for full details 
3426/3551 KB   
3430/3551 KB   
3434/3551 KB   
3438/3551 KB   
3442/3551 KB   
3446/3551 KB   
3450/3551 KB   
3454/3551 KB   
3458/3551 KB   
3462/3551 KB   
3466/3551 KB   
3470/3551 KB   
3474/3551 KB   
3478/3551 KB   
3482/3551 KB   
3486/3551 KB   
3490/3551 KB   
3494/3551 KB   
3498/3551 KB   
3502/3551 KB   
3506/3551 KB   
3510/3551 KB   
3514/3551 KB   
3518/3551 KB   
3522/3551 KB   
3526/3551 KB   
3530/3551 KB   
3534/3551 KB   
3538/3551 KB   
3542/3551 KB   
3546/3551 KB   
3550/3551 KB   
3551/3551 KB   
   
Downloaded: 
http://repository.apache.org/snapshots/org/apache/calcite/calcite-core/1.4.0-incubating-SNAPSHOT/calcite-core-1.4.0-incubating-20150729.211031-2.jar
 (3551 KB at 1005.9 KB/sec)
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-exec ---
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/ql 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-exec ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (generate-sources) @ hive-exec ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/gen
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-test-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen
Generating vector expression code
Generating vector expression test code
[INFO] Executed tasks
[INFO] 
[INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-exec ---
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/protobuf/gen-java
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/src/gen/thrift/gen-javabean
 added.
[INFO] Source directory: 
/data/hive-ptest/working/apache-github-source-source/ql/target/generated-sources/java
 added.
[INFO] 
[INFO] --- antlr3-maven-plugin:3.4:antlr (default) @ hive-exec ---
[INFO] ANTLR: Processing source directory 
/data/hive-ptest/working/apache-github-source-source/ql/src/java
ANTLR Parser Generator  Version 3.4
org/apache/hadoop/hive/ql/parse/HiveLexer.g
org/apache/hadoop/hive/ql/parse/HiveParser.g
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as "{KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY" 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as "{KW_REGEXP, KW_RLIKE} KW_UNION KW_ALL" using 
multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as "{KW_REGEXP, KW_RLIKE} KW_MAP LPAREN" using 
multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as "{KW_REGEXP, KW_RLIKE} KW_UNION KW_MAP" using 
multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as "{KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE" 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as "{KW_REGEXP, KW_RLIKE} KW_UNION KW_SELECT" 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as "{KW_REGEXP, KW_RLIKE} KW_UNION KW_REDUCE" 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:455:5: 
Decision can match input such as "

[jira] [Updated] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-30 Thread Nicholas Brenwald (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Brenwald updated HIVE-11410:
-
Attachment: hive-site.xml

> Join with subquery containing a group by incorrectly returns no results
> ---
>
> Key: HIVE-11410
> URL: https://issues.apache.org/jira/browse/HIVE-11410
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Nicholas Brenwald
>Priority: Minor
> Attachments: hive-site.xml
>
>
> Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
> row of data. For example create table *t* from an existing table which 
> contains at least 1 row of data by running:
> {code}
> create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
> {code}
> Table *t* looks like the following:
> ||c1||c2||
> |abc|0|
> Running the following query then returns zero results.
> {code}
> SELECT 
>   t1.c1
> FROM 
>   t t1
> JOIN
> (SELECT 
>t2.c1,
>MAX(t2.c2) AS c2
>  FROM 
>t t2 
>  GROUP BY 
>t2.c1
> ) t3
> ON t1.c2=t3.c2
> {code}
> However, we expected to see the following:
> ||c1||
> |abc|
> The problem seems to relate to the fact that in the subquery, we group by 
> column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11397) Parse Hive OR clauses as they are written into the AST

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647491#comment-14647491
 ] 

Jesus Camacho Rodriguez commented on HIVE-11397:


[~hagleitn], this looks good to me, we are just transforming the left deep tree 
into a right deep tree; the transformation is legal.

> Parse Hive OR clauses as they are written into the AST
> --
>
> Key: HIVE-11397
> URL: https://issues.apache.org/jira/browse/HIVE-11397
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
>
> When parsing A OR B OR C, hive converts it into 
> (C OR B) OR A
> instead of turning it into
> A OR (B OR C)
> {code}
> GenericUDFOPOr or = new GenericUDFOPOr();
> List expressions = new ArrayList(2);
> expressions.add(previous);
> expressions.add(current);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11383:
---
Attachment: HIVE-11383.8.patch

> Upgrade Hive to Calcite 1.4
> ---
>
> Key: HIVE-11383
> URL: https://issues.apache.org/jira/browse/HIVE-11383
> Project: Hive
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11383.1.patch, HIVE-11383.2.patch, 
> HIVE-11383.3.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
> HIVE-11383.4.patch, HIVE-11383.5.patch, HIVE-11383.6.patch, 
> HIVE-11383.7.patch, HIVE-11383.8.patch
>
>
> CLEAR LIBRARY CACHE
> Upgrade Hive to Calcite 1.4.0-incubating.
> There is currently a snapshot release, which is close to what will be in 1.4. 
> I have checked that Hive compiles against the new snapshot, fixing one issue. 
> The patch is attached.
> Next step is to validate that Hive runs against the new Calcite, and post any 
> issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
> can you please do that.
> [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
> the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647573#comment-14647573
 ] 

Hive QA commented on HIVE-11391:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747966/HIVE-11391.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9289 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4760/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4760/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4760/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12747966 - PreCommit-HIVE-TRUNK-Build

> CBO (Calcite Return Path): Add CBO tests with return path on
> 
>
> Key: HIVE-11391
> URL: https://issues.apache.org/jira/browse/HIVE-11391
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11391.patch, HIVE-11391.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647574#comment-14647574
 ] 

Rajat Khandelwal commented on HIVE-11376:
-

Created https://reviews.apache.org/r/36939/

> CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
> found for one of the input files
> -
>
> Key: HIVE-11376
> URL: https://issues.apache.org/jira/browse/HIVE-11376
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
> This is the exact code snippet:
> {noformat}
> / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
> the tree or not,
>   // we use a configuration variable for the same
>   if (this.mrwork != null && !this.mrwork.getHadoopSupportsSplittable()) {
> // The following code should be removed, once
> // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
> // Hadoop does not handle non-splittable files correctly for 
> CombineFileInputFormat,
> // so don't use CombineFileInputFormat for non-splittable files
> //ie, dont't combine if inputformat is a TextInputFormat and has 
> compression turned on
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11411) Transaction lock

2015-07-30 Thread shiqian.huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647575#comment-14647575
 ] 

shiqian.huang commented on HIVE-11411:
--

Transaction lock time out when can't tracking job progress. hive 1.2 
when hive client cann't connect to appmaster to tracking job progress and job 
running more than 5mins,  hive can't refresh lock heartbeat. then will get  
exception
2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
lock: 3645)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
at 
org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
I find the reason is  if (initializing && rj.getJobState() == JobStatus.PREP) 
throw exception when doesn't configure any hadoop slave host in /etc/hosts.   
is it a bug?

> Transaction lock
> 
>
> Key: HIVE-11411
> URL: https://issues.apache.org/jira/browse/HIVE-11411
> Project: Hive
>  Issue Type: Wish
>Reporter: shiqian.huang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: 
Transaction lock time out when can't tracking job progress. hive 1.2 
when hive client cann't connect to appmaster to tracking job progress and job 
running more than 5mins, hive can't refresh lock heartbeat. then will get 
exception
2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
lock: 3645)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
I find the reason is if (initializing && rj.getJobState() == JobStatus.PREP) 
throw exception when doesn't configure any hadoop slave host in /etc/hosts. 
is it a bug?

> Transaction lock
> 
>
> Key: HIVE-11411
> URL: https://issues.apache.org/jira/browse/HIVE-11411
> Project: Hive
>  Issue Type: Wish
>Reporter: shiqian.huang
>
> Transaction lock time out when can't tracking job progress. hive 1.2 
> when hive client cann't connect to appmaster to tracking job progress and job 
> running more than 5mins, hive can't refresh lock heartbeat. then will get 
> exception
> 2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
> (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
> lock: 3645)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
> at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
> at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
> at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
> at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExe

[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Affects Version/s: 1.2.0
 Priority: Minor  (was: Major)
  Summary: Transaction lock time out when can't tracking job 
progress  (was: Transaction lock)

> Transaction lock time out when can't tracking job progress
> --
>
> Key: HIVE-11411
> URL: https://issues.apache.org/jira/browse/HIVE-11411
> Project: Hive
>  Issue Type: Wish
>Affects Versions: 1.2.0
>Reporter: shiqian.huang
>Priority: Minor
>
> Transaction lock time out when can't tracking job progress. hive 1.2 
> when hive client cann't connect to appmaster to tracking job progress and job 
> running more than 5mins, hive can't refresh lock heartbeat. then will get 
> exception
> 2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
> (RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
> lock: 3645)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
> at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
> at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
> at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
> at 
> org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
> at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
> at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
> I find the reason is if (initializing && rj.getJobState() == JobStatus.PREP) 
> throw exception when doesn't configure any hadoop slave host in /etc/hosts. 
> is it a bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: 
Transaction lock time out when can't tracking job progress. hive 1.2 
when hive client cann't connect to appmaster to tracking job progress and job 
running more than 5mins, hive can't refresh lock heartbeat. then will get 
exception
2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
lock: 3645)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
I find the reason is ’if (initializing && rj.getJobState() == JobStatus.PREP)‘ 
throw exception when doesn't configure any hadoop slave host in /etc/hosts. 
is it a bug?

  was:
Transaction lock time out when can't tracking job progress. hive 1.2 
when hive client cann't connect to appmaster to tracking job progress and job 
running more than 5mins, hive can't refresh lock heartbeat. then will get 
exception
2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
lock: 3645)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.ru

[jira] [Commented] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647586#comment-14647586
 ] 

Jesus Camacho Rodriguez commented on HIVE-11391:


[~pxiong], can you review it? This adds the CBO tests to the testsuite with 
return path enabled; it is useful to check that we do not have any regressions 
while working in the return path. Thanks

> CBO (Calcite Return Path): Add CBO tests with return path on
> 
>
> Key: HIVE-11391
> URL: https://issues.apache.org/jira/browse/HIVE-11391
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11391.patch, HIVE-11391.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: in method progress(ExecDriverTaskHandle th) of class 
HadoopJobExecHelper, hive lock heartbeating progress work with job tracking 
progress. When job tracking progress got any exception, will bring lock time 
out exception to big query job.   (was: Transaction lock time out when can't 
tracking job progress. hive 1.2 
when hive client cann't connect to appmaster to tracking job progress and job 
running more than 5mins, hive can't refresh lock heartbeat. then will get 
exception
2015-07-30 17:23:30,161 ERROR [Thread-206]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(159)) - NoSuchLockException(message:No such 
lock: 3645)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1710)
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:622)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy7.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:1891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy8.heartbeat(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:293)
at org.apache.hadoop.hive.ql.exec.Heartbeater.heartbeat(Heartbeater.java:81)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:242)
at 
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:437)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
I find the reason is ’if (initializing && rj.getJobState() == JobStatus.PREP)‘ 
throw exception when doesn't configure any hadoop slave host in /etc/hosts. 
is it a bug?)

> Transaction lock time out when can't tracking job progress
> --
>
> Key: HIVE-11411
> URL: https://issues.apache.org/jira/browse/HIVE-11411
> Project: Hive
>  Issue Type: Wish
>Affects Versions: 1.2.0
>Reporter: shiqian.huang
>Priority: Minor
>
> in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, 
> hive lock heartbeating progress work with job tracking progress. When job 
> tracking progress got any exception, will bring lock time out exception to 
> big query job. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: 
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing && rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring lock time out exception to big query job.  

  was:in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, 
hive lock heartbeating progress work with job tracking progress. When job 
tracking progress got any exception, will bring lock time out exception to big 
query job. 


> Transaction lock time out when can't tracking job progress
> --
>
> Key: HIVE-11411
> URL: https://issues.apache.org/jira/browse/HIVE-11411
> Project: Hive
>  Issue Type: Wish
>Affects Versions: 1.2.0
>Reporter: shiqian.huang
>Priority: Minor
>
> in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, 
> hive lock heartbeating progress works with job tracking progress. 
> such like 
>   heartbeater.heartbeat();
>   if (initializing && rj.getJobState() == JobStatus.PREP) {
> // No reason to poll untill the job is initialized
> continue;
>   } else {
> // By now the job is initialized so no reason to do
> // rj.getJobState() again and we do not want to do an extra RPC call
> initializing = false;
>   }
> When job tracking progress got any exception in  rj.getJobState() == 
> JobStatus.PREP, will bring lock time out exception to big query job.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: 
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing && rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring lock time out exception to big query job finally.  

  was:
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing && rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring lock time out exception to big query job.  


> Transaction lock time out when can't tracking job progress
> --
>
> Key: HIVE-11411
> URL: https://issues.apache.org/jira/browse/HIVE-11411
> Project: Hive
>  Issue Type: Wish
>Affects Versions: 1.2.0
>Reporter: shiqian.huang
>Priority: Minor
>
> in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, 
> hive lock heartbeating progress works with job tracking progress. 
> such like 
>   heartbeater.heartbeat();
>   if (initializing && rj.getJobState() == JobStatus.PREP) {
> // No reason to poll untill the job is initialized
> continue;
>   } else {
> // By now the job is initialized so no reason to do
> // rj.getJobState() again and we do not want to do an extra RPC call
> initializing = false;
>   }
> When job tracking progress got any exception in  rj.getJobState() == 
> JobStatus.PREP, will bring lock time out exception to big query job finally.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11411) Transaction lock time out when can't tracking job progress

2015-07-30 Thread shiqian.huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shiqian.huang updated HIVE-11411:
-
Description: 
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing && rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring  NoSuchLockException(hive client & exception 
message:No record of lock could be found, may have timed out) to big query job 
finally. 

  was:
in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, hive 
lock heartbeating progress works with job tracking progress. 
such like 
  heartbeater.heartbeat();

  if (initializing && rj.getJobState() == JobStatus.PREP) {
// No reason to poll untill the job is initialized
continue;
  } else {
// By now the job is initialized so no reason to do
// rj.getJobState() again and we do not want to do an extra RPC call
initializing = false;
  }
When job tracking progress got any exception in  rj.getJobState() == 
JobStatus.PREP, will bring lock time out exception to big query job finally.  


> Transaction lock time out when can't tracking job progress
> --
>
> Key: HIVE-11411
> URL: https://issues.apache.org/jira/browse/HIVE-11411
> Project: Hive
>  Issue Type: Wish
>Affects Versions: 1.2.0
>Reporter: shiqian.huang
>Priority: Minor
>
> in method progress(ExecDriverTaskHandle th) of class HadoopJobExecHelper, 
> hive lock heartbeating progress works with job tracking progress. 
> such like 
>   heartbeater.heartbeat();
>   if (initializing && rj.getJobState() == JobStatus.PREP) {
> // No reason to poll untill the job is initialized
> continue;
>   } else {
> // By now the job is initialized so no reason to do
> // rj.getJobState() again and we do not want to do an extra RPC call
> initializing = false;
>   }
> When job tracking progress got any exception in  rj.getJobState() == 
> JobStatus.PREP, will bring  NoSuchLockException(hive client & exception 
> message:No record of lock could be found, may have timed out) to big query 
> job finally. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647608#comment-14647608
 ] 

Rajat Khandelwal commented on HIVE-11376:
-

Taking patch from reviewboard and attaching

> CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
> found for one of the input files
> -
>
> Key: HIVE-11376
> URL: https://issues.apache.org/jira/browse/HIVE-11376
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-11376_02.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
> This is the exact code snippet:
> {noformat}
> / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
> the tree or not,
>   // we use a configuration variable for the same
>   if (this.mrwork != null && !this.mrwork.getHadoopSupportsSplittable()) {
> // The following code should be removed, once
> // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
> // Hadoop does not handle non-splittable files correctly for 
> CombineFileInputFormat,
> // so don't use CombineFileInputFormat for non-splittable files
> //ie, dont't combine if inputformat is a TextInputFormat and has 
> compression turned on
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal reassigned HIVE-11376:
---

Assignee: Rajat Khandelwal

> CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
> found for one of the input files
> -
>
> Key: HIVE-11376
> URL: https://issues.apache.org/jira/browse/HIVE-11376
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-11376_02.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
> This is the exact code snippet:
> {noformat}
> / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
> the tree or not,
>   // we use a configuration variable for the same
>   if (this.mrwork != null && !this.mrwork.getHadoopSupportsSplittable()) {
> // The following code should be removed, once
> // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
> // Hadoop does not handle non-splittable files correctly for 
> CombineFileInputFormat,
> // so don't use CombineFileInputFormat for non-splittable files
> //ie, dont't combine if inputformat is a TextInputFormat and has 
> compression turned on
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-07-30 Thread Rajat Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Khandelwal updated HIVE-11376:

Attachment: HIVE-11376_02.patch

> CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
> found for one of the input files
> -
>
> Key: HIVE-11376
> URL: https://issues.apache.org/jira/browse/HIVE-11376
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-11376_02.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
> This is the exact code snippet:
> {noformat}
> / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
> the tree or not,
>   // we use a configuration variable for the same
>   if (this.mrwork != null && !this.mrwork.getHadoopSupportsSplittable()) {
> // The following code should be removed, once
> // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
> // Hadoop does not handle non-splittable files correctly for 
> CombineFileInputFormat,
> // so don't use CombineFileInputFormat for non-splittable files
> //ie, dont't combine if inputformat is a TextInputFormat and has 
> compression turned on
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647680#comment-14647680
 ] 

Hive QA commented on HIVE-11383:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12747969/HIVE-11383.8.patch

{color:red}ERROR:{color} -1 due to 59 failed/errored test(s), 9276 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_exists
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_cond_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subq_where_serialization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_temp_table_subquery1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_unencrypted_tbl
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_join_filters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_leftsemi_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.

[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-11401:
---
Attachment: HIVE-11401.1.patch

> Predicate push down does not work with Parquet when partitions are in the 
> expression
> 
>
> Key: HIVE-11401
> URL: https://issues.apache.org/jira/browse/HIVE-11401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11401.1.patch
>
>
> When filtering Parquet tables using a partition column, the query fails 
> saying the column does not exist:
> {noformat}
> hive> create table part1 (id int, content string) partitioned by (p string) 
> stored as parquet;
> hive> alter table part1 add partition (p='p1');
> hive> insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
> hive> select id from part1 where p='p1';
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Column [p] was not found in schema!
> Time taken: 0.151 seconds
> {noformat}
> It is correct that the partition column is not part of the Parquet schema. 
> So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11380) NPE when FileSinkOperator is not initialized

2015-07-30 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11380:

Summary: NPE when FileSinkOperator is not initialized  (was: NPE when 
FileSinkOperator is not inialized)

> NPE when FileSinkOperator is not initialized
> 
>
> Key: HIVE-11380
> URL: https://issues.apache.org/jira/browse/HIVE-11380
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>
> When FileSinkOperator's initializeOp is not called (which may happen when an 
> operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
> throw NPE at close time. The stacktrace:
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)
> ... 18 more
> {noformat}
> This Exception is misleading and often distracts users from finding real 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-11401:
---
Attachment: (was: HIVE-11401.1.patch)

> Predicate push down does not work with Parquet when partitions are in the 
> expression
> 
>
> Key: HIVE-11401
> URL: https://issues.apache.org/jira/browse/HIVE-11401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11401.1.patch
>
>
> When filtering Parquet tables using a partition column, the query fails 
> saying the column does not exist:
> {noformat}
> hive> create table part1 (id int, content string) partitioned by (p string) 
> stored as parquet;
> hive> alter table part1 add partition (p='p1');
> hive> insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
> hive> select id from part1 where p='p1';
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Column [p] was not found in schema!
> Time taken: 0.151 seconds
> {noformat}
> It is correct that the partition column is not part of the Parquet schema. 
> So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-11401:
---
Attachment: HIVE-11401.1.patch

> Predicate push down does not work with Parquet when partitions are in the 
> expression
> 
>
> Key: HIVE-11401
> URL: https://issues.apache.org/jira/browse/HIVE-11401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11401.1.patch
>
>
> When filtering Parquet tables using a partition column, the query fails 
> saying the column does not exist:
> {noformat}
> hive> create table part1 (id int, content string) partitioned by (p string) 
> stored as parquet;
> hive> alter table part1 add partition (p='p1');
> hive> insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
> hive> select id from part1 where p='p1';
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Column [p] was not found in schema!
> Time taken: 0.151 seconds
> {noformat}
> It is correct that the partition column is not part of the Parquet schema. 
> So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11397) Parse Hive OR clauses as they are written into the AST

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11397:
---
Attachment: HIVE-11397.patch

> Parse Hive OR clauses as they are written into the AST
> --
>
> Key: HIVE-11397
> URL: https://issues.apache.org/jira/browse/HIVE-11397
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11397.patch
>
>
> When parsing A OR B OR C, hive converts it into 
> (C OR B) OR A
> instead of turning it into
> A OR (B OR C)
> {code}
> GenericUDFOPOr or = new GenericUDFOPOr();
> List expressions = new ArrayList(2);
> expressions.add(previous);
> expressions.add(current);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11397) Parse Hive OR clauses as they are written into the AST

2015-07-30 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647798#comment-14647798
 ] 

Jesus Camacho Rodriguez commented on HIVE-11397:


[~hagleitn], the patch flattens the OR clause in those cases. What do you think?

> Parse Hive OR clauses as they are written into the AST
> --
>
> Key: HIVE-11397
> URL: https://issues.apache.org/jira/browse/HIVE-11397
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11397.patch
>
>
> When parsing A OR B OR C, hive converts it into 
> (C OR B) OR A
> instead of turning it into
> A OR (B OR C)
> {code}
> GenericUDFOPOr or = new GenericUDFOPOr();
> List expressions = new ArrayList(2);
> expressions.add(previous);
> expressions.add(current);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11380) NPE when FileSinkOperator is not initialized

2015-07-30 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11380:

Attachment: HIVE-11380.1.patch

Null check to prevent NPE caused by non-initialized FileSinkOperator

> NPE when FileSinkOperator is not initialized
> 
>
> Key: HIVE-11380
> URL: https://issues.apache.org/jira/browse/HIVE-11380
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11380.1.patch
>
>
> When FileSinkOperator's initializeOp is not called (which may happen when an 
> operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
> throw NPE at close time. The stacktrace:
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)
> ... 18 more
> {noformat}
> This Exception is misleading and often distracts users from finding real 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647841#comment-14647841
 ] 

Sergio Peña commented on HIVE-11401:


[~szehon] [~dongc] [~Ferd] Could you help me review this code?

> Predicate push down does not work with Parquet when partitions are in the 
> expression
> 
>
> Key: HIVE-11401
> URL: https://issues.apache.org/jira/browse/HIVE-11401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11401.1.patch
>
>
> When filtering Parquet tables using a partition column, the query fails 
> saying the column does not exist:
> {noformat}
> hive> create table part1 (id int, content string) partitioned by (p string) 
> stored as parquet;
> hive> alter table part1 add partition (p='p1');
> hive> insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
> hive> select id from part1 where p='p1';
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Column [p] was not found in schema!
> Time taken: 0.151 seconds
> {noformat}
> It is correct that the partition column is not part of the Parquet schema. 
> So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner

2015-07-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8343:
-
Description: 
In addEvent() and processVertex(), there is call such as the following:
{code}
  queue.offer(event);
{code}

The return value should be checked. If false is returned, event would not have 
been queued.
Take a look at line 328 in:
http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html

  was:
In addEvent() and processVertex(), there is call such as the following:
{code}
  queue.offer(event);
{code}
The return value should be checked. If false is returned, event would not have 
been queued.
Take a look at line 328 in:
http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html


> Return value from BlockingQueue.offer() is not checked in 
> DynamicPartitionPruner
> 
>
> Key: HIVE-8343
> URL: https://issues.apache.org/jira/browse/HIVE-8343
> Project: Hive
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: JongWon Park
>Priority: Minor
> Attachments: HIVE-8343.patch
>
>
> In addEvent() and processVertex(), there is call such as the following:
> {code}
>   queue.offer(event);
> {code}
> The return value should be checked. If false is returned, event would not 
> have been queued.
> Take a look at line 328 in:
> http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()

2015-07-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8458:
-
Description: 
{code}
Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);

// if the plan path hasn't been initialized just return, nothing to clean.
if (mapPath == null && reducePath == null) {
  return;
}

try {
  FileSystem fs = mapPath.getFileSystem(conf);
{code}
If mapPath is null but reducePath is not null, getFileSystem() call would 
produce NPE

  was:
{code}
Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);

// if the plan path hasn't been initialized just return, nothing to clean.
if (mapPath == null && reducePath == null) {
  return;
}

try {
  FileSystem fs = mapPath.getFileSystem(conf);
{code}

If mapPath is null but reducePath is not null, getFileSystem() call would 
produce NPE


> Potential null dereference in Utilities#clearWork()
> ---
>
> Key: HIVE-8458
> URL: https://issues.apache.org/jira/browse/HIVE-8458
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Ted Yu
>Assignee: skrho
>Priority: Minor
> Attachments: HIVE-8458_001.patch
>
>
> {code}
> Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
> Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);
> // if the plan path hasn't been initialized just return, nothing to clean.
> if (mapPath == null && reducePath == null) {
>   return;
> }
> try {
>   FileSystem fs = mapPath.getFileSystem(conf);
> {code}
> If mapPath is null but reducePath is not null, getFileSystem() call would 
> produce NPE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11391) CBO (Calcite Return Path): Add CBO tests with return path on

2015-07-30 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647930#comment-14647930
 ] 

Pengcheng Xiong commented on HIVE-11391:


[~jcamachorodriguez], can u resubmit the patch for a QA run due to the recent 
commit of multijoin? If it can pass, +1.

> CBO (Calcite Return Path): Add CBO tests with return path on
> 
>
> Key: HIVE-11391
> URL: https://issues.apache.org/jira/browse/HIVE-11391
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11391.patch, HIVE-11391.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8950) Add support in ParquetHiveSerde to create table schema from a parquet file

2015-07-30 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647986#comment-14647986
 ] 

Ryan Blue commented on HIVE-8950:
-

[~gauravkumar37], the schema is read from the file once and then converted to 
DDL. The file is no longer used after that, so schema evolution proceeds as it 
normally would for any table.

> Add support in ParquetHiveSerde to create table schema from a parquet file
> --
>
> Key: HIVE-8950
> URL: https://issues.apache.org/jira/browse/HIVE-8950
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish K Singh
>Assignee: Gaurav Kumar
> Attachments: HIVE-8950.1.patch, HIVE-8950.2.patch, HIVE-8950.3.patch, 
> HIVE-8950.4.patch, HIVE-8950.5.patch, HIVE-8950.6.patch, HIVE-8950.7.patch, 
> HIVE-8950.8.patch, HIVE-8950.patch
>
>
> PARQUET-76 and PARQUET-47 ask for creating parquet backed tables without 
> having to specify the column names and types. As, parquet files store schema 
> in their footer, it is possible to generate hive schema from parquet file's 
> metadata. This will improve usability of parquet backed tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11397) Parse Hive OR clauses as they are written into the AST

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648004#comment-14648004
 ] 

Hive QA commented on HIVE-11397:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748011/HIVE-11397.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9274 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4762/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4762/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4762/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748011 - PreCommit-HIVE-TRUNK-Build

> Parse Hive OR clauses as they are written into the AST
> --
>
> Key: HIVE-11397
> URL: https://issues.apache.org/jira/browse/HIVE-11397
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11397.patch
>
>
> When parsing A OR B OR C, hive converts it into 
> (C OR B) OR A
> instead of turning it into
> A OR (B OR C)
> {code}
> GenericUDFOPOr or = new GenericUDFOPOr();
> List expressions = new ArrayList(2);
> expressions.add(previous);
> expressions.add(current);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10319) Hive CLI startup takes a long time with a large number of databases

2015-07-30 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648023#comment-14648023
 ] 

Jason Dere commented on HIVE-10319:
---

+1

> Hive CLI startup takes a long time with a large number of databases
> ---
>
> Key: HIVE-10319
> URL: https://issues.apache.org/jira/browse/HIVE-10319
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.0.0
>Reporter: Nezih Yigitbasi
>Assignee: Nezih Yigitbasi
> Attachments: HIVE-10319.1.patch, HIVE-10319.2.patch, 
> HIVE-10319.3.patch, HIVE-10319.4.patch, HIVE-10319.5.patch, HIVE-10319.patch
>
>
> The Hive CLI takes a long time to start when there is a large number of 
> databases in the DW. I think the root cause is the way permanent UDFs are 
> loaded from the metastore. When I looked at the logs and the source code I 
> see that at startup Hive first gets all the databases from the metastore and 
> then for each database it makes a metastore call to get the permanent 
> functions for that database [see Hive.java | 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L162-185].
>  So the number of metastore calls made is in the order of the number of 
> databases. In production we have several hundreds of databases so Hive makes 
> several hundreds of RPC calls during startup, taking 30+ seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11412) StackOverFlow in SemanticAnalyzer for huge filters (~5000)

2015-07-30 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648053#comment-14648053
 ] 

Prasanth Jayachandran commented on HIVE-11412:
--

[~hsubramaniyan] fyi..

> StackOverFlow in SemanticAnalyzer for huge filters (~5000)
> --
>
> Key: HIVE-11412
> URL: https://issues.apache.org/jira/browse/HIVE-11412
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>
> Queries with ~5000 filter conditions fails in SemanticAnalysis
> Stack trace:
> {code}
> Exception in thread "main" java.lang.StackOverflowError
>   at java.util.HashMap.hash(HashMap.java:366)
>   at java.util.HashMap.getEntry(HashMap.java:466)
>   at java.util.HashMap.containsKey(HashMap.java:453)
>   at 
> org.apache.commons.collections.map.AbstractMapDecorator.containsKey(AbstractMapDecorator.java:83)
>   at 
> org.apache.hadoop.conf.Configuration.isDeprecated(Configuration.java:558)
>   at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:605)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:885)
>   at 
> org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:907)
>   at 
> org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1308)
>   at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:2641)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11132)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11226)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11226)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.processPositionAlias(SemanticAnalyzer.java:11226)
> {code}
> Query:
> {code}
> explain select count(*) from over1k where (
> (t=1 and si=2)
> or (t=2 and si=3)
> or (t=3 and si=4) 
> or (t=4 and si=5) 
> or (t=5 and si=6) 
> or (t=6 and si=7) 
> or (t=7 and si=8)
> or (t=7 and si=8)
> or (t=7 and si=8)
> ...
> {code}
> Repeat the filter around 5000 times. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11384) Add Test case which cover both HIVE-11271 and HIVE-11333

2015-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648058#comment-14648058
 ] 

Szehon Ho commented on HIVE-11384:
--

No problem, makes sense.  +1, always good to have more tests.

> Add Test case which cover both HIVE-11271 and HIVE-11333
> 
>
> Key: HIVE-11384
> URL: https://issues.apache.org/jira/browse/HIVE-11384
> Project: Hive
>  Issue Type: Test
>  Components: Logical Optimizer, Parser
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11384.1.patch
>
>
> Add some test queries that need both HIVE-11271 and HIVE-11333 are fixed to 
> pass. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11380) NPE when FileSinkOperator is not initialized

2015-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648063#comment-14648063
 ] 

Szehon Ho commented on HIVE-11380:
--

+1, seems good to add the null check here to me

> NPE when FileSinkOperator is not initialized
> 
>
> Key: HIVE-11380
> URL: https://issues.apache.org/jira/browse/HIVE-11380
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11380.1.patch
>
>
> When FileSinkOperator's initializeOp is not called (which may happen when an 
> operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
> throw NPE at close time. The stacktrace:
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)
> ... 18 more
> {noformat}
> This Exception is misleading and often distracts users from finding real 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression

2015-07-30 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648081#comment-14648081
 ] 

Prasanth Jayachandran commented on HIVE-11405:
--

[~gopalv] is the column stats available for this query? If not your patch will 
early terminate because of data size becoming 0 and AND evaluation terminating 
early. Also I am not sure if this assumption is correct
{code}
final long branch2Rows = (newNumRows <= branchRows) ? 0 : (newNumRows - 
branchRows);
{code}

I am still evaluating this change. The idea of mirroring the tree and passing 
the branchRows to sibling branch looks good so far.

> Add early termination for recursion in 
> StatsRulesProcFactory$FilterStatsRule.evaluateExpression  for OR expression
> --
>
> Key: HIVE-11405
> URL: https://issues.apache.org/jira/browse/HIVE-11405
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Prasanth Jayachandran
>
> Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
> him,
> "The recursion protection works well with an AND expr, but it doesn't work 
> against
> (OR a=1 (OR a=2 (OR a=3 (OR ...)
> since the for the rows will never be reduced during recursion due to the 
> nature of the OR.
> We need to execute a short-circuit to satisfy the OR properly - no case which 
> matches a=1 qualifies for the rest of the filters.
> Recursion should pass in the numRows - branch1Rows for the branch-2."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648100#comment-14648100
 ] 

Szehon Ho commented on HIVE-11401:
--

+1 makes sense from my end.

> Predicate push down does not work with Parquet when partitions are in the 
> expression
> 
>
> Key: HIVE-11401
> URL: https://issues.apache.org/jira/browse/HIVE-11401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11401.1.patch
>
>
> When filtering Parquet tables using a partition column, the query fails 
> saying the column does not exist:
> {noformat}
> hive> create table part1 (id int, content string) partitioned by (p string) 
> stored as parquet;
> hive> alter table part1 add partition (p='p1');
> hive> insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
> hive> select id from part1 where p='p1';
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Column [p] was not found in schema!
> Time taken: 0.151 seconds
> {noformat}
> It is correct that the partition column is not part of the Parquet schema. 
> So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11160) Auto-gather column stats

2015-07-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11160:
---
Attachment: HIVE-11160.03.patch

rebase the patch

> Auto-gather column stats
> 
>
> Key: HIVE-11160
> URL: https://issues.apache.org/jira/browse/HIVE-11160
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11160.01.patch, HIVE-11160.02.patch, 
> HIVE-11160.03.patch
>
>
> Hive will collect table stats when set hive.stats.autogather=true during the 
> INSERT OVERWRITE command. And then the users need to collect the column stats 
> themselves using "Analyze" command. In this patch, the column stats will also 
> be collected automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION

2015-07-30 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648123#comment-14648123
 ] 

Pengcheng Xiong commented on HIVE-11409:


a good example is union_remove_10.q
{code}
Group By Operator
  aggregations: count(VALUE._col0)
  keys: KEY._col0 (type: string)
  mode: mergepartial
  outputColumnNames: $f0, $f1
  Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f0 (type: string), $f1 (type: bigint)
outputColumnNames: key, values
Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 1 Data size: 30 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat
  output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
  name: default.outputtbl1
{code}

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before 
> UNION
> --
>
> Key: HIVE-11409
> URL: https://issues.apache.org/jira/browse/HIVE-11409
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11409.01.patch
>
>
> Two purpose: (1) to ensure that the data type of non-primary branch (the 1st 
> branch is the primary branch) of union can be casted to that of the primary 
> branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is 
> redundant, it will be removed by IdentidyProjectRemover optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11413) Error in detecting availability of HiveSemanticAnalyzerHooks

2015-07-30 Thread Raajay Viswanathan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raajay Viswanathan reassigned HIVE-11413:
-

Assignee: Raajay Viswanathan

> Error in detecting availability of HiveSemanticAnalyzerHooks
> 
>
> Key: HIVE-11413
> URL: https://issues.apache.org/jira/browse/HIVE-11413
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.0.0
>Reporter: Raajay Viswanathan
>Assignee: Raajay Viswanathan
>Priority: Trivial
>  Labels: newbie
>
> In {{compile(String, Boolean)}} function in {{Driver.java}}, the list of 
> available {{HiveSemanticAnalyzerHook}} (_saHooks_) are obtained using the 
> {{getHooks}} method. This method always  returns a {{List}} of hooks. 
> However, while checking for availability of hooks, the current version of the 
> code uses a comparison of _saHooks_ with NULL. This is incorrect, as the 
> segment of code designed to call pre and post Analyze functions gets executed 
> even when the list is empty. The comparison should be changed to 
> {{saHooks.size() > 0}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11380) NPE when FileSinkOperator is not initialized

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648150#comment-14648150
 ] 

Hive QA commented on HIVE-11380:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748014/HIVE-11380.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9276 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4763/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4763/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4763/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748014 - PreCommit-HIVE-TRUNK-Build

> NPE when FileSinkOperator is not initialized
> 
>
> Key: HIVE-11380
> URL: https://issues.apache.org/jira/browse/HIVE-11380
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11380.1.patch
>
>
> When FileSinkOperator's initializeOp is not called (which may happen when an 
> operator before FileSinkOperator initializeOp failed), FileSinkOperator will 
> throw NPE at close time. The stacktrace:
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:523)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:519)
> ... 18 more
> {noformat}
> This Exception is misleading and often distracts users from finding real 
> issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11413) Error in detecting availability of HiveSemanticAnalyzerHooks

2015-07-30 Thread Raajay Viswanathan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raajay Viswanathan updated HIVE-11413:
--
Attachment: HIVE-11413.patch

Check if _saHooks_ is empty instead of checking if it is NULL. Need code review.

> Error in detecting availability of HiveSemanticAnalyzerHooks
> 
>
> Key: HIVE-11413
> URL: https://issues.apache.org/jira/browse/HIVE-11413
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.0.0
>Reporter: Raajay Viswanathan
>Assignee: Raajay Viswanathan
>Priority: Trivial
>  Labels: newbie
> Attachments: HIVE-11413.patch
>
>
> In {{compile(String, Boolean)}} function in {{Driver.java}}, the list of 
> available {{HiveSemanticAnalyzerHook}} (_saHooks_) are obtained using the 
> {{getHooks}} method. This method always  returns a {{List}} of hooks. 
> However, while checking for availability of hooks, the current version of the 
> code uses a comparison of _saHooks_ with NULL. This is incorrect, as the 
> segment of code designed to call pre and post Analyze functions gets executed 
> even when the list is empty. The comparison should be changed to 
> {{saHooks.size() > 0}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: HIVE-11407-branch-1.0.patch

> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407-branch-1.0.patch
>
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: (was: HIVE-11407-branch-1.0.patch)

> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11407-branch-1.0.patch
>
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: (was: HIVE-11407-branch-1.0.patch)

> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11407-branch-1.0.patch
>
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: HIVE-11407-branch-1.0.patch

> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11407-branch-1.0.patch
>
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11407:
-
Attachment: HIVE-11407.1.patch

Patch for master and branch-1 .


> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch
>
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648237#comment-14648237
 ] 

Thejas M Nair edited comment on HIVE-11407 at 7/30/15 8:17 PM:
---

HIVE-11407.1.patch - Patch for master and branch-1 .



was (Author: thejas):
Patch for master and branch-1 .


> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch
>
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648241#comment-14648241
 ] 

Thejas M Nair commented on HIVE-11407:
--

[~sushanth] Can you please review my edits to your patch ? 


> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch
>
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION

2015-07-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11409:
---
Attachment: HIVE-11409.02.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before 
> UNION
> --
>
> Key: HIVE-11409
> URL: https://issues.apache.org/jira/browse/HIVE-11409
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11409.01.patch, HIVE-11409.02.patch
>
>
> Two purpose: (1) to ensure that the data type of non-primary branch (the 1st 
> branch is the primary branch) of union can be casted to that of the primary 
> branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is 
> redundant, it will be removed by IdentidyProjectRemover optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11407) JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM

2015-07-30 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648281#comment-14648281
 ] 

Sushanth Sowmyan commented on HIVE-11407:
-

The edits look good, +1.

> JDBC DatabaseMetaData.getTables with large no of tables call leads to HS2 OOM 
> --
>
> Key: HIVE-11407
> URL: https://issues.apache.org/jira/browse/HIVE-11407
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-11407-branch-1.0.patch, HIVE-11407.1.patch
>
>
> With around 7000 tables having around 1500 columns each, and 512MB of HS2 
> memory, I am able to reproduce this OOM .
> Most of the memory is consumed by the datanucleus objects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648291#comment-14648291
 ] 

Hive QA commented on HIVE-11401:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748010/HIVE-11401.1.patch

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 9278 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.io.parquet.TestParquetRecordReaderWrapper.testBuilder
org.apache.hadoop.hive.ql.io.parquet.TestParquetRecordReaderWrapper.testBuilderComplexTypes
org.apache.hadoop.hive.ql.io.parquet.TestParquetRecordReaderWrapper.testBuilderComplexTypes2
org.apache.hadoop.hive.ql.io.parquet.TestParquetRecordReaderWrapper.testBuilderFloat
org.apache.hadoop.hive.ql.io.sarg.TestConvertAstToSearchArg.testExpression1
org.apache.hadoop.hive.ql.io.sarg.TestConvertAstToSearchArg.testExpression10
org.apache.hadoop.hive.ql.io.sarg.TestConvertAstToSearchArg.testExpression2
org.apache.hadoop.hive.ql.io.sarg.TestConvertAstToSearchArg.testExpression3
org.apache.hadoop.hive.ql.io.sarg.TestConvertAstToSearchArg.testExpression4
org.apache.hadoop.hive.ql.io.sarg.TestConvertAstToSearchArg.testExpression5
org.apache.hadoop.hive.ql.io.sarg.TestConvertAstToSearchArg.testExpression7
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4764/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4764/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4764/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748010 - PreCommit-HIVE-TRUNK-Build

> Predicate push down does not work with Parquet when partitions are in the 
> expression
> 
>
> Key: HIVE-11401
> URL: https://issues.apache.org/jira/browse/HIVE-11401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11401.1.patch
>
>
> When filtering Parquet tables using a partition column, the query fails 
> saying the column does not exist:
> {noformat}
> hive> create table part1 (id int, content string) partitioned by (p string) 
> stored as parquet;
> hive> alter table part1 add partition (p='p1');
> hive> insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
> hive> select id from part1 where p='p1';
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Column [p] was not found in schema!
> Time taken: 0.151 seconds
> {noformat}
> It is correct that the partition column is not part of the Parquet schema. 
> So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2015-07-30 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648302#comment-14648302
 ] 

Vaibhav Gumashta commented on HIVE-11408:
-

Looks like we fixed this in 1.2 via HIVE-10329.

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-11408
> URL: https://issues.apache.org/jira/browse/HIVE-11408
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
> (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
> Basically, add jar creates a new classloader for loading the classes from the 
> new jar and adds the new classloader to the SessionState object of user's 
> session, making the older one its parent. Creating a temporary function uses 
> the new classloader to load the class used for the function. On closing a 
> session, although there is code to close the classloader for the session, I'm 
> not seeing the new classloader getting GCed and from the heapdump I can see 
> it holds on to the temporary function's class that should have gone away 
> after the session close. 
> Steps to reproduce:
> 1.
> {code}
> jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
> {code}
> 2. 
> Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
> added.
> 3. 
> {code}
> jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
> 'org.gumashta.udf.AUDF'; 
> {code}
> 4. 
> Close the jdbc session.
> 5. 
> Take the memory snapshot and verify that the new URLClassLoader is indeed 
> there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
> session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11401) Predicate push down does not work with Parquet when partitions are in the expression

2015-07-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-11401:
---
Attachment: HIVE-11401.2.patch

> Predicate push down does not work with Parquet when partitions are in the 
> expression
> 
>
> Key: HIVE-11401
> URL: https://issues.apache.org/jira/browse/HIVE-11401
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-11401.1.patch, HIVE-11401.2.patch
>
>
> When filtering Parquet tables using a partition column, the query fails 
> saying the column does not exist:
> {noformat}
> hive> create table part1 (id int, content string) partitioned by (p string) 
> stored as parquet;
> hive> alter table part1 add partition (p='p1');
> hive> insert into table part1 partition (p='p1') values (1, 'a'), (2, 'b');
> hive> select id from part1 where p='p1';
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Column [p] was not found in schema!
> Time taken: 0.151 seconds
> {noformat}
> It is correct that the partition column is not part of the Parquet schema. 
> So, the fix should be to remove such expression from the Parquet PPD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's cachedLazyStruct weakly referenced

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Component/s: File Formats

> Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's 
> cachedLazyStruct weakly referenced
> --
>
> Key: HIVE-11414
> URL: https://issues.apache.org/jira/browse/HIVE-11414
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Zheng Shao
>Priority: Minor
>
> MapTask hit OOM in the following situation in our production environment:
> * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
> * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
> * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
> * MapTask memory Xmx: 1.5GB
> By analyzing the heap dump using jhat, we realized that the problem is:
> * One single mapper is processing many partitions (because of 
> CombineHiveInputFormat)
> * Each input path (equivalent to partition here) will construct its own SerDe
> * Each SerDe will do its own caching of deserialized object (and try to reuse 
> it), but will never release it (in this case, the 
> serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take 
> a lot of space - pretty much the last N rows of a file where N is the number 
> of rows in a columnar block).
> * This problem may exist in other SerDe as well, but columnar file format are 
> affected the most because they need bigger cache for the last N rows instead 
> of 1 row.
> Proposed solution:
> * Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
> other columnar serde if any (e.g. maybe ORCFile's serde as well).
> Alternative solutions:
> * We can also free up the whole SerDe after processing a block/file.  The 
> problem with that is that the input splits may contain multiple blocks/files 
> that maps to the same SerDe, and recreating a SerDe is just more work.
> * We can also move the SerDe creation/free-up to the place when input file 
> changes.  But that requires a much bigger change to the code.
> * We can also add a "cleanup()" method to SerDe interface that release the 
> cached object, but that change is not backward compatible with many SerDes 
> that people have wrote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Summary: Fix OOM in MapTask with many input partitions with RCFile  (was: 
Fix OOM in MapTask with many input partitions by making ColumnarSerDeBase's 
cachedLazyStruct weakly referenced)

> Fix OOM in MapTask with many input partitions with RCFile
> -
>
> Key: HIVE-11414
> URL: https://issues.apache.org/jira/browse/HIVE-11414
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Zheng Shao
>Priority: Minor
>
> MapTask hit OOM in the following situation in our production environment:
> * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
> * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
> * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
> * MapTask memory Xmx: 1.5GB
> By analyzing the heap dump using jhat, we realized that the problem is:
> * One single mapper is processing many partitions (because of 
> CombineHiveInputFormat)
> * Each input path (equivalent to partition here) will construct its own SerDe
> * Each SerDe will do its own caching of deserialized object (and try to reuse 
> it), but will never release it (in this case, the 
> serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take 
> a lot of space - pretty much the last N rows of a file where N is the number 
> of rows in a columnar block).
> * This problem may exist in other SerDe as well, but columnar file format are 
> affected the most because they need bigger cache for the last N rows instead 
> of 1 row.
> Proposed solution:
> * Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
> other columnar serde if any (e.g. maybe ORCFile's serde as well).
> Alternative solutions:
> * We can also free up the whole SerDe after processing a block/file.  The 
> problem with that is that the input splits may contain multiple blocks/files 
> that maps to the same SerDe, and recreating a SerDe is just more work.
> * We can also move the SerDe creation/free-up to the place when input file 
> changes.  But that requires a much bigger change to the code.
> * We can also add a "cleanup()" method to SerDe interface that release the 
> cached object, but that change is not backward compatible with many SerDes 
> that people have wrote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Description: 
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a "cleanup()" method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.


  was:
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
other columnar serde if any (e.g. maybe ORCFile's serde as well).

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a "cleanup()" method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.



> Fix OOM in MapTask with many input partitions with RCFile
> -
>
> Key: HIVE-11414
> URL: https://issues.apache.org/jira/browse/HIVE-11414
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Zheng Shao
>Priority: Minor
>
> MapTask hit OOM in the following situation in our production environment:
> * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
> * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
> * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
> * MapTask memory Xmx: 1.5GB
> By analyzing the heap dump using jhat, we realized that the problem is:
> * One single mapper is processing many partitions (because of 
> CombineHiveInputFormat)
> * Each input path (equivalent to partition here) will construct its own SerDe
> * Each SerDe will do its own caching of deserialized object (and try to reuse 
> it), but will never release it (in this case, the 
> serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take 
> a lot of space - pretty much the last N rows of a file where N is the number 
> of rows

[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Description: 
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase.  The cost 
saving of not recreating a single object is too small compared to processing N 
rows.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a "cleanup()" method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.
* We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object, but that feels like an overkill.



  was:
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a "cleanup()" method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.



> Fix OOM in MapTask with many input partitions with RCFile
> -
>
> Key: HIVE-11414
> URL: https://issues.apache.org/jira/browse/HIVE-11414
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Zheng Shao
>Priority: Minor
>
> MapTask hit OOM in the following situation in our production environment:
> * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
> * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
> * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
> * MapTask memory Xmx: 1.5GB
> By analyzing the heap dump using jhat, we realized that the problem is:
> * One single mapper is processing many partitions (because of 
> CombineHiveInputFormat)
> * Each input path (equivalent to partition here) will construct its own SerDe
> * Each SerDe will do its own caching of deserialized object (and try to reuse 
> it), but will never release it (in this case, the 
> serde2.columnar.Co

[jira] [Commented] (HIVE-11415) Add early termination for recursion in vectorization for deep filter queries

2015-07-30 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648352#comment-14648352
 ] 

Prasanth Jayachandran commented on HIVE-11415:
--

[~hsubramaniyan]/[~mmccline] fyi..

> Add early termination for recursion in vectorization for deep filter queries
> 
>
> Key: HIVE-11415
> URL: https://issues.apache.org/jira/browse/HIVE-11415
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>
> Queries with deep filters (left deep) throws StackOverflowException in 
> vectorization
> {code}
> Exception in thread "main" java.lang.StackOverflowError
>   at java.lang.Class.getAnnotation(Class.java:3415)
>   at 
> org.apache.hive.common.util.AnnotationUtils.getAnnotation(AnnotationUtils.java:29)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExpressionDescriptor.getVectorExpressionClass(VectorExpressionDescriptor.java:332)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:988)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:439)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.createVectorExpression(VectorizationContext.java:1014)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpressionForUdf(VectorizationContext.java:996)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1164)
> {code}
> Sample query:
> {code}
> explain select count(*) from over1k where (
> (t=1 and si=2)
> or (t=2 and si=3)
> or (t=3 and si=4) 
> or (t=4 and si=5) 
> or (t=5 and si=6) 
> or (t=6 and si=7) 
> or (t=7 and si=8)
> ...
> ..
> {code}
> repeat the filter for few thousand times for reproduction of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-10863:
--

Assignee: Xuefu Zhang  (was: Deepesh Khandelwal)

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Attachment: (was: HIVE-10863.1-spark.patch)

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Summary: Merge trunk to Spark branch 7/29/2015 [Spark Branch]  (was: Merge 
trunk to Spark branch 5/28/2015 [Spark Branch])

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Deepesh Khandelwal
> Attachments: mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Attachment: (was: HIVE-10863.0-spark.patch)

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: mj.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Comment: was deleted

(was: I resolved the mj.patch conflicts and pushed it into the spark branch. 
Attached the dummy patch again to trigger the tests.)

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Comment: was deleted

(was: 

{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12736177/HIVE-10863.1-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7948 tests executed
*Failed tests:*
{noformat}
TestCliDriver-infer_bucket_sort_multi_insert.q-insert_values_tmp_table.q-union_remove_11.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/871/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/871/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-871/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12736177 - PreCommit-HIVE-SPARK-Build)

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Comment: was deleted

(was: Unfortunately, the patch is too big to be attached here. I had to commit 
the merge and fix anything later on. There are conflicts, as shown below:
{code}
Conflicts:
pom.xml
ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java

ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
ql/src/test/results/clientpositive/runtime_skewjoin_mapjoin_spark.q.out
ql/src/test/results/clientpositive/spark/cbo_gby.q.out
ql/src/test/results/clientpositive/spark/cbo_simple_select.q.out
ql/src/test/results/clientpositive/spark/cbo_udf_udaf.q.out

ql/src/test/results/clientpositive/spark/runtime_skewjoin_mapjoin_spark.q.out
ql/src/test/results/clientpositive/spark/union12.q.out
ql/src/test/results/clientpositive/spark/union17.q.out
ql/src/test/results/clientpositive/spark/union20.q.out
ql/src/test/results/clientpositive/spark/union21.q.out
ql/src/test/results/clientpositive/spark/union22.q.out
ql/src/test/results/clientpositive/spark/union24.q.out
ql/src/test/results/clientpositive/spark/union26.q.out
ql/src/test/results/clientpositive/spark/union27.q.out
ql/src/test/results/clientpositive/spark/union31.q.out
ql/src/test/results/clientpositive/spark/union32.q.out
ql/src/test/results/clientpositive/spark/union34.q.out
ql/src/test/results/clientpositive/spark/union_lateralview.q.out
ql/src/test/results/clientpositive/spark/union_remove_12.q.out
ql/src/test/results/clientpositive/spark/union_remove_13.q.out
ql/src/test/results/clientpositive/spark/union_remove_14.q.out
ql/src/test/results/clientpositive/spark/union_remove_22.q.out
ql/src/test/results/clientpositive/spark/union_remove_23.q.out
ql/src/test/results/clientpositive/spark/union_remove_6_subq.q.out
ql/src/test/results/clientpositive/spark/union_top_level.q.out

service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java

spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java

spark-client/src/main/java/org/apache/hive/spark/client/SparkClientUtilities.java
{code}
I resolved most of them, except some changes from Spark branch is lost. The 
diff is shown in the attached mj.patch file. [~jxiang], could you take a look 
and see how to apply the diff?

We will need to watch the test result and fix them as needed.
)

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Attachment: (was: mj.patch)

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Comment: was deleted

(was: 

{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12736048/HIVE-10863.0-spark.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7962 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_2
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/869/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/869/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-869/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12736048 - PreCommit-HIVE-SPARK-Build)

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10863) Merge trunk to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Comment: was deleted

(was: Patch #0 is a dummy patch to trigger a test run.)

> Merge trunk to Spark branch 7/29/2015 [Spark Branch]
> 
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10863) Merge master to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Summary: Merge master to Spark branch 7/29/2015 [Spark Branch]  (was: Merge 
trunk to Spark branch 7/29/2015 [Spark Branch])

> Merge master to Spark branch 7/29/2015 [Spark Branch]
> -
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10863) Merge master to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10863:
---
Attachment: HIVE-10863.1-spark.patch

> Merge master to Spark branch 7/29/2015 [Spark Branch]
> -
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10863.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11318) Move ORC table properties from OrcFile to OrcOutputFormat

2015-07-30 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-11318:


Assignee: Owen O'Malley

> Move ORC table properties from OrcFile to OrcOutputFormat
> -
>
> Key: HIVE-11318
> URL: https://issues.apache.org/jira/browse/HIVE-11318
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Prasanth Jayachandran
>Assignee: Owen O'Malley
> Fix For: 2.0.0
>
>
> OrcFile contains TableProperties which can be moved to OrcOutputFormat. Also 
> remove deprecated configs that are no longer used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11160) Auto-gather column stats

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648420#comment-14648420
 ] 

Hive QA commented on HIVE-11160:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748041/HIVE-11160.03.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9277 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4765/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4765/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4765/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748041 - PreCommit-HIVE-TRUNK-Build

> Auto-gather column stats
> 
>
> Key: HIVE-11160
> URL: https://issues.apache.org/jira/browse/HIVE-11160
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11160.01.patch, HIVE-11160.02.patch, 
> HIVE-11160.03.patch
>
>
> Hive will collect table stats when set hive.stats.autogather=true during the 
> INSERT OVERWRITE command. And then the users need to collect the column stats 
> themselves using "Analyze" command. In this patch, the column stats will also 
> be collected automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2015-07-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648434#comment-14648434
 ] 

Sergio Peña commented on HIVE-11418:


I think we should support PURGE when dropping a database as well. 
[~ekoifman] What do you think about this?

> Dropping a database in an encryption zone with CASCADE and trash enabled fails
> --
>
> Key: HIVE-11418
> URL: https://issues.apache.org/jira/browse/HIVE-11418
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>
> Here's the query that fails:
> {noformat}
> hive> CREATE DATABASE db;
> hive> USE db;
> hive> CREATE TABLE a(id int);
> hive> SET fs.trash.interval=1;
> hive> DROP DATABASE db CASCADE;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> db.a because it is in an encryption zone and trash
>  is enabled.  Use PURGE option to skip trash.)
> {noformat}
> DROP DATABASE does not support PURGE, so we have to remove the tables one by 
> one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10975) Parquet: Bump the parquet version up to 1.8.0

2015-07-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648460#comment-14648460
 ] 

Sergio Peña commented on HIVE-10975:


[~Ferd] Parquet 1.8.1 is officially released. We can bump up to this new 
version.

> Parquet: Bump the parquet version up to 1.8.0
> -
>
> Key: HIVE-10975
> URL: https://issues.apache.org/jira/browse/HIVE-10975
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>Priority: Minor
> Attachments: HIVE-10975-parquet.patch, HIVE-10975.1-parquet.patch
>
>
> There are lots of changes since parquet's graduation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2015-07-30 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648458#comment-14648458
 ] 

Eugene Koifman commented on HIVE-11418:
---

This feels dangerous, but if "rm -Rf" exists perhaps this is valid as well.

On a separate note, setting fs.trash.intrval in a Hive session (or 
hive-site.xml) will lead to unexpected behavior.
Hadoop code won't see this value.  (HIVE-10986)

> Dropping a database in an encryption zone with CASCADE and trash enabled fails
> --
>
> Key: HIVE-11418
> URL: https://issues.apache.org/jira/browse/HIVE-11418
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>
> Here's the query that fails:
> {noformat}
> hive> CREATE DATABASE db;
> hive> USE db;
> hive> CREATE TABLE a(id int);
> hive> SET fs.trash.interval=1;
> hive> DROP DATABASE db CASCADE;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> db.a because it is in an encryption zone and trash
>  is enabled.  Use PURGE option to skip trash.)
> {noformat}
> DROP DATABASE does not support PURGE, so we have to remove the tables one by 
> one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8128) Improve Parquet Vectorization

2015-07-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648461#comment-14648461
 ] 

Sergio Peña commented on HIVE-8128:
---

Parquet 1.8.1 is now officially released.  Would it help if we bump up to 1.8.1?

> Improve Parquet Vectorization
> -
>
> Key: HIVE-8128
> URL: https://issues.apache.org/jira/browse/HIVE-8128
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Dong Chen
> Fix For: parquet-branch
>
> Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch, 
> HIVE-8128.6-parquet.patch, HIVE-8128.6-parquet.patch, testParquetFile
>
>
> We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, 
> VectorizedOrcSerde) which was partially done in HIVE-5998.
> As discussed in PARQUET-131, we will work out Hive POC based on the new 
> Parquet vectorized API, and then finish the implementation after finilized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY

2015-07-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-11416:
--

Assignee: Pengcheng Xiong

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby 
> Optimizer assumes the schema can match after removing RS and GBY
> --
>
> Key: HIVE-11416
> URL: https://issues.apache.org/jira/browse/HIVE-11416
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11416.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11416) CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY

2015-07-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11416:
---
Attachment: HIVE-11416.01.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby 
> Optimizer assumes the schema can match after removing RS and GBY
> --
>
> Key: HIVE-11416
> URL: https://issues.apache.org/jira/browse/HIVE-11416
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
> Attachments: HIVE-11416.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11419) hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as provided

2015-07-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11419:
---
Description: 
hive-shims-0.23 doesn't declare its {{hadoop-yarn-server-resourcemanager}} 
dependency as optional, so you get hive 2.6.0 on your classpath unless you 
explicitly excluded it.

see: 
[[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]]

  was:
hive-shims-0.23 doesn't declare its {{hadoop-arn-server-resourcemanager}} 
dependency as optional, so you get hive 2.6.0 on your classpath unless you 
explicitly excluded it.

see: 
[[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]]


> hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as 
> provided
> --
>
> Key: HIVE-11419
> URL: https://issues.apache.org/jira/browse/HIVE-11419
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 1.2.1
>Reporter: Steve Loughran
>Priority: Minor
>
> hive-shims-0.23 doesn't declare its {{hadoop-yarn-server-resourcemanager}} 
> dependency as optional, so you get hive 2.6.0 on your classpath unless you 
> explicitly excluded it.
> see: 
> [[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11419) hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as provided

2015-07-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11419:
---
Assignee: (was: Gopal V)

> hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as 
> provided
> --
>
> Key: HIVE-11419
> URL: https://issues.apache.org/jira/browse/HIVE-11419
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 1.2.1
>Reporter: Steve Loughran
>Priority: Minor
>
> hive-shims-0.23 doesn't declare its {{hadoop-arn-server-resourcemanager}} 
> dependency as optional, so you get hive 2.6.0 on your classpath unless you 
> explicitly excluded it.
> see: 
> [[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11419) hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as provided

2015-07-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-11419:
--

Assignee: Gopal V

> hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as 
> provided
> --
>
> Key: HIVE-11419
> URL: https://issues.apache.org/jira/browse/HIVE-11419
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 1.2.1
>Reporter: Steve Loughran
>Assignee: Gopal V
>Priority: Minor
>
> hive-shims-0.23 doesn't declare its {{hadoop-arn-server-resourcemanager}} 
> dependency as optional, so you get hive 2.6.0 on your classpath unless you 
> explicitly excluded it.
> see: 
> [[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11419) hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as provided

2015-07-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11419:
---
Description: 
hive-shims-0.23 doesn't declare its {{hadoop-yarn-server-resourcemanager}} 
dependency as optional, so you get hadoop 2.6.0 on your classpath unless you 
explicitly excluded it.

see: 
[[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]]

  was:
hive-shims-0.23 doesn't declare its {{hadoop-yarn-server-resourcemanager}} 
dependency as optional, so you get hive 2.6.0 on your classpath unless you 
explicitly excluded it.

see: 
[[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]]


> hive-shims-0.23 doesn't declare yarn-server-resourcemanager dependency as 
> provided
> --
>
> Key: HIVE-11419
> URL: https://issues.apache.org/jira/browse/HIVE-11419
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 1.2.1
>Reporter: Steve Loughran
>Priority: Minor
>
> hive-shims-0.23 doesn't declare its {{hadoop-yarn-server-resourcemanager}} 
> dependency as optional, so you get hadoop 2.6.0 on your classpath unless you 
> explicitly excluded it.
> see: 
> [[http://mvnrepository.com/artifact/org.apache.hive.shims/hive-shims-0.23/1.2.1]]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11414) Fix OOM in MapTask with many input partitions with RCFile

2015-07-30 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11414:
--
Description: 
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase.  The cost 
saving of not recreating a single object is too small compared to processing N 
rows.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is a much bigger change to 
the code.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a "cleanup()" method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.
* We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object, but that feels like an overkill.



  was:
MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Remove cachedLazyStruct in serde2.columnar.ColumnarSerDeBase.  The cost 
saving of not recreating a single object is too small compared to processing N 
rows.

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a "cleanup()" method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.
* We can make cachedLazyStruct in serde2.columnar.ColumnarSerDeBase a weakly 
referenced object, but that feels like an overkill.




> Fix OOM in MapTask with many input partitions with RCFile
> -
>
> Key: HIVE-11414
> URL: https://issues.apache.org/jira/browse/HIVE-11414
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, Serializers/Deserializers
>Affects Versions: 0.11.0, 0.12.0, 0.14.0, 0.13.1, 1.2.0
>Reporter: Zheng Shao
>Priority: Minor
>
> MapTask hit OOM in the following situation in our production environment:
> * src: 2048 partitions, each with 1 file of about 2MB using RCFile format
> * query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
> * Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
> * MapTask memory Xmx: 1.5GB
> By analyzing the heap dump using jhat, we realized that the problem is:
> * One single mapper is processing many partitions (because of 
> CombineHiveInputFormat)
> * Each inpu

[jira] [Commented] (HIVE-11418) Dropping a database in an encryption zone with CASCADE and trash enabled fails

2015-07-30 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648468#comment-14648468
 ] 

Eugene Koifman commented on HIVE-11418:
---

I meant hadoop code that actually check to see if file should be moved to trash

> Dropping a database in an encryption zone with CASCADE and trash enabled fails
> --
>
> Key: HIVE-11418
> URL: https://issues.apache.org/jira/browse/HIVE-11418
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 1.2.0
>Reporter: Sergio Peña
>
> Here's the query that fails:
> {noformat}
> hive> CREATE DATABASE db;
> hive> USE db;
> hive> CREATE TABLE a(id int);
> hive> SET fs.trash.interval=1;
> hive> DROP DATABASE db CASCADE;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> db.a because it is in an encryption zone and trash
>  is enabled.  Use PURGE option to skip trash.)
> {noformat}
> DROP DATABASE does not support PURGE, so we have to remove the tables one by 
> one, and then drop the database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default

2015-07-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648470#comment-14648470
 ] 

Sergio Peña commented on HIVE-10884:


Does this issue happen only with the attached patch? Or it happened because I 
enabled the TestBeeLineDriver tests?
The directory is preserved in the jenkins slaves, but those slaves expired 
after a while, and then destroyed; so we don't have access to those logs 
anymore.

> Enable some beeline tests and turn on HIVE-4239 by default
> --
>
> Key: HIVE-10884
> URL: https://issues.apache.org/jira/browse/HIVE-10884
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10884.01.patch, HIVE-10884.02.patch, 
> HIVE-10884.03.patch, HIVE-10884.04.patch, HIVE-10884.05.patch, 
> HIVE-10884.06.patch, HIVE-10884.07.patch, HIVE-10884.07.patch, 
> HIVE-10884.patch
>
>
> See comments in HIVE-4239.
> Beeline tests with parallelism need to be enabled to turn compilation 
> parallelism on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10863) Merge master to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648520#comment-14648520
 ] 

Hive QA commented on HIVE-10863:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748082/HIVE-10863.1-spark.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7742 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/945/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/945/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-945/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748082 - PreCommit-HIVE-SPARK-Build

> Merge master to Spark branch 7/29/2015 [Spark Branch]
> -
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10863.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-30 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648523#comment-14648523
 ] 

Mostafa Mokhtar commented on HIVE-11410:


[~mmccline]

> Join with subquery containing a group by incorrectly returns no results
> ---
>
> Key: HIVE-11410
> URL: https://issues.apache.org/jira/browse/HIVE-11410
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Nicholas Brenwald
>Priority: Minor
> Attachments: hive-site.xml
>
>
> Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
> row of data. For example create table *t* from an existing table which 
> contains at least 1 row of data by running:
> {code}
> create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
> {code}
> Table *t* looks like the following:
> ||c1||c2||
> |abc|0|
> Running the following query then returns zero results.
> {code}
> SELECT 
>   t1.c1
> FROM 
>   t t1
> JOIN
> (SELECT 
>t2.c1,
>MAX(t2.c2) AS c2
>  FROM 
>t t2 
>  GROUP BY 
>t2.c1
> ) t3
> ON t1.c2=t3.c2
> {code}
> However, we expected to see the following:
> ||c1||
> |abc|
> The problem seems to relate to the fact that in the subquery, we group by 
> column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-30 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-11410:
---

Assignee: Matt McCline

> Join with subquery containing a group by incorrectly returns no results
> ---
>
> Key: HIVE-11410
> URL: https://issues.apache.org/jira/browse/HIVE-11410
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Nicholas Brenwald
>Assignee: Matt McCline
>Priority: Minor
> Attachments: hive-site.xml
>
>
> Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
> row of data. For example create table *t* from an existing table which 
> contains at least 1 row of data by running:
> {code}
> create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
> {code}
> Table *t* looks like the following:
> ||c1||c2||
> |abc|0|
> Running the following query then returns zero results.
> {code}
> SELECT 
>   t1.c1
> FROM 
>   t t1
> JOIN
> (SELECT 
>t2.c1,
>MAX(t2.c2) AS c2
>  FROM 
>t t2 
>  GROUP BY 
>t2.c1
> ) t3
> ON t1.c2=t3.c2
> {code}
> However, we expected to see the following:
> ||c1||
> |abc|
> The problem seems to relate to the fact that in the subquery, we group by 
> column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11409) CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before UNION

2015-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648541#comment-14648541
 ] 

Hive QA commented on HIVE-11409:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12748059/HIVE-11409.02.patch

{color:green}SUCCESS:{color} +1 9276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4766/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4766/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4766/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12748059 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): add SEL before 
> UNION
> --
>
> Key: HIVE-11409
> URL: https://issues.apache.org/jira/browse/HIVE-11409
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11409.01.patch, HIVE-11409.02.patch
>
>
> Two purpose: (1) to ensure that the data type of non-primary branch (the 1st 
> branch is the primary branch) of union can be casted to that of the primary 
> branch; (2) to make UnionProcessor optimizer work; (3) if the SEL is 
> redundant, it will be removed by IdentidyProjectRemover optimizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10863) Merge master to Spark branch 7/29/2015 [Spark Branch]

2015-07-30 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648548#comment-14648548
 ] 

Chao Sun commented on HIVE-10863:
-

+1

> Merge master to Spark branch 7/29/2015 [Spark Branch]
> -
>
> Key: HIVE-10863
> URL: https://issues.apache.org/jira/browse/HIVE-10863
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10863.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression

2015-07-30 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648556#comment-14648556
 ] 

Prasanth Jayachandran commented on HIVE-11405:
--

[~gopalv] can you take a look at the patch?

> Add early termination for recursion in 
> StatsRulesProcFactory$FilterStatsRule.evaluateExpression  for OR expression
> --
>
> Key: HIVE-11405
> URL: https://issues.apache.org/jira/browse/HIVE-11405
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11405.patch
>
>
> Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
> him,
> "The recursion protection works well with an AND expr, but it doesn't work 
> against
> (OR a=1 (OR a=2 (OR a=3 (OR ...)
> since the for the rows will never be reduced during recursion due to the 
> nature of the OR.
> We need to execute a short-circuit to satisfy the OR properly - no case which 
> matches a=1 qualifies for the rest of the filters.
> Recursion should pass in the numRows - branch1Rows for the branch-2."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >