[jira] [Commented] (HIVE-11600) Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706369#comment-14706369
 ] 

Hive QA commented on HIVE-11600:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751613/HIVE-11600.03.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9369 tests executed
*Failed tests:*
{noformat}
TestSSL - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_varchar_udf1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5028/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5028/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5028/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751613 - PreCommit-HIVE-TRUNK-Build

> Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())
> 
>
> Key: HIVE-11600
> URL: https://issues.apache.org/jira/browse/HIVE-11600
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11600.01.patch, HIVE-11600.02.patch, 
> HIVE-11600.03.patch
>
>
> Current hive only support single column in clause, e.g., 
> {code}select * from src where  col0 in (v1,v2,v3);{code}
> We want it to support 
> {code}select * from src where (col0,col1+3) in 
> ((col0+v1,v2),(v3,v4-col1));{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11613) schematool should return non zero exit status for info command, if state is inconsistent

2015-08-21 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706377#comment-14706377
 ] 

Prasad Mujumdar commented on HIVE-11613:


+1 for both patches

> schematool should return non zero exit status for info command, if state is 
> inconsistent
> 
>
> Key: HIVE-11613
> URL: https://issues.apache.org/jira/browse/HIVE-11613
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.0.0, 1.1.1, 1.2.1
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-11613-1.0.patch, HIVE-11613.1.patch
>
>
> schematool -info just prints the version information, but it is not easy to 
> consume the validity of the state from a tool as the exit code is 0 even if 
> the schema version has mismatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11526) LLAP: implement LLAP UI as a separate service

2015-08-21 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706437#comment-14706437
 ] 

Kai Sasaki commented on HIVE-11526:
---

[~sershe]
I'm working now to create LLAP Monitor on YARN container through Slider. 
Regarding to this, I have a question.
{{LlapDaemon}} seems use {{LlapWebServices}}. This class provides a service 
through http. And {{llap-server/src/main/resources/webapps/llap}} seems to be 
prepared for web apps which will be provided by LLAP Monitor or LLAP but this 
has not been used yet. 
In terms of LLAP Monitor, how can we use this {{LlapWebServices}} to provide 
resources under {{llap-server/src/main/resources/webapps/llap}}? Or is it a 
requirement of using {{LlapWebServices}} class even in LLAP Monitor daemon?

> LLAP: implement LLAP UI as a separate service
> -
>
> Key: HIVE-11526
> URL: https://issues.apache.org/jira/browse/HIVE-11526
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Kai Sasaki
>
> The specifics are vague at this point. 
> Hadoop metrics can be output, as well as metrics we collect and output in 
> jmx, as well as those we collect per fragment and log right now. 
> This service can do LLAP-specific views, and per-query aggregation.
> [~gopalv] may have some information on how to reuse existing solutions for 
> part of the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11383) Upgrade Hive to Calcite 1.4

2015-08-21 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706454#comment-14706454
 ] 

Jesus Camacho Rodriguez commented on HIVE-11383:


[~ashutoshc], thanks.

I logged CALCITE-811, CALCITE-826, CALCITE-834, and CALCITE-850 to address 
issues discovered running QA in Hive. Once CALCITE-850 goes in (hopefully 
today), I'll ask for a new snapshot, and I'll trigger a new Hive QA run. I 
don't expect more issues, so then we would be ready to move to 1.4 when it is 
released.

> Upgrade Hive to Calcite 1.4
> ---
>
> Key: HIVE-11383
> URL: https://issues.apache.org/jira/browse/HIVE-11383
> Project: Hive
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11383.1.patch, HIVE-11383.10.patch, 
> HIVE-11383.2.patch, HIVE-11383.3.patch, HIVE-11383.3.patch, 
> HIVE-11383.3.patch, HIVE-11383.4.patch, HIVE-11383.5.patch, 
> HIVE-11383.6.patch, HIVE-11383.7.patch, HIVE-11383.8.patch, 
> HIVE-11383.8.patch, HIVE-11383.9.patch
>
>
> CLEAR LIBRARY CACHE
> Upgrade Hive to Calcite 1.4.0-incubating.
> There is currently a snapshot release, which is close to what will be in 1.4. 
> I have checked that Hive compiles against the new snapshot, fixing one issue. 
> The patch is attached.
> Next step is to validate that Hive runs against the new Calcite, and post any 
> issues to the Calcite list or log Calcite Jira cases. [~jcamachorodriguez], 
> can you please do that.
> [~pxiong], I gather you are dependent on CALCITE-814, which will be fixed in 
> the new Calcite version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11544) LazyInteger should avoid throwing NumberFormatException

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706466#comment-14706466
 ] 

Hive QA commented on HIVE-11544:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12750387/HIVE-11544.1.patch

{color:green}SUCCESS:{color} +1 9371 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5029/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5029/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5029/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12750387 - PreCommit-HIVE-TRUNK-Build

> LazyInteger should avoid throwing NumberFormatException
> ---
>
> Key: HIVE-11544
> URL: https://issues.apache.org/jira/browse/HIVE-11544
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.14.0, 1.2.0, 1.3.0, 2.0.0
>Reporter: William Slacum
>Assignee: Gopal V
>Priority: Minor
>  Labels: Performance
> Attachments: HIVE-11544.1.patch
>
>
> {{LazyInteger#parseInt}} will throw a {{NumberFormatException}} under these 
> conditions:
> # bytes are null
> # radix is invalid
> # length is 0
> # the string is '+' or '-'
> # {{LazyInteger#parse}} throws a {{NumberFormatException}}
> Most of the time, such as in {{LazyInteger#init}} and {{LazyByte#init}}, the 
> exception is caught, swallowed, and {{isNull}} is set to {{true}}.
> This is generally a bad workflow, as exception creation is a performance 
> bottleneck, and potentially repeating for many rows in a query can have a 
> drastic performance consequence.
> It would be better if this method returned an {{Optional}}, which 
> would provide similar functionality with a higher throughput rate.
> I've tested against 0.14.0, and saw that the logic is unchanged in 1.2.0, so 
> I've marked those as affected. Any version in between would also suffer from 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11526) LLAP: implement LLAP UI as a separate service

2015-08-21 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki updated HIVE-11526:
--
Attachment: llap_monitor_design.pdf

> LLAP: implement LLAP UI as a separate service
> -
>
> Key: HIVE-11526
> URL: https://issues.apache.org/jira/browse/HIVE-11526
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Kai Sasaki
> Attachments: llap_monitor_design.pdf
>
>
> The specifics are vague at this point. 
> Hadoop metrics can be output, as well as metrics we collect and output in 
> jmx, as well as those we collect per fragment and log right now. 
> This service can do LLAP-specific views, and per-query aggregation.
> [~gopalv] may have some information on how to reuse existing solutions for 
> part of the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706519#comment-14706519
 ] 

Jesus Camacho Rodriguez commented on HIVE-11604:


[~ychena], the patch seems like a workaround to me.

Through the JIRA case I assume that the problem cannot be reproduced in master? 
How was it fixed?
If you compare master vs any of the branches where the problem appears:
1) Does the RowSchema out of the PTF differs in master vs branch?
2) If it doesn't, does {{isIdentityProject}} method in SelectOperator returns a 
different result in master vs branch?

Thanks

> HIVE return wrong results in some queries with PTF function
> ---
>
> Key: HIVE-11604
> URL: https://issues.apache.org/jira/browse/HIVE-11604
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.0, 1.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11604.1.patch
>
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, 
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> 
> INFO  : Ended Job = job_local1070163923_0017
> +-+---+---+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> +-+---+---+--+
> 0: jdbc:hive2://localhost:1> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey 
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:1> from (
> 0: jdbc:hive2://localhost:1> select id, fkey 
> 0: jdbc:hive2://localhost:1> from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:1>  ) ddd 
> 0: jdbc:hive2://localhost:1> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +-+---+---+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> | 100 | 1 | key1  |
> | 200 | 1 | key1  |
> +-+---+---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11615) Create test for max thrift message setting

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706555#comment-14706555
 ] 

Hive QA commented on HIVE-11615:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751633/HIVE-11615.1.patch

{color:green}SUCCESS:{color} +1 9371 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5030/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5030/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5030/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751633 - PreCommit-HIVE-TRUNK-Build

> Create test for max thrift message setting
> --
>
> Key: HIVE-11615
> URL: https://issues.apache.org/jira/browse/HIVE-11615
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11615.1.patch
>
>
> Create a test case for HIVE-8680



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706577#comment-14706577
 ] 

Yongzhi Chen commented on HIVE-11604:
-

[~jcamachorodriguez], the problem is reproducible in master (My master is 
around 1 week old though). The fix follows the similar pattern in other cases 
in ProjectRemover, for example:
{noformat}
  Operator parent = parents.get(0);
  if (parent instanceof ReduceSinkOperator && 
Iterators.any(sel.getChildOperators().iterator(),
  Predicates.instanceOf(ReduceSinkOperator.class))) {
// For RS-SEL-RS case. reducer operator in reducer task cannot be null 
in task compiler
return null;
  }
{noformat}
For the PTF case, it need a select operator follows it. We can add other cases 
before return null if they fall into the same category. 

> HIVE return wrong results in some queries with PTF function
> ---
>
> Key: HIVE-11604
> URL: https://issues.apache.org/jira/browse/HIVE-11604
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.0, 1.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11604.1.patch
>
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, 
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> 
> INFO  : Ended Job = job_local1070163923_0017
> +-+---+---+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> +-+---+---+--+
> 0: jdbc:hive2://localhost:1> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey 
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:1> from (
> 0: jdbc:hive2://localhost:1> select id, fkey 
> 0: jdbc:hive2://localhost:1> from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:1>  ) ddd 
> 0: jdbc:hive2://localhost:1> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +-+---+---+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> | 100 | 1 | key1  |
> | 200 | 1 | key1  |
> +-+---+---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11604:

Affects Version/s: 2.0.0

> HIVE return wrong results in some queries with PTF function
> ---
>
> Key: HIVE-11604
> URL: https://issues.apache.org/jira/browse/HIVE-11604
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11604.1.patch
>
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, 
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> 
> INFO  : Ended Job = job_local1070163923_0017
> +-+---+---+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> +-+---+---+--+
> 0: jdbc:hive2://localhost:1> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey 
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:1> from (
> 0: jdbc:hive2://localhost:1> select id, fkey 
> 0: jdbc:hive2://localhost:1> from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:1>  ) ddd 
> 0: jdbc:hive2://localhost:1> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +-+---+---+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> | 100 | 1 | key1  |
> | 200 | 1 | key1  |
> +-+---+---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11617) Explain plan for multiple lateral views is very slow

2015-08-21 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706690#comment-14706690
 ] 

Aihua Xu commented on HIVE-11617:
-

For lateral view, there will be one parent with 2 children and one child with 2 
parents. For such case, seems PreOrderWalker could need to visit same nodes 
multiple times (including all the descendants). That seems to cause the job to 
run for ever. I will investigate to switch to a different Walker (something 
like level order walker). 

> Explain plan for multiple lateral views is very slow
> 
>
> Key: HIVE-11617
> URL: https://issues.apache.org/jira/browse/HIVE-11617
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> The following explain job will be very slow or never finish if there are many 
> lateral views involved. High CPU usage is also noticed.
> {noformat}
> EXPLAIN
> SELECT
> *
> from
> (
> SELECT * FROM table1 
> ) x
> LATERAL VIEW json_tuple(...) x1 
> LATERAL VIEW json_tuple(...) x2 
> ...
> {noformat}
> From jstack, the job is busy with preorder tree traverse. 
> {noformat}
> at java.util.regex.Matcher.getTextLength(Matcher.java:1234)
> at java.util.regex.Matcher.reset(Matcher.java:308)
> at java.util.regex.Matcher.(Matcher.java:228)
> at java.util.regex.Pattern.matcher(Pattern.java:1088)
> at org.apache.hadoop.hive.ql.lib.RuleRegExp.cost(RuleRegExp.java:67)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:72)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hi

[jira] [Commented] (HIVE-11540) HDP 2.3 and Flume 1.6: Hive Streaming – Too many delta files during Compaction

2015-08-21 Thread Nivin Mathew (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706691#comment-14706691
 ] 

Nivin Mathew commented on HIVE-11540:
-

Update from my side. I got the streaming working after changing the flume 
configs for transactions. So i dont get "too many files" now. 

> HDP 2.3 and Flume 1.6: Hive Streaming – Too many delta files during Compaction
> --
>
> Key: HIVE-11540
> URL: https://issues.apache.org/jira/browse/HIVE-11540
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Nivin Mathew
>Assignee: Alan Gates
>
> Hello,
> I am streaming weblogs to Kafka and then to Flume 1.6 using a Hive sink, with 
> an average of 20 million records a day. I have 5 compactors running at 
> various times (30m/5m/5s), no matter what time I give, the compactors seem to 
> run out of memory cleaning up a couple thousand delta files and ultimately 
> falls behind compacting/cleaning delta files. Any suggestions on what I can 
> do to improve performance? Or can Hive streaming not handle this kind of load?
> I used this post as reference: 
> http://henning.kropponline.de/2015/05/19/hivesink-for-flume/
> 2015-08-12 15:05:01,197 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: Direct buffer memory
> Max block location exceeded for split: CompactorInputSplit{base: 
> hdfs://Dev01HWNameService/user/hive/warehouse/weblogs.db/dt=15-08-12/base_1056406,
>  bucket: 0, length: 6493042, deltas: [delta_1056407_1056408, 
> delta_1056409_1056410, delta_1056411_1056412, delta_1056413_1056414, 
> delta_1056415_1056416, delta_1056417_1056418,…
> , delta_1074039_1074040, delta_1074041_1074042, delta_1074043_1074044, 
> delta_1074045_1074046, delta_1074047_1074048, delta_1074049_1074050, 
> delta_1074051_1074052]} splitsize: 8772 maxsize: 10
> 2015-08-12 15:34:25,271 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of 
> splits:3
> 2015-08-12 15:34:25,367 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting 
> tokens for job: job_1439397150426_0068
> 2015-08-12 15:34:25,603 INFO  [upladevhwd04v.researchnow.com-18]: 
> impl.YarnClientImpl (YarnClientImpl.java:submitApplication(274)) - Submitted 
> application application_1439397150426_0068
> 2015-08-12 15:34:25,610 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:submit(1294)) - The url to track the job: 
> http://upladevhwd02v.researchnow.com:8088/proxy/application_1439397150426_0068/
> 2015-08-12 15:34:25,611 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: 
> job_1439397150426_0068
> 2015-08-12 15:34:30,170 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:33,756 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job 
> job_1439397150426_0068 running in uber mode : false
> 2015-08-12 15:34:33,757 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 0% reduce 0%
> 2015-08-12 15:34:35,147 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:40,155 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:45,184 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:50,201 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:34:55,256 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:00,205 INFO  [Thread-7]: compactor.Initiator 
> (Initiator.java:run(88)) - Checking to see if we should compact 
> weblogs.vop_hs.dt=15-08-12
> 2015-08-12 15:35:02,975 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 33% reduce 0%
> 2015-08-12 15:35:02,982 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_00_0, Status : FAILED
> 2015-08-12 15:35:03,000 INFO  [upladevhwd04v.researchnow.com-18]: 
> mapreduce.Job (Job.java:printTaskEvents(1406)) - Task Id : 
> attempt_1439397150426_0068_m_01_0, Stat

[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)

2015-08-21 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706692#comment-14706692
 ] 

Aihua Xu commented on HIVE-11375:
-

[~ashutoshc] or [~csun] Can you help submit the patch? Thanks for reviewing the 
code, guys.

> Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)
> --
>
> Key: HIVE-11375
> URL: https://issues.apache.org/jira/browse/HIVE-11375
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Mariusz Sakowski
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11375.2.patch, HIVE-11375.3.patch, 
> HIVE-11375.4.patch, HIVE-11375.patch
>
>
> When running query like this:
> {code}explain select * from test where (val is not null and val <> 0);{code}
> hive will simplify expression in parenthesis and omit is not null check:
> {code}
>   Filter Operator
> predicate: (val <> 0) (type: boolean)
> {code}
> which is fine.
> but if we negate condition using NOT operator:
> {code}explain select * from test where not (val is not null and val <> 
> 0);{code}
> hive will also simplify thing, but now it will break stuff:
> {code}
>   Filter Operator
> predicate: (not (val <> 0)) (type: boolean)
> {code}
> because valid predicate should be *val == 0 or val is null*, while above row 
> is equivalent to *val == 0* only, filtering away rows where val is null
> simple example:
> {code}
> CREATE TABLE example (
> val bigint
> );
> INSERT INTO example VALUES (1), (NULL), (0);
> -- returns 2 rows - NULL and 0
> select * from example where (val is null or val == 0);
> -- returns 1 row - 0
> select * from example where not (val is not null and val <> 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner

2015-08-21 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706694#comment-14706694
 ] 

Navis commented on HIVE-11515:
--

[~sseth] Sorry for delay. 
I've seen this a month ago in PoC scenario sometimes and desperately made this 
patch in a hurry. After applying it, those things gone and I just forget it 
(there was so many issues). So I cannot remember what was the exact problem, 
but it seemed "query hang" situation I guess. Sorry for my vague description.

> Still some possible race condition in DynamicPartitionPruner
> 
>
> Key: HIVE-11515
> URL: https://issues.apache.org/jira/browse/HIVE-11515
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Tez
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-11515.1.patch.txt
>
>
> Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to 
> reproduce but it seemed related to the fact that prune() is called by 
> thread-pool. With some delay in queue, events from fast tasks are arrived 
> before prune() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11518) Provide interface to adjust required resource for tez tasks

2015-08-21 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706695#comment-14706695
 ] 

Navis commented on HIVE-11518:
--

[~hagleitn] Any interest on this? I could have assigned 4G for 20+ join map 
tasks by just assigning 1G for other simple tasks.

> Provide interface to adjust required resource for tez tasks
> ---
>
> Key: HIVE-11518
> URL: https://issues.apache.org/jira/browse/HIVE-11518
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-11518.1.patch.txt
>
>
> Resource requirements for each tasks are varied but currently it's fixed to 
> one value(via hive.tez.container.size). It would be good to customize 
> resource requirements appropriate to expected work.
> Suggested interface is quite simple.
> {code}
> public interface ResourceCalculator {
>   Resource adjust(Resource resource, MapWork mapWork);
>   Resource adjust(Resource resource, ReduceWork reduceWork);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706710#comment-14706710
 ] 

Hive QA commented on HIVE-11604:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751643/HIVE-11604.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9372 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5031/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5031/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5031/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751643 - PreCommit-HIVE-TRUNK-Build

> HIVE return wrong results in some queries with PTF function
> ---
>
> Key: HIVE-11604
> URL: https://issues.apache.org/jira/browse/HIVE-11604
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11604.1.patch
>
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, 
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> 
> INFO  : Ended Job = job_local1070163923_0017
> +-+---+---+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> +-+---+---+--+
> 0: jdbc:hive2://localhost:1> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey 
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:1> from (
> 0: jdbc:hive2://localhost:1> select id, fkey 
> 0: jdbc:hive2://localhost:1> from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:1>  ) ddd 
> 0: jdbc:hive2://localhost:1> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +-+---+---+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> | 100 | 1 | key1  |
> | 200 | 1 | key1  |
> +-+---+---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11573) PointLookupOptimizer can be pessimistic at a low nDV

2015-08-21 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706734#comment-14706734
 ] 

Jesus Camacho Rodriguez commented on HIVE-11573:


+1 pending QA run.

> PointLookupOptimizer can be pessimistic at a low nDV
> 
>
> Key: HIVE-11573
> URL: https://issues.apache.org/jira/browse/HIVE-11573
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: TODOC2.0
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11573.1.patch, HIVE-11573.2.patch, 
> HIVE-11573.3.patch
>
>
> The PointLookupOptimizer can turn off some of the optimizations due to its 
> use of tuple IN() clauses.
> Limit the application of the optimizer for very low nDV cases and extract the 
> sub-clause as a pre-condition during runtime, to trigger the simple column 
> predicate index lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11573) PointLookupOptimizer can be pessimistic at a low nDV

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706800#comment-14706800
 ] 

Hive QA commented on HIVE-11573:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751653/HIVE-11573.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9372 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pointlookup
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5032/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5032/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5032/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751653 - PreCommit-HIVE-TRUNK-Build

> PointLookupOptimizer can be pessimistic at a low nDV
> 
>
> Key: HIVE-11573
> URL: https://issues.apache.org/jira/browse/HIVE-11573
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: TODOC2.0
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11573.1.patch, HIVE-11573.2.patch, 
> HIVE-11573.3.patch
>
>
> The PointLookupOptimizer can turn off some of the optimizations due to its 
> use of tuple IN() clauses.
> Limit the application of the optimizer for very low nDV cases and extract the 
> sub-clause as a pre-condition during runtime, to trigger the simple column 
> predicate index lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11604:

Attachment: HIVE-11604.2.patch

> HIVE return wrong results in some queries with PTF function
> ---
>
> Key: HIVE-11604
> URL: https://issues.apache.org/jira/browse/HIVE-11604
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch
>
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, 
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> 
> INFO  : Ended Job = job_local1070163923_0017
> +-+---+---+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> +-+---+---+--+
> 0: jdbc:hive2://localhost:1> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey 
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:1> from (
> 0: jdbc:hive2://localhost:1> select id, fkey 
> 0: jdbc:hive2://localhost:1> from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:1>  ) ddd 
> 0: jdbc:hive2://localhost:1> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +-+---+---+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> | 100 | 1 | key1  |
> | 200 | 1 | key1  |
> +-+---+---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706832#comment-14706832
 ] 

Yongzhi Chen commented on HIVE-11604:
-

My fix will change query plan a little bit in some cases(especially wrong 
results return without the fixes cases). 
The failure is because of an extra select operator in the query plan, it should 
be no harm: it does correct RS_5's output column
to the right sequence, although it might not be so  important in this scenario. 

Attach second patch to change the test output, and add more test cases which 
succeed in master even without my fix to catch possible regressions in the 
future. 

In current master:
This query returns wrong results:
{noformat}
select ddd.id, ddd.fkey, aaa.name
from (
select id, fkey, 
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;
{noformat}

while following returns right value(only different is the extra ddd.rnum):
{noformat}
select ddd.id, ddd.fkey, aaa.name, ddd.rnum
from (
select id, fkey, 
row_number() over (partition by id, fkey) as rnum
from tlb1 group by id, fkey
 ) ddd 
inner join tlb2 aaa on aaa.fid = ddd.fkey;
{noformat}


> HIVE return wrong results in some queries with PTF function
> ---
>
> Key: HIVE-11604
> URL: https://issues.apache.org/jira/browse/HIVE-11604
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch
>
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, 
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> 
> INFO  : Ended Job = job_local1070163923_0017
> +-+---+---+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> +-+---+---+--+
> 0: jdbc:hive2://localhost:1> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey 
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:1> from (
> 0: jdbc:hive2://localhost:1> select id, fkey 
> 0: jdbc:hive2://localhost:1> from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:1>  ) ddd 
> 0: jdbc:hive2://localhost:1> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +-+---+---+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> | 100 | 1 | key1  |
> | 200 | 1 | key1  |
> +-+---+---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11607) Export tables broken for data > 32 MB

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706834#comment-14706834
 ] 

Sergio Peña commented on HIVE-11607:


Hi [~ashutoshc]

I added that code because there were some issues in CDH when compiling the 
code. CDH compiles against hadoop-2 even if it runs with MR1 or MR2. And the 
distcp for MR1 was failing to compile.

I see the problem is that the JAR for distcp is not downloaded by maven when 
using dynamic class loading.
I reviewed it, and it looks good. (+1)

Thanks for fixing this here.

> Export tables broken for data > 32 MB
> -
>
> Key: HIVE-11607
> URL: https://issues.apache.org/jira/browse/HIVE-11607
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 1.0.0, 1.2.0, 1.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11607.2.patch, HIVE-11607.3.patch, HIVE-11607.patch
>
>
> Broken for both hadoop-1 as well as hadoop-2 line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11544) LazyInteger should avoid throwing NumberFormatException

2015-08-21 Thread William Slacum (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706905#comment-14706905
 ] 

William Slacum commented on HIVE-11544:
---

Thanks, Gopal! Apologies for not getting to this sooner. The patch looks good 
and should solve my problem. I'll give writing a small benchmark a shot too.

> LazyInteger should avoid throwing NumberFormatException
> ---
>
> Key: HIVE-11544
> URL: https://issues.apache.org/jira/browse/HIVE-11544
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.14.0, 1.2.0, 1.3.0, 2.0.0
>Reporter: William Slacum
>Assignee: Gopal V
>Priority: Minor
>  Labels: Performance
> Attachments: HIVE-11544.1.patch
>
>
> {{LazyInteger#parseInt}} will throw a {{NumberFormatException}} under these 
> conditions:
> # bytes are null
> # radix is invalid
> # length is 0
> # the string is '+' or '-'
> # {{LazyInteger#parse}} throws a {{NumberFormatException}}
> Most of the time, such as in {{LazyInteger#init}} and {{LazyByte#init}}, the 
> exception is caught, swallowed, and {{isNull}} is set to {{true}}.
> This is generally a bad workflow, as exception creation is a performance 
> bottleneck, and potentially repeating for many rows in a query can have a 
> drastic performance consequence.
> It would be better if this method returned an {{Optional}}, which 
> would provide similar functionality with a higher throughput rate.
> I've tested against 0.14.0, and saw that the logic is unchanged in 1.2.0, so 
> I've marked those as affected. Any version in between would also suffer from 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10497) Upgrade hive branch to latest Tez

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706941#comment-14706941
 ] 

Hive QA commented on HIVE-10497:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751654/HIVE-10497.4.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9371 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_vector_dynpart_hashjoin_1
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5033/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5033/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5033/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751654 - PreCommit-HIVE-TRUNK-Build

> Upgrade hive branch to latest Tez
> -
>
> Key: HIVE-10497
> URL: https://issues.apache.org/jira/browse/HIVE-10497
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-10497.1.patch, HIVE-10497.1.patch, 
> HIVE-10497.2.patch, HIVE-10497.3.patch, HIVE-10497.4.patch
>
>
> Upgrade hive to the upcoming tez-0.7 release 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet

2015-08-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706981#comment-14706981
 ] 

Owen O'Malley commented on HIVE-11504:
--

The goal is simplicity.

For the types that are strict super-type, there is no point in distinguishing 
them. It will just make all of the users of the api add more switch statements.

So my point is the x = 1000 means exactly the same thing if x and 1000 are 
longs or integers. Note that this is completely different than if x were double 
or string.

> Predicate pushing down doesn't work for float type for Parquet
> --
>
> Key: HIVE-11504
> URL: https://issues.apache.org/jira/browse/HIVE-11504
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-11504.1.patch, HIVE-11504.2.patch, 
> HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch
>
>
> Predicate builder should use PrimitiveTypeName type in parquet side to 
> construct predicate leaf instead of the type provided by PredicateLeaf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet

2015-08-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706990#comment-14706990
 ] 

Owen O'Malley commented on HIVE-11504:
--

If you need to differentiate the types for parquet, please do that in the 
parquet bindings and not in the generic api. Hive of course gives you the 
precise types so the input format has the necessary information already.

> Predicate pushing down doesn't work for float type for Parquet
> --
>
> Key: HIVE-11504
> URL: https://issues.apache.org/jira/browse/HIVE-11504
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-11504.1.patch, HIVE-11504.2.patch, 
> HIVE-11504.2.patch, HIVE-11504.3.patch, HIVE-11504.patch
>
>
> Predicate builder should use PrimitiveTypeName type in parquet side to 
> construct predicate leaf instead of the type provided by PredicateLeaf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11525) Bucket pruning

2015-08-21 Thread Maciek Kocon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707006#comment-14707006
 ] 

Maciek Kocon commented on HIVE-11525:
-

It's not exactly the clone of HIVE-9523 as it doesn't cover another 
optimization proposal: [Sort Merge] PARTITION Map join

> Bucket pruning
> --
>
> Key: HIVE-11525
> URL: https://issues.apache.org/jira/browse/HIVE-11525
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.0
>Reporter: Maciek Kocon
>Assignee: Takuya Fukudome
>  Labels: gsoc2015
>
> Logically and functionally bucketing and partitioning are quite similar - 
> both provide mechanism to segregate and separate the table's data based on 
> its content. Thanks to that significant further optimisations like 
> [partition] PRUNING or [bucket] MAP JOIN are possible.
> The difference seems to be imposed by design where the PARTITIONing is 
> open/explicit while BUCKETing is discrete/implicit.
> Partitioning seems to be very common if not a standard feature in all current 
> RDBMS while BUCKETING seems to be HIVE specific only.
> In a way BUCKETING could be also called by "hashing" or simply "IMPLICIT 
> PARTITIONING".
> Regardless of the fact that these two are recognised as two separate features 
> available in Hive there should be nothing to prevent leveraging same existing 
> query/join optimisations across the two.
> BUCKET pruning
> Enable partition PRUNING equivalent optimisation for queries on BUCKETED 
> tables
> Simplest example is for queries like:
> "SELECT … FROM x WHERE colA=123123"
> to read only the relevant bucket file rather than all file-buckets that 
> belong to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8289) Exclude temp tables in compactor threads

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707043#comment-14707043
 ] 

Jakob Stengård commented on HIVE-8289:
--

Please vote for the issue.

> Exclude temp tables in compactor threads
> 
>
> Key: HIVE-8289
> URL: https://issues.apache.org/jira/browse/HIVE-8289
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Priority: Critical
>
> Currently, compactor thread try to compact temp table.
> This throws errors like this one :
> {noformat}
> 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
> (Initiator.java:run(111)) - Caught exception while trying to determine if we 
> should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
> repeated failures, java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)
> 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
> (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
> record
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8289) Exclude temp tables in compactor threads

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707049#comment-14707049
 ] 

Jakob Stengård commented on HIVE-8289:
--

Please vote for the issue.

> Exclude temp tables in compactor threads
> 
>
> Key: HIVE-8289
> URL: https://issues.apache.org/jira/browse/HIVE-8289
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Priority: Critical
>
> Currently, compactor thread try to compact temp table.
> This throws errors like this one :
> {noformat}
> 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
> (Initiator.java:run(111)) - Caught exception while trying to determine if we 
> should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
> repeated failures, java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)
> 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
> (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
> record
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8289) Exclude temp tables in compactor threads

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707050#comment-14707050
 ] 

Jakob Stengård commented on HIVE-8289:
--

Please vote for the issue.

> Exclude temp tables in compactor threads
> 
>
> Key: HIVE-8289
> URL: https://issues.apache.org/jira/browse/HIVE-8289
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Priority: Critical
>
> Currently, compactor thread try to compact temp table.
> This throws errors like this one :
> {noformat}
> 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
> (Initiator.java:run(111)) - Caught exception while trying to determine if we 
> should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
> repeated failures, java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)
> 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
> (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
> record
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8289) Exclude temp tables in compactor threads

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707048#comment-14707048
 ] 

Jakob Stengård commented on HIVE-8289:
--

Please vote for the issue.

> Exclude temp tables in compactor threads
> 
>
> Key: HIVE-8289
> URL: https://issues.apache.org/jira/browse/HIVE-8289
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Priority: Critical
>
> Currently, compactor thread try to compact temp table.
> This throws errors like this one :
> {noformat}
> 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
> (Initiator.java:run(111)) - Caught exception while trying to determine if we 
> should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
> repeated failures, java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)
> 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
> (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
> record
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8289) Exclude temp tables in compactor threads

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707047#comment-14707047
 ] 

Jakob Stengård commented on HIVE-8289:
--

Please vote for the issue.

> Exclude temp tables in compactor threads
> 
>
> Key: HIVE-8289
> URL: https://issues.apache.org/jira/browse/HIVE-8289
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Priority: Critical
>
> Currently, compactor thread try to compact temp table.
> This throws errors like this one :
> {noformat}
> 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
> (Initiator.java:run(111)) - Caught exception while trying to determine if we 
> should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
> repeated failures, java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)
> 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
> (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
> record
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8289) Exclude temp tables in compactor threads

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707045#comment-14707045
 ] 

Jakob Stengård commented on HIVE-8289:
--

Please vote for the issue.

> Exclude temp tables in compactor threads
> 
>
> Key: HIVE-8289
> URL: https://issues.apache.org/jira/browse/HIVE-8289
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Priority: Critical
>
> Currently, compactor thread try to compact temp table.
> This throws errors like this one :
> {noformat}
> 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
> (Initiator.java:run(111)) - Caught exception while trying to determine if we 
> should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
> repeated failures, java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)
> 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
> (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
> record
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8289) Exclude temp tables in compactor threads

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707046#comment-14707046
 ] 

Jakob Stengård commented on HIVE-8289:
--

Please vote for the issue.

> Exclude temp tables in compactor threads
> 
>
> Key: HIVE-8289
> URL: https://issues.apache.org/jira/browse/HIVE-8289
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Priority: Critical
>
> Currently, compactor thread try to compact temp table.
> This throws errors like this one :
> {noformat}
> 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
> (Initiator.java:run(111)) - Caught exception while trying to determine if we 
> should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
> repeated failures, java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)
> 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
> (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
> record
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8289) Exclude temp tables in compactor threads

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707044#comment-14707044
 ] 

Jakob Stengård commented on HIVE-8289:
--

Please vote for the issue.

> Exclude temp tables in compactor threads
> 
>
> Key: HIVE-8289
> URL: https://issues.apache.org/jira/browse/HIVE-8289
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Priority: Critical
>
> Currently, compactor thread try to compact temp table.
> This throws errors like this one :
> {noformat}
> 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
> (Initiator.java:run(111)) - Caught exception while trying to determine if we 
> should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
> repeated failures, java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)
> 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
> (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
> record
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8289) Exclude temp tables in compactor threads

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707042#comment-14707042
 ] 

Jakob Stengård commented on HIVE-8289:
--

Please vote for the issue.

> Exclude temp tables in compactor threads
> 
>
> Key: HIVE-8289
> URL: https://issues.apache.org/jira/browse/HIVE-8289
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Priority: Critical
>
> Currently, compactor thread try to compact temp table.
> This throws errors like this one :
> {noformat}
> 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
> (Initiator.java:run(111)) - Caught exception while trying to determine if we 
> should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
> repeated failures, java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)
> 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
> (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
> record
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup & Test for hive.script.operator.env.blacklist

2015-08-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707051#comment-14707051
 ] 

Alan Gates commented on HIVE-8583:
--

+1, again.

> HIVE-8341 Cleanup & Test for hive.script.operator.env.blacklist
> ---
>
> Key: HIVE-8583
> URL: https://issues.apache.org/jira/browse/HIVE-8583
> Project: Hive
>  Issue Type: Improvement
>Reporter: Lars Francke
>Assignee: Lars Francke
>Priority: Minor
> Attachments: HIVE-8583.1.patch, HIVE-8583.2.patch, HIVE-8583.3.patch, 
> HIVE-8583.4.patch, HIVE-8583.5.patch
>
>
> [~alangates] added the following in HIVE-8341:
> {code}
> String bl = 
> hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString());
> if (bl != null && bl.length() > 0) {
>   String[] bls = bl.split(",");
>   for (String b : bls) {
> b.replaceAll(".", "_");
> blackListedConfEntries.add(b);
>   }
> }
> {code}
> The {{replaceAll}} call is confusing as its result is not used at all.
> This patch contains the following:
> * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST
> * Lets blackListed take a Configuration job as parameter which allowed me to 
> add a test for this
> * Tabs to Spaces conversion



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11607) Export tables broken for data > 32 MB

2015-08-21 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11607:

Fix Version/s: 2.0.0

> Export tables broken for data > 32 MB
> -
>
> Key: HIVE-11607
> URL: https://issues.apache.org/jira/browse/HIVE-11607
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 1.0.0, 1.2.0, 1.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.0.0
>
> Attachments: HIVE-11607.2.patch, HIVE-11607.3.patch, HIVE-11607.patch
>
>
> Broken for both hadoop-1 as well as hadoop-2 line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8007) Clean up Thrift definitions

2015-08-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707095#comment-14707095
 ] 

Alan Gates commented on HIVE-8007:
--

bq. I can't get Thrift 0.9 to build on my Mac so I'm using Hive QA to verify 
that everything still works.
Did you regenerate the files java etc. files with thrift?  If not, the tests 
are testing your changes.  The standard build doesn't rebuilt the generated 
files.  We actually check those in, since thrift is such a pain to use.

> Clean up Thrift definitions
> ---
>
> Key: HIVE-8007
> URL: https://issues.apache.org/jira/browse/HIVE-8007
> Project: Hive
>  Issue Type: Improvement
>Reporter: Lars Francke
>Assignee: Lars Francke
>Priority: Minor
> Attachments: HIVE-8007.1.patch, HIVE-8007.2.patch, HIVE-8007.3.patch
>
>
> This patch changes the following:
> * Currently the thrift file uses {{//}} to denote comments. Thrift 
> understands the {{/** ... */}} syntax and converts that into documentation in 
> the generated code. This patch changes the syntax
> * Change tabs to spaces
> * Consistent indentation
> * Minor whitespace and/or formatting issues
> There should be no changes to functionality at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11611) A bad performance regression issue with Parquet happens if Hive does not select any columns

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707143#comment-14707143
 ] 

Hive QA commented on HIVE-11611:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751658/HIVE-11611.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9371 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5034/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5034/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5034/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751658 - PreCommit-HIVE-TRUNK-Build

> A bad performance regression issue with Parquet happens if Hive does not 
> select any columns
> ---
>
> Key: HIVE-11611
> URL: https://issues.apache.org/jira/browse/HIVE-11611
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Sergio Peña
>Assignee: Ferdinand Xu
> Attachments: HIVE-11611.patch
>
>
> A possible performance issue may happen with the below code when using a 
> query like this {{SELECT count(1) FROM parquetTable}}.
> {code}
> if (!ColumnProjectionUtils.isReadAllColumns(configuration) && 
> !indexColumnsWanted.isEmpty()) {
> MessageType requestedSchemaByUser =
> getSchemaByIndex(tableSchema, columnNamesList, 
> indexColumnsWanted);
> return new ReadContext(requestedSchemaByUser, contextMetadata);
> } else {
>   return new ReadContext(tableSchema, contextMetadata);
> }
> {code}
> If there are not columns nor indexes selected, then the above code will read 
> the full schema from Parquet even if Hive does not do anything with such 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11586) ObjectInspectorFactory.getReflectionObjectInspector is not thread-safe

2015-08-21 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707147#comment-14707147
 ] 

Szehon Ho commented on HIVE-11586:
--

The concerns are addressed, +1

> ObjectInspectorFactory.getReflectionObjectInspector is not thread-safe
> --
>
> Key: HIVE-11586
> URL: https://issues.apache.org/jira/browse/HIVE-11586
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11586.1.patch, HIVE-11586.2.patch
>
>
> ObjectInspectorFactory#getReflectionObjectInspectorNoCache addes newly create 
> object inspector to the cache before calling its init() method, to allow 
> reusing the cache when dealing with recursive types. So a second thread can 
> then call getReflectionObjectInspector and fetch an uninitialized instance of 
> ReflectionStructObjectInspector.
> Another issue is that if two threads calls 
> ObjectInspectorFactory.getReflectionObjectInspector at the same time. One 
> thread could get an object inspector not in the cache, i.e. they could both 
> call getReflectionObjectInspectorNoCache() but only one will put the new 
> object inspector to cache successfully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11544) LazyInteger should avoid throwing NumberFormatException

2015-08-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707151#comment-14707151
 ] 

Gopal V commented on HIVE-11544:


[~bills]: Would be nice - once you have a preliminary benchmark, I can write a 
more rigorous test-case (i.e test every 0, 1 & 2 byte pattern against the new 
impl vs old Impl).

> LazyInteger should avoid throwing NumberFormatException
> ---
>
> Key: HIVE-11544
> URL: https://issues.apache.org/jira/browse/HIVE-11544
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.14.0, 1.2.0, 1.3.0, 2.0.0
>Reporter: William Slacum
>Assignee: Gopal V
>Priority: Minor
>  Labels: Performance
> Attachments: HIVE-11544.1.patch
>
>
> {{LazyInteger#parseInt}} will throw a {{NumberFormatException}} under these 
> conditions:
> # bytes are null
> # radix is invalid
> # length is 0
> # the string is '+' or '-'
> # {{LazyInteger#parse}} throws a {{NumberFormatException}}
> Most of the time, such as in {{LazyInteger#init}} and {{LazyByte#init}}, the 
> exception is caught, swallowed, and {{isNull}} is set to {{true}}.
> This is generally a bad workflow, as exception creation is a performance 
> bottleneck, and potentially repeating for many rows in a query can have a 
> drastic performance consequence.
> It would be better if this method returned an {{Optional}}, which 
> would provide similar functionality with a higher throughput rate.
> I've tested against 0.14.0, and saw that the logic is unchanged in 1.2.0, so 
> I've marked those as affected. Any version in between would also suffer from 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)

2015-08-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707170#comment-14707170
 ] 

Ashutosh Chauhan commented on HIVE-11375:
-

I think it will be good to push it to branch-1 as well since this has potential 
to generate incorrect results without any warning. [~aihuaxu] If you can 
generate branch-1 patch, that will be great.

> Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)
> --
>
> Key: HIVE-11375
> URL: https://issues.apache.org/jira/browse/HIVE-11375
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Mariusz Sakowski
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11375.2.patch, HIVE-11375.3.patch, 
> HIVE-11375.4.patch, HIVE-11375.patch
>
>
> When running query like this:
> {code}explain select * from test where (val is not null and val <> 0);{code}
> hive will simplify expression in parenthesis and omit is not null check:
> {code}
>   Filter Operator
> predicate: (val <> 0) (type: boolean)
> {code}
> which is fine.
> but if we negate condition using NOT operator:
> {code}explain select * from test where not (val is not null and val <> 
> 0);{code}
> hive will also simplify thing, but now it will break stuff:
> {code}
>   Filter Operator
> predicate: (not (val <> 0)) (type: boolean)
> {code}
> because valid predicate should be *val == 0 or val is null*, while above row 
> is equivalent to *val == 0* only, filtering away rows where val is null
> simple example:
> {code}
> CREATE TABLE example (
> val bigint
> );
> INSERT INTO example VALUES (1), (NULL), (0);
> -- returns 2 rows - NULL and 0
> select * from example where (val is null or val == 0);
> -- returns 1 row - 0
> select * from example where not (val is not null and val <> 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707326#comment-14707326
 ] 

Hive QA commented on HIVE-11614:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751666/HIVE-11614.01.patch

{color:red}ERROR:{color} -1 due to 121 failed/errored test(s), 9373 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fouter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_gby_star
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_innerjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join40
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_alt_syntax
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_merging
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown_negative
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mergejoins
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optional_outer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_outer_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_print_header
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_query_properties
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_regex_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_router_join_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unionDistinct_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
or

[jira] [Updated] (HIVE-11617) Explain plan for multiple lateral views is very slow

2015-08-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11617:

Attachment: HIVE-11617.patch

> Explain plan for multiple lateral views is very slow
> 
>
> Key: HIVE-11617
> URL: https://issues.apache.org/jira/browse/HIVE-11617
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11617.patch
>
>
> The following explain job will be very slow or never finish if there are many 
> lateral views involved. High CPU usage is also noticed.
> {noformat}
> EXPLAIN
> SELECT
> *
> from
> (
> SELECT * FROM table1 
> ) x
> LATERAL VIEW json_tuple(...) x1 
> LATERAL VIEW json_tuple(...) x2 
> ...
> {noformat}
> From jstack, the job is busy with preorder tree traverse. 
> {noformat}
> at java.util.regex.Matcher.getTextLength(Matcher.java:1234)
> at java.util.regex.Matcher.reset(Matcher.java:308)
> at java.util.regex.Matcher.(Matcher.java:228)
> at java.util.regex.Pattern.matcher(Pattern.java:1088)
> at org.apache.hadoop.hive.ql.lib.RuleRegExp.cost(RuleRegExp.java:67)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:72)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
>

[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)

2015-08-21 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707405#comment-14707405
 ] 

Aihua Xu commented on HIVE-11375:
-

Seems we need to backport HIVE-11398 to branch-1 first before checkin this 
patch. Can you guys backport that first?

> Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)
> --
>
> Key: HIVE-11375
> URL: https://issues.apache.org/jira/browse/HIVE-11375
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Mariusz Sakowski
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11375.2.patch, HIVE-11375.3.patch, 
> HIVE-11375.4.patch, HIVE-11375.patch
>
>
> When running query like this:
> {code}explain select * from test where (val is not null and val <> 0);{code}
> hive will simplify expression in parenthesis and omit is not null check:
> {code}
>   Filter Operator
> predicate: (val <> 0) (type: boolean)
> {code}
> which is fine.
> but if we negate condition using NOT operator:
> {code}explain select * from test where not (val is not null and val <> 
> 0);{code}
> hive will also simplify thing, but now it will break stuff:
> {code}
>   Filter Operator
> predicate: (not (val <> 0)) (type: boolean)
> {code}
> because valid predicate should be *val == 0 or val is null*, while above row 
> is equivalent to *val == 0* only, filtering away rows where val is null
> simple example:
> {code}
> CREATE TABLE example (
> val bigint
> );
> INSERT INTO example VALUES (1), (NULL), (0);
> -- returns 2 rows - NULL and 0
> select * from example where (val is null or val == 0);
> -- returns 1 row - 0
> select * from example where not (val is not null and val <> 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11620) Fix several qtest output order

2015-08-21 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-11620:
---
Attachment: HIVE-11620.1.patch

> Fix several qtest output order
> --
>
> Key: HIVE-11620
> URL: https://issues.apache.org/jira/browse/HIVE-11620
> Project: Hive
>  Issue Type: Test
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11620.1.patch
>
>
> selectDistinctStar.q
> unionall_unbalancedppd.q
> vector_cast_constant.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-08-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707416#comment-14707416
 ] 

Pengcheng Xiong commented on HIVE-11614:


seems like most of the test failures are related to the different golden files. 
[~jpullokkaran], could you please take a look and give some suggestions? Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after 
> order by has problem
> -
>
> Key: HIVE-11614
> URL: https://issues.apache.org/jira/browse/HIVE-11614
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11614.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11445) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : groupby distinct does not work

2015-08-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707417#comment-14707417
 ] 

Pengcheng Xiong commented on HIVE-11445:


[~jpullokkaran], here is one more issue that needs your suggestion. If you take 
a look at my temporary patch, you will find out the location of the problem. My 
concern is that, since there is no comments around the code (that you wrote), I 
am not sure my fix is good enough...Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : groupby 
> distinct does not work
> -
>
> Key: HIVE-11445
> URL: https://issues.apache.org/jira/browse/HIVE-11445
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11445.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)

2015-08-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707419#comment-14707419
 ] 

Ashutosh Chauhan commented on HIVE-11375:
-

HIVE-11398 is a new performance enhancement feature, I am not sure whether its 
a good idea to backport such changes.

> Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)
> --
>
> Key: HIVE-11375
> URL: https://issues.apache.org/jira/browse/HIVE-11375
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Mariusz Sakowski
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11375.2.patch, HIVE-11375.3.patch, 
> HIVE-11375.4.patch, HIVE-11375.patch
>
>
> When running query like this:
> {code}explain select * from test where (val is not null and val <> 0);{code}
> hive will simplify expression in parenthesis and omit is not null check:
> {code}
>   Filter Operator
> predicate: (val <> 0) (type: boolean)
> {code}
> which is fine.
> but if we negate condition using NOT operator:
> {code}explain select * from test where not (val is not null and val <> 
> 0);{code}
> hive will also simplify thing, but now it will break stuff:
> {code}
>   Filter Operator
> predicate: (not (val <> 0)) (type: boolean)
> {code}
> because valid predicate should be *val == 0 or val is null*, while above row 
> is equivalent to *val == 0* only, filtering away rows where val is null
> simple example:
> {code}
> CREATE TABLE example (
> val bigint
> );
> INSERT INTO example VALUES (1), (NULL), (0);
> -- returns 2 rows - NULL and 0
> select * from example where (val is null or val == 0);
> -- returns 1 row - 0
> select * from example where not (val is not null and val <> 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11355) Hive on tez: memory manager for sort buffers (input/output) and operators

2015-08-21 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11355:
--
Attachment: HIVE-11355.4.patch

Rebased.

> Hive on tez: memory manager for sort buffers (input/output) and operators
> -
>
> Key: HIVE-11355
> URL: https://issues.apache.org/jira/browse/HIVE-11355
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11355.1.patch, HIVE-11355.2.patch, 
> HIVE-11355.3.patch, HIVE-11355.4.patch
>
>
> We need to better manage the sort buffer allocations to ensure better 
> performance. Also, we need to provide configurations to certain operators to 
> stay within memory limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11611) A bad performance regression issue with Parquet happens if Hive does not select any columns

2015-08-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707433#comment-14707433
 ] 

Sergio Peña commented on HIVE-11611:


[~Ferd]. Thanks for working on this.

[~rdblue] I see you fixed the validation on PARQUET-363. If we now pass an 
empty schema to Parquet, how is 'select count(1)' going to count the # of 
records if Parquet does not return any data? I am not sure how Parquet handles 
an empty schema, could you help me understand this? Do you think that 
requesting at least 1 column to Parquet will allow Hive to count the records 
(See [~Ferd] patch)?

> A bad performance regression issue with Parquet happens if Hive does not 
> select any columns
> ---
>
> Key: HIVE-11611
> URL: https://issues.apache.org/jira/browse/HIVE-11611
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Sergio Peña
>Assignee: Ferdinand Xu
> Attachments: HIVE-11611.patch
>
>
> A possible performance issue may happen with the below code when using a 
> query like this {{SELECT count(1) FROM parquetTable}}.
> {code}
> if (!ColumnProjectionUtils.isReadAllColumns(configuration) && 
> !indexColumnsWanted.isEmpty()) {
> MessageType requestedSchemaByUser =
> getSchemaByIndex(tableSchema, columnNamesList, 
> indexColumnsWanted);
> return new ReadContext(requestedSchemaByUser, contextMetadata);
> } else {
>   return new ReadContext(tableSchema, contextMetadata);
> }
> {code}
> If there are not columns nor indexes selected, then the above code will read 
> the full schema from Parquet even if Hive does not do anything with such 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)

2015-08-21 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707434#comment-14707434
 ] 

Aihua Xu commented on HIVE-11375:
-

If we can't backport that, then I need to regenerate the baselines for some 
tests. I will do that later.

> Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)
> --
>
> Key: HIVE-11375
> URL: https://issues.apache.org/jira/browse/HIVE-11375
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Mariusz Sakowski
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11375.2.patch, HIVE-11375.3.patch, 
> HIVE-11375.4.patch, HIVE-11375.patch
>
>
> When running query like this:
> {code}explain select * from test where (val is not null and val <> 0);{code}
> hive will simplify expression in parenthesis and omit is not null check:
> {code}
>   Filter Operator
> predicate: (val <> 0) (type: boolean)
> {code}
> which is fine.
> but if we negate condition using NOT operator:
> {code}explain select * from test where not (val is not null and val <> 
> 0);{code}
> hive will also simplify thing, but now it will break stuff:
> {code}
>   Filter Operator
> predicate: (not (val <> 0)) (type: boolean)
> {code}
> because valid predicate should be *val == 0 or val is null*, while above row 
> is equivalent to *val == 0* only, filtering away rows where val is null
> simple example:
> {code}
> CREATE TABLE example (
> val bigint
> );
> INSERT INTO example VALUES (1), (NULL), (0);
> -- returns 2 rows - NULL and 0
> select * from example where (val is null or val == 0);
> -- returns 1 row - 0
> select * from example where not (val is not null and val <> 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11611) A bad performance regression issue with Parquet happens if Hive does not select any columns

2015-08-21 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707476#comment-14707476
 ] 

Ryan Blue commented on HIVE-11611:
--

[~spena], Lian's analysis on PARQUET-363 shows that Parquet does actually 
support this case and has a code path that calls the record materializer the 
right number of times based on the row group metadata. No need to keep at least 
one column.

> A bad performance regression issue with Parquet happens if Hive does not 
> select any columns
> ---
>
> Key: HIVE-11611
> URL: https://issues.apache.org/jira/browse/HIVE-11611
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Sergio Peña
>Assignee: Ferdinand Xu
> Attachments: HIVE-11611.patch
>
>
> A possible performance issue may happen with the below code when using a 
> query like this {{SELECT count(1) FROM parquetTable}}.
> {code}
> if (!ColumnProjectionUtils.isReadAllColumns(configuration) && 
> !indexColumnsWanted.isEmpty()) {
> MessageType requestedSchemaByUser =
> getSchemaByIndex(tableSchema, columnNamesList, 
> indexColumnsWanted);
> return new ReadContext(requestedSchemaByUser, contextMetadata);
> } else {
>   return new ReadContext(tableSchema, contextMetadata);
> }
> {code}
> If there are not columns nor indexes selected, then the above code will read 
> the full schema from Parquet even if Hive does not do anything with such 
> values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707503#comment-14707503
 ] 

Hive QA commented on HIVE-11604:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751736/HIVE-11604.2.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9376 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_avro_compression_enabled_native
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_avro_joins
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_clusterby1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby5_map_skew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_list_bucket_dml_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_louter_join_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_reduce_deduplicate_exclude_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5036/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5036/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5036/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751736 - PreCommit-HIVE-TRUNK-Build

> HIVE return wrong results in some queries with PTF function
> ---
>
> Key: HIVE-11604
> URL: https://issues.apache.org/jira/browse/HIVE-11604
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch
>
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, 
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> 
> INFO  : Ended Job = job_local1070163923_0017
> +-+---+---+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> +-+---+---+--+
> 0: jdbc:hive2://localhost:1> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey 
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:1> from (
> 0: jdbc:hive2://localhost:1> select id, fkey 
> 0: jdbc:hive2://localhost:1> from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:1>  ) ddd 
> 0: jdbc:hive2://localhost:1> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +-+---+---+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> | 100 | 1 | key1  |
> | 200 | 1 | key1  |
> +-+---+---+--+
> {noformat}



--
This message was sent

[jira] [Updated] (HIVE-11600) Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())

2015-08-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11600:
---
Attachment: HIVE-11600.04.patch

> Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())
> 
>
> Key: HIVE-11600
> URL: https://issues.apache.org/jira/browse/HIVE-11600
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11600.01.patch, HIVE-11600.02.patch, 
> HIVE-11600.03.patch, HIVE-11600.04.patch
>
>
> Current hive only support single column in clause, e.g., 
> {code}select * from src where  col0 in (v1,v2,v3);{code}
> We want it to support 
> {code}select * from src where (col0,col1+3) in 
> ((col0+v1,v2),(v3,v4-col1));{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11606) Bucket map joins fail at hash table construction time

2015-08-21 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707581#comment-14707581
 ] 

Gunther Hagleitner commented on HIVE-11606:
---

[~vikram.dixit] can you add a test for this please? otherwise looks good.

> Bucket map joins fail at hash table construction time
> -
>
> Key: HIVE-11606
> URL: https://issues.apache.org/jira/browse/HIVE-11606
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.1, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11606.1.patch
>
>
> {code}
> info=[Error: Failure while running task:java.lang.RuntimeException: 
> java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a 
> power of two
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity 
> must be a power of two
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11604) HIVE return wrong results in some queries with PTF function

2015-08-21 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707580#comment-14707580
 ] 

Yongzhi Chen commented on HIVE-11604:
-

All the 15 spark failures are not related. They are because of same error: 
Timed out waiting for Spark cluster to init
And the patch2's source code is the same as patch1, there is no spark failure 
in the first patch.

> HIVE return wrong results in some queries with PTF function
> ---
>
> Key: HIVE-11604
> URL: https://issues.apache.org/jira/browse/HIVE-11604
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11604.1.patch, HIVE-11604.2.patch
>
>
> Following query returns empty result which is not right:
> {noformat}
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> {noformat}
> After remove row_number() over (partition by id, fkey) as rnum from query, 
> the right result returns.
> Reproduce:
> {noformat}
> create table tlb1 (id int, fkey int, val string);
> create table tlb2 (fid int, name string);
> insert into table tlb1 values(100,1,'abc');
> insert into table tlb1 values(200,1,'efg');
> insert into table tlb2 values(1, 'key1');
> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey, 
> row_number() over (partition by id, fkey) as rnum
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> 
> INFO  : Ended Job = job_local1070163923_0017
> +-+---+---+--+
> No rows selected (14.248 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> +-+---+---+--+
> 0: jdbc:hive2://localhost:1> select ddd.id, ddd.fkey, aaa.name
> from (
> select id, fkey 
> from tlb1 group by id, fkey
>  ) ddd 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;select ddd.id, ddd.fkey, aaa.name
> 0: jdbc:hive2://localhost:1> from (
> 0: jdbc:hive2://localhost:1> select id, fkey 
> 0: jdbc:hive2://localhost:1> from tlb1 group by id, fkey
> 0: jdbc:hive2://localhost:1>  ) ddd 
> 0: jdbc:hive2://localhost:1> 
> inner join tlb2 aaa on aaa.fid = ddd.fkey;
> INFO  : Number of reduce tasks not specified. Estimated from input data size: 
> 1
> ...
> INFO  : Ended Job = job_local672340505_0019
> +-+---+---+--+
> 2 rows selected (14.383 seconds)
> | ddd.id  | ddd.fkey  | aaa.name  |
> +-+---+---+--+
> | 100 | 1 | key1  |
> | 200 | 1 | key1  |
> +-+---+---+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11618) Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG

2015-08-21 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-11618:


Assignee: Owen O'Malley

> Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG
> ---
>
> Key: HIVE-11618
> URL: https://issues.apache.org/jira/browse/HIVE-11618
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> The Parquet binding leaked implementation details into the generic SARG api. 
> Rather than make all users of the SARG api deal with each of the specific 
> types, reunify the INTEGER and LONG types. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11618) Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG

2015-08-21 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11618:
-
Attachment: HIVE-11618.patch

Ok, here is a patch that fixes the problem. The major change was in the parquet 
unit tests that didn't pass in a schema for the file.

> Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG
> ---
>
> Key: HIVE-11618
> URL: https://issues.apache.org/jira/browse/HIVE-11618
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11618.patch
>
>
> The Parquet binding leaked implementation details into the generic SARG api. 
> Rather than make all users of the SARG api deal with each of the specific 
> types, reunify the INTEGER and LONG types. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11592) ORC metadata section can sometimes exceed protobuf message size limit

2015-08-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707674#comment-14707674
 ] 

Owen O'Malley commented on HIVE-11592:
--

Does this patch detect the case where a field ends at the buffer boundary? It 
seems like that would be undetected and thus not expand the range.

> ORC metadata section can sometimes exceed protobuf message size limit
> -
>
> Key: HIVE-11592
> URL: https://issues.apache.org/jira/browse/HIVE-11592
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11592.1.patch, HIVE-11592.2.patch, 
> HIVE-11592.3.patch
>
>
> If there are too many small stripes and with many columns, the overhead for 
> storing metadata (column stats) can exceed the default protobuf message size 
> of 64MB. Reading such files will throw the following exception
> {code}
> Exception in thread "main" 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
> at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
> at 
> com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811)
> at 
> com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.(OrcProto.java:1331)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.(OrcProto.java:1281)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369)
> at 
> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.(OrcProto.java:4887)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.(OrcProto.java:4803)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985)
> at 
> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.(OrcProto.java:12925)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.(OrcProto.java:12872)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956)
> at 
> com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.(OrcProto.java:13599)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.(OrcProto.java:13546)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630)
> at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
> at 
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
> at 
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
> at 
> com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.(ReaderImpl.java:468)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:314)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
> at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:67)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {cod

[jira] [Commented] (HIVE-11518) Provide interface to adjust required resource for tez tasks

2015-08-21 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707683#comment-14707683
 ] 

Gunther Hagleitner commented on HIVE-11518:
---

[~navis] very cool. Let me take a closer look. [~vikram.dixit]/[~t3rmin4t0r] 
you might be interested too.

> Provide interface to adjust required resource for tez tasks
> ---
>
> Key: HIVE-11518
> URL: https://issues.apache.org/jira/browse/HIVE-11518
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-11518.1.patch.txt
>
>
> Resource requirements for each tasks are varied but currently it's fixed to 
> one value(via hive.tez.container.size). It would be good to customize 
> resource requirements appropriate to expected work.
> Suggested interface is quite simple.
> {code}
> public interface ResourceCalculator {
>   Resource adjust(Resource resource, MapWork mapWork);
>   Resource adjust(Resource resource, ReduceWork reduceWork);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11573) PointLookupOptimizer can be pessimistic at a low nDV

2015-08-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11573:
---
Attachment: HIVE-11573.4.patch

Fix the hashset traversal order and update qfiles.

> PointLookupOptimizer can be pessimistic at a low nDV
> 
>
> Key: HIVE-11573
> URL: https://issues.apache.org/jira/browse/HIVE-11573
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: TODOC2.0
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11573.1.patch, HIVE-11573.2.patch, 
> HIVE-11573.3.patch, HIVE-11573.4.patch
>
>
> The PointLookupOptimizer can turn off some of the optimizations due to its 
> use of tuple IN() clauses.
> Limit the application of the optimizer for very low nDV cases and extract the 
> sub-clause as a pre-condition during runtime, to trigger the simple column 
> predicate index lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10289) Support filter on non-first partition key and non-string partition key

2015-08-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-10289:
--
Attachment: HIVE-10289.3.patch

Addressing [~alangates]'s review comments. Also add some more unit tests and 
documents.

> Support filter on non-first partition key and non-string partition key
> --
>
> Key: HIVE-10289
> URL: https://issues.apache.org/jira/browse/HIVE-10289
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore, Metastore
>Affects Versions: hbase-metastore-branch
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-10289.1.patch, HIVE-10289.2.patch, 
> HIVE-10289.3.patch
>
>
> Currently, partition filtering only handles the first partition key and the 
> type for this partition key must be string. In order to break this 
> limitation, several improvements are required:
> 1. Change serialization format for partition key. Currently partition keys 
> are serialized into delimited string, which sorted on string order not with 
> regard to the actual type of the partition key. We use BinarySortableSerDe 
> for this purpose.
> 2. For filter condition not on the initial partition keys, push it into HBase 
> RowFilter. RowFilter will deserialize the partition key and evaluate the 
> filter condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-21 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707706#comment-14707706
 ] 

Aaron Tokhy commented on HIVE-10631:


Code review posted:

https://reviews.apache.org/r/37484/

> create_table_core method has invalid update for Fast Stats
> --
>
> Key: HIVE-10631
> URL: https://issues.apache.org/jira/browse/HIVE-10631
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.0.0
>Reporter: Dongwook Kwon
>Assignee: Aaron Tokhy
>Priority: Minor
> Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch
>
>
> HiveMetaStore.create_table_core method calls 
> MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
> is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
> call scanning warehouse dir and doesn't seem to use it. 
> "Fast Stats" was implemented by HIVE-3959
> https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
> From create_table_core method
> {code}
> if (HiveConf.getBoolVar(hiveConf, 
> HiveConf.ConfVars.HIVESTATSAUTOGATHER) &&
> !MetaStoreUtils.isView(tbl)) {
>   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
> madeDir);
>   } else { // Partitioned table with no partitions.
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
> true);
>   }
> }
> {code}
> Particularly Line 1363: // Partitioned table with no partitions.
> {code}
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
> {code}
> This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
> do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
> newDir flag is always true
> Impact of this bug is minor with HDFS warehouse 
> location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
> location especially for large existing partitions.
> Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
> basically it could scan wrong S3 directory recursively and do nothing with 
> it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11621) Fix TestMiniTezCliDriver test failures when HBase Metastore is used

2015-08-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-11621:
--
Attachment: HIVE-11621.1.patch

Note this patch needs to be applied on top of HIVE-10289.

> Fix TestMiniTezCliDriver test failures when HBase Metastore is used
> ---
>
> Key: HIVE-11621
> URL: https://issues.apache.org/jira/browse/HIVE-11621
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore
>Affects Versions: hbase-metastore-branch
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: hbase-metastore-branch
>
> Attachments: HIVE-11621.1.patch
>
>
> As a first step, Fix hbase-metastore unit tests with TestMiniTezCliDriver, so 
> we can test LLAP and hbase-metastore together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11620) Fix several qtest output order

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707725#comment-14707725
 ] 

Hive QA commented on HIVE-11620:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751791/HIVE-11620.1.patch

{color:green}SUCCESS:{color} +1 9376 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5038/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5038/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5038/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751791 - PreCommit-HIVE-TRUNK-Build

> Fix several qtest output order
> --
>
> Key: HIVE-11620
> URL: https://issues.apache.org/jira/browse/HIVE-11620
> Project: Hive
>  Issue Type: Test
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11620.1.patch
>
>
> selectDistinctStar.q
> unionall_unbalancedppd.q
> vector_cast_constant.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.

2015-08-21 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11581:

Attachment: HIVE-11581.3.patch

> HiveServer2 should store connection params in ZK when using dynamic service 
> discovery for simpler client connection string.
> ---
>
> Key: HIVE-11581
> URL: https://issues.apache.org/jira/browse/HIVE-11581
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11581.1.patch, HIVE-11581.2.patch, 
> HIVE-11581.3.patch
>
>
> Currently, the client needs to specify several parameters based on which an 
> appropriate connection is created with the server. In case of dynamic service 
> discovery, when multiple HS2 instances are running, it is much more usable 
> for the server to add its config parameters to ZK which the driver can use to 
> configure the connection, instead of the jdbc/odbc user adding those in 
> connection string.
> However, at minimum, client will need to specify zookeeper ensemble and that 
> she wants the JDBC driver to use ZooKeeper:
> {noformat}
> beeline> !connect 
> jdbc:hive2://vgumashta.local:2181,vgumashta.local:2182,vgumashta.local:2183/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
>  vgumashta vgumashta org.apache.hive.jdbc.HiveDriver
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.

2015-08-21 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707757#comment-14707757
 ] 

Thejas M Nair commented on HIVE-11581:
--

+1
Can you also make a minor change before commit - change pattern variable name 
to maybe kvPattern ?


> HiveServer2 should store connection params in ZK when using dynamic service 
> discovery for simpler client connection string.
> ---
>
> Key: HIVE-11581
> URL: https://issues.apache.org/jira/browse/HIVE-11581
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11581.1.patch, HIVE-11581.2.patch, 
> HIVE-11581.3.patch
>
>
> Currently, the client needs to specify several parameters based on which an 
> appropriate connection is created with the server. In case of dynamic service 
> discovery, when multiple HS2 instances are running, it is much more usable 
> for the server to add its config parameters to ZK which the driver can use to 
> configure the connection, instead of the jdbc/odbc user adding those in 
> connection string.
> However, at minimum, client will need to specify zookeeper ensemble and that 
> she wants the JDBC driver to use ZooKeeper:
> {noformat}
> beeline> !connect 
> jdbc:hive2://vgumashta.local:2181,vgumashta.local:2182,vgumashta.local:2183/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
>  vgumashta vgumashta org.apache.hive.jdbc.HiveDriver
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)

2015-08-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11375:

Attachment: HIVE-11375.branch-1.patch

Attached the patch for branch-1 branch.

> Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)
> --
>
> Key: HIVE-11375
> URL: https://issues.apache.org/jira/browse/HIVE-11375
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Mariusz Sakowski
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11375.2.patch, HIVE-11375.3.patch, 
> HIVE-11375.4.patch, HIVE-11375.branch-1.patch, HIVE-11375.patch
>
>
> When running query like this:
> {code}explain select * from test where (val is not null and val <> 0);{code}
> hive will simplify expression in parenthesis and omit is not null check:
> {code}
>   Filter Operator
> predicate: (val <> 0) (type: boolean)
> {code}
> which is fine.
> but if we negate condition using NOT operator:
> {code}explain select * from test where not (val is not null and val <> 
> 0);{code}
> hive will also simplify thing, but now it will break stuff:
> {code}
>   Filter Operator
> predicate: (not (val <> 0)) (type: boolean)
> {code}
> because valid predicate should be *val == 0 or val is null*, while above row 
> is equivalent to *val == 0* only, filtering away rows where val is null
> simple example:
> {code}
> CREATE TABLE example (
> val bigint
> );
> INSERT INTO example VALUES (1), (NULL), (0);
> -- returns 2 rows - NULL and 0
> select * from example where (val is null or val == 0);
> -- returns 1 row - 0
> select * from example where not (val is not null and val <> 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11581) HiveServer2 should store connection params in ZK when using dynamic service discovery for simpler client connection string.

2015-08-21 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11581:

Attachment: HIVE-11581.3.patch

Patch with kvPattern variable name.

> HiveServer2 should store connection params in ZK when using dynamic service 
> discovery for simpler client connection string.
> ---
>
> Key: HIVE-11581
> URL: https://issues.apache.org/jira/browse/HIVE-11581
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11581.1.patch, HIVE-11581.2.patch, 
> HIVE-11581.3.patch, HIVE-11581.3.patch
>
>
> Currently, the client needs to specify several parameters based on which an 
> appropriate connection is created with the server. In case of dynamic service 
> discovery, when multiple HS2 instances are running, it is much more usable 
> for the server to add its config parameters to ZK which the driver can use to 
> configure the connection, instead of the jdbc/odbc user adding those in 
> connection string.
> However, at minimum, client will need to specify zookeeper ensemble and that 
> she wants the JDBC driver to use ZooKeeper:
> {noformat}
> beeline> !connect 
> jdbc:hive2://vgumashta.local:2181,vgumashta.local:2182,vgumashta.local:2183/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
>  vgumashta vgumashta org.apache.hive.jdbc.HiveDriver
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11564) HBaseSchemaTool should be able to list objects

2015-08-21 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-11564:
--
Attachment: HIVE-11564.patch

I ended up completely rewriting HBaseSchemaTool.  It now operates on all of the 
tables in the HBase Metastore.  For each table it can find a given row when 
provided with the key.  For each table it can also take a regular expression 
(over the key only) and print out each row that matches the expression.

These changes make it lower level, in that instead of returning full Table or 
Partition objects it returns results from the hbase tables.  But it should be 
more useful for trouble shooting an actual metastore.

> HBaseSchemaTool should be able to list objects
> --
>
> Key: HIVE-11564
> URL: https://issues.apache.org/jira/browse/HIVE-11564
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Affects Versions: hbase-metastore-branch
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-11564.patch
>
>
> Current HBaseSchemaTool can only fetch objects the user already knows the 
> name of.  It should also be able to list available objects (e.g. list all 
> databases).  
> It is also very user unfriendly in terms of error handling.  That needs to be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11600) Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707797#comment-14707797
 ] 

Hive QA commented on HIVE-11600:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751804/HIVE-11600.04.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9381 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5039/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5039/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5039/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751804 - PreCommit-HIVE-TRUNK-Build

> Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())
> 
>
> Key: HIVE-11600
> URL: https://issues.apache.org/jira/browse/HIVE-11600
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11600.01.patch, HIVE-11600.02.patch, 
> HIVE-11600.03.patch, HIVE-11600.04.patch
>
>
> Current hive only support single column in clause, e.g., 
> {code}select * from src where  col0 in (v1,v2,v3);{code}
> We want it to support 
> {code}select * from src where (col0,col1+3) in 
> ((col0+v1,v2),(v3,v4-col1));{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11618) Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707834#comment-14707834
 ] 

Hive QA commented on HIVE-11618:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751818/HIVE-11618.patch

{color:green}SUCCESS:{color} +1 9376 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5040/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5040/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5040/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751818 - PreCommit-HIVE-TRUNK-Build

> Correct the SARG api to reunify the PredicateLeaf.Type INTEGER and LONG
> ---
>
> Key: HIVE-11618
> URL: https://issues.apache.org/jira/browse/HIVE-11618
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11618.patch
>
>
> The Parquet binding leaked implementation details into the generic SARG api. 
> Rather than make all users of the SARG api deal with each of the specific 
> types, reunify the INTEGER and LONG types. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11621) Fix TestMiniTezCliDriver test failures when HBase Metastore is used

2015-08-21 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-11621:
--
Attachment: HIVE-11621.2.patch

Add condition only use HBaseMetastore when running TestMiniTezCliDriver

> Fix TestMiniTezCliDriver test failures when HBase Metastore is used
> ---
>
> Key: HIVE-11621
> URL: https://issues.apache.org/jira/browse/HIVE-11621
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore
>Affects Versions: hbase-metastore-branch
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: hbase-metastore-branch
>
> Attachments: HIVE-11621.1.patch, HIVE-11621.2.patch
>
>
> As a first step, Fix hbase-metastore unit tests with TestMiniTezCliDriver, so 
> we can test LLAP and hbase-metastore together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11573) PointLookupOptimizer can be pessimistic at a low nDV

2015-08-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707853#comment-14707853
 ] 

Hive QA commented on HIVE-11573:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12751821/HIVE-11573.4.patch

{color:green}SUCCESS:{color} +1 9377 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5041/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5041/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5041/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12751821 - PreCommit-HIVE-TRUNK-Build

> PointLookupOptimizer can be pessimistic at a low nDV
> 
>
> Key: HIVE-11573
> URL: https://issues.apache.org/jira/browse/HIVE-11573
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: TODOC2.0
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11573.1.patch, HIVE-11573.2.patch, 
> HIVE-11573.3.patch, HIVE-11573.4.patch
>
>
> The PointLookupOptimizer can turn off some of the optimizations due to its 
> use of tuple IN() clauses.
> Limit the application of the optimizer for very low nDV cases and extract the 
> sub-clause as a pre-condition during runtime, to trigger the simple column 
> predicate index lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11622) Creating an Avro table with a complex map-typed column leads to incorrect column type.

2015-08-21 Thread Alexander Behm (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Behm updated HIVE-11622:
--
Description: 
In the following CREATE TABLE the following map-typed column leads to the wrong 
type. I suspect some problem with inferring the Avro schema from the column 
definitions, but I am not sure.

Reproduction:
{code}
hive> create table t (c map>) stored as avro;
OK
Time taken: 0.101 seconds
hive> desc t;
OK
c   array>  from deserializer   
Time taken: 0.135 seconds, Fetched: 1 row(s)
{code}

Note how the type shown in DESCRIBE is not the type originally passed in the 
CREATE TABLE.

However, *sometimes* the DESCRIBE shows the correct output. You may also try 
these steps which produce a similar problem to increase the chance of hitting 
this issue:

{code}
hive> create table t (c array>) stored as avro;
OK
Time taken: 0.063 seconds
hive> desc t;
OK
c   map>  from deserializer   
Time taken: 0.152 seconds, Fetched: 1 row(s)
{code}

  was:
In the following CREATE TABLE the following map-typed column leads to the wrong 
type. I suspect some problem with inferring the Avro schema from the column 
definitions, but I am not sure.

Reproduction:
{code}
hive> create table t (c map>) stored as avro;
OK
Time taken: 0.101 seconds
hive> desc t;
OK
c   array>  from deserializer   
Time taken: 0.135 seconds, Fetched: 1 row(s)
{code}

Note how the type shown in DESCRIBE is not the type originally passed in the 
CREATE TABLE.



> Creating an Avro table with a complex map-typed column leads to incorrect 
> column type.
> --
>
> Key: HIVE-11622
> URL: https://issues.apache.org/jira/browse/HIVE-11622
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0
>Reporter: Alexander Behm
>
> In the following CREATE TABLE the following map-typed column leads to the 
> wrong type. I suspect some problem with inferring the Avro schema from the 
> column definitions, but I am not sure.
> Reproduction:
> {code}
> hive> create table t (c map>) stored as avro;
> OK
> Time taken: 0.101 seconds
> hive> desc t;
> OK
> c array>  from deserializer   
> Time taken: 0.135 seconds, Fetched: 1 row(s)
> {code}
> Note how the type shown in DESCRIBE is not the type originally passed in the 
> CREATE TABLE.
> However, *sometimes* the DESCRIBE shows the correct output. You may also try 
> these steps which produce a similar problem to increase the chance of hitting 
> this issue:
> {code}
> hive> create table t (c array>) stored as avro;
> OK
> Time taken: 0.063 seconds
> hive> desc t;
> OK
> c map>  from deserializer   
> Time taken: 0.152 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11622) Creating an Avro table with a complex map-typed column leads to incorrect column type.

2015-08-21 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-11622:
--
Labels: AvroSerde  (was: )

> Creating an Avro table with a complex map-typed column leads to incorrect 
> column type.
> --
>
> Key: HIVE-11622
> URL: https://issues.apache.org/jira/browse/HIVE-11622
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0
>Reporter: Alexander Behm
>  Labels: AvroSerde
>
> In the following CREATE TABLE the following map-typed column leads to the 
> wrong type. I suspect some problem with inferring the Avro schema from the 
> column definitions, but I am not sure.
> Reproduction:
> {code}
> hive> create table t (c map>) stored as avro;
> OK
> Time taken: 0.101 seconds
> hive> desc t;
> OK
> c array>  from deserializer   
> Time taken: 0.135 seconds, Fetched: 1 row(s)
> {code}
> Note how the type shown in DESCRIBE is not the type originally passed in the 
> CREATE TABLE.
> However, *sometimes* the DESCRIBE shows the correct output. You may also try 
> these steps which produce a similar problem to increase the chance of hitting 
> this issue:
> {code}
> hive> create table t (c array>) stored as avro;
> OK
> Time taken: 0.063 seconds
> hive> desc t;
> OK
> c map>  from deserializer   
> Time taken: 0.152 seconds, Fetched: 1 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)