[jira] [Commented] (HIVE-13988) zero length file is being created for empty bucket in tez mode (I)

2016-06-22 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345425#comment-15345425
 ] 

Pengcheng Xiong commented on HIVE-13988:


[~ashutoshc], your comments are valid. Could u take another look? I tried to 
only use move task but it seems more complicated than i thought. Move task is 
followed by stats task and we also need to make stats work. Thus, I only make 
very limited optimization, i.e., when there is only one "insert into", we skip 
the task compilation. Please see attached q files for examples.

> zero length file is being created for empty bucket in tez mode (I)
> --
>
> Key: HIVE-13988
> URL: https://issues.apache.org/jira/browse/HIVE-13988
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13988.01.patch, HIVE-13988.02.patch
>
>
> Even though bucket is empty, zero length file is being created in tez mode. 
> steps to reproduce the issue:
> {noformat}
> hive> set hive.execution.engine;
> hive.execution.engine=tez
> hive> drop table if exists emptybucket_orc;
> OK
> Time taken: 5.416 seconds
> hive> create table emptybucket_orc(age int) clustered by (age) sorted by 
> (age) into 99 buckets stored as orc;
> OK
> Time taken: 0.493 seconds
> hive> insert into table emptybucket_orc select distinct(age) from 
> studenttab10k limit 0;
> Query ID = hrt_qa_20160523231955_8b981be7-68c4-4416-8a48-5f8c7ff551c3
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1464045121842_0002)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 1 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 2 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 3 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 4 ..  llap SUCCEEDED 99 9900  
>  0   0  
> --
> VERTICES: 04/04  [==>>] 100%  ELAPSED TIME: 11.00 s   
>  
> --
> Loading data to table default.emptybucket_orc
> OK
> Time taken: 16.907 seconds
> hive> dfs -ls /apps/hive/warehouse/emptybucket_orc;
> Found 99 items
> -rwxrwxrwx   3 hrt_qa hdfs  0 2016-05-23 23:20 
> /apps/hive/warehouse/emptybucket_orc/00_0
> -rwxrwxrwx   3 hrt_qa hdfs  0 2016-05-23 23:20 
> /apps/hive/warehouse/emptybucket_orc/01_0
> ..
> {noformat}
> Expected behavior:
> In tez mode, zero length file shouldn't get created on hdfs if bucket is empty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11527) bypass HiveServer2 thrift interface for query results

2016-06-22 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345252#comment-15345252
 ] 

Thejas M Nair commented on HIVE-11527:
--

[~tasanuma0829] I think it makes sense to explore if we can extend typeDesc to 
encode the fully information for complex types. Otherwise, we have duplicate 
information being sent.
([~prasadm] [~cwsteinbach] please chime in if you have tried that ).

The serliazation format is hardcoded in this case. If we use a single serde 
format, using LazyBinarySerDe instead of LazySimpleSerde is likely to be more 
performant IMO. [~gopalv] What are your thoughts on that ?



> bypass HiveServer2 thrift interface for query results
> -
>
> Key: HIVE-11527
> URL: https://issues.apache.org/jira/browse/HIVE-11527
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sergey Shelukhin
>Assignee: Takanobu Asanuma
> Attachments: HIVE-11527.10.patch, HIVE-11527.11.patch, 
> HIVE-11527.WIP.patch
>
>
> Right now, HS2 reads query results and returns them to the caller via its 
> thrift API.
> There should be an option for HS2 to return some pointer to results (an HDFS 
> link?) and for the user to read the results directly off HDFS inside the 
> cluster, or via something like WebHDFS outside the cluster
> Review board link: https://reviews.apache.org/r/40867



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14070) hive.tez.exec.print.summary=true returns wrong performance numbers on HS2

2016-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14070:
---
Status: Patch Available  (was: Open)

> hive.tez.exec.print.summary=true returns wrong performance numbers on HS2
> -
>
> Key: HIVE-14070
> URL: https://issues.apache.org/jira/browse/HIVE-14070
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14070.01.patch, HIVE-14070.02.patch, 
> HIVE-14070.03.patch
>
>
> On master, we have 
> {code}
> Query Execution Summary
> --
> OPERATIONDURATION
> --
> Compile Query   -1466208820.74s
> Prepare Plan0.00s
> Submit Plan 1466208825.50s
> Start DAG   0.26s
> Run DAG 4.39s
> --
> Task Execution Summary
> --
>   VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  
> OUTPUT_RECORDS
> --
>  Map 11014.00 1,534   11  1,500   
> 1
>  Reducer 2  96.00   5410  1   
> 0
> --
> {code}
> sounds like a real issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14070) hive.tez.exec.print.summary=true returns wrong performance numbers on HS2

2016-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14070:
---
Status: Open  (was: Patch Available)

> hive.tez.exec.print.summary=true returns wrong performance numbers on HS2
> -
>
> Key: HIVE-14070
> URL: https://issues.apache.org/jira/browse/HIVE-14070
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14070.01.patch, HIVE-14070.02.patch, 
> HIVE-14070.03.patch
>
>
> On master, we have 
> {code}
> Query Execution Summary
> --
> OPERATIONDURATION
> --
> Compile Query   -1466208820.74s
> Prepare Plan0.00s
> Submit Plan 1466208825.50s
> Start DAG   0.26s
> Run DAG 4.39s
> --
> Task Execution Summary
> --
>   VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  
> OUTPUT_RECORDS
> --
>  Map 11014.00 1,534   11  1,500   
> 1
>  Reducer 2  96.00   5410  1   
> 0
> --
> {code}
> sounds like a real issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14070) hive.tez.exec.print.summary=true returns wrong performance numbers on HS2

2016-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14070:
---
Attachment: HIVE-14070.03.patch

> hive.tez.exec.print.summary=true returns wrong performance numbers on HS2
> -
>
> Key: HIVE-14070
> URL: https://issues.apache.org/jira/browse/HIVE-14070
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14070.01.patch, HIVE-14070.02.patch, 
> HIVE-14070.03.patch
>
>
> On master, we have 
> {code}
> Query Execution Summary
> --
> OPERATIONDURATION
> --
> Compile Query   -1466208820.74s
> Prepare Plan0.00s
> Submit Plan 1466208825.50s
> Start DAG   0.26s
> Run DAG 4.39s
> --
> Task Execution Summary
> --
>   VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  
> OUTPUT_RECORDS
> --
>  Map 11014.00 1,534   11  1,500   
> 1
>  Reducer 2  96.00   5410  1   
> 0
> --
> {code}
> sounds like a real issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14055) directSql - getting the number of partitions is broken

2016-06-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14055:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Looks like the original patch is not in Hive 1. Committed to all the relevant 
branches. Thanks for the review!

> directSql - getting the number of partitions is broken
> --
>
> Key: HIVE-14055
> URL: https://issues.apache.org/jira/browse/HIVE-14055
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14055.01.patch, HIVE-14055.02.patch, 
> HIVE-14055.patch
>
>
> Noticed while looking at something else. If the filter cannot be pushed down 
> it just returns 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14077) revert or fix HIVE-13380

2016-06-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345490#comment-15345490
 ] 

Sergey Shelukhin commented on HIVE-14077:
-

I'll use this JIRA to add a test.

> revert or fix HIVE-13380
> 
>
> Key: HIVE-14077
> URL: https://issues.apache.org/jira/browse/HIVE-14077
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Blocker
>
> See comments in that JIRA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14070) hive.tez.exec.print.summary=true returns wrong performance numbers on HS2

2016-06-22 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345511#comment-15345511
 ] 

Pengcheng Xiong commented on HIVE-14070:


address [~thejas]'s comments and test failures.

> hive.tez.exec.print.summary=true returns wrong performance numbers on HS2
> -
>
> Key: HIVE-14070
> URL: https://issues.apache.org/jira/browse/HIVE-14070
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14070.01.patch, HIVE-14070.02.patch, 
> HIVE-14070.03.patch
>
>
> On master, we have 
> {code}
> Query Execution Summary
> --
> OPERATIONDURATION
> --
> Compile Query   -1466208820.74s
> Prepare Plan0.00s
> Submit Plan 1466208825.50s
> Start DAG   0.26s
> Run DAG 4.39s
> --
> Task Execution Summary
> --
>   VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  
> OUTPUT_RECORDS
> --
>  Map 11014.00 1,534   11  1,500   
> 1
>  Reducer 2  96.00   5410  1   
> 0
> --
> {code}
> sounds like a real issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14068) make more effort to find hive-site.xml

2016-06-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14068:

Attachment: HIVE-14068.02.patch

Updated... [~thejas] I bet those things could be fixed on commit :P

> make more effort to find hive-site.xml
> --
>
> Key: HIVE-14068
> URL: https://issues.apache.org/jira/browse/HIVE-14068
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14068.01.patch, HIVE-14068.02.patch, 
> HIVE-14068.patch
>
>
> It pretty much doesn't make sense to run Hive w/o the config, so we should 
> make more effort to find one if it's missing on the classpath, or the 
> classloader does not return it for some reason (e.g. classloader ignores some 
> permission issues; explicitly looking for the file may expose them better)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14071) HIVE-14014 breaks non-file outputs

2016-06-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14071:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed to branches; thanks for the review!

> HIVE-14014 breaks non-file outputs
> --
>
> Key: HIVE-14071
> URL: https://issues.apache.org/jira/browse/HIVE-14071
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14071.patch, HIVE-14071.patch
>
>
> Cannot avoid creating outputs when outputs are e.g. streaming



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13617) LLAP: support non-vectorized execution in IO

2016-06-22 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13617:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review!

> LLAP: support non-vectorized execution in IO
> 
>
> Key: HIVE-13617
> URL: https://issues.apache.org/jira/browse/HIVE-13617
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-13617-wo-11417.patch, HIVE-13617-wo-11417.patch, 
> HIVE-13617.01.patch, HIVE-13617.03.patch, HIVE-13617.04.patch, 
> HIVE-13617.05.patch, HIVE-13617.06.patch, HIVE-13617.patch, HIVE-13617.patch, 
> HIVE-15396-with-oi.patch
>
>
> Two approaches - a separate decoding path, into rows instead of VRBs; or 
> decoding VRBs into rows on a higher level (the original LlapInputFormat). I 
> think the latter might be better - it's not a hugely important path, and perf 
> in non-vectorized case is not the best anyway, so it's better to make do with 
> much less new code and architectural disruption. 
> Some ORC patches in progress introduce an easy to reuse (or so I hope, 
> anyway) VRB-to-row conversion, so we should just use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14070) hive.tez.exec.print.summary=true returns wrong performance numbers on HS2

2016-06-22 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345567#comment-15345567
 ] 

Pengcheng Xiong commented on HIVE-14070:


[~prasanth_j], could u take a final look? Thanks.

> hive.tez.exec.print.summary=true returns wrong performance numbers on HS2
> -
>
> Key: HIVE-14070
> URL: https://issues.apache.org/jira/browse/HIVE-14070
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14070.01.patch, HIVE-14070.02.patch, 
> HIVE-14070.03.patch
>
>
> On master, we have 
> {code}
> Query Execution Summary
> --
> OPERATIONDURATION
> --
> Compile Query   -1466208820.74s
> Prepare Plan0.00s
> Submit Plan 1466208825.50s
> Start DAG   0.26s
> Run DAG 4.39s
> --
> Task Execution Summary
> --
>   VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  
> OUTPUT_RECORDS
> --
>  Map 11014.00 1,534   11  1,500   
> 1
>  Reducer 2  96.00   5410  1   
> 0
> --
> {code}
> sounds like a real issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13988) zero length file is being created for empty bucket in tez mode (I)

2016-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13988:
---
Status: Patch Available  (was: Open)

> zero length file is being created for empty bucket in tez mode (I)
> --
>
> Key: HIVE-13988
> URL: https://issues.apache.org/jira/browse/HIVE-13988
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13988.01.patch, HIVE-13988.02.patch
>
>
> Even though bucket is empty, zero length file is being created in tez mode. 
> steps to reproduce the issue:
> {noformat}
> hive> set hive.execution.engine;
> hive.execution.engine=tez
> hive> drop table if exists emptybucket_orc;
> OK
> Time taken: 5.416 seconds
> hive> create table emptybucket_orc(age int) clustered by (age) sorted by 
> (age) into 99 buckets stored as orc;
> OK
> Time taken: 0.493 seconds
> hive> insert into table emptybucket_orc select distinct(age) from 
> studenttab10k limit 0;
> Query ID = hrt_qa_20160523231955_8b981be7-68c4-4416-8a48-5f8c7ff551c3
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1464045121842_0002)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 1 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 2 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 3 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 4 ..  llap SUCCEEDED 99 9900  
>  0   0  
> --
> VERTICES: 04/04  [==>>] 100%  ELAPSED TIME: 11.00 s   
>  
> --
> Loading data to table default.emptybucket_orc
> OK
> Time taken: 16.907 seconds
> hive> dfs -ls /apps/hive/warehouse/emptybucket_orc;
> Found 99 items
> -rwxrwxrwx   3 hrt_qa hdfs  0 2016-05-23 23:20 
> /apps/hive/warehouse/emptybucket_orc/00_0
> -rwxrwxrwx   3 hrt_qa hdfs  0 2016-05-23 23:20 
> /apps/hive/warehouse/emptybucket_orc/01_0
> ..
> {noformat}
> Expected behavior:
> In tez mode, zero length file shouldn't get created on hdfs if bucket is empty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13988) zero length file is being created for empty bucket in tez mode (I)

2016-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13988:
---
Status: Open  (was: Patch Available)

> zero length file is being created for empty bucket in tez mode (I)
> --
>
> Key: HIVE-13988
> URL: https://issues.apache.org/jira/browse/HIVE-13988
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13988.01.patch, HIVE-13988.02.patch
>
>
> Even though bucket is empty, zero length file is being created in tez mode. 
> steps to reproduce the issue:
> {noformat}
> hive> set hive.execution.engine;
> hive.execution.engine=tez
> hive> drop table if exists emptybucket_orc;
> OK
> Time taken: 5.416 seconds
> hive> create table emptybucket_orc(age int) clustered by (age) sorted by 
> (age) into 99 buckets stored as orc;
> OK
> Time taken: 0.493 seconds
> hive> insert into table emptybucket_orc select distinct(age) from 
> studenttab10k limit 0;
> Query ID = hrt_qa_20160523231955_8b981be7-68c4-4416-8a48-5f8c7ff551c3
> Total jobs = 1
> Launching Job 1 out of 1
> Status: Running (Executing on YARN cluster with App id 
> application_1464045121842_0002)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 1 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 2 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 3 ..  llap SUCCEEDED  1  100  
>  0   0  
> Reducer 4 ..  llap SUCCEEDED 99 9900  
>  0   0  
> --
> VERTICES: 04/04  [==>>] 100%  ELAPSED TIME: 11.00 s   
>  
> --
> Loading data to table default.emptybucket_orc
> OK
> Time taken: 16.907 seconds
> hive> dfs -ls /apps/hive/warehouse/emptybucket_orc;
> Found 99 items
> -rwxrwxrwx   3 hrt_qa hdfs  0 2016-05-23 23:20 
> /apps/hive/warehouse/emptybucket_orc/00_0
> -rwxrwxrwx   3 hrt_qa hdfs  0 2016-05-23 23:20 
> /apps/hive/warehouse/emptybucket_orc/01_0
> ..
> {noformat}
> Expected behavior:
> In tez mode, zero length file shouldn't get created on hdfs if bucket is empty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14074) RELOAD FUNCTION should update dropped functions

2016-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345654#comment-15345654
 ] 

Hive QA commented on HIVE-14074:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12812632/HIVE-14074.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 232 failed/errored test(s), 10258 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby_empty
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_semijoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_simple_select
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_not_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_udf_udaf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_views
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_windowing
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_mat_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_mat_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_orig_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_whole_partition
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_orig_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_llap_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_llapdecider
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_decimal
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge11
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge7

[jira] [Commented] (HIVE-14070) hive.tez.exec.print.summary=true returns wrong performance numbers on HS2

2016-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345557#comment-15345557
 ] 

Hive QA commented on HIVE-14070:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12812394/HIVE-14070.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10257 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/225/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/225/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-225/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12812394 - PreCommit-HIVE-MASTER-Build

> hive.tez.exec.print.summary=true returns wrong performance numbers on HS2
> -
>
> Key: HIVE-14070
> URL: https://issues.apache.org/jira/browse/HIVE-14070
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14070.01.patch, HIVE-14070.02.patch, 
> HIVE-14070.03.patch
>
>
> On master, we have 
> {code}
> Query Execution Summary
> --
> OPERATIONDURATION
> --
> Compile Query   -1466208820.74s
> Prepare Plan0.00s
> Submit Plan 1466208825.50s
> Start DAG   0.26s
> Run DAG 4.39s
> --
> Task Execution Summary
> --
>   VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  
> OUTPUT_RECORDS
> --
>  Map 11014.00 1,534   11  1,500   
> 1
>  Reducer 2  96.00   5410  1   
> 0
> --
> {code}
> sounds like a real issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

2016-06-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13872:

Attachment: HIVE-13872.05.patch

> Vectorization: Fix cross-product reduce sink serialization
> --
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, 
> HIVE-13872.03.patch, HIVE-13872.04.patch, HIVE-13872.05.patch, 
> HIVE-13872.WIP.patch, customer_demographics.txt, vector_include_no_sel.q, 
> vector_include_no_sel.q.out
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 
> projection column num 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>  ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>  )or
>  (
>customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>  ))
> ;
> {code}
> {code}
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: customer_demographics
>   Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int), 
> cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

2016-06-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13872:

Attachment: HIVE-13872.05.patch

> Vectorization: Fix cross-product reduce sink serialization
> --
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, 
> HIVE-13872.03.patch, HIVE-13872.04.patch, HIVE-13872.05.patch, 
> HIVE-13872.WIP.patch, customer_demographics.txt, vector_include_no_sel.q, 
> vector_include_no_sel.q.out
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 
> projection column num 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>  ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>  )or
>  (
>customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>  ))
> ;
> {code}
> {code}
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: customer_demographics
>   Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int), 
> cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

2016-06-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13872:

Status: In Progress  (was: Patch Available)

> Vectorization: Fix cross-product reduce sink serialization
> --
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, 
> HIVE-13872.03.patch, HIVE-13872.04.patch, HIVE-13872.05.patch, 
> HIVE-13872.WIP.patch, customer_demographics.txt, vector_include_no_sel.q, 
> vector_include_no_sel.q.out
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 
> projection column num 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>  ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>  )or
>  (
>customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>  ))
> ;
> {code}
> {code}
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: customer_demographics
>   Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int), 
> cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

2016-06-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13872:

Attachment: (was: HIVE-13872.05.patch)

> Vectorization: Fix cross-product reduce sink serialization
> --
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, 
> HIVE-13872.03.patch, HIVE-13872.04.patch, HIVE-13872.05.patch, 
> HIVE-13872.WIP.patch, customer_demographics.txt, vector_include_no_sel.q, 
> vector_include_no_sel.q.out
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 
> projection column num 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>  ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>  )or
>  (
>customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>  ))
> ;
> {code}
> {code}
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: customer_demographics
>   Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int), 
> cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

2016-06-22 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13872:

Status: Patch Available  (was: In Progress)

> Vectorization: Fix cross-product reduce sink serialization
> --
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, 
> HIVE-13872.03.patch, HIVE-13872.04.patch, HIVE-13872.05.patch, 
> HIVE-13872.WIP.patch, customer_demographics.txt, vector_include_no_sel.q, 
> vector_include_no_sel.q.out
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 
> projection column num 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>  ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>  )or
>  (
>customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>  ))
> ;
> {code}
> {code}
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: customer_demographics
>   Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int), 
> cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

2016-06-22 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345690#comment-15345690
 ] 

Matt McCline commented on HIVE-13872:
-

Thank you Gopal for the review!

> Vectorization: Fix cross-product reduce sink serialization
> --
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, 
> HIVE-13872.03.patch, HIVE-13872.04.patch, HIVE-13872.05.patch, 
> HIVE-13872.WIP.patch, customer_demographics.txt, vector_include_no_sel.q, 
> vector_include_no_sel.q.out
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 
> projection column num 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>  ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>  )or
>  (
>customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>  ))
> ;
> {code}
> {code}
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: customer_demographics
>   Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int), 
> cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13636) Exception using Postgres as metastore with ACID transanctions enabled

2016-06-22 Thread Rajkumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345738#comment-15345738
 ] 

Rajkumar Singh commented on HIVE-13636:
---

[~mgaido] it seems that there is no issue with hive here, you are using old 
postgress jdbc3 driver which is throwing AbstractMethodError, could you please 
update to jdbc4 driver and see if you still face this issue.

> Exception using Postgres as metastore with ACID transanctions enabled
> -
>
> Key: HIVE-13636
> URL: https://issues.apache.org/jira/browse/HIVE-13636
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.2.1
> Environment: HDP 2.3.2
>Reporter: Marco Gaido
>
> We are using Postgres as metastore and we enabled ACID transactions. Once we 
> have done this, we started facing this error:
> {code}
> FATAL [DeadTxnReaper-0]: txn.AcidHouseKeeperService 
> (AcidHouseKeeperService.java:run(92)) - Serious error in DeadTxnReaper-0: 
> Method org/postgresql/jdbc3/Jdbc3ResultSet.isClosed()Z is abstract
> java.lang.AbstractMethodError: Method 
> org/postgresql/jdbc3/Jdbc3ResultSet.isClosed()Z is abstract
> at org.postgresql.jdbc3.Jdbc3ResultSet.isClosed(Jdbc3ResultSet.java)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.close(TxnHandler.java:934)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.close(TxnHandler.java:947)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.performTimeOuts(TxnHandler.java:1933)
> at 
> org.apache.hadoop.hive.ql.txn.AcidHouseKeeperService$TimedoutTxnReaper.run(AcidHouseKeeperService.java:87)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Looking at the code of the class TxnHandler, in the method close is actually 
> used the isClosed() method on the ResultSet class, which is not implemented 
> in Jdbc3ResultSet Postgres driver's class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14078) LLAP input split should get task attempt number from conf if available

2016-06-22 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-14078:
--
Attachment: HIVE-14078.1.patch

> LLAP input split should get task attempt number from conf if available
> --
>
> Key: HIVE-14078
> URL: https://issues.apache.org/jira/browse/HIVE-14078
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14078.1.patch
>
>
> Currently the attempt number is hard-coded to 0. If the split is being 
> fetched as part of a hadoop job we can get the task attempt ID from the conf 
> if it has been set, and use the attempt number from that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14078) LLAP input split should get task attempt number from conf if available

2016-06-22 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-14078:
--
Status: Patch Available  (was: Open)

> LLAP input split should get task attempt number from conf if available
> --
>
> Key: HIVE-14078
> URL: https://issues.apache.org/jira/browse/HIVE-14078
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14078.1.patch
>
>
> Currently the attempt number is hard-coded to 0. If the split is being 
> fetched as part of a hadoop job we can get the task attempt ID from the conf 
> if it has been set, and use the attempt number from that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14028) stats is not updated

2016-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14028:
---
Status: Patch Available  (was: Open)

> stats is not updated
> 
>
> Key: HIVE-14028
> URL: https://issues.apache.org/jira/browse/HIVE-14028
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14028.01.patch, HIVE-14028.02.patch
>
>
> {code}
> DROP TABLE users;
> CREATE TABLE users(key string, state string, country string, country_id int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "info:state,info:country,info:country_id"
> );
> INSERT OVERWRITE TABLE users SELECT 'user1', 'IA', 'USA', 0 FROM src;
> desc formatted users;
> {code}
> the result is
> {code}
>  A masked pattern was here 
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
> numFiles0
> numRows 0
> rawDataSize 0
> storage_handler 
> org.apache.hadoop.hive.hbase.HBaseStorageHandler
> totalSize   0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14028) stats is not updated

2016-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14028:
---
Attachment: HIVE-14028.02.patch

> stats is not updated
> 
>
> Key: HIVE-14028
> URL: https://issues.apache.org/jira/browse/HIVE-14028
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14028.01.patch, HIVE-14028.02.patch
>
>
> {code}
> DROP TABLE users;
> CREATE TABLE users(key string, state string, country string, country_id int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "info:state,info:country,info:country_id"
> );
> INSERT OVERWRITE TABLE users SELECT 'user1', 'IA', 'USA', 0 FROM src;
> desc formatted users;
> {code}
> the result is
> {code}
>  A masked pattern was here 
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
> numFiles0
> numRows 0
> rawDataSize 0
> storage_handler 
> org.apache.hadoop.hive.hbase.HBaseStorageHandler
> totalSize   0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14028) stats is not updated

2016-06-22 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14028:
---
Status: Open  (was: Patch Available)

> stats is not updated
> 
>
> Key: HIVE-14028
> URL: https://issues.apache.org/jira/browse/HIVE-14028
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14028.01.patch, HIVE-14028.02.patch
>
>
> {code}
> DROP TABLE users;
> CREATE TABLE users(key string, state string, country string, country_id int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "info:state,info:country,info:country_id"
> );
> INSERT OVERWRITE TABLE users SELECT 'user1', 'IA', 'USA', 0 FROM src;
> desc formatted users;
> {code}
> the result is
> {code}
>  A masked pattern was here 
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
> numFiles0
> numRows 0
> rawDataSize 0
> storage_handler 
> org.apache.hadoop.hive.hbase.HBaseStorageHandler
> totalSize   0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14078) LLAP input split should get task attempt number from conf if available

2016-06-22 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-14078:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-12991

> LLAP input split should get task attempt number from conf if available
> --
>
> Key: HIVE-14078
> URL: https://issues.apache.org/jira/browse/HIVE-14078
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14078.1.patch
>
>
> Currently the attempt number is hard-coded to 0. If the split is being 
> fetched as part of a hadoop job we can get the task attempt ID from the conf 
> if it has been set, and use the attempt number from that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-22 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Attachment: (was: HIVE-13982.6.patch)

> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, 
> HIVE-13982.4.patch, HIVE-13982.5.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible

2016-06-22 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-13982:
---
Attachment: HIVE-13982.6.patch

Updated the patch to fix the regression; it had to do with windowing. For now, 
we do not support reordering of partitioning/ordering within windowing.

> Extensions to RS dedup: execute with different column order and sorting 
> direction if possible
> -
>
> Key: HIVE-13982
> URL: https://issues.apache.org/jira/browse/HIVE-13982
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, 
> HIVE-13982.4.patch, HIVE-13982.5.patch, HIVE-13982.6.patch, HIVE-13982.patch
>
>
> Pointed out by [~gopalv].
> RS dedup should kick in for these cases, avoiding an additional shuffle stage.
> {code}
> select state, city, sum(sales) from table
> group by state, city
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state, city
> limit 10;
> {code}
> {code}
> select state, city, sum(sales) from table
> group by city, state
> order by state desc, city
> limit 10;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13756) Map failure attempts to delete reducer _temporary directory on multi-query pig query

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13756:
---
Attachment: HIVE-13756.1.patch
HIVE-13756.1-branch-1.patch

> Map failure attempts to delete reducer _temporary directory on multi-query 
> pig query
> 
>
> Key: HIVE-13756
> URL: https://issues.apache.org/jira/browse/HIVE-13756
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, 
> HIVE-13756.1.patch, HIVE-13756.patch
>
>
> A pig script, executed with multi-query enabled, that reads the source data 
> and writes it as-is into TABLE_A as well as performing a group-by operation 
> on the data which is written into TABLE_B can produce erroneous results if 
> any map fails. This results in a single MR job that writes the map output to 
> a scratch directory relative to TABLE_A and the reducer output to a scratch 
> directory relative to TABLE_B.
> If one or more maps fail it will delete the attempt data relative to TABLE_A, 
> but it also deletes the _temporary directory relative to TABLE_B. This has 
> the unintended side-effect of preventing subsequent maps from committing 
> their data. This means that any maps which successfully completed before the 
> first map failure will have its data committed as expected, other maps not, 
> resulting in an incomplete result set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13754) Fix resource leak in HiveClientCache

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13754:
---
Attachment: HIVE-13754.1.patch
HIVE-13754.1-branch-1.patch

> Fix resource leak in HiveClientCache
> 
>
> Key: HIVE-13754
> URL: https://issues.apache.org/jira/browse/HIVE-13754
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13754-branch-1.patch, HIVE-13754.1-branch-1.patch, 
> HIVE-13754.1.patch, HIVE-13754.patch
>
>
> Found that the {{users}} reference count can go into negative values, which 
> prevents {{tearDownIfUnused}} from closing the client connection when called.
> This leads to a build up of clients which have been evicted from the cache, 
> are no longer in use, but have not been shutdown.
> GC will eventually call {{finalize}}, which forcibly closes the connection 
> and cleans up the client, but I have seen as many as several hundred open 
> client connections as a result.
> The main resource for this is caused by RetryingMetaStoreClient, which will 
> call {{reconnect}} on acquire, which calls {{close}}. This will decrement 
> {{users}} to -1 on the reconnect, then acquire will increase this to 0 while 
> using it, and back to -1 when it releases it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14078) LLAP input split should get task attempt number from conf if available

2016-06-22 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345110#comment-15345110
 ] 

Jason Dere commented on HIVE-14078:
---

cc [~sseth]

> LLAP input split should get task attempt number from conf if available
> --
>
> Key: HIVE-14078
> URL: https://issues.apache.org/jira/browse/HIVE-14078
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14078.1.patch
>
>
> Currently the attempt number is hard-coded to 0. If the split is being 
> fetched as part of a hadoop job we can get the task attempt ID from the conf 
> if it has been set, and use the attempt number from that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13990:
---
Attachment: HIVE-13990.1.patch
HIVE-13990.1-branch-1.patch

> Client should not check dfs.namenode.acls.enabled to determine if extended 
> ACLs are supported
> -
>
> Key: HIVE-13990
> URL: https://issues.apache.org/jira/browse/HIVE-13990
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Chris Drome
> Attachments: HIVE-13990-branch-1.patch, HIVE-13990.1-branch-1.patch, 
> HIVE-13990.1.patch
>
>
> dfs.namenode.acls.enabled is a server side configuration and the client 
> should not presume to know how the server is configured. Barring a method for 
> querying the NN whether ACLs are supported the client should try and catch 
> the appropriate exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13989) Extended ACLs are not handled according to specification

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13989:
---
Attachment: HIVE-13989.1.patch
HIVE-13989.1-branch-1.patch

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989-branch-1.patch, HIVE-13989.1-branch-1.patch, 
> HIVE-13989.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13990) Client should not check dfs.namenode.acls.enabled to determine if extended ACLs are supported

2016-06-22 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345117#comment-15345117
 ] 

Chris Drome commented on HIVE-13990:


@ashutosh, we are still on branch-1, so I had this patch readily available for 
branch-1. There was a bit of work to get HIVE-13989, which this depends on, 
ported to master.

Patches are available for master and branch-1 now.

> Client should not check dfs.namenode.acls.enabled to determine if extended 
> ACLs are supported
> -
>
> Key: HIVE-13990
> URL: https://issues.apache.org/jira/browse/HIVE-13990
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Chris Drome
> Attachments: HIVE-13990-branch-1.patch, HIVE-13990.1-branch-1.patch, 
> HIVE-13990.1.patch
>
>
> dfs.namenode.acls.enabled is a server side configuration and the client 
> should not presume to know how the server is configured. Barring a method for 
> querying the NN whether ACLs are supported the client should try and catch 
> the appropriate exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13989) Extended ACLs are not handled according to specification

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13989:
---
Target Version/s: 2.0.0, 1.2.1  (was: 1.2.1, 2.0.0)
  Status: Patch Available  (was: Open)

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989-branch-1.patch, HIVE-13989.1-branch-1.patch, 
> HIVE-13989.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13754) Fix resource leak in HiveClientCache

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13754:
---
Target Version/s: 2.0.0, 1.2.1  (was: 1.2.1, 2.0.0)
  Status: Open  (was: Patch Available)

> Fix resource leak in HiveClientCache
> 
>
> Key: HIVE-13754
> URL: https://issues.apache.org/jira/browse/HIVE-13754
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13754-branch-1.patch, HIVE-13754.1-branch-1.patch, 
> HIVE-13754.1.patch, HIVE-13754.patch
>
>
> Found that the {{users}} reference count can go into negative values, which 
> prevents {{tearDownIfUnused}} from closing the client connection when called.
> This leads to a build up of clients which have been evicted from the cache, 
> are no longer in use, but have not been shutdown.
> GC will eventually call {{finalize}}, which forcibly closes the connection 
> and cleans up the client, but I have seen as many as several hundred open 
> client connections as a result.
> The main resource for this is caused by RetryingMetaStoreClient, which will 
> call {{reconnect}} on acquire, which calls {{close}}. This will decrement 
> {{users}} to -1 on the reconnect, then acquire will increase this to 0 while 
> using it, and back to -1 when it releases it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13989) Extended ACLs are not handled according to specification

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13989:
---
Target Version/s: 2.0.0, 1.2.1  (was: 1.2.1, 2.0.0)
  Status: Open  (was: Patch Available)

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989-branch-1.patch, HIVE-13989.1-branch-1.patch, 
> HIVE-13989.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13989) Extended ACLs are not handled according to specification

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13989:
---
Target Version/s: 2.0.0, 1.2.1  (was: 1.2.1, 2.0.0)
  Status: Patch Available  (was: Open)

> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989-branch-1.patch, HIVE-13989.1-branch-1.patch, 
> HIVE-13989.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13754) Fix resource leak in HiveClientCache

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13754:
---
Target Version/s: 2.0.0, 1.2.1  (was: 1.2.1, 2.0.0)
  Status: Patch Available  (was: Open)

> Fix resource leak in HiveClientCache
> 
>
> Key: HIVE-13754
> URL: https://issues.apache.org/jira/browse/HIVE-13754
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13754-branch-1.patch, HIVE-13754.1-branch-1.patch, 
> HIVE-13754.1.patch, HIVE-13754.patch
>
>
> Found that the {{users}} reference count can go into negative values, which 
> prevents {{tearDownIfUnused}} from closing the client connection when called.
> This leads to a build up of clients which have been evicted from the cache, 
> are no longer in use, but have not been shutdown.
> GC will eventually call {{finalize}}, which forcibly closes the connection 
> and cleans up the client, but I have seen as many as several hundred open 
> client connections as a result.
> The main resource for this is caused by RetryingMetaStoreClient, which will 
> call {{reconnect}} on acquire, which calls {{close}}. This will decrement 
> {{users}} to -1 on the reconnect, then acquire will increase this to 0 while 
> using it, and back to -1 when it releases it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13756) Map failure attempts to delete reducer _temporary directory on multi-query pig query

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13756:
---
Target Version/s: 2.0.0, 1.2.1  (was: 1.2.1, 2.0.0)
  Status: Patch Available  (was: Open)

> Map failure attempts to delete reducer _temporary directory on multi-query 
> pig query
> 
>
> Key: HIVE-13756
> URL: https://issues.apache.org/jira/browse/HIVE-13756
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, 
> HIVE-13756.1.patch, HIVE-13756.patch
>
>
> A pig script, executed with multi-query enabled, that reads the source data 
> and writes it as-is into TABLE_A as well as performing a group-by operation 
> on the data which is written into TABLE_B can produce erroneous results if 
> any map fails. This results in a single MR job that writes the map output to 
> a scratch directory relative to TABLE_A and the reducer output to a scratch 
> directory relative to TABLE_B.
> If one or more maps fail it will delete the attempt data relative to TABLE_A, 
> but it also deletes the _temporary directory relative to TABLE_B. This has 
> the unintended side-effect of preventing subsequent maps from committing 
> their data. This means that any maps which successfully completed before the 
> first map failure will have its data committed as expected, other maps not, 
> resulting in an incomplete result set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13756) Map failure attempts to delete reducer _temporary directory on multi-query pig query

2016-06-22 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-13756:
---
Target Version/s: 2.0.0, 1.2.1  (was: 1.2.1, 2.0.0)
  Status: Open  (was: Patch Available)

> Map failure attempts to delete reducer _temporary directory on multi-query 
> pig query
> 
>
> Key: HIVE-13756
> URL: https://issues.apache.org/jira/browse/HIVE-13756
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 2.0.0, 1.2.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, 
> HIVE-13756.1.patch, HIVE-13756.patch
>
>
> A pig script, executed with multi-query enabled, that reads the source data 
> and writes it as-is into TABLE_A as well as performing a group-by operation 
> on the data which is written into TABLE_B can produce erroneous results if 
> any map fails. This results in a single MR job that writes the map output to 
> a scratch directory relative to TABLE_A and the reducer output to a scratch 
> directory relative to TABLE_B.
> If one or more maps fail it will delete the attempt data relative to TABLE_A, 
> but it also deletes the _temporary directory relative to TABLE_B. This has 
> the unintended side-effect of preventing subsequent maps from committing 
> their data. This means that any maps which successfully completed before the 
> first map failure will have its data committed as expected, other maps not, 
> resulting in an incomplete result set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14070) hive.tez.exec.print.summary=true returns wrong results on HS2

2016-06-22 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345045#comment-15345045
 ] 

Pengcheng Xiong commented on HIVE-14070:


I also want to remove PerfLogger.DRIVER_RUN as well. And use  the start time of 
PerfLogger.COMPILE instead.

> hive.tez.exec.print.summary=true returns wrong results on HS2
> -
>
> Key: HIVE-14070
> URL: https://issues.apache.org/jira/browse/HIVE-14070
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14070.01.patch, HIVE-14070.02.patch
>
>
> On master, we have 
> {code}
> Query Execution Summary
> --
> OPERATIONDURATION
> --
> Compile Query   -1466208820.74s
> Prepare Plan0.00s
> Submit Plan 1466208825.50s
> Start DAG   0.26s
> Run DAG 4.39s
> --
> Task Execution Summary
> --
>   VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  
> OUTPUT_RECORDS
> --
>  Map 11014.00 1,534   11  1,500   
> 1
>  Reducer 2  96.00   5410  1   
> 0
> --
> {code}
> sounds like a real issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14068) make more effort to find hive-site.xml

2016-06-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345024#comment-15345024
 ] 

Hive QA commented on HIVE-14068:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12812346/HIVE-14068.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10254 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/222/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/222/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-222/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12812346 - PreCommit-HIVE-MASTER-Build

> make more effort to find hive-site.xml
> --
>
> Key: HIVE-14068
> URL: https://issues.apache.org/jira/browse/HIVE-14068
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14068.01.patch, HIVE-14068.patch
>
>
> It pretty much doesn't make sense to run Hive w/o the config, so we should 
> make more effort to find one if it's missing on the classpath, or the 
> classloader does not return it for some reason (e.g. classloader ignores some 
> permission issues; explicitly looking for the file may expose them better)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13380) Decimal should have lower precedence than double in type hierachy

2016-06-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345027#comment-15345027
 ] 

Sergey Shelukhin commented on HIVE-13380:
-

Created HIVE-14077 to track

> Decimal should have lower precedence than double in type hierachy
> -
>
> Key: HIVE-13380
> URL: https://issues.apache.org/jira/browse/HIVE-13380
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-13380.2.patch, HIVE-13380.4.patch, 
> HIVE-13380.5.patch, HIVE-13380.patch, decimal_filter.q
>
>
> Currently its other way round. Also, decimal should be lower than float.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14055) directSql - getting the number of partitions is broken

2016-06-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345051#comment-15345051
 ] 

Sergio Peña commented on HIVE-14055:


Agree. Let's fix this in another jira.

I took a look the the patch, and it looks good.
+1

Let's wait for HiveQA to verify the patch.

> directSql - getting the number of partitions is broken
> --
>
> Key: HIVE-14055
> URL: https://issues.apache.org/jira/browse/HIVE-14055
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14055.01.patch, HIVE-14055.02.patch, 
> HIVE-14055.patch
>
>
> Noticed while looking at something else. If the filter cannot be pushed down 
> it just returns 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2