[jira] [Resolved] (HIVE-15433) setting hive.warehouse.subdir.inherit.perms in HIVE won't overwrite it in hive configuration

2017-12-10 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov resolved HIVE-15433.
-
Resolution: Invalid

Resolved as invalid due to HIVE-16392

> setting hive.warehouse.subdir.inherit.perms in HIVE won't overwrite it in 
> hive configuration
> 
>
> Key: HIVE-15433
> URL: https://issues.apache.org/jira/browse/HIVE-15433
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 2.0.0
>Reporter: Alina Abramova
>Assignee: Vlad Gudikov
> Fix For: 3.0.0, 1.2.0
>
> Attachments: HIVE-15433-branch-1.2.patch, HIVE-15433.1.patch
>
>
> Setting hive.warehouse.subdir.inherit.perms in HIVE won't make any effect. It 
> will always take the default value from HiveConf until you define it in 
> hive-site.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15433) setting hive.warehouse.subdir.inherit.perms in HIVE won't overwrite it in hive configuration

2017-12-08 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-15433:

Fix Version/s: 3.0.0

> setting hive.warehouse.subdir.inherit.perms in HIVE won't overwrite it in 
> hive configuration
> 
>
> Key: HIVE-15433
> URL: https://issues.apache.org/jira/browse/HIVE-15433
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 2.0.0
>Reporter: Alina Abramova
>Assignee: Vlad Gudikov
> Fix For: 1.2.0, 3.0.0
>
> Attachments: HIVE-15433-branch-1.2.patch, HIVE-15433.1.patch
>
>
> Setting hive.warehouse.subdir.inherit.perms in HIVE won't make any effect. It 
> will always take the default value from HiveConf until you define it in 
> hive-site.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15433) setting hive.warehouse.subdir.inherit.perms in HIVE won't overwrite it in hive configuration

2017-12-08 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-15433:

Status: In Progress  (was: Patch Available)

> setting hive.warehouse.subdir.inherit.perms in HIVE won't overwrite it in 
> hive configuration
> 
>
> Key: HIVE-15433
> URL: https://issues.apache.org/jira/browse/HIVE-15433
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.0.0, 1.2.0, 1.0.0
>Reporter: Alina Abramova
>Assignee: Vlad Gudikov
> Fix For: 3.0.0, 1.2.0
>
> Attachments: HIVE-15433-branch-1.2.patch, HIVE-15433.1.patch
>
>
> Setting hive.warehouse.subdir.inherit.perms in HIVE won't make any effect. It 
> will always take the default value from HiveConf until you define it in 
> hive-site.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-15433) setting hive.warehouse.subdir.inherit.perms in HIVE won't overwrite it in hive configuration

2017-11-30 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov reassigned HIVE-15433:
---

Assignee: Vlad Gudikov  (was: Alina Abramova)

> setting hive.warehouse.subdir.inherit.perms in HIVE won't overwrite it in 
> hive configuration
> 
>
> Key: HIVE-15433
> URL: https://issues.apache.org/jira/browse/HIVE-15433
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 2.0.0
>Reporter: Alina Abramova
>Assignee: Vlad Gudikov
> Fix For: 1.2.0
>
> Attachments: HIVE-15433-branch-1.2.patch, HIVE-15433.1.patch
>
>
> Setting hive.warehouse.subdir.inherit.perms in HIVE won't make any effect. It 
> will always take the default value from HiveConf until you define it in 
> hive-site.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-11309) Replace PidDailyRollingFileAppender with equivalent log4j2 implementation

2017-08-29 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145356#comment-16145356
 ] 

Vlad Gudikov edited comment on HIVE-11309 at 8/29/17 2:12 PM:
--

I think it doesn't work correctly. I am adding pid to file pattern.

{code}
appender.DRFA.filePattern = 
${sys:hive.log.dir}/${sys:hive.log.file}.%d{-MM-dd-HH-mm}.%pid
{code}
When it's time to roll it just takes whole hive.log and renames it as described 
above. hive.log contains logs from different services so the logs created on 
roll are not containing information about processes they've been created for. 


was (Author: allgoodok):
I think it doesn't work correctly. I am adding pid to file pattern.

{code}
appender.DRFA.filePattern = 
${sys:hive.log.dir}/${sys:hive.log.file}.%d{-MM-dd-HH-mm}.%pid
{code}
When it's time to roll it just takes whole hive.log and renames it as described 
above. hive.log contains logs from different services so the logs create on 
roll are not containing information about processes they've been created for. 

> Replace PidDailyRollingFileAppender with equivalent log4j2 implementation
> -
>
> Key: HIVE-11309
> URL: https://issues.apache.org/jira/browse/HIVE-11309
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11309.patch
>
>
> PidDailyRollingFileAppender appends pid@hostname information to file name 
> output. Similar thing can be achieved by adding a custom file pattern 
> converter in log4j2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-11309) Replace PidDailyRollingFileAppender with equivalent log4j2 implementation

2017-08-29 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145356#comment-16145356
 ] 

Vlad Gudikov edited comment on HIVE-11309 at 8/29/17 2:12 PM:
--

I think it doesn't work correctly. I am adding pid to file pattern.

{code}
appender.DRFA.filePattern = 
${sys:hive.log.dir}/${sys:hive.log.file}.%d{-MM-dd-HH-mm}.%pid
{code}
When it's time to roll it just takes whole hive.log and renames it as described 
above. hive.log contains logs from different services so the logs create on 
roll are not containing information about processes they've been created for. 


was (Author: allgoodok):
I think it doesn't work correctly. I am adding pid to file pattern.

appender.DRFA.filePattern = 
${sys:hive.log.dir}/${sys:hive.log.file}.%d{-MM-dd-HH-mm}.%pid

When it's time to roll it just takes whole hive.log and renames it as described 
above. hive.log contains logs from different services so the logs create on 
roll are not containing information about processes they've been created for. 

> Replace PidDailyRollingFileAppender with equivalent log4j2 implementation
> -
>
> Key: HIVE-11309
> URL: https://issues.apache.org/jira/browse/HIVE-11309
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11309.patch
>
>
> PidDailyRollingFileAppender appends pid@hostname information to file name 
> output. Similar thing can be achieved by adding a custom file pattern 
> converter in log4j2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11309) Replace PidDailyRollingFileAppender with equivalent log4j2 implementation

2017-08-29 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145356#comment-16145356
 ] 

Vlad Gudikov commented on HIVE-11309:
-

I think it doesn't work correctly. I am adding pid to file pattern.

appender.DRFA.filePattern = 
${sys:hive.log.dir}/${sys:hive.log.file}.%d{-MM-dd-HH-mm}.%pid

When it's time to roll it just takes whole hive.log and renames it as described 
above. hive.log contains logs from different services so the logs create on 
roll are not containing information about processes they've been created for. 

> Replace PidDailyRollingFileAppender with equivalent log4j2 implementation
> -
>
> Key: HIVE-11309
> URL: https://issues.apache.org/jira/browse/HIVE-11309
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11309.patch
>
>
> PidDailyRollingFileAppender appends pid@hostname information to file name 
> output. Similar thing can be achieved by adding a custom file pattern 
> converter in log4j2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17346) TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning] is failing every time

2017-08-17 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131009#comment-16131009
 ] 

Vlad Gudikov commented on HIVE-17346:
-

Yeah, it was an intended change, missed this one. Thanks!

> TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning] is failing 
> every time
> ---
>
> Key: HIVE-17346
> URL: https://issues.apache.org/jira/browse/HIVE-17346
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-17346.patch
>
>
> The TestMiniSparkOnYarnCliDriver.testCliDriver - 
> spark_dynamic_partition_pruning is failing with this diff:
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 714c714
> <   filterExpr: ((date = '2008-04-08') and abs(((- 
> UDFToLong(concat(UDFToString(day(ds)), '0'))) + 10)) is not null) (type: 
> boolean)
> ---
> >   filterExpr: ((date = '2008-04-08') and ds is not null) 
> > (type: boolean)
> 717c717
> < predicate: ((date = '2008-04-08') and abs(((- 
> UDFToLong(concat(UDFToString(day(ds)), '0'))) + 10)) is not null) (type: 
> boolean)
> ---
> > predicate: ((date = '2008-04-08') and ds is not null) 
> > (type: boolean)
> 749c749
> <   filterExpr: abs(((- 
> UDFToLong(concat(UDFToString(day(ds)), '0'))) + 10)) is not null (type: 
> boolean)
> ---
> >   filterExpr: ds is not null (type: boolean)
> 751,752c751,753
> <   Filter Operator
> < predicate: abs(((- 
> UDFToLong(concat(UDFToString(day(ds)), '0'))) + 10)) is not null (type: 
> boolean)
> ---
> >   Select Operator
> > expressions: ds (type: string)
> > outputColumnNames: _col0
> 754,756c755,758
> < Select Operator
> <   expressions: ds (type: string)
> <   outputColumnNames: _col0
> ---
> > Reduce Output Operator
> >   key expressions: abs(((- 
> > UDFToLong(concat(UDFToString(day(_col0)), '0'))) + 10)) (type: bigint)
> >   sort order: +
> >   Map-reduce partition columns: abs(((- 
> > UDFToLong(concat(UDFToString(day(_col0)), '0'))) + 10)) (type: bigint)
> 758,762d759
> <   Reduce Output Operator
> < key expressions: abs(((- 
> UDFToLong(concat(UDFToString(day(_col0)), '0'))) + 10)) (type: bigint)
> < sort order: +
> < Map-reduce partition columns: abs(((- 
> UDFToLong(concat(UDFToString(day(_col0)), '0'))) + 10)) (type: bigint)
> < Statistics: Num rows: 2000 Data size: 21248 Basic 
> stats: COMPLETE Column stats: NONE
> 767c764
> <  
> Output was too long and had to be truncated...
> {code}
> I think it is caused by:
> HIVE-17148 - Incorrect result for Hive join query with COALESCE in WHERE 
> condition
> [~allgoodok]: Am I right? Is it an intended change and only the golden file 
> regeneration is needed?
> Thanks,
> Peter



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-11 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.3.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, 
> HIVE-17148.3.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-09 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119531#comment-16119531
 ] 

Vlad Gudikov commented on HIVE-17148:
-

Uploaded new patch with fixed tests

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-09 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.2.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-08 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119445#comment-16119445
 ] 

Vlad Gudikov commented on HIVE-17148:
-

[~ashutoshc] today I will upload another patch with fixed related tests


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-03 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Open  (was: Patch Available)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-03 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Patch Available  (was: Open)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-03 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112242#comment-16112242
 ] 

Vlad Gudikov commented on HIVE-17148:
-

Added patch with testcase

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-03 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Open  (was: Patch Available)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-03 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Patch Available  (was: Open)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-03 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.1.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104919#comment-16104919
 ] 

Vlad Gudikov edited comment on HIVE-17148 at 7/28/17 1:08 PM:
--

ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
*test1(a1 int, a2 int)* and *test2(b1)*. When we execute following query 
*select * from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);* we get to 
predicates for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.



was (Author: allgoodok):
ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query *select * 
from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);* we get to predicates 
for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST 

[jira] [Comment Edited] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104919#comment-16104919
 ] 

Vlad Gudikov edited comment on HIVE-17148 at 7/28/17 1:07 PM:
--

ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query *select * 
from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);* we get to predicates 
for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.



was (Author: allgoodok):
ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query strong 
text*select * from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);*strong 
text* we get to predicates for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   

[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104919#comment-16104919
 ] 

Vlad Gudikov commented on HIVE-17148:
-

ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query strong 
text*select * from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);*strong 
text* we get to predicates for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104534#comment-16104534
 ] 

Vlad Gudikov commented on HIVE-17148:
-

Related test failures are due to changes in the plan as we do not create not 
null conjunctions for fields that are in filter but for expressions in filter 
as well.

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-27 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Status: Patch Available  (was: Open)

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-27 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-27 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov reassigned HIVE-17148:
---

Assignee: Vlad Gudikov

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-24 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Description: 
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates this problem is skipped.

  was:
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates whis problem is skipped.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO 

[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-24 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Description: 
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:(a1 is not null and b1 is not null)
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates this problem is skipped.

  was:
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates this problem is skipped.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> 

[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-21 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096357#comment-16096357
 ] 

Vlad Gudikov commented on HIVE-17148:
-

The thing is that while optimizing query HiveJoinAddNotRule which is checking 
if values that are part of filter are not null. The thing is that coalesce is 
working with null values, but tuples with null values are omitted. 

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:{color:red}(a1 is not null and b1 is not null){color}
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates whis problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-21 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Environment: (was: {color:red}colored text{color})

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:{color:red}(a1 is not null and b1 is not null){color}
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type? otherwise HiveJoinAddNotRule which 
> creates whis problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-21 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Description: 
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
creates whis problem is skipped.

  was:
The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type? otherwise HiveJoinAddNotRule which 
creates whis problem is skipped.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO 

[jira] [Updated] (HIVE-16775) Fix HiveFilterAggregateTransposeRule when filter is always false

2017-07-20 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16775:

Description: 
strong textquery4.q,query74.q
{code}
[7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
{code}

  was:
query4.q,query74.q
{code}
[7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
{code}


> Fix HiveFilterAggregateTransposeRule when filter is always false
> 
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 3.0.0
>
> Attachments: HIVE-16775.01.patch, HIVE-16775.02.patch, 
> HIVE-16775.03.patch
>
>
> strong textquery4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error

2017-07-13 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085664#comment-16085664
 ] 

Vlad Gudikov commented on HIVE-16983:
-

Maybe it is reasonable to leave version 2.8.1 as is. I haven't seen that master 
branch already has 2.8.1 version of joda time. My fault. I thought it's still 
2.5

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-branch-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error

2017-07-11 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081846#comment-16081846
 ] 

Vlad Gudikov commented on HIVE-16983:
-

Well I've tested this one with s3a://, using Impala and Hive (storing keys in 
core-site.xml). Also tested using hadoop commands by directly passing keys to 
command.

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-branch-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error C

2017-07-10 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16983:

Status: Patch Available  (was: Open)

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-branch-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error C

2017-07-10 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16983:

Status: Open  (was: Patch Available)

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-branch-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error C

2017-07-10 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16983:

Attachment: (was: HIVE-16983-brach-2.1.patch)

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error C

2017-07-10 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16983:

Attachment: HIVE-16983-branch-2.1.patch

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-branch-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code

2017-07-07 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16983:

Comment: was deleted

(was: Updating joda-time version to 2.9.9)

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-brach-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error C

2017-07-07 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16983:

Status: Patch Available  (was: Open)

The solution is to update joda-time version from 2.5 to 2.9.9

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-brach-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error C

2017-07-07 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16983:

Attachment: HIVE-16983-brach-2.1.patch

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
> Attachments: HIVE-16983-brach-2.1.patch
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error C

2017-07-07 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16983:

Status: Open  (was: Patch Available)

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error C

2017-07-07 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16983:

   Fix Version/s: 2.1.1
Target Version/s: 2.1.1
  Status: Patch Available  (was: Open)

Updating joda-time version to 2.9.9

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
> Fix For: 2.1.1
>
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error

2017-07-07 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov reassigned HIVE-16983:
---

Assignee: Vlad Gudikov

> getFileStatus on accessible s3a://[bucket-name]/folder: throws 
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon 
> S3; Status Code: 403; Error Code: 403 Forbidden;
> -
>
> Key: HIVE-16983
> URL: https://issues.apache.org/jira/browse/HIVE-16983
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.1
> Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to 
> S3 using s3a:// protocol
>Reporter: Alex Baretto
>Assignee: Vlad Gudikov
>
> I've followed various published documentation on integrating Apache Hive 
> 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` 
> and 
> `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and 
> `hive/conf/hive-site.xml`.
> I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` 
> to work properly (it returns s3 ls of that bucket). So I know my creds, 
> bucket access, and overall Hadoop setup is valid. 
> hdfs dfs -ls s3a://[bucket-name]/
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files
> ...etc. 
> hdfs dfs -ls s3a://[bucket-name]/files
> 
> drwxrwxrwx   - hdfs hdfs  0 2017-06-27 22:43 
> s3a://[bucket-name]/files/my-csv.csv
> However, when I attempt to access the same s3 resources from hive, e.g. run 
> any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION 
> 's3a://[bucket-name]/files/'`, it fails. 
> for example:
> >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, 
> >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED 
> >BY ',' LOCATION 's3a://[bucket-name]/files/';
> I keep getting this error:
> >FAILED: Execution Error, return code 1 from 
> >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus 
> >on s3a://[bucket-name]/files: 
> >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
> >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
> >C9CF3F9C50EF08D1), S3 Extended Request ID: 
> >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=)
> This makes no sense. I have access to the bucket as one can see in the hdfs 
> test. And I've added the proper creds to hive-site.xml. 
> Anyone have any idea what's missing from this equation?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17014) Password File Encryption for HiveServer2 Client

2017-07-06 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076563#comment-16076563
 ] 

Vlad Gudikov commented on HIVE-17014:
-

Attached document with possible ways to implement this feature. 

As [~lmccay] commented in [HIVE-17014]. - "we may want to consider the use of 
the CredentialProvider API that will be committed soon.
See [HADOOP-10607]. This isn't mutually exclusive with the password file 
approach as there are plans to fallback to existing password files in certain 
components. However, the abstraction of the API is best realized through the 
new Configuration.getPassword(String name) method. This will allow you to ask 
for a configuration item that you know is a password and it will check for an 
aliased credential based on the name through the CredentialProvider API. If the 
name is not resolved into a credential from a provider then it falls back to 
the config file."

Would be happy to discuss this approach with other members.

> Password File Encryption for HiveServer2 Client
> ---
>
> Key: HIVE-17014
> URL: https://issues.apache.org/jira/browse/HIVE-17014
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Fix For: 2.1.2
>
> Attachments: PasswordFileEncryption.docx.pdf
>
>
> The main point of this file is to encrypt password file that is used for 
> beeline connection using -w key. Any ideas or proposals would be great.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17014) Password File Encryption for HiveServer2 Client

2017-07-06 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17014:

Attachment: PasswordFileEncryption.docx.pdf

This document describes possible ways of implementing password file encryption 
feature

> Password File Encryption for HiveServer2 Client
> ---
>
> Key: HIVE-17014
> URL: https://issues.apache.org/jira/browse/HIVE-17014
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Fix For: 2.1.2
>
> Attachments: PasswordFileEncryption.docx.pdf
>
>
> The main point of this file is to encrypt password file that is used for 
> beeline connection using -w key. Any ideas or proposals would be great.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17014) Password File Encryption for HiveServer2 Client

2017-07-06 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17014:

Summary: Password File Encryption for HiveServer2 Client  (was: Password 
File Encryption)

> Password File Encryption for HiveServer2 Client
> ---
>
> Key: HIVE-17014
> URL: https://issues.apache.org/jira/browse/HIVE-17014
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vlad Gudikov
>
> The main point of this file is to encrypt password file that is used for 
> beeline connection using -w key. Any ideas or proposals would be great.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16674) Hive metastore JVM dumps core

2017-05-23 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021278#comment-16021278
 ] 

Vlad Gudikov commented on HIVE-16674:
-

Are these failures related to fix?

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 2.3.0
>
> Attachments: HIVE-16674.1.patch, HIVE-16674.patch
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16674) Hive metastore JVM dumps core

2017-05-23 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16674:

Attachment: HIVE-16674.1.patch

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 2.3.0
>
> Attachments: HIVE-16674.1.patch, HIVE-16674.patch
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16674) Hive metastore JVM dumps core

2017-05-23 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16674:

Status: Patch Available  (was: In Progress)

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
>Priority: Blocker
> Fix For: 2.3.0, 1.2.1
>
> Attachments: HIVE-16674.patch
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16674) Hive metastore JVM dumps core

2017-05-23 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-16674:

Attachment: HIVE-16674.patch

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 2.3.0
>
> Attachments: HIVE-16674.patch
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16674) Hive metastore JVM dumps core

2017-05-23 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16674 started by Vlad Gudikov.
---
> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 2.3.0
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16674) Hive metastore JVM dumps core

2017-05-23 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov reassigned HIVE-16674:
---

Assignee: Vlad Gudikov

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 2.3.0
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16674) Hive metastore JVM dumps core

2017-05-18 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015781#comment-16015781
 ] 

Vlad Gudikov edited comment on HIVE-16674 at 5/18/17 2:00 PM:
--

Most of the rpc call in MetaStore are of fairly small payload. But in this case 
we get more than 256 mb of data  while calling get_partitions method. It is so 
due to getting all information about partitions including column level 
comments. Do we actually need them while getting partitions, because they are 
duplicated for each partition? Here is the code where we get column comments. 
Do we actually need them while getting information about partitions?

{code}
// Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
  // We are skipping the CDS table here, as it seems to be totally useless.
  queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\", 
\"TYPE_NAME\""
  + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
  + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
  loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc() {
@Override
public void apply(List t, Object[] fields) {
  t.add(new FieldSchema((String)fields[2], (String)fields[3], 
(String)fields[1]));
}});
}
{code}


was (Author: allgoodok):
Most of the rpc call in MetaStore are of fairly small payload. But in this case 
we get more than 256 mb of data  while calling get_partitions method. It is so 
due to getting all information about partitions including column level 
comments. Do we actually need them while getting partitions, because they are 
duplicated for each partition? Here is the code where we get column comments. 
Do we actuualy need them while getting information about partitions?

{code}
// Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
  // We are skipping the CDS table here, as it seems to be totally useless.
  queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\", 
\"TYPE_NAME\""
  + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
  + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
  loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc() {
@Override
public void apply(List t, Object[] fields) {
  t.add(new FieldSchema((String)fields[2], (String)fields[3], 
(String)fields[1]));
}});
}
{code}

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 2.3.0
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - 

[jira] [Comment Edited] (HIVE-16674) Hive metastore JVM dumps core

2017-05-18 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015781#comment-16015781
 ] 

Vlad Gudikov edited comment on HIVE-16674 at 5/18/17 1:57 PM:
--

Most of the rpc call in MetaStore are of fairly small payload. But in this case 
we get more than 256 mb of data  while calling get_partitions method. It is so 
due to getting all information about partitions including column level 
comments. Do we actually need them while getting partitions, because they are 
duplicated for each partition? Here is the code where we get column comments. 
Do we actuualy need them while getting information about partitions?

{code}
// Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
  // We are skipping the CDS table here, as it seems to be totally useless.
  queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\", 
\"TYPE_NAME\""
  + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
  + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
  loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc() {
@Override
public void apply(List t, Object[] fields) {
  t.add(new FieldSchema((String)fields[2], (String)fields[3], 
(String)fields[1]));
}});
}
{code}


was (Author: allgoodok):
Most of the rpc call in MetaStore are of fairly small payload. But in this case 
we get more than 256 mb of data  while calling get_partitions method. It is so 
due to getting all information about partitions including column level 
comments. Do we actually need them while getting partitions, because they are 
duplicated for each partition? Here is the code where we get column comments. 
Do we actuualy need them while getting information about partitions?

{code}
// Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
  // We are skipping the CDS table here, as it seems to be totally useless.
  queryText = "select \"CD_ID\", {color:red}\"COMMENT\"{color}, 
\"COLUMN_NAME\", \"TYPE_NAME\""
  + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
  + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
  loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc() {
@Override
public void apply(List t, Object[] fields) {
  t.add(new FieldSchema((String)fields[2], (String)fields[3], 
(String)fields[1]));
}});
}
{code}

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 2.3.0
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - 

[jira] [Commented] (HIVE-16674) Hive metastore JVM dumps core

2017-05-18 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015781#comment-16015781
 ] 

Vlad Gudikov commented on HIVE-16674:
-

Most of the rpc call in MetaStore are of fairly small payload. But in this case 
we get more than 256 mb of data  while calling get_partitions method. It is so 
due to getting all information about partitions including column level 
comments. Do we actually need them while getting partitions, because they are 
duplicated for each partition? Here is the code where we get column comments. 
Do we actuualy need them while getting information about partitions?

{code}
// Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
  // We are skipping the CDS table here, as it seems to be totally useless.
  queryText = "select \"CD_ID\", {color:red}\"COMMENT\"{color}, 
\"COLUMN_NAME\", \"TYPE_NAME\""
  + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
  + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
  loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc() {
@Override
public void apply(List t, Object[] fields) {
  t.add(new FieldSchema((String)fields[2], (String)fields[3], 
(String)fields[1]));
}});
}
{code}

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 2.3.0
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)