[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847700#comment-13847700
 ] 

Xuefu Zhang commented on HIVE-6028:
---

If I understand correctly, the "workaround" mentioned is what a user is 
expected to be doing. The partition key has string as type, so the constant 
value for the key should be string literals instead of integers. Further, Hive 
allows implicit conversions, such the case demonstrated, which is common in 
many DBs. Disallowing that is too restrictive and less usable.

In my opinion, user should be aware of the implicit conversion and its 
consequence. If not sure, match the type explicitly.

> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-13 Thread Pala M Muthaia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847924#comment-13847924
 ] 

Pala M Muthaia commented on HIVE-6028:
--

Xuefu, you are right that using string literal is the correct way. 

I suppose previously implicit conversion was actually supported in some sense, 
because the literal 01 was actually treated as string. However, in 0.12, even 
though hour partition column is STRING, when i specify hour=01, the literal 01 
is not converted implicitly into string, but instead is treated as int.

i digged into this a bit more and suspect that this change in behavior is 
related to commit for HIVE-2702, which adds support for integral partition 
columns. Previously, all partition filter value literals were always treated as 
string. Now, both integral and string types are supported.

I think the best fix would be to support implicit conversion behavior again. At 
least, a type check and subsequent error should be thrown, so that when user 
specifies hour=01, query fails and user can fix his/her query, though this will 
be more disruptive change for end users.



> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847992#comment-13847992
 ] 

Xuefu Zhang commented on HIVE-6028:
---

[~pala] Thanks for the clarification. It seems there is some issue about data 
conversion. Your partition column has a type of string, so, if implicit 
conversion is due, the integer literals should be converted to strings. It's 
wrong to treat your partition column as integer type if that's what's 
happening. Without further investigation, it's hard to draw conclusion yet. 
Throwing a semantic error for a case like this is probably something Hive 
shouldn't do, as hr=02 is a valid expression because the data types can be 
implicitly converted.

This seems to be bug somewhere, which needs more research. If you'd like to 
work on this, please feel free to submit a patch. Otherwise, as a workaround, 
you may just use string literals for now.

> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848003#comment-13848003
 ] 

Sergey Shelukhin commented on HIVE-6028:


hmm, I cannot repro this on trunk, Filter is not added to the plan for either 
"'00'" or just "00" (w/o quotes)
I think Filter might actually be culprit, rather than partition filtering. Let 
me try something...



> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848009#comment-13848009
 ] 

Sergey Shelukhin commented on HIVE-6028:


No, filter works (on trunk) with non-partition string column. It has the same 
condition (expr: (col1 = 1)), but it matches 01.
[~pala] can you attach debug logs from client and metastore?

> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848079#comment-13848079
 ] 

Sergey Shelukhin commented on HIVE-6028:


Yeah nm I built 12 and can repro. Yeah it seems to be the root cause of the 
issue; 12 does validation on the client and apparently doesn't check that types 
are compatible, only that both are supported (this is the root cause). Then it 
uses getPartitionsByFilter to query partitions and filter being sent 
(expression.getExprString) is changed to "hour = 0", which fails.
Of course you could say that SemanticAnalyzer should have caught this earlier, 
what is 00? It doesn't look like it should be treated like a valid string to me.

Trunk uses getPartitionsByExpr, and validates on server that column and value 
type are compatible. As far as I understand hive doesn't do sub-releases; how 
should this jira be handled, closed as already fixed in 13? 
Or do we want to change SemanticAnalyzer to be less lenient?



> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-13 Thread Pala M Muthaia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848120#comment-13848120
 ] 

Pala M Muthaia commented on HIVE-6028:
--

Thanks for following up Sergey Shelukhin. Regarding 00, its just a convention 
we follow in partition names, hour=00 to hour=23. 

Is there a commit in trunk that we could cherry pick and apply on Hive 0.12 to 
fix this issue?

> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848123#comment-13848123
 ] 

Sergey Shelukhin commented on HIVE-6028:


Wrt 00, my point is that 00 is not a valid string so it should not work.
Depending on opinion it should either give an error (string-int compare), or 
not match (as it doesn't in 0.12) because column type is string, and 00 to me 
is an integer 0, so if type is coerced to type of the column, '00' != '0' . 
Can you work around by using properly quoted strings, e.g. '00'?

As for commit, that would be HIVE-4914, but it's quite a large one, it may have 
some dependencies and bugfixes.


> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848125#comment-13848125
 ] 

Sergey Shelukhin commented on HIVE-6028:


E.g. postgres:

{noformat}
sergey=# select * from foo where s = 01;
ERROR:  operator does not exist: character varying = integer
LINE 1: select * from foo where s = 01;
  ^
HINT:  No operator matches the given name and argument type(s). You might need 
to add explicit type casts.
{noformat}



> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-17 Thread Pala M Muthaia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850984#comment-13850984
 ] 

Pala M Muthaia commented on HIVE-6028:
--

Sergey, the same thing above works in hive 12, for a regular string column (as 
opposed to partition column). 

In any case, given the cost of fix vs severity, we will avoid depending on type 
coercion and use proper literals.

> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851006#comment-13851006
 ] 

Sergey Shelukhin commented on HIVE-6028:


Yeah, I agree that this is breakage in 12 compared to 11. Sorry for that. Good 
to know that the workaround works.

I will resolve as dup of 4914, as the fix is contained therein.

> Partition predicate literals are not interpreted correctly.
> ---
>
> Key: HIVE-6028
> URL: https://issues.apache.org/jira/browse/HIVE-6028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Pala M Muthaia
> Attachments: Hive-6028-explain-plan.txt
>
>
> When parsing/analyzing query, hive treats partition predicate value as int 
> instead of string. This breaks down and leads to incorrect result when the 
> partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
> The following repro illustrates the bug:
> -- create test table and partition, populate with some data
> create table test_partition_pred(col1 int) partitioned by (hour STRING);
> insert into table test_partition_pred partition (hour=00) select 21 FROM  
> some_table limit 1;
> -- this query returns incorrect results, i.e. just empty set.
> select * from test_partition_pred where hour=00;
> OK
> -- this query returns correct result. Note predicate value is string literal
> select * from test_partition_pred where hour='00';
> OK
> 2100
> explain plan illustrates how the query was interpreted. Particularly the 
> partition predicate is pushed down as regular filter clause, with hour=0 as 
> predicate. See attached explain plan file.
> Note:
> 1. The type of the partition column is defined as string, not int.
> 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
> 3. Not an issue when the partition value starts with integer other than 0, 
> e.g hour=10, hour=11 etc.
> 4. As seen above, workaround is to use string literal hour='00' etc.
> This should not be too bad if in the failing case hive complains that 
> partition hour=0 is not found, or complains literal type doesn't match column 
> type. Instead hive silently pushes it down as filter clause, and query 
> succeeds with empty set as result.
> We found this out in our production tables partitioned by hour, only a few 
> days after it started occurring, when there were empty data sets for 
> partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)