[ 
https://issues.apache.org/jira/browse/KUDU-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451541#comment-16451541
 ] 

Grant Henke commented on KUDU-2416:
-----------------------------------

A quick look at PartialRow.setMin shows it is used in 3 places:
 * *_PartialRow.decodeRangePartitionKey_*
 ** When the buffer being decoded doesn't contain the column
 ** This is used in _Partition.formatRangePartition which is only used by 
Impala and should always have all the columns in the buffer given_ 
 * *_PartitionPruner.pushPredsIntoLowerBoundRangeKey & 
PartitionPruner.pushPredsIntoUpperBoundRangeKey_*:
 ** When the column is part of the range partition but not a part of the 
predicate
 ** These are only used in _PartitionPruner.create which is used in the scanner 
APIs_

In all cases when the decimal case falls through and addStringUtf8 is called it 
will result in an IllegalArgumentException when checking the expected column 
type. This is not great, but it's better than returning the wrong rows. 

However, when adding a unit test for PartialRow.setMin I found that since 1.0.0 
([4fd0572|https://github.com/apache/kudu/commit/4fd0572]) we have been setting 
Integer.MIN_VALUE on LONG and UNIXTIME_MICROS/TIMESTAMP columns. This is likely 
a much bigger issue given that could result in silently pruning the wrong 
partitions in the case that a LONG column is a part of the range partition but 
not a part of the predicate. I need to look at _PartitionPruner.create_ code 
more to understand the true impact. 

I will post this patch with this test and then investigate further.

 

> Incorrect fallthrough in Java PartialRow.setMin for DECIMAL times
> -----------------------------------------------------------------
>
>                 Key: KUDU-2416
>                 URL: https://issues.apache.org/jira/browse/KUDU-2416
>             Project: Kudu
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.0
>            Reporter: Todd Lipcon
>            Assignee: Grant Henke
>            Priority: Critical
>
> There's a missing 'break' statement in the following code:
> {code:java}
>       case DECIMAL:
>         ColumnTypeAttributes typeAttributes = column.getTypeAttributes();
>         addDecimal(index,
>             DecimalUtil.minValue(typeAttributes.getPrecision(), 
> typeAttributes.getScale()));
>       case STRING:
>         addStringUtf8(index, AsyncKuduClient.EMPTY_ARRAY);
>         break;
> {code}
> which I think could cause incorrect results for range partition pruning on 
> decimal columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to