[jira] [Comment Edited] (PHOENIX-4324) Immutable indexes are invalid with null fixed width indexed column values

James Taylor (JIRA) Tue, 14 Nov 2017 10:28:29 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251880#comment-16251880
 ]


James Taylor edited comment on PHOENIX-4324 at 11/14/17 6:27 PM:
-----------------------------------------------------------------

This has the potential to break functionality that uses the meta data in the 
SYSTEM.CATALOG to interpret the row key. The fundamental issue is that the data 
type use by the IndexMaintainer does not match the "source of truth" type in 
the system catalog. The IndexMaintainer should translate the data type using 
IndexUtil.getIndexColumnDataType(). It's unclear why this is happening for 
mutable but not immutable indexes. 

The reason we do this translation from fixed width type to variable width type 
is so that we can handle null correctly. When a KeyValue column is indexed, it 
can be absent in the data table row and thus have a null value for the column 
in the row key for the index row. Fixed width types do not have a way to 
represent null, while variable length types do - any empty byte represents null 
for all variable length types.

An example would be as follows:
{code}
CREATE IMMUTABLE TABLE T (K VARCHAR PRIMARY KEY, V1 SMALLINT);
CREATE INDEX IDX ON T(V1);
{code}
The expected row key structure of the index would be DECIMAL+VARCHAR, but 
instead the IndexMaintainer ends up writing the data as SMALLINT+VARCHAR. Since 
DECIMAL is variable length, the index row key can lead with a \0 byte for rows 
in the data table that don't have a V1 value. As a SMALLINT, I'm not entirely 
sure what would be used - maybe 2 \0 bytes.

A simple query that uses the index would still work because we always coerce 
the index data type to the data table type. In this case, due to this bug, the 
coerce would be a noop and it appears to work. 

I suspect the following types of things would not function correctly:
- queries that use IS NULL will not always work, depending on the value of V1. 
I suspect negative numbers would be problematic. Definitely a value of 
Short.MIN_VALUE would be a problem.
- queries that compute some expression with V1 instead of just selecting it 
would probably not work. For example, SELECT V1 + 5 FROM T WHERE V1 > 100.
- query more functionality that uses the index may not function correctly (as I 
believe query more relies on the data type from the index table row in system 
catalog).

There are two ways I can think of to fix the problem (and both need to be done 
IMHO):
# Support declaring non PK columns as NOT NULL for immutable tables 
(PHOENIX-2566) and modify the index meta data to use the same type as the data 
table column type. If a column is NOT NULL, then we can keep the same data type 
between the index and data table. This solution is attractive because we do not 
need to recreate the index as it's purely modifying the meta data.
# Fix the IndexMaintainer code to do the correct data type adjustment for 
immutable tables and disable (or asynchronously rebuild) all immutable indexes. 
We need to do this in a b/w compatible manner, though, so we'd need to continue 
to generate the "incorrect" row key schema in the IndexMaintainer until the 
index gets rebuilt. We could use a dynamic column in the index table header row 
in system catalog to track this.


was (Author: jamestaylor):
This has the potential to break functionality that uses the meta data in the 
SYSTEM.CATALOG to interpret the row key. The fundamental issue is that the data 
type use by the IndexMaintainer does not match the "source of truth" type in 
the system catalog. The IndexMaintainer should translate the data type using 
IndexUtil.getIndexColumnDataType(). It's unclear why this is happening for 
mutable but not immutable indexes. 

The reason we do this translation from fixed width type to variable width type 
is so that we can handle null correctly. When a KeyValue column is indexed, it 
can be absent in the data table row and thus have a null value for the column 
in the row key for the index row. Fixed width types do not have a way to 
represent null, while variable length types do - any empty byte represents null 
for all variable length types.

An example would be as follows:
{code}
CREATE IMMUTABLE TABLE T (K VARCHAR PRIMARY KEY, V1 SMALLINT);
CREATE INDEX IDX ON T(V1);
{code}
The expected row key structure of the index would be DECIMAL+VARCHAR, but 
instead the IndexMaintainer ends up writing the data as SMALLINT+VARCHAR. Since 
DECIMAL is variable length, the index row key can lead with a \0 byte for rows 
in the data table that don't have a V1 value. As a SMALLINT, I'm not entirely 
sure what would be used - maybe 2 \0 bytes.

A simple query that uses the index would still work because we always coerce 
the index data type to the data table type. In this case, due to this bug, the 
coerce would be a noop and it appears to work. 

I suspect the following types of things would not function correctly:
- queries that use IS NULL will not always work, depending on the value of V1. 
I suspect negative numbers would be problematic. Definitely a value of 
Short.MIN_VALUE would be a problem.
- queries that compute some expression with V1 instead of just selecting it 
would probably not work. For example, SELECT V1 + 5 FROM T WHERE V1 > 100.
- query more functionality that uses the index may not function correctly (as I 
believe query more relies on the data type from the index table row in system 
catalog).

> Immutable indexes are invalid with null fixed width indexed column values
> -------------------------------------------------------------------------
>
>                 Key: PHOENIX-4324
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4324
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> The data conversion for fixed width column types is not being done for 
> immutable indexes leading to erroneous results when indexed columns are 
> absent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (PHOENIX-4324) Immutable indexes are invalid with null fixed width indexed column values

Reply via email to