Gabriel Reid created PHOENIX-1227:
-------------------------------------

             Summary: Upsert select of binary data doesn't always correctly 
coerce data into correct format
                 Key: PHOENIX-1227
                 URL: https://issues.apache.org/jira/browse/PHOENIX-1227
             Project: Phoenix
          Issue Type: Bug
            Reporter: Gabriel Reid


If you run an upsert select statement that selects a binary value and writes a 
numerical value (or probably other types as well), you can end up with invalid 
binary values stored in HBase.

For example, in something like this if v is an {{INTEGER}} column:
{code}UPSERT INTO MYTABLE (v) SELECT MD5(v) FROM MYTABLE{code}

the literal 16-byte binary values from the MD5 function will be added verbatim 
into the field v. 

This is a really big problem if v is the key field, as it can even lead to 
multiple keys with what appear to be the same value. This happens if there are 
multiple (invalid) row keys that begin with the same 4 bytes, as only the first 
4 bytes of the key will be shown when selecting data from the column, but the 
different full-length values of the row keys will lead to multiple records.

Somewhat related to this, a statement like the following (with a constant 
binary value) will fail immediately due to datatype mismatch:
{code}UPSERT INTO MYTABLE (v) SELECT MD5(1) FROM MYTABLE{code}

It seems that the first expression above should probably fail in the same way 
as the expression with the constant binary value (or neither of them should 
fail). Obviously there shouldn't be any invalid values being written in to 
HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to