Gabriel Reid created PHOENIX-1227:
-------------------------------------
Summary: Upsert select of binary data doesn't always correctly
coerce data into correct format
Key: PHOENIX-1227
URL: https://issues.apache.org/jira/browse/PHOENIX-1227
Project: Phoenix
Issue Type: Bug
Reporter: Gabriel Reid
If you run an upsert select statement that selects a binary value and writes a
numerical value (or probably other types as well), you can end up with invalid
binary values stored in HBase.
For example, in something like this if v is an {{INTEGER}} column:
{code}UPSERT INTO MYTABLE (v) SELECT MD5(v) FROM MYTABLE{code}
the literal 16-byte binary values from the MD5 function will be added verbatim
into the field v.
This is a really big problem if v is the key field, as it can even lead to
multiple keys with what appear to be the same value. This happens if there are
multiple (invalid) row keys that begin with the same 4 bytes, as only the first
4 bytes of the key will be shown when selecting data from the column, but the
different full-length values of the row keys will lead to multiple records.
Somewhat related to this, a statement like the following (with a constant
binary value) will fail immediately due to datatype mismatch:
{code}UPSERT INTO MYTABLE (v) SELECT MD5(1) FROM MYTABLE{code}
It seems that the first expression above should probably fail in the same way
as the expression with the constant binary value (or neither of them should
fail). Obviously there shouldn't be any invalid values being written in to
HBase.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)