[ 
https://issues.apache.org/jira/browse/PHOENIX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117344#comment-14117344
 ] 

Gabriel Reid commented on PHOENIX-1227:
---------------------------------------

[~jamestaylor] do you have an opinion on the best way to approach this? I've 
looked at it from a few different angles -- to me the one that makes the most 
sense is just to disallow {code}UPSERT INTO MYTABLE (v) SELECT MD5(v) FROM 
MYTABLE{code} due to datatype mismatch.

> Upsert select of binary data doesn't always correctly coerce data into 
> correct format
> -------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1227
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1227
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>
> If you run an upsert select statement that selects a binary value and writes 
> a numerical value (or probably other types as well), you can end up with 
> invalid binary values stored in HBase.
> For example, in something like this if v is an {{INTEGER}} column:
> {code}UPSERT INTO MYTABLE (v) SELECT MD5(v) FROM MYTABLE{code}
> the literal 16-byte binary values from the MD5 function will be added 
> verbatim into the field v. 
> This is a really big problem if v is the key field, as it can even lead to 
> multiple keys with what appear to be the same value. This happens if there 
> are multiple (invalid) row keys that begin with the same 4 bytes, as only the 
> first 4 bytes of the key will be shown when selecting data from the column, 
> but the different full-length values of the row keys will lead to multiple 
> records.
> Somewhat related to this, a statement like the following (with a constant 
> binary value) will fail immediately due to datatype mismatch:
> {code}UPSERT INTO MYTABLE (v) SELECT MD5(1) FROM MYTABLE{code}
> It seems that the first expression above should probably fail in the same way 
> as the expression with the constant binary value (or neither of them should 
> fail). Obviously there shouldn't be any invalid values being written in to 
> HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to