[jira] [Commented] (PHOENIX-4283) Group By statement truncating BIGINTs

Ethan Wang (JIRA) Sun, 15 Oct 2017 01:49:28 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205073#comment-16205073
 ]


Ethan Wang commented on PHOENIX-4283:
-------------------------------------

Thanks [~jamestaylor] for the direction. It turns out, as you directed, between 
rs.next() and rs.getValue(), a byte array operation mistakenly cutting a 
serialized DECIMAL ImmutableBytesWritable from 11B to 8B, dropping 3B in the 
tail, which responsible for 6 last digits data loss. This cut-off happened at 
PLong.coerceBytes() when it try to "Decrease size of TIMESTAMP to size of LONG 
and continue coerce".

As for details, here is what happened (correct me). In the case of a nested 
group by, at very beginning the subquery (select A, C from table grouby A, C) 
as a whole is returned from server as a tuple. Here as "A" is a 
CoercedExpression, it first gets converted it to its child, 
ProjectedColumnExpression. This converts A from BIGINT to DECIMAL which is then 
serialized into a byte array("so that it is binary comparable"). That's when 
the 8B BigDecimal becomes a 11B byte array.

This byte array gets passed along to the moment when user calls rs.getValue, 
"A" as a RowKeyColoumnExpression gets evaluated, that's where this serialized 
DECIMAL is converted back to BIGINT. during this converting, PLong overrides 
coerceBytes() and injected a cut-off there. In this cut-off, anything that is 
larger than Bytes.SIZEOF_LONG (which is 8B) will be reduce to first 8B. 
Therefore the last 3B is lost. So in summary:

Server -> BIGINT 8B ->  BIGDECIMAL byte[] 11B -> BIGINT 8B

Let me know your thoughts [~jamestaylor] then I'd like to discuss a thought for 
a patch.

> Group By statement truncating BIGINTs
> -------------------------------------
>
>                 Key: PHOENIX-4283
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4283
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0
>            Reporter: Steven Sadowski
>            Assignee: Ethan Wang
>             Fix For: 4.12.1
>
>
> *Versions:*
> Phoenix 4.11.0
> HBase: 1.3.1
> (Amazon EMR: 5.8.0)
> *Steps to reproduce:*
> 1. From the `sqlline-thin.py` client setup the following table:
> {code:sql}
> CREATE TABLE test_table (
>     a BIGINT NOT NULL, 
>     c BIGINT NOT NULL
>     CONSTRAINT PK PRIMARY KEY (a, c)
> );
> UPSERT INTO test_table(a,c) VALUES(4444444444444444444, 5555555555555555555);
> SELECT a FROM (SELECT a, c FROM test_table GROUP BY a, c) GROUP BY a, c;
> {code}
> *Expected Result:*
> {code:sql}
> +----------------------+
> |          A           |
> +----------------------+
> | 4444444444444444444  |
> +----------------------+
> {code}
> *Actual Result:*
> {code:sql}
> +----------------------+
> |          A           |
> +----------------------+
> | 4444444444444000000  |
> +----------------------+
> {code}
> *Comments:*
> Having the two Group By statements together seems to truncate the last 6 or 
> so digits of the final result. Removing the outer (or either) group by will 
> produce the correct result.
> Please fix the Group by statement to not truncate the outer result's value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PHOENIX-4283) Group By statement truncating BIGINTs

Reply via email to