[
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15787222#comment-15787222
]
chenglei edited comment on PHOENIX-3453 at 12/30/16 1:24 PM:
-------------------------------------------------------------
I wrote following test case to make this problem can be reproduced under 4.9.0,
simplifying the original test case by removing the index table and change the
type from CHAR(15) to Integer, which is more easier to debug:
{code:borderStyle=solid}
CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID
DESC,SCORE DESC)
)
UPSERT INTO GROUPBY3453_INT VALUES (1,1,1)
select DISTINCT entity_id, score from ( select entity_id, score from
GROUPBY3453_INT limit 1)
{code}
the expecting result is :
{code:borderStyle=solid}
1 1
{code}
but the actual result is:
{code:borderStyle=solid}
-104 1
{code}
This problem can only be reproduced when the SQL has a SubQuery.
When I debug into the source code,I find the cause of the problem is the
distinct(or group by) statement in the outer query.By the following code in
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's
expressions is ProjectedColumnExpression,but in line 245, the "entity" column
in GroupBy's keyExpressions is CoerceExpression wrapping
ProjectedColumnExpression ,which convert the the ProjectedColumnExpression
from PInteger to PDecimal:
{code:borderStyle=solid}
232 for (int i = expressions.size()-2; i >= 0; i--) {
233 Expression expression = expressions.get(i);
234 PDataType keyType = getGroupByDataType(expression);
235 if (keyType == expression.getDataType()) {
236 continue;
237 }
238 // Copy expressions only when keyExpressions will be
different than expressions
239 if (keyExpressions == expressions) {
240 keyExpressions = new ArrayList<Expression>(expressions);
241 }
242 // Wrap expression in an expression that coerces the
expression to the required type..
243 // This is done so that we have a way of expressing null as
an empty key when more
244 // than one fixed and nullable types are used in a group by
clause
245 keyExpressions.set(i, CoerceExpression.create(expression,
keyType));
246 }
{code}
When I look into CoerceExpression.create method,in line 68 of the following
code I observe that the SortOder of the CoerceExpression is
SortOrder.getDefault(),which is SortOrder.ASC, but it ought to be
SortOrder.DESC,because the SortOder of ProjectedColumnExpression is
SortOrder.DESC:
{code:borderStyle=solid}
46 public static Expression create(Expression expression, PDataType toType)
throws SQLException {
47 if (toType == expression.getDataType()) {
48 return expression;
49 }
50 return new CoerceExpression(expression, toType);
51 }
......
66 //Package protected for tests
67 CoerceExpression(Expression expression, PDataType toType) {
68 this(expression, toType, SortOrder.getDefault(), null, true);
69 }
{code}
So when we get the query results, in
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the
following PDecimal.coerceBytes method to get the coerceBytes of PDecimal,
noticed that actualType parameter is PInteger,actualModifier parameter is
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line
842,we get the
PInteger "1" from the ptr which is got from the HBase RegionServer, and in
line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we
encode the PDecimal "1" to bytes, but because the expectedModifier parameter
is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC.That is to
say,the SortOrder of the groupBy key got from
ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.ASC.
{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType
actualType, Integer actualMaxLength,
827 Integer actualScale, SortOrder actualModifier, Integer
desiredMaxLength, Integer desiredScale,
828 SortOrder expectedModifier) {
......
840 // Optimization for cases in which we already have the object around
841 if (o == null) {
842 o = actualType.toObject(ptr, actualType, actualModifier);
843 }
844
845 o = toObject(o, actualType);
846 byte[] b = toBytes(o, expectedModifier);
847 ptr.set(b);
848 }
{code}
Unfortunately, finally in following PhoenixResult.getObject method, when we
invoke the ColumnProjector.getValue method in line 524, the ColumnProjector's
Expression is RowKeyColumnExpression,which thinks the SortOrder of groupBy key
got from the above-mentioned
ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.DESC,
so it decodes the bytes by SortOrder.DESC,however,actually the SortOrder of
bytes is SortOrder.ASC,so the error arises.
{code:borderStyle=solid}
521 public Object getObject(int columnIndex) throws SQLException {
522 checkCursorState();
523 ColumnProjector projector =
rowProjector.getColumnProjector(columnIndex-1);
524 Object value = projector.getValue(currentRow,
projector.getExpression().getDataType(), ptr);
525 wasNull = (value == null);
526 return value;
527 }
{code}
was (Author: comnetwork):
I wrote following test case to make this problem can be reproduced under 4.9.0,
simplifying the original test case by removing the index table and change the
type from CHAR(15) to Integer, which is more easier to debug:
{code:borderStyle=solid}
CREATE TABLE GROUPBY3453_INT (
ENTITY_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (ENTITY_ID DESC,CONTAINER_ID
DESC,SCORE DESC)
)
UPSERT INTO GROUPBY3453_INT VALUES (1,1,1)
select DISTINCT entity_id, score from ( select entity_id, score from
GROUPBY3453_INT limit 1)
{code}
the expecting result is :
{code:borderStyle=solid}
1 1
{code}
but the actual result is:
{code:borderStyle=solid}
-104 1
{code}
This problem can only be reproduced when the SQL has a SubQuery.
When I debug into the source code,I find the cause of the problem is the
distinct(or group by) statement in the outer query.By the following code in
GroupByCompiler.GroupBy.compile method, the "entity" column in GroupBy's
expressions is ProjectedColumnExpression,but in line 245, the "entity" column
in GroupBy's keyExpressions is CoerceExpression wrapping
ProjectedColumnExpression ,which convert the the ProjectedColumnExpression
from PInteger to PDecimal:
{code:borderStyle=solid}
232 for (int i = expressions.size()-2; i >= 0; i--) {
233 Expression expression = expressions.get(i);
234 PDataType keyType = getGroupByDataType(expression);
235 if (keyType == expression.getDataType()) {
236 continue;
237 }
238 // Copy expressions only when keyExpressions will be
different than expressions
239 if (keyExpressions == expressions) {
240 keyExpressions = new ArrayList<Expression>(expressions);
241 }
242 // Wrap expression in an expression that coerces the
expression to the required type..
243 // This is done so that we have a way of expressing null as
an empty key when more
244 // than one fixed and nullable types are used in a group by
clause
245 keyExpressions.set(i, CoerceExpression.create(expression,
keyType));
246 }
{code}
When I look into CoerceExpression.create method,in line 68 of the following
code I observe that the SortOder of the CoerceExpression is SortOrder.ASC, but
it ought to be SortOrder.DESC,because the SortOder of ProjectedColumnExpression
is SortOrder.DESC:
{code:borderStyle=solid}
46 public static Expression create(Expression expression, PDataType toType)
throws SQLException {
47 if (toType == expression.getDataType()) {
48 return expression;
49 }
50 return new CoerceExpression(expression, toType);
51 }
......
66 //Package protected for tests
67 CoerceExpression(Expression expression, PDataType toType) {
68 this(expression, toType, SortOrder.getDefault(), null, true);
69 }
{code}
So when we get the query results, in
ClientGroupedAggregatingResultIterator.getGroupingKey method, we invoke the
following PDecimal.coerceBytes method to get the coerceBytes of PDecimal,
noticed that actualType parameter is PInteger,actualModifier parameter is
SortOrder.DESC,and expectedModifier parameter is SortOrder.ASC, so in line
842,we get the
PInteger "1" from the ptr which is got from the HBase RegionServer, and in
line 845,we convert the PInteger "1" to PDecimal "1", last in line 846, we
encode the PDecimal "1" to bytes, but because the expectedModifier parameter
is SortOrder.ASC, so the PDecimal "1" is encoded by SortOrder.ASC.That is to
say,the SortOrder of the groupBy key got from
ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.ASC.
{code:borderStyle=solid}
826 public void coerceBytes(ImmutableBytesWritable ptr, Object o, PDataType
actualType, Integer actualMaxLength,
827 Integer actualScale, SortOrder actualModifier, Integer
desiredMaxLength, Integer desiredScale,
828 SortOrder expectedModifier) {
......
840 // Optimization for cases in which we already have the object around
841 if (o == null) {
842 o = actualType.toObject(ptr, actualType, actualModifier);
843 }
844
845 o = toObject(o, actualType);
846 byte[] b = toBytes(o, expectedModifier);
847 ptr.set(b);
848 }
{code}
Unfortunately, finally in following PhoenixResult.getObject method, when we
invoke the ColumnProjector.getValue method in line 524, the ColumnProjector's
Expression is RowKeyColumnExpression,which thinks the SortOrder of groupBy key
got from the above-mentioned
ClientGroupedAggregatingResultIterator.getGroupingKey method is SortOrder.DESC,
so it decodes the bytes by SortOrder.DESC,however,actually the SortOrder of
bytes is SortOrder.ASC,so the error arises.
{code:borderStyle=solid}
521 public Object getObject(int columnIndex) throws SQLException {
522 checkCursorState();
523 ColumnProjector projector =
rowProjector.getColumnProjector(columnIndex-1);
524 Object value = projector.getValue(currentRow,
projector.getExpression().getDataType(), ptr);
525 wasNull = (value == null);
526 return value;
527 }
{code}
> Secondary index and query using distinct: Outer query results in ERROR 201
> (22000): Illegal data. CHAR types may only contain single byte characters
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.8.0, 4.9.0
> Reporter: Joel Palmert
> Assignee: chenglei
> Attachments: PHOENIX-3453_v1.patch
>
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ��������������� 1.1
> If you run it in SQuirreL it results in the entity_id column getting the
> above error value. Notice that if you remove the secondary index or DISTINCT
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types
> may only contain single byte characters (????????????)
> at
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)