[ https://issues.apache.org/jira/browse/PHOENIX-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wuyang updated PHOENIX-1334: ---------------------------- Description: When I use like expression in SELECT query. It works well when I put *Chinese* characters in LIKE expression on NONE PRIMARY KEY columns . BUT when I put them in LIKE expression on *PRIMARY KEY* , it occurs an Exception: select * from "test3" where PK like '中%'; ||COLUMN_NAME||DATA_TYPE||TYPE_NAME |PK|12|VARCHAR |VAL|12|VARCHAR {quote} org.apache.phoenix.schema.IllegalDataException: CHAR types may only contain single byte characters (中) at org.apache.phoenix.schema.PDataType$2.toBytes(PDataType.java:216) at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:829) at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:349) at org.apache.phoenix.expression.LikeExpression.accept(LikeExpression.java:269) at .... {quote} the type of PRIMARY KEY and NONE PRIMARY KEY columns are all VARCHAR In the relative source code: {code} byte[] b = VARCHAR.toBytes(object); if (b.length != ((String) object).length()) { throw new IllegalDataException("CHAR types may only contain single byte characters (" + object + ")"); } {code} actually, Chinese (or other non-Latin) characters will never meet the condition b.length == ((String) object).length() . Default encode method is UTF-8. User following sentences to reappear: create table "test_c" ( pk varchar primary key , val varchar); upsert into "test_c" values ('中文','中文'); select * from "test_c" where VAL like '中%'; _// it works well until now_ select * from "test_c" where PK like '中%'; _// oops..._ was: When I use like expression in SELECT query. It works well when I put *Chinese* characters in LIKE expression on NONE PRIMARY KEY columns . BUT when I put them in LIKE expression on *PRIMARY KEY* , it occurs an Exception: select * from "test3" where PK like '中%'; {quote} org.apache.phoenix.schema.IllegalDataException: CHAR types may only contain single byte characters (中) at org.apache.phoenix.schema.PDataType$2.toBytes(PDataType.java:216) at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:829) at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:349) at org.apache.phoenix.expression.LikeExpression.accept(LikeExpression.java:269) at .... {quote} the type of PRIMARY KEY and NONE PRIMARY KEY columns are all VARCHAR In the relative source code: {code} byte[] b = VARCHAR.toBytes(object); if (b.length != ((String) object).length()) { throw new IllegalDataException("CHAR types may only contain single byte characters (" + object + ")"); } {code} actually, Chinese (or other non-Latin) characters will never meet the condition b.length == ((String) object).length() . Default encode method is UTF-8. > Issue when LIKE expression contains Chinese characters on Key column > -------------------------------------------------------------------- > > Key: PHOENIX-1334 > URL: https://issues.apache.org/jira/browse/PHOENIX-1334 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.1 > Environment: jdk 1.8 linux > Reporter: wuyang > > When I use like expression in SELECT query. It works well when I put > *Chinese* characters in LIKE expression on NONE PRIMARY KEY columns . BUT > when I put them in LIKE expression on *PRIMARY KEY* , it occurs an Exception: > select * from "test3" where PK like '中%'; > ||COLUMN_NAME||DATA_TYPE||TYPE_NAME > |PK|12|VARCHAR > |VAL|12|VARCHAR > {quote} > org.apache.phoenix.schema.IllegalDataException: CHAR types may only contain > single byte characters (中) > at org.apache.phoenix.schema.PDataType$2.toBytes(PDataType.java:216) > at > org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:829) > at > org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:349) > at > org.apache.phoenix.expression.LikeExpression.accept(LikeExpression.java:269) > > at > .... > {quote} > the type of PRIMARY KEY and NONE PRIMARY KEY columns are all VARCHAR > In the relative source code: > {code} > byte[] b = VARCHAR.toBytes(object); > if (b.length != ((String) object).length()) { > throw new IllegalDataException("CHAR types may only contain > single byte characters (" + object + ")"); > } > {code} > actually, Chinese (or other non-Latin) characters will never meet the > condition b.length == ((String) object).length() . Default encode method is > UTF-8. > User following sentences to reappear: > create table "test_c" ( pk varchar primary key , val varchar); > upsert into "test_c" values ('中文','中文'); > select * from "test_c" where VAL like '中%'; > _// it works well until now_ > select * from "test_c" where PK like '中%'; > _// oops..._ -- This message was sent by Atlassian JIRA (v6.3.4#6332)