benj created DRILL-7493: --------------------------- Summary: convert_fromJSON and unicode Key: DRILL-7493 URL: https://issues.apache.org/jira/browse/DRILL-7493 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.16.0 Reporter: benj
transform a json string (with \uxxxx char) into json struct {code:sql} apache drill> SELECT x_str, convert_fromJSON(x_str) AS x_array FROM (SELECT '["test=\u0014=test"]' x_str); +----------------------+----------------------+ | x_str | x_array | +----------------------+----------------------+ | ["test=\u0014=test"] | ["test=\u0014=test"] | +----------------------+----------------------+ {code} Use json struct : {code:sql} apache drill> SELECT x_str , x_array , x_array[0] AS x_array0 FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array FROM (SELECT '["test=\u0014=test"]' x_str)); +----------------------+----------------------+-------------+ | x_str | x_array | x_array0 | +----------------------+----------------------+-------------+ | ["test=\u0014=test"] | ["test=\u0014=test"] | test==test | +----------------------+----------------------+-------------+ {code} Note that the char \u0014 is interpreted in x_array0 if using split function on x_array0, an array is built with non interpreted \uxxxx {code:sql} apache drill> SELECT x_str , x_array , x_array[0] AS x_array0 , split(x_array[0],',') AS x_array0_split FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array FROM (SELECT '["test=\u0014=test"]' x_str)); +----------------------+----------------------+-------------+----------------------+ | x_str | x_array | x_array0 | x_array0_split | +----------------------+----------------------+-------------+----------------------+ | ["test=\u0014=test"] | ["test=\u0014=test"] | test==test | ["test=\u0014=test"] | +----------------------+----------------------+-------------+----------------------+ {code} It's not possible to use convert_fromJSON on the interpreted \uxxxx {code:sql} SELECT x_str , x_array , x_array[0] AS x_array0 , split(x_array[0],',') AS x_array0_split , convert_fromJSON('["' || x_array[0] || '"]') AS convertJSONerror FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array FROM (SELECT '["test=\u0014=test"]' x_str)); Error: DATA_READ ERROR: Illegal unquoted character ((CTRL-CHAR, code 20)): has to be escaped using backslash to be included in string value at [Source: (org.apache.drill.exec.vector.complex.fn.DrillBufInputStream); line: 1, column: 9] {code} don't work although the string is the same as the origin but \uxxxx is unfortunatly interpreted -- This message was sent by Atlassian Jira (v8.3.4#803005)