[
https://issues.apache.org/jira/browse/DRILL-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aditya Kishore updated DRILL-1195:
----------------------------------
Component/s: (was: Storage - JSON)
Execution - Data Types
Description: Incorrect result are returned while querying nested
array elements (see comment below for illustrated examples). (was: If a Json
obejct has an array which is present at the 1st(uppermost) level it can be
queried in Drill successfully as follows
{code}
{"rownum":1, "arrayval": [ "a1", "a2", "a3" ]}
{"rownum":2, "arrayval": [ "b1", "b2", "b3" ]}
{"rownum":3, "arrayval": [ "c1", "c2", "c3" ]}
{"rownum":4, "arrayval": [ "d1", "d2", "d3" ]}
{"rownum":5, "arrayval": [ "e1", "e2", "e3" ]}
select tbl.arrayval[0] from `nested_working.json` tbl;
+------------+
| EXPR$0 |
+------------+
| a1 |
| b1 |
| c1 |
| d1 |
| e1 |
+------------+
5 rows selected (0.157 seconds)
{code}
However if the array is present in the 2nd level or below (inner level),
directly querying the array's elements throws only null values.
{code}
{"rownum":1, "a":{"arrayval": [ "a1", "a2", "a3" ]}}
{"rownum":2, "a":{"arrayval": [ "b1", "b2", "b3" ]}}
{"rownum":3, "a":{"arrayval": [ "c1", "c2", "c3" ]}}
{"rownum":4, "a":{"arrayval": [ "d1", "d2", "d3" ]}}
{"rownum":5, "a":{"arrayval": [ "e1", "e2", "e3" ]}}
select tbl.a.arrayval[0] from `nested_failing.json` tbl;
+------------+
| EXPR$0 |
+------------+
| null |
| null |
| null |
| null |
| null |
+------------+
5 rows selected (0.144 seconds)
{code}
Select * as well as selecting the whole array works fine though.
{code}
select tbl.a.arrayval from `/user/root/nested_failing.json` tbl;
+------------+
| EXPR$0 |
+------------+
| ["a1","a2","a3"] |
| ["b1","b2","b3"] |
| ["c1","c2","c3"] |
| ["d1","d2","d3"] |
| ["e1","e2","e3"] |
+------------+
5 rows selected (0.151 seconds)
{code}
On the other hand if the array's elements are json objects again, we are able
to successfully query them. The issue seems to be centric only to single
element objects.
{code}
{"rownum":1, "a":{"arrayval": [ {"val1":"a1"}, {"val2":"a2"}, {"val3":"a3"} ]}}
{"rownum":2, "a":{"arrayval": [ {"val1":"b1"}, {"val2":"b2"}, {"val3":"b3"} ]}}
{"rownum":3, "a":{"arrayval": [ {"val1":"c1"}, {"val2":"c2"}, {"val3":"c3"} ]}}
{"rownum":4, "a":{"arrayval": [ {"val1":"d1"}, {"val2":"d2"}, {"val3":"d3"} ]}}
{"rownum":5, "a":{"arrayval": [ {"val1":"e1"}, {"val2":"e2"}, {"val3":"e3"} ]}}
select tbl.a.arrayval[0] from `nested_working2.json` tbl;
+------------+
| EXPR$0 |
+------------+
| {"val1":"a1"} |
| {"val1":"b1"} |
| {"val1":"c1"} |
| {"val1":"d1"} |
| {"val1":"e1"} |
+------------+
5 rows selected (0.186 seconds)
select tbl.a.arrayval[0].val1 from `nested_working2.json` tbl;
+------------+
| EXPR$0 |
+------------+
| a1 |
| b1 |
| c1 |
| d1 |
| e1 |
+------------+
5 rows selected (0.157 seconds)
{code})
Target Version/s: 0.4.0
Affects Version/s: 0.4.0
If a Json obejct has an array which is present at the 1st(uppermost) level it
can be queried in Drill successfully as follows
{code}
{"rownum":1, "arrayval": [ "a1", "a2", "a3" ]}
{"rownum":2, "arrayval": [ "b1", "b2", "b3" ]}
{"rownum":3, "arrayval": [ "c1", "c2", "c3" ]}
{"rownum":4, "arrayval": [ "d1", "d2", "d3" ]}
{"rownum":5, "arrayval": [ "e1", "e2", "e3" ]}
select tbl.arrayval[0] from `nested_working.json` tbl;
+------------+
| EXPR$0 |
+------------+
| a1 |
| b1 |
| c1 |
| d1 |
| e1 |
+------------+
5 rows selected (0.157 seconds)
{code}
However if the array is present in the 2nd level or below (inner level),
directly querying the array's elements throws only null values.
{code}
{"rownum":1, "a":{"arrayval": [ "a1", "a2", "a3" ]}}
{"rownum":2, "a":{"arrayval": [ "b1", "b2", "b3" ]}}
{"rownum":3, "a":{"arrayval": [ "c1", "c2", "c3" ]}}
{"rownum":4, "a":{"arrayval": [ "d1", "d2", "d3" ]}}
{"rownum":5, "a":{"arrayval": [ "e1", "e2", "e3" ]}}
select tbl.a.arrayval[0] from `nested_failing.json` tbl;
+------------+
| EXPR$0 |
+------------+
| null |
| null |
| null |
| null |
| null |
+------------+
5 rows selected (0.144 seconds)
{code}
Select * as well as selecting the whole array works fine though.
{code}
select tbl.a.arrayval from `/user/root/nested_failing.json` tbl;
+------------+
| EXPR$0 |
+------------+
| ["a1","a2","a3"] |
| ["b1","b2","b3"] |
| ["c1","c2","c3"] |
| ["d1","d2","d3"] |
| ["e1","e2","e3"] |
+------------+
5 rows selected (0.151 seconds)
{code}
On the other hand if the array's elements are json objects again, we are able
to successfully query them. The issue seems to be centric only to single
element objects.
{code}
{"rownum":1, "a":{"arrayval": [ {"val1":"a1"}, {"val2":"a2"}, {"val3":"a3"} ]}}
{"rownum":2, "a":{"arrayval": [ {"val1":"b1"}, {"val2":"b2"}, {"val3":"b3"} ]}}
{"rownum":3, "a":{"arrayval": [ {"val1":"c1"}, {"val2":"c2"}, {"val3":"c3"} ]}}
{"rownum":4, "a":{"arrayval": [ {"val1":"d1"}, {"val2":"d2"}, {"val3":"d3"} ]}}
{"rownum":5, "a":{"arrayval": [ {"val1":"e1"}, {"val2":"e2"}, {"val3":"e3"} ]}}
select tbl.a.arrayval[0] from `nested_working2.json` tbl;
+------------+
| EXPR$0 |
+------------+
| {"val1":"a1"} |
| {"val1":"b1"} |
| {"val1":"c1"} |
| {"val1":"d1"} |
| {"val1":"e1"} |
+------------+
5 rows selected (0.186 seconds)
select tbl.a.arrayval[0].val1 from `nested_working2.json` tbl;
+------------+
| EXPR$0 |
+------------+
| a1 |
| b1 |
| c1 |
| d1 |
| e1 |
+------------+
5 rows selected (0.157 seconds)
{code}
> Querying nested array elements in Json returns only null values
> ---------------------------------------------------------------
>
> Key: DRILL-1195
> URL: https://issues.apache.org/jira/browse/DRILL-1195
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Data Types
> Affects Versions: 0.4.0
> Reporter: Amit Katti
> Assignee: Aditya Kishore
> Fix For: 0.4.0
>
> Attachments:
> DRILL-1195-Querying-nested-array-elements-in-JSON-re.patch
>
>
> Incorrect result are returned while querying nested array elements (see
> comment below for illustrated examples).
--
This message was sent by Atlassian JIRA
(v6.2#6252)