[ 
https://issues.apache.org/jira/browse/DRILL-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Kishore updated DRILL-1195:
----------------------------------

          Component/s:     (was: Storage - JSON)
                       Execution - Data Types
          Description: Incorrect result are returned while querying nested 
array elements (see comment below for illustrated examples).  (was: If a Json 
obejct has an array which is present at the 1st(uppermost) level it can be 
queried in Drill successfully as follows

{code}
{"rownum":1, "arrayval": [ "a1", "a2", "a3" ]}
{"rownum":2, "arrayval": [ "b1", "b2", "b3" ]}
{"rownum":3, "arrayval": [ "c1", "c2", "c3" ]}
{"rownum":4, "arrayval": [ "d1", "d2", "d3" ]}
{"rownum":5, "arrayval": [ "e1", "e2", "e3" ]}

select tbl.arrayval[0] from `nested_working.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| a1         |
| b1         |
| c1         |
| d1         |
| e1         |
+------------+
5 rows selected (0.157 seconds)
{code}

However if the array is present in the 2nd level or below (inner level), 
directly querying the array's elements throws only null values. 

{code}
{"rownum":1, "a":{"arrayval": [ "a1", "a2", "a3" ]}}
{"rownum":2, "a":{"arrayval": [ "b1", "b2", "b3" ]}}
{"rownum":3, "a":{"arrayval": [ "c1", "c2", "c3" ]}}
{"rownum":4, "a":{"arrayval": [ "d1", "d2", "d3" ]}}
{"rownum":5, "a":{"arrayval": [ "e1", "e2", "e3" ]}}

select tbl.a.arrayval[0] from `nested_failing.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| null       |
| null       |
| null       |
| null       |
| null       |
+------------+
5 rows selected (0.144 seconds)
{code}

Select * as well as selecting the whole array works fine though.

{code}
select tbl.a.arrayval from `/user/root/nested_failing.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| ["a1","a2","a3"] |
| ["b1","b2","b3"] |
| ["c1","c2","c3"] |
| ["d1","d2","d3"] |
| ["e1","e2","e3"] |
+------------+
5 rows selected (0.151 seconds)
{code}

On the other hand if the array's elements are json objects again, we are able 
to successfully query them. The issue seems to be centric only to single 
element objects.

{code}
{"rownum":1, "a":{"arrayval": [ {"val1":"a1"}, {"val2":"a2"}, {"val3":"a3"} ]}}
{"rownum":2, "a":{"arrayval": [ {"val1":"b1"}, {"val2":"b2"}, {"val3":"b3"} ]}}
{"rownum":3, "a":{"arrayval": [ {"val1":"c1"}, {"val2":"c2"}, {"val3":"c3"} ]}}
{"rownum":4, "a":{"arrayval": [ {"val1":"d1"}, {"val2":"d2"}, {"val3":"d3"} ]}}
{"rownum":5, "a":{"arrayval": [ {"val1":"e1"}, {"val2":"e2"}, {"val3":"e3"} ]}}

select tbl.a.arrayval[0] from `nested_working2.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| {"val1":"a1"} |
| {"val1":"b1"} |
| {"val1":"c1"} |
| {"val1":"d1"} |
| {"val1":"e1"} |
+------------+
5 rows selected (0.186 seconds)

select tbl.a.arrayval[0].val1 from `nested_working2.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| a1         |
| b1         |
| c1         |
| d1         |
| e1         |
+------------+
5 rows selected (0.157 seconds)
{code})
     Target Version/s: 0.4.0
    Affects Version/s: 0.4.0

If a Json obejct has an array which is present at the 1st(uppermost) level it 
can be queried in Drill successfully as follows

{code}
{"rownum":1, "arrayval": [ "a1", "a2", "a3" ]}
{"rownum":2, "arrayval": [ "b1", "b2", "b3" ]}
{"rownum":3, "arrayval": [ "c1", "c2", "c3" ]}
{"rownum":4, "arrayval": [ "d1", "d2", "d3" ]}
{"rownum":5, "arrayval": [ "e1", "e2", "e3" ]}

select tbl.arrayval[0] from `nested_working.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| a1         |
| b1         |
| c1         |
| d1         |
| e1         |
+------------+
5 rows selected (0.157 seconds)
{code}

However if the array is present in the 2nd level or below (inner level), 
directly querying the array's elements throws only null values. 

{code}
{"rownum":1, "a":{"arrayval": [ "a1", "a2", "a3" ]}}
{"rownum":2, "a":{"arrayval": [ "b1", "b2", "b3" ]}}
{"rownum":3, "a":{"arrayval": [ "c1", "c2", "c3" ]}}
{"rownum":4, "a":{"arrayval": [ "d1", "d2", "d3" ]}}
{"rownum":5, "a":{"arrayval": [ "e1", "e2", "e3" ]}}

select tbl.a.arrayval[0] from `nested_failing.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| null       |
| null       |
| null       |
| null       |
| null       |
+------------+
5 rows selected (0.144 seconds)
{code}

Select * as well as selecting the whole array works fine though.

{code}
select tbl.a.arrayval from `/user/root/nested_failing.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| ["a1","a2","a3"] |
| ["b1","b2","b3"] |
| ["c1","c2","c3"] |
| ["d1","d2","d3"] |
| ["e1","e2","e3"] |
+------------+
5 rows selected (0.151 seconds)
{code}

On the other hand if the array's elements are json objects again, we are able 
to successfully query them. The issue seems to be centric only to single 
element objects.

{code}
{"rownum":1, "a":{"arrayval": [ {"val1":"a1"}, {"val2":"a2"}, {"val3":"a3"} ]}}
{"rownum":2, "a":{"arrayval": [ {"val1":"b1"}, {"val2":"b2"}, {"val3":"b3"} ]}}
{"rownum":3, "a":{"arrayval": [ {"val1":"c1"}, {"val2":"c2"}, {"val3":"c3"} ]}}
{"rownum":4, "a":{"arrayval": [ {"val1":"d1"}, {"val2":"d2"}, {"val3":"d3"} ]}}
{"rownum":5, "a":{"arrayval": [ {"val1":"e1"}, {"val2":"e2"}, {"val3":"e3"} ]}}

select tbl.a.arrayval[0] from `nested_working2.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| {"val1":"a1"} |
| {"val1":"b1"} |
| {"val1":"c1"} |
| {"val1":"d1"} |
| {"val1":"e1"} |
+------------+
5 rows selected (0.186 seconds)

select tbl.a.arrayval[0].val1 from `nested_working2.json` tbl;
+------------+
|   EXPR$0   |
+------------+
| a1         |
| b1         |
| c1         |
| d1         |
| e1         |
+------------+
5 rows selected (0.157 seconds)
{code}

> Querying nested array elements in Json returns only null values
> ---------------------------------------------------------------
>
>                 Key: DRILL-1195
>                 URL: https://issues.apache.org/jira/browse/DRILL-1195
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 0.4.0
>            Reporter: Amit Katti
>            Assignee: Aditya Kishore
>             Fix For: 0.4.0
>
>         Attachments: 
> DRILL-1195-Querying-nested-array-elements-in-JSON-re.patch
>
>
> Incorrect result are returned while querying nested array elements (see 
> comment below for illustrated examples).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to