Hanifi Gunes created DRILL-3577: ----------------------------------- Summary: Counting nested fields on CTAS-created-parquet file/s reports inaccurate results Key: DRILL-3577 URL: https://issues.apache.org/jira/browse/DRILL-3577 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.1.0 Reporter: Hanifi Gunes Assignee: Mehant Baid Priority: Critical
I have not tried this at a smaller scale nor on JSON file directly but the following seems to re-prod the issue 1. Create an input file as follows 20K rows with the following - {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} 200 rows with the following - {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last entries only"}} 2. CTAS as follows {code:sql} CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t {code} This should read {code} Fragment Number of records written 0_0 20200 {code} 3. Count on nested fields via {code:sql} select count(t.others.additional) from dfs.`tmp`.`tp` t OR select count(t.others.other) from dfs.`tmp`.`tp` t {code} reports no rows as follows {code} EXPR$0 0 {code} While {code:sql} select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not null {code} reports expected 200 rows {code} EXPR$0 200 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)