[ https://issues.apache.org/jira/browse/DRILL-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vitalii Diravka reassigned DRILL-3577: -------------------------------------- Assignee: Vitalii Diravka (was: Mehant Baid) > Counting nested fields on CTAS-created-parquet file/s reports inaccurate > results > -------------------------------------------------------------------------------- > > Key: DRILL-3577 > URL: https://issues.apache.org/jira/browse/DRILL-3577 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 1.1.0 > Reporter: Hanifi Gunes > Assignee: Vitalii Diravka > Priority: Critical > Fix For: 1.7.0 > > > I have not tried this at a smaller scale nor on JSON file directly but the > following seems to re-prod the issue > 1. Create an input file as follows > 20K rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}} > 200 rows with the following - > {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last > entries only"}} > 2. CTAS as follows > {code:sql} > CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t > {code} > This should read > {code} > Fragment Number of records written > 0_0 20200 > {code} > 3. Count on nested fields via > {code:sql} > select count(t.others.additional) from dfs.`tmp`.`tp` t > OR > select count(t.others.other) from dfs.`tmp`.`tp` t > {code} > reports no rows as follows > {code} > EXPR$0 > 0 > {code} > While > {code:sql} > select count(t.`some`) from dfs.`tmp`.`tp` t where t.others.additional is not > null > {code} > reports expected 200 rows > {code} > EXPR$0 > 200 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)