[ https://issues.apache.org/jira/browse/DRILL-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
benj updated DRILL-7444: ------------------------ Summary: JSON blank result on SELECT when too much byte in multiple files on Drill embedded (was: JSON blank result on SELECT when too much byte in multiple files on embedded) > JSON blank result on SELECT when too much byte in multiple files on Drill > embedded > ---------------------------------------------------------------------------------- > > Key: DRILL-7444 > URL: https://issues.apache.org/jira/browse/DRILL-7444 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON > Affects Versions: 1.17.0 > Reporter: benj > Priority: Major > > 2 files (a.json and b.json) and the concat of these 2 file (ab.json) produce > different results on a simple _SELECT_ when using +Drill embedded+. > Problem appears from a number of byte (~ 102 400 000 in my case) > {code:bash} > #!/bin/bash > # script gen.sh to reproduce the problem > for ((i=1;i<=$1;++i)); > do > echo -n '{"At":"' > for j in {1..999}; > do > echo -n 'aaaaabbbbb' > done > echo '"}' > done > {code} > {noformat} > == I == > $ gen.sh 10000 > a.json > $ gen.sh 239 > b.json > $ wc -c *.json > 100000000 a.json > 2390000 b.json > 102390000 total > $ bash drill-embedded > apache drill> SELECT * FROM dfs.tmp.`*.json` LIMIT 1; > +--------------------+ > | At | > +--------------------+ > | aaaaabbbbaaaaab... | > +--------------------+ > => All is fine here > == II == > $ gen.sh 10000 > a.json > $ gen.sh 240 > b.json > $ wc -c *.json > 100000000 a.json > 2400000 b.json > 102400000 total > $ bash drill-embedded > apache drill> SELECT * FROM dfs.tmp.`*.json` LIMIT 1; > +--------------------+ > | At | > +--------------------+ > | | > +--------------------+ > => In a surprising way field `At` is empty > == III == > $ gen.sh 10240 > ab.json > $ wc -c *.json > 102400000 ab.json > $ bash drill-embedded > apache drill> SELECT * FROM dfs.tmp.`c.json` LIMIT 1; > +--------------------+ > | At | > +--------------------+ > | aaaaabbbbaaaaab... | > +--------------------+ > => All is fine here although the number of lines is equal to case II > {noformat} > The Version of the Drill 1.17 tested here is the latest at 2019-11-13 > This problem doesn't appears with Drill embedded 1.16 -- This message was sent by Atlassian Jira (v8.3.4#803005)