[ https://issues.apache.org/jira/browse/DRILL-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144475#comment-16144475 ]
Paul Rogers commented on DRILL-5747: ------------------------------------ This change is already being done as part of the revised {{ScanBatch}} for the project to limit Drill batch sizes. > Drill should put directory name field in same sequence w.r.t regular column > for select * query > ---------------------------------------------------------------------------------------------- > > Key: DRILL-5747 > URL: https://issues.apache.org/jira/browse/DRILL-5747 > Project: Apache Drill > Issue Type: Bug > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > > Today, star column * in Drill would expand into a list of regular columns, > and the directory name field such as dir0, dir1. However, Drill does not put > the directory name field with respect to regular field in a consistent way. > For instance, for parquet files, dir0 is put behind the list of regular > columns. > {code} > select * from dfs.tmp.parquetTbl where dir0 = 1990; > +--------------+--------------+--------------+--------------+-------+ > | N_NATIONKEY | N_NAME | N_REGIONKEY | N_COMMENT | dir0 | > +--------------+--------------+--------------+--------------+-------+ > | 0 | [B@5527446 | 0 | [B@684fa264 | 1990 | > | 1 | [B@442e88bc | 1 | [B@4b13119c | 1990 | > | 2 | [B@50e93f45 | 1 | [B@138f483 | 1990 | > | 3 | [B@423cc515 | 1 | [B@23af07ac | 1990 | > | 4 | [B@3820bf81 | 4 | [B@6dfccaf0 | 1990 | > | 5 | [B@6f6f8af9 | 0 | [B@40d1a97 | 1990 | > | 6 | [B@784cb194 | 3 | [B@731ea93f | 1990 | > | 7 | [B@61f9a224 | 3 | [B@4c041bbc | 1990 | > | 8 | [B@21b8faa1 | 2 | [B@774e7152 | 1990 | > | 9 | [B@3ef1fbaf | 2 | [B@c2be72 | 1990 | > | 10 | [B@71652ec1 | 4 | [B@29e0bb10 | 1990 | > | 11 | [B@61192cea | 4 | [B@3bd3e873 | 1990 | > | 12 | [B@5541f4b4 | 2 | [B@5d288126 | 1990 | > | 13 | [B@e371592 | 4 | [B@42692b88 | 1990 | > | 14 | [B@6a90fc8 | 0 | [B@454b16e2 | 1990 | > | 15 | [B@44cb72f8 | 0 | [B@8e91b11 | 1990 | > | 16 | [B@7feffda8 | 0 | [B@64f66236 | 1990 | > | 17 | [B@6ba9fb02 | 1 | [B@649e7786 | 1990 | > | 18 | [B@5fb93205 | 2 | [B@7783175b | 1990 | > | 19 | [B@3f7294a9 | 3 | [B@7b7e03c9 | 1990 | > | 20 | [B@e2ac076 | 4 | [B@18c18a3e | 1990 | > | 21 | [B@4a5af924 | 2 | [B@1a9ad09f | 1990 | > | 22 | [B@29f6845e | 3 | [B@776c4cd7 | 1990 | > | 23 | [B@6728f481 | 3 | [B@31cc7610 | 1990 | > | 24 | [B@665b2dfa | 1 | [B@6c27ac95 | 1990 | > +--------------+--------------+--------------+--------------+-------+ > {code} > Notice in the above output, dir0 = 1990 is the last column. > However, for JSON, dir0 is put in front of the list of regular columns. > {code} > select * from dfs.tmp.jsonTbl where dir0 = 1990; > +-------+------+ > | dir0 | a | > +-------+------+ > | 1990 | 100 | > | 1990 | 200 | > +-------+------+ > {code} > It would be good to present the directory name field in the same sequence > regardless of file format, storage plugin. IMHO, it makes sense to put the > directory name field in front of the list of regular columns ( the behavior > that JSON format present today). > This ticket is opened to modify Drill's ScanBatch code for the above > explained purpose. -- This message was sent by Atlassian JIRA (v6.4.14#64029)