[ 
https://issues.apache.org/jira/browse/DRILL-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144475#comment-16144475
 ] 

Paul Rogers commented on DRILL-5747:
------------------------------------

This change is already being done as part of the revised {{ScanBatch}} for the 
project to limit Drill batch sizes.

> Drill should put directory name field in same sequence w.r.t regular column 
> for select * query
> ----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5747
>                 URL: https://issues.apache.org/jira/browse/DRILL-5747
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> Today,  star column * in Drill would expand into a list of regular columns, 
> and the directory name field such as dir0, dir1.  However, Drill does not put 
> the directory name field with respect to regular field in a consistent way.
> For instance, for parquet files, dir0 is put behind the list of regular 
> columns.
> {code}
> select * from dfs.tmp.parquetTbl where dir0 = 1990;
> +--------------+--------------+--------------+--------------+-------+
> | N_NATIONKEY  |    N_NAME    | N_REGIONKEY  |  N_COMMENT   | dir0  |
> +--------------+--------------+--------------+--------------+-------+
> | 0            | [B@5527446   | 0            | [B@684fa264  | 1990  |
> | 1            | [B@442e88bc  | 1            | [B@4b13119c  | 1990  |
> | 2            | [B@50e93f45  | 1            | [B@138f483   | 1990  |
> | 3            | [B@423cc515  | 1            | [B@23af07ac  | 1990  |
> | 4            | [B@3820bf81  | 4            | [B@6dfccaf0  | 1990  |
> | 5            | [B@6f6f8af9  | 0            | [B@40d1a97   | 1990  |
> | 6            | [B@784cb194  | 3            | [B@731ea93f  | 1990  |
> | 7            | [B@61f9a224  | 3            | [B@4c041bbc  | 1990  |
> | 8            | [B@21b8faa1  | 2            | [B@774e7152  | 1990  |
> | 9            | [B@3ef1fbaf  | 2            | [B@c2be72    | 1990  |
> | 10           | [B@71652ec1  | 4            | [B@29e0bb10  | 1990  |
> | 11           | [B@61192cea  | 4            | [B@3bd3e873  | 1990  |
> | 12           | [B@5541f4b4  | 2            | [B@5d288126  | 1990  |
> | 13           | [B@e371592   | 4            | [B@42692b88  | 1990  |
> | 14           | [B@6a90fc8   | 0            | [B@454b16e2  | 1990  |
> | 15           | [B@44cb72f8  | 0            | [B@8e91b11   | 1990  |
> | 16           | [B@7feffda8  | 0            | [B@64f66236  | 1990  |
> | 17           | [B@6ba9fb02  | 1            | [B@649e7786  | 1990  |
> | 18           | [B@5fb93205  | 2            | [B@7783175b  | 1990  |
> | 19           | [B@3f7294a9  | 3            | [B@7b7e03c9  | 1990  |
> | 20           | [B@e2ac076   | 4            | [B@18c18a3e  | 1990  |
> | 21           | [B@4a5af924  | 2            | [B@1a9ad09f  | 1990  |
> | 22           | [B@29f6845e  | 3            | [B@776c4cd7  | 1990  |
> | 23           | [B@6728f481  | 3            | [B@31cc7610  | 1990  |
> | 24           | [B@665b2dfa  | 1            | [B@6c27ac95  | 1990  |
> +--------------+--------------+--------------+--------------+-------+
> {code}
> Notice in the above output, dir0 = 1990 is the last column.
> However, for JSON, dir0 is put in front of the list of regular columns.
> {code}
> select * from dfs.tmp.jsonTbl where dir0 = 1990;
> +-------+------+
> | dir0  |  a   |
> +-------+------+
> | 1990  | 100  |
> | 1990  | 200  |
> +-------+------+
> {code}
> It would be good to present the directory name field in the same sequence 
> regardless of file format, storage plugin. IMHO, it makes sense to put the 
> directory name field in front of the list of regular columns ( the behavior 
> that JSON format present today).
> This ticket is opened to modify Drill's ScanBatch code for the above 
> explained purpose.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to