[ 
https://issues.apache.org/jira/browse/DRILL-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875498#comment-15875498
 ] 

Khurram Faraaz commented on DRILL-5278:
---------------------------------------

The header is the reason for the extra record in the COUNT.
Shouldn't the header be ignored by default in this case (I know we have an 
option to ignore CSV header).
The reason I ask this is, because if there were an application that would 
perform a read on the temporary table created with store.fomat=csv, that read 
would fail due to a difference in Schema and/or due to the extra record if it 
was doing a count.

The CSV header should be ignored by default when a temporary table uses 
store.format=csv.

The header is part of the temporary table in the physical file.
{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from dfs.tmp.temp_tbl limit 2;
+---------+
| columns |
+---------+
| 
["col_int","col_chr","col_vrchr1","col_vrchr2","col_dt","col_tim","col_tmstmp","col_flt","col_intrvl_yr","col_intrvl_day","col_bln"]
 |
| ["45436","WV","John 
Mcginity","Rhbf6VFLJguvH9ejrWNkY1CDO8QqumTZAGjwa9cHfjBnLmNIWvo9YfcGObxbeXwa1NkemW9ULxsq5293wEA2v5FFCduwt03D7ysI3RlH8b4B0XAPKY","2011-11-04T00:00:00.000Z","1970-01-01T18:02:26.000Z","1988-09-23T16:58:42.000Z","10.193293","P314M","P26DT27386S","false"]
 |
+---------+
2 rows selected (0.232 seconds)
{noformat}

> CTTAS store.format=csv, returns one extra record
> ------------------------------------------------
>
>                 Key: DRILL-5278
>                 URL: https://issues.apache.org/jira/browse/DRILL-5278
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>            Reporter: Khurram Faraaz
>
> When store.format = csv, we see incorrect results returned, an extra record 
> is returned, as compared to when store.format = parquet. The difference is 
> seen when doing a count over a temporary table created in Drill.
> Drill 1.10.0 git commit ID : 300e9349
> Steps to reproduce the problem.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> reset all;
> +-------+---------------+
> |  ok   |    summary    |
> +-------+---------------+
> | true  | ALL updated.  |
> +-------+---------------+
> 1 row selected (0.279 seconds)
> 0: jdbc:drill:schema=dfs.tmp> ALTER SESSION SET `store.format`='csv';
> +-------+------------------------+
> |  ok   |        summary         |
> +-------+------------------------+
> | true  | store.format updated.  |
> +-------+------------------------+
> 1 row selected (0.21 seconds)
> 0: jdbc:drill:schema=dfs.tmp> CREATE TEMPORARY TABLE dfs.tmp.temp_tbl
> . . . . . . . . . . . . . . > AS
> . . . . . . . . . . . . . . > SELECT * FROM typeall_l;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 105                        |
> +-----------+----------------------------+
> 1 row selected (0.233 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT COUNT(*) FROM dfs.tmp.temp_tbl;
> +---------+
> | EXPR$0  |
> +---------+
> | 106     |
> +---------+
> 1 row selected (0.189 seconds)
> 0: jdbc:drill:schema=dfs.tmp> drop table dfs.tmp.temp_tbl;
> +-------+-------------------------------------+
> |  ok   |               summary               |
> +-------+-------------------------------------+
> | true  | Temporary table [temp_tbl] dropped  |
> +-------+-------------------------------------+
> 1 row selected (0.186 seconds)
> 0: jdbc:drill:schema=dfs.tmp> ALTER SESSION SET `store.format`='parquet';
> +-------+------------------------+
> |  ok   |        summary         |
> +-------+------------------------+
> | true  | store.format updated.  |
> +-------+------------------------+
> 1 row selected (0.196 seconds)
> 0: jdbc:drill:schema=dfs.tmp> CREATE TEMPORARY TABLE dfs.tmp.temp_tbl
> . . . . . . . . . . . . . . > AS
> . . . . . . . . . . . . . . > SELECT * FROM typeall_l;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 105                        |
> +-----------+----------------------------+
> 1 row selected (0.263 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT COUNT(*) FROM dfs.tmp.temp_tbl;
> +---------+
> | EXPR$0  |
> +---------+
> | 105     |
> +---------+
> 1 row selected (0.169 seconds)
> 0: jdbc:drill:schema=dfs.tmp> drop table dfs.tmp.temp_tbl;
> +-------+-------------------------------------+
> |  ok   |               summary               |
> +-------+-------------------------------------+
> | true  | Temporary table [temp_tbl] dropped  |
> +-------+-------------------------------------+
> 1 row selected (0.165 seconds)
> 0: jdbc:drill:schema=dfs.tmp>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to