Khurram Faraaz created DRILL-2760:
-------------------------------------
Summary: Quoted strings from CSV file appear in query output in
different forms
Key: DRILL-2760
URL: https://issues.apache.org/jira/browse/DRILL-2760
Project: Apache Drill
Issue Type: Bug
Components: Storage - Text & CSV
Affects Versions: 0.9.0
Environment: | 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580:
Exit early from HashJoinBatch if build side is empty | 26.03.2015 @ 16:13:53 EDT
4 node cluster on CentOS
Reporter: Khurram Faraaz
Assignee: Steven Phillips
Quoted strings appear in query output in different forms, as shown in the
section below.
Quotes should NOT appear in query output. Strings must be stripped of their
leading and prevailing quotes. (I am referring to this character - " )
{code}
Snippet of data from airports.cv file, first three lines, the first line has
header information.
[root@centos-01 airport_CSV_data]# head -3 airports.csv
"id","ident","type","name","latitude_deg","longitude_deg","elevation_ft","continent","iso_country","iso_region","municipality","scheduled_service","gps_code","iata_code","local_code","home_link","wikipedia_link","keywords"
6523,"00A","heliport","Total Rf
Heliport",40.07080078125,-74.9336013793945,11,"NA","US","US-PA","Bensalem","no","00A",,"00A",,,
6524,"00AK","small_airport","Lowell
Field",59.94919968,-151.695999146,450,"NA","US","US-AK","Anchor
Point","no","00AK",,"00AK",,,
case 1) In this case quotes are not escaped, they appear in the output as is.
0: jdbc:drill:> select columns[0] id,columns[1] ident,columns[2]
type,columns[3] name,columns[4] latitude_deg,columns[5]
longitude_deg,columns[6] elevation_ft,columns[7] continent,columns[8]
iso_country,columns[9] iso_region,columns[10] municipality,columns[11]
scheduled_service,columns[12] gps_code,columns[13] iata_code, columns[14]
local_code,columns[15] home_link,columns[16] wikipedia_link,columns[17]
keywords from `airports.csv` limit 3;
+------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
| id | ident | type | name | latitude_deg |
longitude_deg | elevation_ft | continent | iso_country | iso_region |
municipality | scheduled_service | gps_code | iata_code | local_code |
home_link | wikipedia_link | keywords |
+------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
| "id" | "ident" | "type" | "name" | "latitude_deg" |
"longitude_deg" | "elevation_ft" | "continent" | "iso_country" | "iso_region" |
"municipality" | "scheduled_service" | "gps_code" | "iata_code" | "local_code"
| "home_link" | "wikipedia_link" | "keywords" |
| 6523 | "00A" | "heliport" | "Total Rf Heliport" | 40.07080078125 |
-74.9336013793945 | 11 | "NA" | "US" | "US-PA" |
"Bensalem" | "no" | "00A" | | "00A" |
| | null |
| 6524 | "00AK" | "small_airport" | "Lowell Field" | 59.94919968 |
-151.695999146 | 450 | "NA" | "US" | "US-AK" | "Anchor
Point" | "no" | "00AK" | | "00AK" |
| | null |
+------------+------------+------------+------------+--------------+---------------+--------------+------------+-------------+------------+--------------+-------------------+------------+------------+------------+------------+----------------+------------+
3 rows selected (0.155 seconds)
In this case quotes appear in the query output but they are escaped with
backslash character in the output.
0: jdbc:drill:> select * from `airports.csv` limit 3;
+------------+
| columns |
+------------+
|
["\"id\"","\"ident\"","\"type\"","\"name\"","\"latitude_deg\"","\"longitude_deg\"","\"elevation_ft\"","\"continent\"","\"iso_country\"","\"iso_region\"","\"municipality\"","\"scheduled_service\"","\"gps_code\"","\"iata_code\"","\"local_code\"","\"home_link\"","\"wikipedia_link\"","\"keywords\""]
|
| ["6523","\"00A\"","\"heliport\"","\"Total Rf
Heliport\"","40.07080078125","-74.9336013793945","11","\"NA\"","\"US\"","\"US-PA\"","\"Bensalem\"","\"no\"","\"00A\"","","\"00A\"","",""]
|
| ["6524","\"00AK\"","\"small_airport\"","\"Lowell
Field\"","59.94919968","-151.695999146","450","\"NA\"","\"US\"","\"US-AK\"","\"Anchor
Point\"","\"no\"","\"00AK\"","","\"00AK\"","",""] |
+------------+
3 rows selected (0.097 seconds)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)