[ 
https://issues.apache.org/jira/browse/DRILL-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498842#comment-14498842
 ] 

Victoria Markman commented on DRILL-2806:
-----------------------------------------

That fact that drill did not throw an error is bad: it gives an impression to 
user that  we do support .tgz, but return wrong result.
Is it possible to throw an error on attempt to read from file with unsupported 
extension ?

> Querying data from compressed csv file returns nulls and unreadable data
> ------------------------------------------------------------------------
>
>                 Key: DRILL-2806
>                 URL: https://issues.apache.org/jira/browse/DRILL-2806
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 0.9.0
>         Environment: 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: 
> Exit early from HashJoinBatch if build side is empty | 26.03.2015
>            Reporter: Khurram Faraaz
>            Assignee: Steven Phillips
>
> Project columns from a compressed CSV data file returns unreadable data and 
> nulls in the query results. Querying the same CSV file in uncompressed 
> format, the query returns correct results, readable data and no nulls. Test 
> was performed on 4 node cluster on CentOS.
> {code}
> 0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], 
> columns[4], columns[5], columns[6], columns[7] from 
> `deletions-00000-of-00020.tgz` limit 10;
> +------------+------------+------------+------------+------------+------------+------------+------------+
> |   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   
> |   EXPR$6   |   EXPR$7   |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | 0U[ˮȑ|axaR)ﺫ=鲍i̊HDJ|?3̑$%Q$%
>                                                 TdfD8'2i$E^/Y}C'>|/7
>                                                                               
>     H1o0! | 0g TMUܸW`ʙ&T
>                                                                               
>                                                   \uXپN|2I~Y 0RAX6UaXe+ow*]s 
> | null       | null       | null       | null       | null       | null       
> |
> | oM.ڻU/ | ̼\
>                            )qwda7((
>                                                       y[) | 
> 9>^0>WM[{r]iE$ze&!EküIfa | null       | null       | null       | null       
> | null       |
> | SRΠ     | null       | null       | null       | null       | null       | 
> null       | null       |
> | 
> 6imJ\f_dYڿ]%ln3IaE*BGA-a$j:M!Uc)ﶘD~wUx0ɼgme]ӘcQ*pk$%\2ER-)(ÈxTn?SϓxeҜݠºI|'(Cni
>      s | null       | null       | null       | null       | null       | 
> null       | null       |
> | bxΜkr4ü_nIxl_s`vN   
> ó.$OL7Eބyڗia;Pu$M!AoCӦnlS-`ۢ+o~>%wzcgwtMge7"lMgZ=WྃgMRX1"a | X=Rd.fab{t{
>                                                                               
>                                                                               
>                            A!t
>                                                                               
>                                                                               
>                                    1$ڧw-0EXURg
>                                                                               
>                                                                               
>                                                           p       
> #qzߤ΢gWMem{=z{
>                                                                               
>                                                                               
>                                                                               
>           eiA]^ | null       | null       | null       | null       | null    
>    | null       |
> | ֌        | null       | null       | null       | null       | null       | 
> null       | null       |
> | !{1H*m71`˰]oZ | 𾳔] &f4Z)4SP7Rm4^5WWXȧ<p.́3L
>                                                                               
>         q%|WL-p[ | null       | null       | null       | null       | null   
>     | null       |
> | dqyd\K#"ԁ@ | null       | null       | null       | null       | null       
> | null       | null       |
> | [GԊKFlɢ(ZK8h#D/[(U=_8ΏE%
>                                                            [;
>                                                               w}Fr`#Xk
>                                                                               
> lT'15:y
>                                                                               
>                  ņPz(-ȓ񆹞Cs)1v    | null       | null       | null       | 
> null       | null       | null       | null       |
> | LyPO|Ώ(+n+H]
>                          Ņ2?糩s/_ l
>                                             +ӯb        | null       | null    
>    | null       | null       | null       | null       | null       |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> 10 rows selected (0.176 seconds)
> 0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], 
> columns[4], columns[5], columns[6], columns[7] from 
> `deletions/deletions-00000-of-00020.csv` limit 10;
> +------------+------------+------------+------------+------------+------------+------------+------------+
> |   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |   EXPR$4   |   EXPR$5   
> |   EXPR$6   |   EXPR$7   |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> | 1354980518007 | /user/mwcl_musicbrainz | 1356247116000 | 
> /user/google_gardener | /m/0nj707g | /music/track_contribution/contributor | 
> /m/09xmq3  | en         |
> | 1359609261000 | /user/ahsan2002us | 1359697206000 | /user/mjsigua | 
> /m/0q47ym9 | /common/topic/description | Afrosheen CEO is the fictional 
> character from the 2003 film The Watermelon Heist. | en         |
> | 1258294630005 | /user/book_bot | 1260214155000 | /user/book_bot | 
> /m/08g19rh | /book/book_edition/book | /m/04sty07 | en         |
> | 1260232964000 | /user/book_bot | 1360880749000 | /user/turtlewax_bot | 
> /m/0872_f2 | /book/book_edition/book | /m/069_gyc | en         |
> | 1320298552000 | /user/gardening_bot | 1358083965004 | /user/googlebot | 
> /m/01dy3t2 | /type/object/type | /music/single | en         |
> | 1360430129006 | /user/mwcl_musicbrainz | 1362830875001 | 
> /user/mwcl_musicbrainz | /m/0qm1x62 | /music/release_track/release | 
> /m/0ql38vr | en         |
> | 1269251105000 | /user/mwcl_images | 1336539194001 | /user/gardening_bot | 
> /m/06w7yw7 | /common/topic/image | /m/0bcncxt | en         |
> | 1225386250001 | /user/mwcl_images | 1336080683003 | /user/gardening_bot | 
> /m/04sb526 | /common/licensed_object/license | /m/02x6b   | en         |
> | 1286991487000 | /user/mw_template_bot | 1362532733000 | 
> /user/wikipedia_facts | /m/0dgs170 | /people/person/date_of_birth | 1975      
>  | en         |
> | 1258986090000 | /user/book_bot | 1260138587000 | /user/book_bot | 
> /m/08r_m33 | /book/book_edition/book | /m/04sty07 | en         |
> +------------+------------+------------+------------+------------+------------+------------+------------+
> 10 rows selected (0.25 seconds)
> Details of the files (compressed and uncompressed)
> [root@centos-01 ~]# hadoop fs -ls /tmp/deletions-00000-of-00020.tgz
> -rwxr-xr-x   3 root root  111364147 2015-04-16 20:35 
> /tmp/deletions-00000-of-00020.tgz
> [root@centos-01 ~]# hadoop fs -ls /tmp/deletions/deletions-00000-of-00020.csv
> -rwxr-xr-x   3 root root  395624293 2015-04-14 18:10 
> /tmp/deletions/deletions-00000-of-00020.csv
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to