[ https://issues.apache.org/jira/browse/DRILL-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498854#comment-14498854 ]
Victoria Markman commented on DRILL-2806: ----------------------------------------- Steven, thanks for the explanation and your suggestion sounds like the right thing to do. > Querying data from compressed csv file returns nulls and unreadable data > ------------------------------------------------------------------------ > > Key: DRILL-2806 > URL: https://issues.apache.org/jira/browse/DRILL-2806 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV > Affects Versions: 0.9.0 > Environment: 9d92b8e319f2d46e8659d903d355450e15946533 | DRILL-2580: > Exit early from HashJoinBatch if build side is empty | 26.03.2015 > Reporter: Khurram Faraaz > Assignee: Steven Phillips > > Project columns from a compressed CSV data file returns unreadable data and > nulls in the query results. Querying the same CSV file in uncompressed > format, the query returns correct results, readable data and no nulls. Test > was performed on 4 node cluster on CentOS. > {code} > 0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], > columns[4], columns[5], columns[6], columns[7] from > `deletions-00000-of-00020.tgz` limit 10; > +------------+------------+------------+------------+------------+------------+------------+------------+ > | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5 > | EXPR$6 | EXPR$7 | > +------------+------------+------------+------------+------------+------------+------------+------------+ > | 0U[ˮȑ|axaR)ﺫ=鲍i̊HDJ|?3̑$%Q$% > TdfD8'2i$E^/Y}C'>|/7 > > H1o0! | 0g TMUܸW`ʙ&T > > \uXپN|2I~Y 0RAX6UaXe+ow*]s > | null | null | null | null | null | null > | > | oM.ڻU/ | ̼\ > )qwda7(( > y[) | > 9>^0>WM[{r]iE$ze&!EküIfa | null | null | null | null > | null | > | SR | null | null | null | null | null | > null | null | > | > 6imJ\f_dYڿ]%ln3IaE*BGA-a$j:M!Uc)ﶘD~wUx0ɼgme]ӘcQ*pk$%\2ER-)(ÈxTn?SϓxeҜݠºI|'(Cni > s | null | null | null | null | null | > null | null | > | bxΜkr4ü_nIxl_s`vN > ó.$OL7Eބyڗia;Pu$M!AoCӦnlS-`ۢ+o~>%wzcgwtMge7"lMgZ=WྃgMRX1"a | X=Rd.fab{t{ > > > A!t > > > 1$ڧw-0EXURg > > > p > #qzߤgWMem{=z{ > > > > eiA]^ | null | null | null | null | null > | null | > | | null | null | null | null | null | > null | null | > | !{1H*m71`˰]oZ | ] &f4Z)4SP7Rm4^5WWXȧ<p.́3L > > q%|WL-p[ | null | null | null | null | null > | null | > | dqyd\K#"ԁ@ | null | null | null | null | null > | null | null | > | [GԊKFlɢ(ZK8h#D/[(U=_8ΏE% > [; > w}Fr`#Xk > > lT'15:y > > ņPz(-ȓCs)1v | null | null | null | > null | null | null | null | > | LyPO|Ώ(+n+H] > Ņ2?糩s/_ l > +ӯb | null | null > | null | null | null | null | null | > +------------+------------+------------+------------+------------+------------+------------+------------+ > 10 rows selected (0.176 seconds) > 0: jdbc:drill:> select columns[0], columns[1], columns[2], columns[3], > columns[4], columns[5], columns[6], columns[7] from > `deletions/deletions-00000-of-00020.csv` limit 10; > +------------+------------+------------+------------+------------+------------+------------+------------+ > | EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 | EXPR$5 > | EXPR$6 | EXPR$7 | > +------------+------------+------------+------------+------------+------------+------------+------------+ > | 1354980518007 | /user/mwcl_musicbrainz | 1356247116000 | > /user/google_gardener | /m/0nj707g | /music/track_contribution/contributor | > /m/09xmq3 | en | > | 1359609261000 | /user/ahsan2002us | 1359697206000 | /user/mjsigua | > /m/0q47ym9 | /common/topic/description | Afrosheen CEO is the fictional > character from the 2003 film The Watermelon Heist. | en | > | 1258294630005 | /user/book_bot | 1260214155000 | /user/book_bot | > /m/08g19rh | /book/book_edition/book | /m/04sty07 | en | > | 1260232964000 | /user/book_bot | 1360880749000 | /user/turtlewax_bot | > /m/0872_f2 | /book/book_edition/book | /m/069_gyc | en | > | 1320298552000 | /user/gardening_bot | 1358083965004 | /user/googlebot | > /m/01dy3t2 | /type/object/type | /music/single | en | > | 1360430129006 | /user/mwcl_musicbrainz | 1362830875001 | > /user/mwcl_musicbrainz | /m/0qm1x62 | /music/release_track/release | > /m/0ql38vr | en | > | 1269251105000 | /user/mwcl_images | 1336539194001 | /user/gardening_bot | > /m/06w7yw7 | /common/topic/image | /m/0bcncxt | en | > | 1225386250001 | /user/mwcl_images | 1336080683003 | /user/gardening_bot | > /m/04sb526 | /common/licensed_object/license | /m/02x6b | en | > | 1286991487000 | /user/mw_template_bot | 1362532733000 | > /user/wikipedia_facts | /m/0dgs170 | /people/person/date_of_birth | 1975 > | en | > | 1258986090000 | /user/book_bot | 1260138587000 | /user/book_bot | > /m/08r_m33 | /book/book_edition/book | /m/04sty07 | en | > +------------+------------+------------+------------+------------+------------+------------+------------+ > 10 rows selected (0.25 seconds) > Details of the files (compressed and uncompressed) > [root@centos-01 ~]# hadoop fs -ls /tmp/deletions-00000-of-00020.tgz > -rwxr-xr-x 3 root root 111364147 2015-04-16 20:35 > /tmp/deletions-00000-of-00020.tgz > [root@centos-01 ~]# hadoop fs -ls /tmp/deletions/deletions-00000-of-00020.csv > -rwxr-xr-x 3 root root 395624293 2015-04-14 18:10 > /tmp/deletions/deletions-00000-of-00020.csv > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)