[ 
https://issues.apache.org/jira/browse/DRILL-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956971#comment-16956971
 ] 

ASF GitHub Bot commented on DRILL-5674:
---------------------------------------

arina-ielchiieva commented on pull request #1879: DRILL-5674: Support ZIP 
compression
URL: https://github.com/apache/drill/pull/1879#discussion_r337457219
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/pcapng/PcapngFormatPlugin.java
 ##########
 @@ -47,7 +47,7 @@ public PcapngFormatPlugin(String name, DrillbitContext 
context, Configuration fs
 
   public PcapngFormatPlugin(String name, DrillbitContext context, 
Configuration fsConf, StoragePluginConfig config, PcapngFormatConfig 
formatPluginConfig) {
     super(name, context, fsConf, config, formatPluginConfig, true,
-        false, true, false,
+        false, true, true,
 
 Review comment:
   1. Drill uses `BlockMapBuilder` to split file into blocks if possible. 
According to its code, it tries to split the file if `blockSplittable` is set 
to true and file IS NOT compressed. So even if format is block splittable but 
came as compressed file, it won't be split.
   
   
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java#L115
   
   Looks like most of compressed formats are not splittable, that's why Drill 
considers any compressed file not splittable: 
https://i.stack.imgur.com/jpprr.jpg
   
   2. Regarding blockSplittable for Pcang format, you are right such format is 
not splittable, as well as Pcap, I have updated the value of `blockSplittable` 
to `false` for both formats.
   
   https://blog.marouni.fr/pcap2seq/
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Drill should support .zip compression
> -------------------------------------
>
>                 Key: DRILL-5674
>                 URL: https://issues.apache.org/jira/browse/DRILL-5674
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Arina Ielchiieva
>            Priority: Major
>              Labels: doc-impacting
>             Fix For: 1.17.0
>
>
> Zip is a very common compression format. Create a compressed CSV file with 
> column headers: data.csv.zip.
> Define a storage plugin config for the file, call it "dfs.myws", set 
> delimiter = ",", extract header = true, skip header = false.
> Run a simple query:
> SELECT * FROM dfs.myws.`data.csv.zip`
> The result is garbage as the CSV reader is trying to parse Zipped data as if 
> it were text.
> DRILL-5506 asks how to do this; the responder said to add a library to the 
> path. Better would be to simply support zip out-of-the-box as a default 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to