[ 
https://issues.apache.org/jira/browse/FLINK-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178113#comment-16178113
 ] 

ASF GitHub Bot commented on FLINK-5944:
---------------------------------------

Github user haohui commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4683#discussion_r140649772
  
    --- Diff: flink-core/pom.xml ---
    @@ -52,6 +52,12 @@ under the License.
                        <artifactId>flink-shaded-asm</artifactId>
                </dependency>
     
    +           <dependency>
    +                   <groupId>org.apache.flink</groupId>
    +                   <artifactId>flink-shaded-hadoop2</artifactId>
    +                   <version>${project.version}</version>
    +           </dependency>
    --- End diff --
    
    Including hadoop as a dependency in flink-core can be problematic for a 
number of downstream projects.
    
    I wonder what is the exact difference between the Hadoop and vanilla snappy 
codec? Is it just due to the fact that there are additional framings in the 
snappy codec in Hadoop?
    
    



> Flink should support reading Snappy Files
> -----------------------------------------
>
>                 Key: FLINK-5944
>                 URL: https://issues.apache.org/jira/browse/FLINK-5944
>             Project: Flink
>          Issue Type: New Feature
>          Components: Batch Connectors and Input/Output Formats
>            Reporter: Ilya Ganelin
>            Assignee: Mikhail Lipkovich
>              Labels: features
>
> Snappy is an extremely performant compression format that's widely used 
> offering fast decompression/compression. 
> This can be easily implemented by creating a SnappyInflaterInputStreamFactory 
> and updating the initDefaultInflateInputStreamFactories in FileInputFormat.
> Flink already includes the Snappy dependency in the project. 
> There is a minor gotcha in this. If we wish to use this with Hadoop, then we 
> must provide two separate implementations since Hadoop uses a different 
> version of the snappy format than Snappy Java (which is the xerial/snappy 
> included in Flink). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to