On Wed, Apr 22, 2020 at 11:06 AM Jeff Klukas <jklu...@mozilla.com> wrote:

> Beam is able to infer compression from file extensions for a variety of
> formats, but snappy is not among them currently:
>
>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java
>
> Although ParquetIO and AvroIO each look to have support for snappy.
>
> So as best I can tell, there is no current built-in support for reading
> text files compressed via snappy. I think you would need to use FileIO to
> match files, and then implement a custom DoFn that take the file object,
> streams the contents through a snappy decompressor, and outputs one record
> per line.
>
> I imagine a PR to add snappy as a supported format in Compression.java
> would be welcome.
>

+1, and probably not that difficult either.


>
> On Wed, Apr 22, 2020 at 1:16 PM Christopher Larsen <chlar...@google.com>
> wrote:
>
>> Hi devs,
>>
>> We are trying to build a pipeline to read snappy compressed text files
>> that contain one record per line using the Java SDK.
>>
>> We have tried the following to read the files:
>>
>> p.apply("ReadLines",
>> FileIO.match().filepattern((options.getInputFilePattern())))
>>         .apply(FileIO.readMatches())
>>         .setCoder(SnappyCoder.of(ReadableFileCoder.of()))
>>         .apply(TextIO.readFiles())
>>         .apply(ParDo.of(new TransformRecord()));
>>
>> Is there a recommended way to decompress and read Snappy files with Beam?
>>
>> Thanks,
>> Chris
>>
>

Reply via email to