[ https://issues.apache.org/jira/browse/BEAM-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360092#comment-15360092 ]
ASF GitHub Bot commented on BEAM-422: ------------------------------------- GitHub user dhalperi opened a pull request: https://github.com/apache/incubator-beam/pull/583 [BEAM-422] AvroSource: use a 64K buffer size for Snappy codec commons-compress defaults to a 32K buffer size for Snappy. However, Avro uses xerial.snappy to write, which has a 64K buffer size. When the buffer size is too small, decoding data from Snappy can cause an EOF exception rather than finishing data. This fixes BEAM-422. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhalperi/incubator-beam avro-source-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/583.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #583 ---- commit 7bbec5108aa64e97a4e7a2cda54882a369f94ebd Author: Dan Halperin <dhalp...@google.com> Date: 2016-07-02T09:18:07Z AvroSource: use a 64K buffer size for Snappy codec commons-compress defaults to a 32K buffer size for Snappy. However, Avro uses xerial.snappy to write, which has a 64K buffer size. When the buffer size is too small, decoding data from Snappy can cause an EOF exception rather than finishing data. This fixes BEAM-422. ---- > AvroSource hits various IOException when using Snappy codec > ----------------------------------------------------------- > > Key: BEAM-422 > URL: https://issues.apache.org/jira/browse/BEAM-422 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Affects Versions: 0.1.0-incubating, 0.2.0-incubating > Reporter: Daniel Halperin > Assignee: Daniel Halperin > > Example: > {code} > Caused by:java.io.IOException: Offset is larger than block size at > org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.expandCopy(SnappyCompressorInputStream.java:338) > at > org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.fill(SnappyCompressorInputStream.java:209) > at > org.apache.commons.compress.compressors.snappy.SnappyCompressorInputStream.read(SnappyCompressorInputStream.java:134) > > atorg.apache.avro.io.BinaryDecoder$InputStreamByteSource.tryReadRaw(BinaryDecoder.java:839) > > atorg.apache.avro.io.BinaryDecoder$ByteSource.compactAndFill(BinaryDecoder.java:692) > atorg.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:471) > atorg.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) > atorg.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) > atorg.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290) > atorg.apache.avro.io.parsing.Parser.advance(Parser.java:88) > atorg.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267) at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)