[
https://issues.apache.org/jira/browse/CRUNCH-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Olson resolved CRUNCH-698.
---------------------------------
Fix Version/s: 1.1.0
Resolution: Fixed
Pull request has been merged.
> Avro DataFileReader creation can hang
> -------------------------------------
>
> Key: CRUNCH-698
> URL: https://issues.apache.org/jira/browse/CRUNCH-698
> Project: Crunch
> Issue Type: Bug
> Components: Core, IO
> Reporter: Andrew Olson
> Assignee: Josh Wills
> Priority: Major
> Fix For: 1.1.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> A severe Avro bug [AVRO-2944|https://issues.apache.org/jira/browse/AVRO-2944]
> was recently found in the static method for creating a DataFileReader
> instance, where it can get stuck in an infinite loop while trying to read the
> 4 byte "magic" header of the file.
> This was fixed in Avro 1.10.1 but has not yet been patched to any other Avro
> versions. The issue has existed since Avro 1.5 although we have encountered
> it recently. It does not happen in normal circumstances, there has to be some
> very unusual input stream behavior (partial/throttled read, or unexpected
> EOF) causing it. We've only seen it with the S3AFileSystem's S3AInputStream,
> suddenly starting a few days ago for no apparent reason. Even now it is
> sporadic, happening a small percent of the time in job tasks that read many
> S3 files but often enough to be problematic. An AWS support case is open to
> attempt to find out what could have caused this.
> To avoid the external dependency on a particular Avro version to fix this, we
> can probably just patch this locally in Crunch since it's only one static
> method and apart from one legacy constant everything we need access to in the
> Avro code is public.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)