[
https://issues.apache.org/jira/browse/AVRO-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cutting updated AVRO-160:
------------------------------
Attachment: AVRO-160.patch
Philip> pastSync doesn't seem to do any reading, so this might be out of date.
Yes, that comment was confusing. I have updated it.
Philip> this throws EOFException on a two-byte file, whereas "Not a data file"
would be clearer.
I fixed that.
Philip> DataFileStream: synchronization of hasNext(), next(D), close.
Philip> Do these need to be synchronized for Hadoop compatibility, too?
It's not so much Hadoop compatibility as consistency: The API should either be
thread-safe or not. If you feel that thread safety is not useful here and has
a performance penalty then synchronization could be moved to the to-be-written
InputFormat implementation that will use this. Would you prefer that?
Philip> TestDataFile: readFile()
Philip> I think the reuse parameter is unused here now.
Removed.
I also fixed a bug in the sync handling code.
> file format should be friendly to streaming
> -------------------------------------------
>
> Key: AVRO-160
> URL: https://issues.apache.org/jira/browse/AVRO-160
> Project: Avro
> Issue Type: Improvement
> Components: spec
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Attachments: AVRO-160-python.patch, AVRO-160.patch, AVRO-160.patch,
> AVRO-160.patch
>
>
> It should be possible to stream through an Avro data file without seeking to
> the end.
> Currently the interpretation is that schemas written to the file apply to all
> entries before them. If this were changed so that they instead apply to all
> entries that follow, and the initial schema is written at the start of the
> file, then streaming could be supported.
> Note that the only change permitted to a schema as a file is written is to,
> if it is a union, to add new branches at the end of that union. If it is not
> a union, no changes may be made. So it is still the case that the final
> schema in a file can read every entry in the file and thus may be used to
> randomly access the file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.