[ 
https://issues.apache.org/jira/browse/AVRO-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793421#action_12793421
 ] 

Philip Zeyliger commented on AVRO-160:
--------------------------------------

Thanks for addressing my comments.  Some minor notes
below, but I'm comfortable with this being committed. +1.

bq. org.apache.avro.file.Header (from the spec)

Cool.

bq.  Things used in Hadoop InputFormats should be thread safe to make them easy 
to use from multi-threaded mappers. SequenceFile is thread-safe for this 
reason, and we want this to be a drop-in replacement for SequenceFile.

It might be handy to make a note in DataFileReader's javadoc
to mention that it is thread-safe.  AVRO could later add
a non-thread-safe version, if it's deemed faster.

bq. To read up to that synchoronization point, call pastSync(long)

pastSync doesn't seem to do any reading, so this might be out of date.
Also, synchronization is misspelled.

bq. DataFileStream: vin.readFixed(magic);

Hate to waffle on you here, but this throws EOFException on
a two-byte file, whereas "Not a data file" would be clearer.

bq. DataFileStream: synchronization of hasNext(), next(D), close.

Do these need to be synchronized for Hadoop compatibility, too?
If so, I think it's appropriate to note in the javadoc
for DataFileStream that multiple threads can use it concurrently,
though they are not allowed to use the underlying inputstream.

bq. //System.out.println("sync = "+ 
bq. //System.out.println("start = "+start+" end = "+end);

You may want to delete these two before checkin.

bq. TestDataFile: readFile()

I think the reuse parameter is unused here now.


> file format should be friendly to streaming
> -------------------------------------------
>
>                 Key: AVRO-160
>                 URL: https://issues.apache.org/jira/browse/AVRO-160
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-160.patch, AVRO-160.patch
>
>
> It should be possible to stream through an Avro data file without seeking to 
> the end.
> Currently the interpretation is that schemas written to the file apply to all 
> entries before them.  If this were changed so that they instead apply to all 
> entries that follow, and the initial schema is written at the start of the 
> file, then streaming could be supported.
> Note that the only change permitted to a schema as a file is written is to, 
> if it is a union, to add new branches at the end of that union.  If it is not 
> a union, no changes may be made.  So it is still the case that the final 
> schema in a file can read every entry in the file and thus may be used to 
> randomly access the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to