[ 
https://issues.apache.org/jira/browse/AVRO-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795387#action_12795387
 ] 

Philip Zeyliger commented on AVRO-160:
--------------------------------------

Looked over the patch again.  Looks good.  Synchronization issue is still open.

bq. It's not so much Hadoop compatibility as consistency: The API should either 
be thread-safe or not. If you feel that thread safety is not useful here and 
has a performance penalty then synchronization could be moved to the 
to-be-written InputFormat implementation that will use this. Would you prefer 
that?

My preference is against thread-safety in the basic container object.  (Just 
imagine me with big signs in a protest march... "Say No to Thread Safety"... 
Oy.)  I don't actually have any good numbers on how much synchronized blocks 
cost us.  Java has certainly moved towards ArrayList (and away from Vector), 
and I think that's not a crazy parallel.  Waxing more philosophical, half the 
time I find thread-safe containers don't buy you much: if you're using two of 
them and they need to be modified atomically, you still have to do your own 
synchronization work.

hasNext() and next() are methods that make very little sense, btw, 
synchronized.  You can only call next() when hasNext() is true, but who's to 
say someone hasn't gone in and advanced the pointer while you weren't looking...

-- Philip

> file format should be friendly to streaming
> -------------------------------------------
>
>                 Key: AVRO-160
>                 URL: https://issues.apache.org/jira/browse/AVRO-160
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>         Attachments: AVRO-160-python.patch, AVRO-160.patch, AVRO-160.patch, 
> AVRO-160.patch
>
>
> It should be possible to stream through an Avro data file without seeking to 
> the end.
> Currently the interpretation is that schemas written to the file apply to all 
> entries before them.  If this were changed so that they instead apply to all 
> entries that follow, and the initial schema is written at the start of the 
> file, then streaming could be supported.
> Note that the only change permitted to a schema as a file is written is to, 
> if it is a union, to add new branches at the end of that union.  If it is not 
> a union, no changes may be made.  So it is still the case that the final 
> schema in a file can read every entry in the file and thus may be used to 
> randomly access the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to