[ 
https://issues.apache.org/jira/browse/AVRO-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800735#action_12800735
 ] 

Thiruvalluvan M. G. commented on AVRO-327:
------------------------------------------

bq. Are there important use cases where buffering is a problem?

If we have a stream with metadata and data (like our DataFile)  and if the 
metadata is encoded in non-avro format we'll have trouble with buffering. One 
reason users may want to encode the metadata in non-avro format is that the 
stream contains avro- and non-avro data.

bq. ... call Scott's implementation BufferedBinaryDecoder and encourage folks 
to use that unless buffering is a problem. Could that work?

+1 for this proposal. We can add a method reset() in the Decoder interface. 
This method would throw away the remaining contents of the buffer in 
BufferedBinaryDecoder and do nothing in BinaryDecoder. Other decoders like 
ValidatingDecoder will pass this call to their underlying Decoders. 

The main problem with BinaryDecoder is that it does single byte reads. One 
solution for that problem could be to encode the count of bytes for a number in 
the first byte itself. For example, a bit prefix 0 may indicate that it's a 
7-it number, a prefix of 10 may indicate it's a 14-bit number (6 remaining bits 
of the first byte and 8 bits of the second byte) and so on. This will be as 
efficient as the current encoding scheme. We'd need at most two reads per 
encoded number. The performance will not be as great as buffered decoder, but 
better than BinaryDecoder. The big drawback of this scheme is that it is an 
incompatible change. I don't think it's worthwhile to implement it.



> Performance improvements to BinaryDecoder.readLong()
> ----------------------------------------------------
>
>                 Key: AVRO-327
>                 URL: https://issues.apache.org/jira/browse/AVRO-327
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Thiruvalluvan M. G.
>            Assignee: Thiruvalluvan M. G.
>
> AVRO-315 proposed performance improvements to readLong(), readFloat() and 
> readDouble(). readLong() did not improve performance well for all. Scott 
> proposed a better method (but requires a change in semantics and API). We'll 
> carry on the discussion on that proposal here. AVRO-315 will be committed 
> with changes for readFloat() and readDouble().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to