[ https://issues.apache.org/jira/browse/AVRO-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800735#action_12800735 ]
Thiruvalluvan M. G. commented on AVRO-327: ------------------------------------------ bq. Are there important use cases where buffering is a problem? If we have a stream with metadata and data (like our DataFile) and if the metadata is encoded in non-avro format we'll have trouble with buffering. One reason users may want to encode the metadata in non-avro format is that the stream contains avro- and non-avro data. bq. ... call Scott's implementation BufferedBinaryDecoder and encourage folks to use that unless buffering is a problem. Could that work? +1 for this proposal. We can add a method reset() in the Decoder interface. This method would throw away the remaining contents of the buffer in BufferedBinaryDecoder and do nothing in BinaryDecoder. Other decoders like ValidatingDecoder will pass this call to their underlying Decoders. The main problem with BinaryDecoder is that it does single byte reads. One solution for that problem could be to encode the count of bytes for a number in the first byte itself. For example, a bit prefix 0 may indicate that it's a 7-it number, a prefix of 10 may indicate it's a 14-bit number (6 remaining bits of the first byte and 8 bits of the second byte) and so on. This will be as efficient as the current encoding scheme. We'd need at most two reads per encoded number. The performance will not be as great as buffered decoder, but better than BinaryDecoder. The big drawback of this scheme is that it is an incompatible change. I don't think it's worthwhile to implement it. > Performance improvements to BinaryDecoder.readLong() > ---------------------------------------------------- > > Key: AVRO-327 > URL: https://issues.apache.org/jira/browse/AVRO-327 > Project: Avro > Issue Type: Improvement > Components: java > Reporter: Thiruvalluvan M. G. > Assignee: Thiruvalluvan M. G. > > AVRO-315 proposed performance improvements to readLong(), readFloat() and > readDouble(). readLong() did not improve performance well for all. Scott > proposed a better method (but requires a change in semantics and API). We'll > carry on the discussion on that proposal here. AVRO-315 will be committed > with changes for readFloat() and readDouble(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.