[jira] Commented: (AVRO-327) Performance improvements to BinaryDecoder.readLong()

Scott Carey (JIRA) Fri, 15 Jan 2010 16:27:17 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801028#action_12801028
 ]


Scott Carey commented on AVRO-327:
----------------------------------

Some bigger context:

InputStream and OutputStream are slow and should be avoided as much as possible 
when copying chunks to/from them smaller than a couple hundred bytes on 
average.  
This performance difference isn't small, its a factor of 2.5 on the previous 
test for things that aren't fully optimized yet -- including the readDouble() 
that read 8 bytes at a time.  This has a large impact on the impact of other 
improvements.  An improvement that currently helps by 10%, will help by 25% 
after this change.  If one wants to be able to have one thread decode/encode at 
gigabit ethernet wire speed, avoiding inputStream.read() and 
OutputStream.write(byte b) is mandatory -- even if you use a 
BufferedInputStream.

This is not just for decoder/encoder, but also in various other places, where 
the assumed "pass data around" method is via InputStream and OutputStream.  
ByteBuffer, byte[], Channel, are good options for various use cases that 
perform much better when small reads/writes are done than an equivalent 
Input/Output stream.
There will be more to change than just BinaryDecoder eventually, and a holistic 
approach is better than a patchwork one.

To address Thiru's concerns, I think that it can be made even simpler:

{code}
void f(InputStream in) {
   BinaryDecoder bin = new BinaryDecoder(in);
   AvroObject o = readAvro(bin);
   NonAvroObject no = readNonAvro(bin.inputStream());
}
{code}

BinaryDecoder can construct a specialized InputStream inner class on demand 
(and cache it).
The contract would be that once an InputStream is given to a decoder, it should 
not be accessed directly -- not any different than what happens when you wrap 
an input stream with a buffered input stream.  
Alternatively Decoder could implement BufferedInputStream itself -- but that 
would force that on all implementations.

If two readers need to readahead-buffer on the same data, a different API will 
be needed (ByteBuffers? something else? read methods that don't advance the 
position?).


I can produce a patch next week for review.



> Performance improvements to BinaryDecoder.readLong()
> ----------------------------------------------------
>
>                 Key: AVRO-327
>                 URL: https://issues.apache.org/jira/browse/AVRO-327
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Thiruvalluvan M. G.
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-327.patch
>
>
> AVRO-315 proposed performance improvements to readLong(), readFloat() and 
> readDouble(). readLong() did not improve performance well for all. Scott 
> proposed a better method (but requires a change in semantics and API). We'll 
> carry on the discussion on that proposal here. AVRO-315 will be committed 
> with changes for readFloat() and readDouble().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-327) Performance improvements to BinaryDecoder.readLong()

Reply via email to