[jira] Commented: (AVRO-392) Binary Decoder Performance and flexibility overhaul

Scott Carey (JIRA) Fri, 05 Feb 2010 15:00:52 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830353#action_12830353
 ]


Scott Carey commented on AVRO-392:
----------------------------------

I am unsure about splitting BinaryDecoder up into two.  Should the old version 
implement all of the same constructors/initializers?  Or only maintain the 
InputStream one?   One option is to make a "DirectBinaryDecoder" class that has 
the same capabilities as the old one, and supports only InputStream 
construction.   This implementation is not buffered if given a byte[], 
buffering is optional and depends on the ByteSource, so BufferedBinaryDecoder 
might not be the right name.  This would mean that clients that cannot allow 
any read-ahead can switch to that, but all others will by default get the new 
one.
A second class also raises some important testing questions.  Do we need to 
duplicate a lot of tests to run against both?  How many clients actually need a 
guarantee that it does not read more bytes than absolutely necessary? 
Can anyone come up with a use case where inputStream() combined with 
remaining() to detect how far ahead it read and the means to retrieve those 
bytes, is not sufficient?

Splitting it up with an abstract parent may hurt performance a bit, enough 
implementations of a method eventually trigger the JIT to de-optimize and 
prevent method inlining.  For most things this doesn't matter, but for 
something in a tight loop like ReadInt() it might.

Another problem is that there will likely be features going forward that will 
depend on the buffering, or only be needed by the buffering, which will further 
branch the two implementations.   I would also prefer not to maintain two 
versions of every readType() method if possible.
What use cases are there where inputStream() is not sufficient?  Can we add to 
the new BinaryDecoder to address those rather than keep another variation? 

*Issues*
{quote}
I think ByteSource.read(byte[], int, int) does not conform to the InputStream 
spec under a specific condition. If the buffer has some bytes less than the 
number of desired bytes it copies that data into the destination buffer. Then 
it calls tryReadRaw(). If tryReadRaw() returns 0, it returns -1. It should 
return the number of bytes copied so far.
{quote}
Good catch.  Fixed to add the result of tryReadRaw() to reamaining, and check 
that total for 0 for the -1 return.

{quote}
If the underlying stream has fewer than desired length, readRaw would throw 
EOFException. Wouldn't be useful to somehow let the client know the actual 
number of bytes read into the buffer passed?
{quote}
That is what tryReadRaw() is for.  It should always return how many bytes are 
actually read, up to MIN(maximum available, requested).  A client that must 
know how much data is copied should use tryReadRaw and treat a 0 return as EOF. 
 I tried removing readRaw() and only having tryReadRaw(), but that hurt 
performance since the clients had to have more conditionals in 'common path' 
code -- that is most of the time is not EOF, so checking for it on every read 
hurts tight loops.  This is essentially what is wrong with InputStream -- 
checks for EOF happen on both the client and internal side of the read() 
methods all the time even though the common case is not EOF.

{quote}
Is there a need to have mark() with empty body in ByteSource. InputStream 
already has an identical implementation.
{quote}
Removed.

{quote}
InputStream.skipSourceBytes() catches EOFException and does something. The only 
method called in that function is InputStream.skip(). InputStream.skip() newer 
throws EOFException. It indicates EOF thorough a return value of 0. The similar 
observation applies to InputStream.read() within 
InputStreamByteSource.tryReadRaw().
{quote}
That was the first implementation, but ByteBufferInputStream and 
InflaterInputStream both do not adhere to the InputStream contract and throw 
EOFException instead of returning -1 in some cases.  If you remove the catch, 
tests do not pass (in fact, its worse, the SocketServer test spins in an 
infinite loop forever).  For example:
{noformat}
java.io.EOFException
        at 
org.apache.avro.ipc.ByteBufferInputStream.getBuffer(ByteBufferInputStream.java:84)
        at 
org.apache.avro.ipc.ByteBufferInputStream.read(ByteBufferInputStream.java:46)
        at 
org.apache.avro.io.BinaryDecoder$InputStreamByteSource.tryReadRaw(BinaryDecoder.java:683)
{noformat}
Because the declared contract of skip() and read() can throw IOException, I 
have to be defensive and catch the EOF subtype EOFExeption since the 
InputStream javadoc contract is not enforced by the compiler, and trySkipBytes 
and tryReadRaw both must never throw EOFException.

{quote}
In BinaryDecoder.readFloat() and readDouble(), we check if we have enough 
number of bytes in the buffer after converting bytes into int's. It will be 
better if we check before converting. This logic is sound for readInt() and 
readLong() since we don't know the number of bytes needed. But for readFloat() 
and readDouble(), we know the number of bytes needed, so we check early.
{quote}
This started out as an optimization that was worth about 10% in decoding 
performance when _pos_ was incremented with each read.  But, by removing the 
false dependency between the check and the read it doesn't matter as much as in 
the past.  Placing the most common case first and the unusual case (EOF) second 
tends to help performance.  For smaller buffer sizes, or less aggressive 
compilers this helps.
The check and the load are not dependent.  Although the code looks at first 
glance that one part is 'first' and one part is 'second', the compiler and CPU 
are free to reorder these and speculatively execute since one operation does 
not depend on the other.  Placing the conditional second is a sort of compiler 
'hint' that we want this to be a speculative load -- load the bytes and deal 
with the consequences of the conditional simultaneously if possible; avoid 
waiting on the conditional before initiating the loads and do both concurrently 
if possible.

{quote}
I think minpos is not used correctly in BinaryDecoder.ensureBounds(). The 
remaining bytes in the buffer are moved to location starting at minPos, but pos 
is set to 0. It should be set to minPos, right?
{quote}

Good catch!  I enhanced a test in TestBinaryDecoder to detect this bug.  This 
test reliably breaks without minPos and limit set properly.  It found an 
additional bug one line below as well:  limit did not account for minPos.

{quote}
# Indentation for continuation lines is the same as the first line in a small 
number of places. E.g the definition of the method ByteSource.skipSourceBytes().
# Some documentation around ByteSource and BufferAccessor classes will be 
useful.
# In InputStreamByteSource.readRaw(), the variable read is only used within the 
while loop. So, the declaration can move inside the while loop.
{quote}
I'll clean those up.

{quote}
# There are many occasions like if (c) return x; esle return y;. I feel it'll 
be more readable if we have return c ? x : y. Of course, it is a matter of 
taste.
# There are many occassions with code pos += x; setPos(pos); return xx. Will it 
be better if we write setPos(pos + x); return xx;? Again, it's a matter of 
taste.
{quote}
Yeah, its a matter of taste.  I prefer separating complex statements into 
multiple lines since it is easier to debug when stepping through code line by 
line and watching the variables change state.  Additionally, if the code 
changes, and the statements are less condensed, the diff better represents the 
change.

> Binary Decoder Performance and flexibility overhaul
> ---------------------------------------------------
>
>                 Key: AVRO-392
>                 URL: https://issues.apache.org/jira/browse/AVRO-392
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.3.0
>
>         Attachments: AVRO-392-preview.patch, AVRO-392.patch, AVRO-392.patch
>
>
> BinaryDecoder has room for significant performance improvement.  
> [AVRO-327|https://issues.apache.org/jira/browse/AVRO-327] has some 
> preliminary work here, but in order to satisfy some use cases there is much 
> more work to do.
> I am opening a new ticket because the scope of the changes needed to do this 
> the right way are larger.
> I have done a large bulk of a new implementation that abstracts a 
> 'ByteSource' from the BinaryDecoder.  Currently BinaryDecoder is tightly 
> coupled to InputStream.  The ByteSource can wrap an InputStream, FileChannel, 
> or byte[] in this version, but could be extended to support other channel 
> types, sockets, etc.  This abstraction allows the BinaryDecoder to buffer 
> data from various sources while supporting interleaved access to the 
> underlying data and greater flexibility going forward.
> The performance of this abstraction has been heavily tuned so that maximum 
> performance can be achieved even for slower ByteSource implementations.
> For readers that must interleave reads on a stream with the decoder, this 
> includes a
> {code}
> public InputStream inputStream();
> {code}
> method on the decoder that can serve interleaved reads.  
> Additionally it will be necessary to have a constructor on BinaryDecoder that 
> allows two BinaryDecoders to share a stream (and buffer).
> Performance results on this new version is better than previous prototypes:
> *current trunk BinaryDecoder*
> {noformat}
> ReadInt: 983 ms, 30.497877855999185 million entries/sec
> ReadLongSmall: 1058 ms, 28.336666040111496 million entries/sec
> ReadLong: 1518 ms, 19.75179889508437 million entries/sec
> ReadFloat: 657 ms, 45.61031157924184 million entries/sec
> ReadDouble: 761 ms, 39.387756709704355 million entries/sec
> ReadBoolean: 331 ms, 90.4268145647456 million entries/sec
> RepeaterTest: 7718 ms, 3.886725782038378 million entries/sec
> NestedRecordTest: 1884 ms, 15.91964611687992 million entries/sec
> ResolverTest: 8296 ms, 3.616055866616717 million entries/sec
> MigrationTest: 21216 ms, 1.4139999570144013 million entries/sec
> {noformat}
> *buffering BinaryDecoder*
> {noformat}
> ReadInt: 187 ms, 160.22131904871262 million entries/sec
> ReadLongSmall: 372 ms, 80.4863521975457 million entries/sec
> ReadLong: 613 ms, 48.882385721129246 million entries/sec
> ReadFloat: 253 ms, 118.16606270679061 million entries/sec
> ReadDouble: 275 ms, 108.94314257389068 million entries/sec
> ReadBoolean: 222 ms, 134.85327963176064 million entries/sec
> RepeaterTest: 3335 ms, 8.993007936329503 million entries/sec
> NestedRecordTest: 1152 ms, 26.0256943004597 million entries/sec
> ResolverTest: 4213 ms, 7.120659335077578 million entries/sec
> MigrationTest: 15310 ms, 1.9594884898992941 million entries/sec
> {noformat}
> Performance is 2x to 5x the throughput of trunk on most tests.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-392) Binary Decoder Performance and flexibility overhaul

Reply via email to