[ https://issues.apache.org/jira/browse/AVRO-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830353#action_12830353 ]
Scott Carey commented on AVRO-392: ---------------------------------- I am unsure about splitting BinaryDecoder up into two. Should the old version implement all of the same constructors/initializers? Or only maintain the InputStream one? One option is to make a "DirectBinaryDecoder" class that has the same capabilities as the old one, and supports only InputStream construction. This implementation is not buffered if given a byte[], buffering is optional and depends on the ByteSource, so BufferedBinaryDecoder might not be the right name. This would mean that clients that cannot allow any read-ahead can switch to that, but all others will by default get the new one. A second class also raises some important testing questions. Do we need to duplicate a lot of tests to run against both? How many clients actually need a guarantee that it does not read more bytes than absolutely necessary? Can anyone come up with a use case where inputStream() combined with remaining() to detect how far ahead it read and the means to retrieve those bytes, is not sufficient? Splitting it up with an abstract parent may hurt performance a bit, enough implementations of a method eventually trigger the JIT to de-optimize and prevent method inlining. For most things this doesn't matter, but for something in a tight loop like ReadInt() it might. Another problem is that there will likely be features going forward that will depend on the buffering, or only be needed by the buffering, which will further branch the two implementations. I would also prefer not to maintain two versions of every readType() method if possible. What use cases are there where inputStream() is not sufficient? Can we add to the new BinaryDecoder to address those rather than keep another variation? *Issues* {quote} I think ByteSource.read(byte[], int, int) does not conform to the InputStream spec under a specific condition. If the buffer has some bytes less than the number of desired bytes it copies that data into the destination buffer. Then it calls tryReadRaw(). If tryReadRaw() returns 0, it returns -1. It should return the number of bytes copied so far. {quote} Good catch. Fixed to add the result of tryReadRaw() to reamaining, and check that total for 0 for the -1 return. {quote} If the underlying stream has fewer than desired length, readRaw would throw EOFException. Wouldn't be useful to somehow let the client know the actual number of bytes read into the buffer passed? {quote} That is what tryReadRaw() is for. It should always return how many bytes are actually read, up to MIN(maximum available, requested). A client that must know how much data is copied should use tryReadRaw and treat a 0 return as EOF. I tried removing readRaw() and only having tryReadRaw(), but that hurt performance since the clients had to have more conditionals in 'common path' code -- that is most of the time is not EOF, so checking for it on every read hurts tight loops. This is essentially what is wrong with InputStream -- checks for EOF happen on both the client and internal side of the read() methods all the time even though the common case is not EOF. {quote} Is there a need to have mark() with empty body in ByteSource. InputStream already has an identical implementation. {quote} Removed. {quote} InputStream.skipSourceBytes() catches EOFException and does something. The only method called in that function is InputStream.skip(). InputStream.skip() newer throws EOFException. It indicates EOF thorough a return value of 0. The similar observation applies to InputStream.read() within InputStreamByteSource.tryReadRaw(). {quote} That was the first implementation, but ByteBufferInputStream and InflaterInputStream both do not adhere to the InputStream contract and throw EOFException instead of returning -1 in some cases. If you remove the catch, tests do not pass (in fact, its worse, the SocketServer test spins in an infinite loop forever). For example: {noformat} java.io.EOFException at org.apache.avro.ipc.ByteBufferInputStream.getBuffer(ByteBufferInputStream.java:84) at org.apache.avro.ipc.ByteBufferInputStream.read(ByteBufferInputStream.java:46) at org.apache.avro.io.BinaryDecoder$InputStreamByteSource.tryReadRaw(BinaryDecoder.java:683) {noformat} Because the declared contract of skip() and read() can throw IOException, I have to be defensive and catch the EOF subtype EOFExeption since the InputStream javadoc contract is not enforced by the compiler, and trySkipBytes and tryReadRaw both must never throw EOFException. {quote} In BinaryDecoder.readFloat() and readDouble(), we check if we have enough number of bytes in the buffer after converting bytes into int's. It will be better if we check before converting. This logic is sound for readInt() and readLong() since we don't know the number of bytes needed. But for readFloat() and readDouble(), we know the number of bytes needed, so we check early. {quote} This started out as an optimization that was worth about 10% in decoding performance when _pos_ was incremented with each read. But, by removing the false dependency between the check and the read it doesn't matter as much as in the past. Placing the most common case first and the unusual case (EOF) second tends to help performance. For smaller buffer sizes, or less aggressive compilers this helps. The check and the load are not dependent. Although the code looks at first glance that one part is 'first' and one part is 'second', the compiler and CPU are free to reorder these and speculatively execute since one operation does not depend on the other. Placing the conditional second is a sort of compiler 'hint' that we want this to be a speculative load -- load the bytes and deal with the consequences of the conditional simultaneously if possible; avoid waiting on the conditional before initiating the loads and do both concurrently if possible. {quote} I think minpos is not used correctly in BinaryDecoder.ensureBounds(). The remaining bytes in the buffer are moved to location starting at minPos, but pos is set to 0. It should be set to minPos, right? {quote} Good catch! I enhanced a test in TestBinaryDecoder to detect this bug. This test reliably breaks without minPos and limit set properly. It found an additional bug one line below as well: limit did not account for minPos. {quote} # Indentation for continuation lines is the same as the first line in a small number of places. E.g the definition of the method ByteSource.skipSourceBytes(). # Some documentation around ByteSource and BufferAccessor classes will be useful. # In InputStreamByteSource.readRaw(), the variable read is only used within the while loop. So, the declaration can move inside the while loop. {quote} I'll clean those up. {quote} # There are many occasions like if (c) return x; esle return y;. I feel it'll be more readable if we have return c ? x : y. Of course, it is a matter of taste. # There are many occassions with code pos += x; setPos(pos); return xx. Will it be better if we write setPos(pos + x); return xx;? Again, it's a matter of taste. {quote} Yeah, its a matter of taste. I prefer separating complex statements into multiple lines since it is easier to debug when stepping through code line by line and watching the variables change state. Additionally, if the code changes, and the statements are less condensed, the diff better represents the change. > Binary Decoder Performance and flexibility overhaul > --------------------------------------------------- > > Key: AVRO-392 > URL: https://issues.apache.org/jira/browse/AVRO-392 > Project: Avro > Issue Type: Improvement > Components: java > Reporter: Scott Carey > Assignee: Scott Carey > Priority: Critical > Fix For: 1.3.0 > > Attachments: AVRO-392-preview.patch, AVRO-392.patch, AVRO-392.patch > > > BinaryDecoder has room for significant performance improvement. > [AVRO-327|https://issues.apache.org/jira/browse/AVRO-327] has some > preliminary work here, but in order to satisfy some use cases there is much > more work to do. > I am opening a new ticket because the scope of the changes needed to do this > the right way are larger. > I have done a large bulk of a new implementation that abstracts a > 'ByteSource' from the BinaryDecoder. Currently BinaryDecoder is tightly > coupled to InputStream. The ByteSource can wrap an InputStream, FileChannel, > or byte[] in this version, but could be extended to support other channel > types, sockets, etc. This abstraction allows the BinaryDecoder to buffer > data from various sources while supporting interleaved access to the > underlying data and greater flexibility going forward. > The performance of this abstraction has been heavily tuned so that maximum > performance can be achieved even for slower ByteSource implementations. > For readers that must interleave reads on a stream with the decoder, this > includes a > {code} > public InputStream inputStream(); > {code} > method on the decoder that can serve interleaved reads. > Additionally it will be necessary to have a constructor on BinaryDecoder that > allows two BinaryDecoders to share a stream (and buffer). > Performance results on this new version is better than previous prototypes: > *current trunk BinaryDecoder* > {noformat} > ReadInt: 983 ms, 30.497877855999185 million entries/sec > ReadLongSmall: 1058 ms, 28.336666040111496 million entries/sec > ReadLong: 1518 ms, 19.75179889508437 million entries/sec > ReadFloat: 657 ms, 45.61031157924184 million entries/sec > ReadDouble: 761 ms, 39.387756709704355 million entries/sec > ReadBoolean: 331 ms, 90.4268145647456 million entries/sec > RepeaterTest: 7718 ms, 3.886725782038378 million entries/sec > NestedRecordTest: 1884 ms, 15.91964611687992 million entries/sec > ResolverTest: 8296 ms, 3.616055866616717 million entries/sec > MigrationTest: 21216 ms, 1.4139999570144013 million entries/sec > {noformat} > *buffering BinaryDecoder* > {noformat} > ReadInt: 187 ms, 160.22131904871262 million entries/sec > ReadLongSmall: 372 ms, 80.4863521975457 million entries/sec > ReadLong: 613 ms, 48.882385721129246 million entries/sec > ReadFloat: 253 ms, 118.16606270679061 million entries/sec > ReadDouble: 275 ms, 108.94314257389068 million entries/sec > ReadBoolean: 222 ms, 134.85327963176064 million entries/sec > RepeaterTest: 3335 ms, 8.993007936329503 million entries/sec > NestedRecordTest: 1152 ms, 26.0256943004597 million entries/sec > ResolverTest: 4213 ms, 7.120659335077578 million entries/sec > MigrationTest: 15310 ms, 1.9594884898992941 million entries/sec > {noformat} > Performance is 2x to 5x the throughput of trunk on most tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.