[jira] Commented: (AVRO-392) Binary Decoder Performance and flexibility overhaul

Scott Carey (JIRA) Wed, 10 Feb 2010 12:13:41 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832166#action_12832166
 ]


Scott Carey commented on AVRO-392:
----------------------------------

bq. Do we distinguish between malformed data (a varint that hasn't finished) 
and a genuine EOF? I think we don't need to (that's the job of the level high 
up), but just want to make sure.

We might want to introduce something like a 'BinaryFormatException' to deal 
with the "this can't be an encoded int" -like cases. but we can do that in 
another ticket.

bq. Why is BufferAccessor static? It seems that it attaches itself only ever to 
a BinaryDecoder object. I don't know that it makes a difference; might be just 
a style preference.
2 things.  
* Style -- Since a non-static private class has a hidden private member 'this' 
that points to the parent object, it has more overhead.  So by default I start 
with static and remove it if needed.
* Flexibility -- I wasn't sure if I'd want to share a source or 'reattach' one. 
 For example, two decoders could share one source.  Once it was coded to work 
as static, I left it that way.  With the current use cases, it doesn't matter 
much.

{quote}I'm a bit concerned that we haven't tested all of the edges cases here. 
Introducing buffering into the Decoder has certainly increased the code 
complexity. We could run a code coverage tool to see if all the new code is 
touched by the existing TestBinaryDecoder code. I didn't see (though I only had 
a chance to do a quick pass) tests that detect EOFExceptions on the tricky 
ensureBounds() codepath. Likewise, I'd be more comfortable if the two variants 
of ByteSource had their own unit tests.{quote}

I've used EclEmma in eclipse to generate code coverage reports.  The main parts 
missing (counting coverage from TestBinaryDecoder and TestDataFile) are:
* Invalid Int/Long encoding.
* Most skipXX methods.
* BufferAccessor when detached.
* .inputStream() read and skip methods.

I'll add some tests to cover these.  Several of the above are covered in other 
tests (for example, the skipXXX methods).


> Binary Decoder Performance and flexibility overhaul
> ---------------------------------------------------
>
>                 Key: AVRO-392
>                 URL: https://issues.apache.org/jira/browse/AVRO-392
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.3.0
>
>         Attachments: AVRO-392-preview.patch, AVRO-392.patch, AVRO-392.patch, 
> AVRO-392.patch
>
>
> BinaryDecoder has room for significant performance improvement.  
> [AVRO-327|https://issues.apache.org/jira/browse/AVRO-327] has some 
> preliminary work here, but in order to satisfy some use cases there is much 
> more work to do.
> I am opening a new ticket because the scope of the changes needed to do this 
> the right way are larger.
> I have done a large bulk of a new implementation that abstracts a 
> 'ByteSource' from the BinaryDecoder.  Currently BinaryDecoder is tightly 
> coupled to InputStream.  The ByteSource can wrap an InputStream, FileChannel, 
> or byte[] in this version, but could be extended to support other channel 
> types, sockets, etc.  This abstraction allows the BinaryDecoder to buffer 
> data from various sources while supporting interleaved access to the 
> underlying data and greater flexibility going forward.
> The performance of this abstraction has been heavily tuned so that maximum 
> performance can be achieved even for slower ByteSource implementations.
> For readers that must interleave reads on a stream with the decoder, this 
> includes a
> {code}
> public InputStream inputStream();
> {code}
> method on the decoder that can serve interleaved reads.  
> Additionally it will be necessary to have a constructor on BinaryDecoder that 
> allows two BinaryDecoders to share a stream (and buffer).
> Performance results on this new version is better than previous prototypes:
> *current trunk BinaryDecoder*
> {noformat}
> ReadInt: 983 ms, 30.497877855999185 million entries/sec
> ReadLongSmall: 1058 ms, 28.336666040111496 million entries/sec
> ReadLong: 1518 ms, 19.75179889508437 million entries/sec
> ReadFloat: 657 ms, 45.61031157924184 million entries/sec
> ReadDouble: 761 ms, 39.387756709704355 million entries/sec
> ReadBoolean: 331 ms, 90.4268145647456 million entries/sec
> RepeaterTest: 7718 ms, 3.886725782038378 million entries/sec
> NestedRecordTest: 1884 ms, 15.91964611687992 million entries/sec
> ResolverTest: 8296 ms, 3.616055866616717 million entries/sec
> MigrationTest: 21216 ms, 1.4139999570144013 million entries/sec
> {noformat}
> *buffering BinaryDecoder*
> {noformat}
> ReadInt: 187 ms, 160.22131904871262 million entries/sec
> ReadLongSmall: 372 ms, 80.4863521975457 million entries/sec
> ReadLong: 613 ms, 48.882385721129246 million entries/sec
> ReadFloat: 253 ms, 118.16606270679061 million entries/sec
> ReadDouble: 275 ms, 108.94314257389068 million entries/sec
> ReadBoolean: 222 ms, 134.85327963176064 million entries/sec
> RepeaterTest: 3335 ms, 8.993007936329503 million entries/sec
> NestedRecordTest: 1152 ms, 26.0256943004597 million entries/sec
> ResolverTest: 4213 ms, 7.120659335077578 million entries/sec
> MigrationTest: 15310 ms, 1.9594884898992941 million entries/sec
> {noformat}
> Performance is 2x to 5x the throughput of trunk on most tests.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-392) Binary Decoder Performance and flexibility overhaul

Reply via email to