[ 
https://issues.apache.org/jira/browse/AVRO-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834899#action_12834899
 ] 

Scott Carey commented on AVRO-392:
----------------------------------

bq. Instead of having a state for the factory, can we have a boolean argument 
to the factory method that takes an InputStream? If we have the state, we need 
to have another method returns the state etc.

I'm concerned about the future flexibility of a boolean flag.  It is 
essentially exposing the implementation detail that the current BinaryDecoder 
is not 'direct' for InputStreams, but is for the other stuff.

If the mapping of which factory methods return different instances for 'direct' 
and non-direct changes for some reason, it affects the API.
For example, maybe in the future, we add support for a factory method that 
operates on an Iterator<ByteBuffer>.  The first implementation might 
'read-ahead' past the minimum ByteBuffer necessary and thus not be applicable 
when 'direct' is used.
But, a later implementation might fix this and ensure that the iterator never 
advances past the minimum necessary position and become applicable for 'direct' 
needs.  With a state flag on the factory, this all remains hidden.  With a flag 
on the factory method, it is exposed.

Maybe the flexibility is a bit of overkill for this one use case, but if we end 
up with more and more configuration state flags over time, the factory methods 
don't have to change.  If we push all that to the method signatures, it is 
harder to evolve.
We currently have one state variable in the factory -- the buffer size.  It is 
only applicable to the InputStream factory at the moment.  
The factory could be static if we put that on the factory method, but that 
would have the same drawbacks -- bufferSize may be applicable to some 
implementations, and not others, and which implementations are returned from 
each factory may change too.


bq. But instead of adding supported() methods, can we have these methods throw 
OperationNotSupportedException(). If the clients encounter, for example 
isEndSupported() == false, they have to somehow manage the situation, They can 
now handle the exception to achieve the same. at least it makes the interface 
simpler. What do you think?

I think that implementations that don't support these operations should 
definitely throw OperationNotSupportedException (or equivalent).  But a client 
should be able to test for support proactively.  
I suppose that can be done with a try / catch instead of an if(), but I guess I 
belong to the "exceptions are for exceptional situations" school.  Its not 
exceptional for a user to want to know if a method is going to work, so I feel 
that 

{code}
if (!isEndSupported()) {
  logger.error("Decoder must support isEnd());
  throw ... // or return, or whatever the client needs to do
}
{code}
is cleaner than
{code}
try {
  isEnd();
} catch (OperationNotSupportedException e) {
  logger.error("Decoder must support isEnd());
  throw e;  // or return, or whatever the client needs to do
}
{code}

Also, there may be cases where a client could have
{code}
if (isEndSupported()) {
  // do one thing
} else {
  // do something else
}
{code}
Which is much uglier (and significantly slower) in exception form.

But that is definitely a style thing.  I can go either way here.

We agree on the core of all this so I'll start working on that right now.  What 
is left will be a decision on whether we should remove the supportsXX on 
Decoder and whether to put the 'direct' part of the factory as a method flag or 
a state variable.  Some input/opinions from some others would help resolve that 
quickly.

> Binary Decoder Performance and flexibility overhaul
> ---------------------------------------------------
>
>                 Key: AVRO-392
>                 URL: https://issues.apache.org/jira/browse/AVRO-392
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>            Priority: Critical
>             Fix For: 1.3.0
>
>         Attachments: AVRO-392-preview.patch, 
> AVRO-392-with_DirectBinaryDecoder-2.patch, 
> AVRO-392-with_DirectBinaryDecoder.patch, AVRO-392.patch, AVRO-392.patch, 
> AVRO-392.patch, AVRO-392.patch
>
>
> BinaryDecoder has room for significant performance improvement.  
> [AVRO-327|https://issues.apache.org/jira/browse/AVRO-327] has some 
> preliminary work here, but in order to satisfy some use cases there is much 
> more work to do.
> I am opening a new ticket because the scope of the changes needed to do this 
> the right way are larger.
> I have done a large bulk of a new implementation that abstracts a 
> 'ByteSource' from the BinaryDecoder.  Currently BinaryDecoder is tightly 
> coupled to InputStream.  The ByteSource can wrap an InputStream, FileChannel, 
> or byte[] in this version, but could be extended to support other channel 
> types, sockets, etc.  This abstraction allows the BinaryDecoder to buffer 
> data from various sources while supporting interleaved access to the 
> underlying data and greater flexibility going forward.
> The performance of this abstraction has been heavily tuned so that maximum 
> performance can be achieved even for slower ByteSource implementations.
> For readers that must interleave reads on a stream with the decoder, this 
> includes a
> {code}
> public InputStream inputStream();
> {code}
> method on the decoder that can serve interleaved reads.  
> Additionally it will be necessary to have a constructor on BinaryDecoder that 
> allows two BinaryDecoders to share a stream (and buffer).
> Performance results on this new version is better than previous prototypes:
> *current trunk BinaryDecoder*
> {noformat}
> ReadInt: 983 ms, 30.497877855999185 million entries/sec
> ReadLongSmall: 1058 ms, 28.336666040111496 million entries/sec
> ReadLong: 1518 ms, 19.75179889508437 million entries/sec
> ReadFloat: 657 ms, 45.61031157924184 million entries/sec
> ReadDouble: 761 ms, 39.387756709704355 million entries/sec
> ReadBoolean: 331 ms, 90.4268145647456 million entries/sec
> RepeaterTest: 7718 ms, 3.886725782038378 million entries/sec
> NestedRecordTest: 1884 ms, 15.91964611687992 million entries/sec
> ResolverTest: 8296 ms, 3.616055866616717 million entries/sec
> MigrationTest: 21216 ms, 1.4139999570144013 million entries/sec
> {noformat}
> *buffering BinaryDecoder*
> {noformat}
> ReadInt: 187 ms, 160.22131904871262 million entries/sec
> ReadLongSmall: 372 ms, 80.4863521975457 million entries/sec
> ReadLong: 613 ms, 48.882385721129246 million entries/sec
> ReadFloat: 253 ms, 118.16606270679061 million entries/sec
> ReadDouble: 275 ms, 108.94314257389068 million entries/sec
> ReadBoolean: 222 ms, 134.85327963176064 million entries/sec
> RepeaterTest: 3335 ms, 8.993007936329503 million entries/sec
> NestedRecordTest: 1152 ms, 26.0256943004597 million entries/sec
> ResolverTest: 4213 ms, 7.120659335077578 million entries/sec
> MigrationTest: 15310 ms, 1.9594884898992941 million entries/sec
> {noformat}
> Performance is 2x to 5x the throughput of trunk on most tests.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to