Hi all,

We're building an application that requires a high-throughput Kafka
consumer in C++ and must utilize Avro as its data transmission format.

Currently memoryInputStream allocates a new MemoryInputStream2 on the heap
every call.

For context we're decoding in a tight loop and that's adds a new/delete per
message for an object that's just 3 scalars. To avoid that, we wrote an
internal InputStream that reuses the memory buffer via a reset() method. I
thought it might be a good idea to hoist this into the Avro C++
implementation itself.

 I benchmarked the difference on my local machine and it had a pretty
significant impact. Although it’s a local, hacky benchmark so take it with
a grain of salt.

  Full decode (3-field record, binaryDecoder, release build):

  - Stock memoryInputStream(): 111 ns/decode  ~9M decodes/sec

  - Reusable stream with reset(): 74 ns/decode ~13.5M decodes/sec

  Happy to put together a patch if there's interest. Any thoughts on the
right approach, or is this something that would be desired?

Thanks,
Robbie

Reply via email to