StephanEwen edited a comment on pull request #17520: URL: https://github.com/apache/flink/pull/17520#issuecomment-970256639
@tsreaper Thanks for doing the benchmark. I am curious to understand what the difference is between "bulk format + array list" and "stream format", because the "stream format" also puts deserialized records into an ArrayList. But something must be different, else there would not be such a big performance difference. Can we try and identify that, and maybe update the StreamFormatAdapter to be better? I would also be curious to understand where the performance difference with different block sizes come from in the StreamFormat. The stream format counts the batch size bytes after decompression, and it should be independent of Avro's blocks and sync markers, so I am puzzled why it has an impact. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
