Dear All: I need to generate some data by Samza to Kafka and then write to Parquet formate file.
I was asked why I choose Avro type as my Samza output to Kafka instead of Protocol Buffer. Since currently our data on Kafka are all Protocol buffer type message. I explained that Avro encoded message has advantages such as, the encoded size smaller, no extra code compile, implementation easier. fast to serialize/deserialize and supporting a lot language. However some people believe when encoded the Avro message take as much space as Protocol buffer, but with schema, the size could be much bigger. I am wondering if there are any other advantages make you choose Avro as your message type How you consider the data size for Avro vs Protocol buffer? Sincerely, Selina Reference: 1. https://issues.apache.org/jira/browse/SAMZA-317 2. http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html 3. https://avro.apache.org/docs/1.7.7/gettingstartedjava.html 4. https://www.igvita.com/2011/08/01/protocol-buffers-avro-thrift-messagepack/ 5. http://tech.puredanger.com/2011/05/27/serialization-comparison/