Dear All:

       I need to generate some data by Samza to Kafka and then write to
Parquet formate file.

      I was asked why I choose Avro type as my Samza output to Kafka
instead of Protocol Buffer. Since currently our data on Kafka are all
Protocol buffer type message.

      I explained that Avro encoded message has advantages such as, the
encoded size smaller, no extra code compile, implementation easier.  fast
to serialize/deserialize and supporting a lot language.

      However some people believe when encoded the Avro message take as
much space as Protocol buffer, but with schema, the size could be much
bigger.

        I am wondering if there are any other advantages make you choose
Avro as your message type  How you consider the data size for Avro vs
Protocol buffer?

 Sincerely,
Selina


Reference:
1. https://issues.apache.org/jira/browse/SAMZA-317
2.
http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html

3. https://avro.apache.org/docs/1.7.7/gettingstartedjava.html
4.
https://www.igvita.com/2011/08/01/protocol-buffers-avro-thrift-messagepack/
5. http://tech.puredanger.com/2011/05/27/serialization-comparison/

Reply via email to