Yeah, this reduced-overhead message format calls for the need to have an Avro schema registry s.t. you can lookup the actual Avro schema via the schemaId.
On Wed, Nov 18, 2015 at 5:53 PM, Selina Tech <swucaree...@gmail.com> wrote: > Hi, Yi: > > I think I got the answer as below: > > "The Kafka message format starts with a magic byte indicating what kind of > serialization is used for this message. And if this byte indicates Avro, > you can layout your message as starting with the schemaId and then followed > by message payload. Upon consumption, you can first get the schemaId, query > Avro for the schema given the id, and then use schema to deserialize the > message" > --http://grokbase.com/t/kafka/users/138mdm6tp3/avro-serialization > > > Thanks again! > Sincerely, > Selina > > On Wed, Nov 18, 2015 at 5:43 PM, Selina Tech <swucaree...@gmail.com> > wrote: > > > Hi, Yi: > > Thanks for your reply. Do you mean there is no advantage of Avro > > message vs Protocol buffer message on Kafka except Avro schema registry? > > > > BTW, do you know how Kafka implement the Avro message? Does each > Avro > > message include the schema or not? The size of Avro message is a big > > concern for me now. > > > > Sincerely, > > Selina > > > > > > > > On Wed, Nov 18, 2015 at 5:29 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > >> Hi, Selina, > >> > >> Samza's producer/consumer is highly tunable. You can configure it to use > >> ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf > >> format. The use of Avro in Kafka is LinkedIn's choice and does not > >> necessarily fit others. > >> > >> For the sake of "why LinkedIn uses Avro", here is the biggest reason: > >> LinkedIn uses Avro schema registry to ensure that producer/consumer are > >> using compatible Avro schema versions. It is a specific way of > maintaining > >> compatibility between producer and consumer in LinkedIn. ProtoBuf does > not > >> seem to have the schema registry functionality and requires > re-compilation > >> to make sure producer and consumer are compatible on the wire-format of > >> the > >> message. > >> > >> If you have other ways to maintain the compatibility between producer > and > >> consumers using ProtoBuf, I don't see why you cannot use ProtoBuf in > >> Samza. > >> > >> Best, > >> > >> -Yi > >> > >> On Wed, Nov 18, 2015 at 3:43 PM, Selina Tech <swucaree...@gmail.com> > >> wrote: > >> > >> > Dear All: > >> > > >> > I need to generate some data by Samza to Kafka and then write to > >> > Parquet formate file. I was asked why I choose Avro type as my Samza > >> > output to Kafka instead of Protocol Buffer. Since currently our data > on > >> > Kafka are all Protocol buffer. > >> > I explained for Avro encoded message -- The encoded size is > >> smaller, > >> > no extra code compile, implementation easier. fast to > >> > serialize/deserialize and support a lot language. However some people > >> > believe when encoded the Avro message take as much space as Protocol > >> > buffer, but with schema, the size could be much bigger. > >> > > >> > I am wondering if there are any other advantages make you choose > >> Avro > >> > as your message type at Kafka? > >> > > >> > Sincerely, > >> > Selina > >> > > >> > > > > >