As one of the people who spent too much time building Avro repositories, +1 on bringing serializer API back.
I think it will make the new producer easier to work with. Gwen On Mon, Nov 24, 2014 at 6:13 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > This is admittedly late in the release cycle to make a change. To add to > Jun's description the motivation was that we felt it would be better to > change that interface now rather than after the release if it needed to > change. > > The motivation for wanting to make a change was the ability to really be > able to develop support for Avro and other serialization formats. The > current status is pretty scattered--there is a schema repository on an Avro > JIRA and another fork of that on github, and a bunch of people we have > talked to have done similar things for other serialization systems. It > would be nice if these things could be packaged in such a way that it was > possible to just change a few configs in the producer and get rich metadata > support for messages. > > As we were thinking this through we realized that the new api we were about > to introduce was kind of not very compatable with this since it was just > byte[] oriented. > > You can always do this by adding some kind of wrapper api that wraps the > producer. But this puts us back in the position of trying to document and > support multiple interfaces. > > This also opens up the possibility of adding a MessageValidator or > MessageInterceptor plug-in transparently so that you can do other custom > validation on the messages you are sending which obviously requires access > to the original object not the byte array. > > This api doesn't prevent using byte[] by configuring the > ByteArraySerializer it works as it currently does. > > -Jay > > On Mon, Nov 24, 2014 at 5:58 PM, Jun Rao <jun...@gmail.com> wrote: > > > Hi, Everyone, > > > > I'd like to start a discussion on whether it makes sense to add the > > serializer api back to the new java producer. Currently, the new java > > producer takes a byte array for both the key and the value. While this > api > > is simple, it pushes the serialization logic into the application. This > > makes it hard to reason about what type of data is being sent to Kafka > and > > also makes it hard to share an implementation of the serializer. For > > example, to support Avro, the serialization logic could be quite involved > > since it might need to register the Avro schema in some remote registry > and > > maintain a schema cache locally, etc. Without a serialization api, it's > > impossible to share such an implementation so that people can easily > reuse. > > We sort of overlooked this implication during the initial discussion of > the > > producer api. > > > > So, I'd like to propose an api change to the new producer by adding back > > the serializer api similar to what we had in the old producer. Specially, > > the proposed api changes are the following. > > > > First, we change KafkaProducer to take generic types K and V for the key > > and the value, respectively. > > > > public class KafkaProducer<K,V> implements Producer<K,V> { > > > > public Future<RecordMetadata> send(ProducerRecord<K,V> record, > Callback > > callback); > > > > public Future<RecordMetadata> send(ProducerRecord<K,V> record); > > } > > > > Second, we add two new configs, one for the key serializer and another > for > > the value serializer. Both serializers will default to the byte array > > implementation. > > > > public class ProducerConfig extends AbstractConfig { > > > > .define(KEY_SERIALIZER_CLASS_CONFIG, Type.CLASS, > > "org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH, > > KEY_SERIALIZER_CLASS_DOC) > > .define(VALUE_SERIALIZER_CLASS_CONFIG, Type.CLASS, > > "org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH, > > VALUE_SERIALIZER_CLASS_DOC); > > } > > > > Both serializers will implement the following interface. > > > > public interface Serializer<T> extends Configurable { > > public byte[] serialize(String topic, T data, boolean isKey); > > > > public void close(); > > } > > > > This is more or less the same as what's in the old producer. The slight > > differences are (1) the serializer now only requires a parameter-less > > constructor; (2) the serializer has a configure() and a close() method > for > > initialization and cleanup, respectively; (3) the serialize() method > > additionally takes the topic and an isKey indicator, both of which are > > useful for things like schema registration. > > > > The detailed changes are included in KAFKA-1797. For completeness, I also > > made the corresponding changes for the new java consumer api as well. > > > > Note that the proposed api changes are incompatible with what's in the > > 0.8.2 branch. However, if those api changes are beneficial, it's probably > > better to include them now in the 0.8.2 release, rather than later. > > > > I'd like to discuss mainly two things in this thread. > > 1. Do people feel that the proposed api changes are reasonable? > > 2. Are there any concerns of including the api changes in the 0.8.2 final > > release? > > > > Thanks, > > > > Jun > > >