As one of the people who spent too much time building Avro repositories, +1
on bringing serializer API back.

I think it will make the new producer easier to work with.

Gwen

On Mon, Nov 24, 2014 at 6:13 PM, Jay Kreps <jay.kr...@gmail.com> wrote:

> This is admittedly late in the release cycle to make a change. To add to
> Jun's description the motivation was that we felt it would be better to
> change that interface now rather than after the release if it needed to
> change.
>
> The motivation for wanting to make a change was the ability to really be
> able to develop support for Avro and other serialization formats. The
> current status is pretty scattered--there is a schema repository on an Avro
> JIRA and another fork of that on github, and a bunch of people we have
> talked to have done similar things for other serialization systems. It
> would be nice if these things could be packaged in such a way that it was
> possible to just change a few configs in the producer and get rich metadata
> support for messages.
>
> As we were thinking this through we realized that the new api we were about
> to introduce was kind of not very compatable with this since it was just
> byte[] oriented.
>
> You can always do this by adding some kind of wrapper api that wraps the
> producer. But this puts us back in the position of trying to document and
> support multiple interfaces.
>
> This also opens up the possibility of adding a MessageValidator or
> MessageInterceptor plug-in transparently so that you can do other custom
> validation on the messages you are sending which obviously requires access
> to the original object not the byte array.
>
> This api doesn't prevent using byte[] by configuring the
> ByteArraySerializer it works as it currently does.
>
> -Jay
>
> On Mon, Nov 24, 2014 at 5:58 PM, Jun Rao <jun...@gmail.com> wrote:
>
> > Hi, Everyone,
> >
> > I'd like to start a discussion on whether it makes sense to add the
> > serializer api back to the new java producer. Currently, the new java
> > producer takes a byte array for both the key and the value. While this
> api
> > is simple, it pushes the serialization logic into the application. This
> > makes it hard to reason about what type of data is being sent to Kafka
> and
> > also makes it hard to share an implementation of the serializer. For
> > example, to support Avro, the serialization logic could be quite involved
> > since it might need to register the Avro schema in some remote registry
> and
> > maintain a schema cache locally, etc. Without a serialization api, it's
> > impossible to share such an implementation so that people can easily
> reuse.
> > We sort of overlooked this implication during the initial discussion of
> the
> > producer api.
> >
> > So, I'd like to propose an api change to the new producer by adding back
> > the serializer api similar to what we had in the old producer. Specially,
> > the proposed api changes are the following.
> >
> > First, we change KafkaProducer to take generic types K and V for the key
> > and the value, respectively.
> >
> > public class KafkaProducer<K,V> implements Producer<K,V> {
> >
> >     public Future<RecordMetadata> send(ProducerRecord<K,V> record,
> Callback
> > callback);
> >
> >     public Future<RecordMetadata> send(ProducerRecord<K,V> record);
> > }
> >
> > Second, we add two new configs, one for the key serializer and another
> for
> > the value serializer. Both serializers will default to the byte array
> > implementation.
> >
> > public class ProducerConfig extends AbstractConfig {
> >
> >     .define(KEY_SERIALIZER_CLASS_CONFIG, Type.CLASS,
> > "org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH,
> > KEY_SERIALIZER_CLASS_DOC)
> >     .define(VALUE_SERIALIZER_CLASS_CONFIG, Type.CLASS,
> > "org.apache.kafka.clients.producer.ByteArraySerializer", Importance.HIGH,
> > VALUE_SERIALIZER_CLASS_DOC);
> > }
> >
> > Both serializers will implement the following interface.
> >
> > public interface Serializer<T> extends Configurable {
> >     public byte[] serialize(String topic, T data, boolean isKey);
> >
> >     public void close();
> > }
> >
> > This is more or less the same as what's in the old producer. The slight
> > differences are (1) the serializer now only requires a parameter-less
> > constructor; (2) the serializer has a configure() and a close() method
> for
> > initialization and cleanup, respectively; (3) the serialize() method
> > additionally takes the topic and an isKey indicator, both of which are
> > useful for things like schema registration.
> >
> > The detailed changes are included in KAFKA-1797. For completeness, I also
> > made the corresponding changes for the new java consumer api as well.
> >
> > Note that the proposed api changes are incompatible with what's in the
> > 0.8.2 branch. However, if those api changes are beneficial, it's probably
> > better to include them now in the 0.8.2 release, rather than later.
> >
> > I'd like to discuss mainly two things in this thread.
> > 1. Do people feel that the proposed api changes are reasonable?
> > 2. Are there any concerns of including the api changes in the 0.8.2 final
> > release?
> >
> > Thanks,
> >
> > Jun
> >
>

Reply via email to