[jira] [Commented] (KAFKA-544) Retain key in producer

Jay Kreps (JIRA) Fri, 05 Oct 2012 13:02:04 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470610#comment-13470610
 ]


Jay Kreps commented on KAFKA-544:
---------------------------------

After looking at the code I think there is a fair amount of work here. I 
recommend we put off the user-facing API change until 0.9. Instead I propose 
the following intermediate hack for 0.8:
1. Use the existing ProducerData object to get the key and value. This is 
slightly unnatural because it allows you to associate a key with many values.
2. Use option (2) above for the encoders

So specifically this means that the two interfaces would now be 
trait Encoder[T] {
  def toBytes(t: T)
}
trait Decoder[T] {
  def fromBytes(b: Array[Byte]
}

There would now be two encoders, one for the key and one for the value. The 
value would still be configured by the property "serializer.class" but we would 
add a new property "key.serializer.class" which would default to use the same 
value as the value serializer.

The plan would be to hold off on any changes to the consumer for now.
                
> Retain key in producer
> ----------------------
>
>                 Key: KAFKA-544
>                 URL: https://issues.apache.org/jira/browse/KAFKA-544
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>
> KAFKA-506 added support for retaining a key in the messages, however this 
> field is not yet set by the producer.
> The proposal for doing this is to change the producer api to change 
> ProducerData to allow only a single key/value pair so it has a one-to-one 
> mapping to Message. That is change from
>   ProducerData(topic: String, key: K, data: Seq[V])
> to
>   ProducerData(topic: String, key: K, data: V)
> The key itself needs to be encoded. There are several ways this could be 
> handled. A few of the options:
> 1. Change the Encoder and Decoder to be MessageEncoder and MessageDecoder and 
> have them take both a key and value.
> 2. Another option is to change the type of the encoder/decoder to not refer 
> to Message so it could be used for both the key and value.
> I favor the second option but am open to feedback.
> One concern with our current approach to serialization as well as both of 
> these proposals is that they are inefficient. We go from 
> Object=>byte[]=>Message=>MessageSet with a copy at each step. In the case of 
> compression there are a bunch of intermediate steps. We could theoretically 
> clean this up by instead having an interface for the encoder that was 
> something like
>    Encoder.writeTo(buffer: ByteBuffer, object: AnyRef)
> and
>    Decoder.readFrom(buffer:ByteBuffer): AnyRef
> However there are two problems with this. The first is that we don't actually 
> know the size of the data until  it is serialized so we can't really allocate 
> the bytebuffer properly and might need to resize it. The second is that in 
> the case of compression there is a whole other path to consider. Originally I 
> thought maybe it would be good to try to fix this, but now I think it should 
> be out-of-scope and we should revisit the efficiency issue in a future 
> release in conjunction with our internal handling of compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-544) Retain key in producer

Reply via email to