[ https://issues.apache.org/jira/browse/KAFKA-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497680#comment-13497680 ]
Jun Rao commented on KAFKA-544: ------------------------------- Thanks for patch v3. Looks good overall. Some minor comments: 30. Encoder: It seems that we require the constructor of Encoder and Partitioner to take a VerifiableProperty. It would be good if we can add a comment on that in the trait. 31. ConsumerConnector: Can we have a version of create,essageStreamsByFilter without the decoders? 32. ConsumerFetcherManager: no change is needed. 33. BlockingChannel: logger.debug() should be just debug(). 34. ChecksumMessageFormatter: We probably can't remove it since it may be used in our tests. > Retain key in producer and expose it in the consumer > ---------------------------------------------------- > > Key: KAFKA-544 > URL: https://issues.apache.org/jira/browse/KAFKA-544 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8 > Reporter: Jay Kreps > Assignee: Jay Kreps > Priority: Blocker > Labels: bugs > Attachments: KAFKA-544-v1.patch, KAFKA-544-v2.patch, > KAFKA-544-v3.patch, KAFKA-544-v4.patch > > > KAFKA-506 added support for retaining a key in the messages, however this > field is not yet set by the producer. > The proposal for doing this is to change the producer api to change > ProducerData to allow only a single key/value pair so it has a one-to-one > mapping to Message. That is change from > ProducerData(topic: String, key: K, data: Seq[V]) > to > ProducerData(topic: String, key: K, data: V) > The key itself needs to be encoded. There are several ways this could be > handled. A few of the options: > 1. Change the Encoder and Decoder to be MessageEncoder and MessageDecoder and > have them take both a key and value. > 2. Another option is to change the type of the encoder/decoder to not refer > to Message so it could be used for both the key and value. > I favor the second option but am open to feedback. > One concern with our current approach to serialization as well as both of > these proposals is that they are inefficient. We go from > Object=>byte[]=>Message=>MessageSet with a copy at each step. In the case of > compression there are a bunch of intermediate steps. We could theoretically > clean this up by instead having an interface for the encoder that was > something like > Encoder.writeTo(buffer: ByteBuffer, object: AnyRef) > and > Decoder.readFrom(buffer:ByteBuffer): AnyRef > However there are two problems with this. The first is that we don't actually > know the size of the data until it is serialized so we can't really allocate > the bytebuffer properly and might need to resize it. The second is that in > the case of compression there is a whole other path to consider. Originally I > thought maybe it would be good to try to fix this, but now I think it should > be out-of-scope and we should revisit the efficiency issue in a future > release in conjunction with our internal handling of compression. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira