[jira] [Comment Edited] (KAFKA-1895) Investigate moving deserialization and decompression out of KafkaConsumer

Jason Gustafson (JIRA) Wed, 22 Feb 2017 11:28:19 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879020#comment-15879020
 ]


Jason Gustafson edited comment on KAFKA-1895 at 2/22/17 7:27 PM:
-----------------------------------------------------------------

[~original-brownbear] How would you propose to expose RawRecordIterator in the 
consumer API? The problem is that IO is currently driven through the poll() 
API, so introducing something else would be a bit weird (though maybe there's a 
nice way to do it). I think that's why Jay was suggesting to push 
deserialization into the ConsumerRecords class. It wouldn't require an API 
change. 

That said, there are definitely use cases where lower level access has been 
desired. MM is one of them. It sucks that we need to decompress and recompress. 
To solve that problem, we would need access to the raw data (not just the key 
value). However, I'm not sure that KafkaProducer and KafkaConsumer are the 
right place for this, nor am I sure that it's ultimately something we want to 
support as a user API.




was (Author: hachikuji):
[~original-brownbear] How would you propose to expose RawRecordIterator in the 
consumer API? The problem is that IO is currently driven through the poll() 
API, so introducing something else would be a bit weird (though maybe there's a 
nice way to do it). I think that's why Jay was suggesting to push 
deserialization into the ConsumerRecords class. It wouldn't require an API 
change. 

That said, there are definitely use cases where lower level access has been 
desired. MM is one of them. It sucks that we need to decompress and recompress. 
To solve that problem, we would need access to the raw data (not just the key 
value). However, I'm not sure that KafkaProducer and KafkaConsumer are the 
right place for this, nor am I sure that it's ultimately something we want 
ultimately want to support as a user API.



> Investigate moving deserialization and decompression out of KafkaConsumer
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-1895
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1895
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>            Reporter: Jay Kreps
>
> The consumer implementation in KAFKA-1760 decompresses fetch responses and 
> deserializes them into ConsumerRecords which are then handed back as the 
> result of poll().
> There are several downsides to this:
> 1. It is impossible to scale serialization and decompression work beyond the 
> single thread running the KafkaConsumer.
> 2. The results can come back during the processing of other calls such as 
> commit() etc which can result in caching these records a little longer.
> An alternative would be to have ConsumerRecords wrap the actual compressed 
> serialized MemoryRecords chunks and do the deserialization during iteration. 
> This way you could scale this over a thread pool if needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KAFKA-1895) Investigate moving deserialization and decompression out of KafkaConsumer

Reply via email to