[ 
https://issues.apache.org/jira/browse/KAFKA-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500347#comment-16500347
 ] 

Chia-Ping Tsai commented on KAFKA-5761:
---------------------------------------

+1(non-binding) to this idea(if the ByteBuffer overkill, we can enhance the 
org.apache.kafka.common.utils.Bytes to accept the offset and length). Our 
application requires this feature also. The serializer, currently, forces us to 
do a deep copy from our custom off-heap pool in order to return a "byte[]"...

> Serializer API should support ByteBuffer
> ----------------------------------------
>
>                 Key: KAFKA-5761
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5761
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients
>    Affects Versions: 0.11.0.0
>            Reporter: Bhaskar Gollapudi
>            Priority: Major
>              Labels: features, performance
>
> Consider the Serializer : Its main method is :
> byte[] serialize(String topic, T data);
> Producer applications create a implementation that takes in an instance (
> of T ) and convert that to a byte[]. This byte array is allocated a new for
> this message.This byte array then is handed over to Kafka Producer API
> internals that write the bytes to buffer/ network socket. When the next
> message arrives , the serializer instead of creating a new byte[] , should
> try to reuse the existing byte[] for the new message. This requires two
> things :
> 1. The process of handing off the bytes to the buffer/socket and reusing
> the byte[] must happen on the same thread.
> 2 There should be a way for marking the end of available bytes in the
> byte[].
> The first is reasonably simple to understand. If this does not happen , and
> without other necessary synchrinization , the byte[] get corrupted and so
> is the message written to buffer/socket.However , this requirement is easy
> to meet for a producer application , because it controls the threads on
> which the serializer is invoked.
> The second is where the problem lies with the current API. It does not
> allow a variable size of bytes to be read from a container. It is limited
> by the byte[]'s length. This forces the producer to
> 1 either create a new byte[] for a message that is bigger than the previous
> one.
> OR
> 2. Decide a max size and use a padding .
> Both are cumbersome and error prone, and may cause wasting of network
> bandwidth.
> Instead , if there is an Serializer with this method :
> ByteBuffer serialize(String topic, T data);
> This helps to implements a reusable bytes container for  clients to avoid
> allocations for each message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to