My vague recollection was that those two cases were about a 15% perf
difference in the producer in the small message case. For this and
metrics changes we should quantify the degradation (if any) if we're
making changes.

-Jay

On Tue, Dec 22, 2015 at 4:24 AM, Rajini Sivaram
<[email protected]> wrote:
> Thank you, Ismael. I will switch to ConcurrentHashMap later today if there
> are no objections.
>
> Regards,
>
> Rajini
>
> On Tue, Dec 22, 2015 at 11:57 AM, Ismael Juma <[email protected]> wrote:
>
>> Hi Rajini,
>>
>> Looking at git annotate, the choice of CopyOnWriteMap was done in the
>> original commit that introduced the Producer. For cases where writes are
>> very rare, it can be a win, but it's problematic as soon as this assumption
>> changes (particularly if the map is large). We switched from CopyOnWriteMap
>> to ConcurrentHashMap in the Metrics class, as a result (see KAFKA-2664 for
>> details). It sounds like we may want to change it here too.
>>
>> Best,
>> Ismael
>>
>> On Tue, Dec 22, 2015 at 11:41 AM, Rajini Sivaram <
>> [email protected]> wrote:
>>
>> > I was looking at removing unused partitions from
>> > org.apache.kafka.clients.producer.internals.RecordAccumulator#batches to
>> > avoid the map growing indefinitely, especially in the REST service. The
>> PR
>> > under https://issues.apache.org/jira/browse/KAFKA-2948 has the details.
>> > With CopyOnWriteMap, removing entries requires synchronization, and there
>> > are places in the code like dequeFor() which relies on partitions never
>> > being removed from the map. I wasn't sure why CopyOnWriteMap was chosen
>> for
>> > this map, instead of perhaps ConcurrentHashMap. This would avoid large
>> > copies of maps under a lock when there are large number of partitions,
>> and
>> > also make it easier to remove entries from the map. But I imagine there
>> > must have been a reason why CopyOnWriteMap was preferred choice.
>> >
>> > Thoughts?
>> >
>> >
>> > Regards,
>> >
>> > Rajini
>> >
>>

Reply via email to