My vague recollection was that those two cases were about a 15% perf difference in the producer in the small message case. For this and metrics changes we should quantify the degradation (if any) if we're making changes.
-Jay On Tue, Dec 22, 2015 at 4:24 AM, Rajini Sivaram <[email protected]> wrote: > Thank you, Ismael. I will switch to ConcurrentHashMap later today if there > are no objections. > > Regards, > > Rajini > > On Tue, Dec 22, 2015 at 11:57 AM, Ismael Juma <[email protected]> wrote: > >> Hi Rajini, >> >> Looking at git annotate, the choice of CopyOnWriteMap was done in the >> original commit that introduced the Producer. For cases where writes are >> very rare, it can be a win, but it's problematic as soon as this assumption >> changes (particularly if the map is large). We switched from CopyOnWriteMap >> to ConcurrentHashMap in the Metrics class, as a result (see KAFKA-2664 for >> details). It sounds like we may want to change it here too. >> >> Best, >> Ismael >> >> On Tue, Dec 22, 2015 at 11:41 AM, Rajini Sivaram < >> [email protected]> wrote: >> >> > I was looking at removing unused partitions from >> > org.apache.kafka.clients.producer.internals.RecordAccumulator#batches to >> > avoid the map growing indefinitely, especially in the REST service. The >> PR >> > under https://issues.apache.org/jira/browse/KAFKA-2948 has the details. >> > With CopyOnWriteMap, removing entries requires synchronization, and there >> > are places in the code like dequeFor() which relies on partitions never >> > being removed from the map. I wasn't sure why CopyOnWriteMap was chosen >> for >> > this map, instead of perhaps ConcurrentHashMap. This would avoid large >> > copies of maps under a lock when there are large number of partitions, >> and >> > also make it easier to remove entries from the map. But I imagine there >> > must have been a reason why CopyOnWriteMap was preferred choice. >> > >> > Thoughts? >> > >> > >> > Regards, >> > >> > Rajini >> > >>
