Re: Use of CopyOnWriteMap in RecordAccumulator.java

Rajini Sivaram Tue, 22 Dec 2015 14:58:03 -0800

Jay, Ismael,

Thank you, I will run some performance tests first. I will try and get this
done tomorrow, otherwise, I will get onto it first thing in the new year.



Thank you...

Regards,

Rajini

On Tue, Dec 22, 2015 at 5:04 PM, Ismael Juma <[email protected]> wrote:

> Yes, definitely. We should also have automated alerts in case of
> regressions. We already run the existing performance tools on a nightly
> basis, which is a start:
>
> http://testing.confluent.io/kafka/latest/
>
> One thing to keep in mind is that there have been significant changes to
> the implementation of HashMap and ConcurrentHashMap over the last couple of
> years (in JDK 7 and 8), so (1) we need to check that performance doesn't
> degrade with newer JDKs and (2) we should be aware that some older tests
> may produce different results today.
>
> With regards to KAFKA-2664, here's a link to a discussion between Joel and
> Aditya with some performance numbers:
>
> https://github.com/apache/kafka/pull/369/files#r43207947
>
> It looks like the focus was on create sensor performance (which is where
> the problem was), but I don't see any mention of a more complex test (like
> ProducerPerformance).
>
> Ismael
> On 22 Dec 2015 16:14, "Jay Kreps" <[email protected]> wrote:
>
> > My vague recollection was that those two cases were about a 15% perf
> > difference in the producer in the small message case. For this and
> > metrics changes we should quantify the degradation (if any) if we're
> > making changes.
> >
> > -Jay
> >
> > On Tue, Dec 22, 2015 at 4:24 AM, Rajini Sivaram
> > <[email protected]> wrote:
> > > Thank you, Ismael. I will switch to ConcurrentHashMap later today if
> > there
> > > are no objections.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > > On Tue, Dec 22, 2015 at 11:57 AM, Ismael Juma <[email protected]>
> wrote:
> > >
> > >> Hi Rajini,
> > >>
> > >> Looking at git annotate, the choice of CopyOnWriteMap was done in the
> > >> original commit that introduced the Producer. For cases where writes
> are
> > >> very rare, it can be a win, but it's problematic as soon as this
> > assumption
> > >> changes (particularly if the map is large). We switched from
> > CopyOnWriteMap
> > >> to ConcurrentHashMap in the Metrics class, as a result (see KAFKA-2664
> > for
> > >> details). It sounds like we may want to change it here too.
> > >>
> > >> Best,
> > >> Ismael
> > >>
> > >> On Tue, Dec 22, 2015 at 11:41 AM, Rajini Sivaram <
> > >> [email protected]> wrote:
> > >>
> > >> > I was looking at removing unused partitions from
> > >> >
> org.apache.kafka.clients.producer.internals.RecordAccumulator#batches
> > to
> > >> > avoid the map growing indefinitely, especially in the REST service.
> > The
> > >> PR
> > >> > under https://issues.apache.org/jira/browse/KAFKA-2948 has the
> > details.
> > >> > With CopyOnWriteMap, removing entries requires synchronization, and
> > there
> > >> > are places in the code like dequeFor() which relies on partitions
> > never
> > >> > being removed from the map. I wasn't sure why CopyOnWriteMap was
> > chosen
> > >> for
> > >> > this map, instead of perhaps ConcurrentHashMap. This would avoid
> large
> > >> > copies of maps under a lock when there are large number of
> partitions,
> > >> and
> > >> > also make it easier to remove entries from the map. But I imagine
> > there
> > >> > must have been a reason why CopyOnWriteMap was preferred choice.
> > >> >
> > >> > Thoughts?
> > >> >
> > >> >
> > >> > Regards,
> > >> >
> > >> > Rajini
> > >> >
> > >>
> >
>

Re: Use of CopyOnWriteMap in RecordAccumulator.java

Reply via email to