Yes, definitely. We should also have automated alerts in case of regressions. We already run the existing performance tools on a nightly basis, which is a start:
http://testing.confluent.io/kafka/latest/ One thing to keep in mind is that there have been significant changes to the implementation of HashMap and ConcurrentHashMap over the last couple of years (in JDK 7 and 8), so (1) we need to check that performance doesn't degrade with newer JDKs and (2) we should be aware that some older tests may produce different results today. With regards to KAFKA-2664, here's a link to a discussion between Joel and Aditya with some performance numbers: https://github.com/apache/kafka/pull/369/files#r43207947 It looks like the focus was on create sensor performance (which is where the problem was), but I don't see any mention of a more complex test (like ProducerPerformance). Ismael On 22 Dec 2015 16:14, "Jay Kreps" <[email protected]> wrote: > My vague recollection was that those two cases were about a 15% perf > difference in the producer in the small message case. For this and > metrics changes we should quantify the degradation (if any) if we're > making changes. > > -Jay > > On Tue, Dec 22, 2015 at 4:24 AM, Rajini Sivaram > <[email protected]> wrote: > > Thank you, Ismael. I will switch to ConcurrentHashMap later today if > there > > are no objections. > > > > Regards, > > > > Rajini > > > > On Tue, Dec 22, 2015 at 11:57 AM, Ismael Juma <[email protected]> wrote: > > > >> Hi Rajini, > >> > >> Looking at git annotate, the choice of CopyOnWriteMap was done in the > >> original commit that introduced the Producer. For cases where writes are > >> very rare, it can be a win, but it's problematic as soon as this > assumption > >> changes (particularly if the map is large). We switched from > CopyOnWriteMap > >> to ConcurrentHashMap in the Metrics class, as a result (see KAFKA-2664 > for > >> details). It sounds like we may want to change it here too. > >> > >> Best, > >> Ismael > >> > >> On Tue, Dec 22, 2015 at 11:41 AM, Rajini Sivaram < > >> [email protected]> wrote: > >> > >> > I was looking at removing unused partitions from > >> > org.apache.kafka.clients.producer.internals.RecordAccumulator#batches > to > >> > avoid the map growing indefinitely, especially in the REST service. > The > >> PR > >> > under https://issues.apache.org/jira/browse/KAFKA-2948 has the > details. > >> > With CopyOnWriteMap, removing entries requires synchronization, and > there > >> > are places in the code like dequeFor() which relies on partitions > never > >> > being removed from the map. I wasn't sure why CopyOnWriteMap was > chosen > >> for > >> > this map, instead of perhaps ConcurrentHashMap. This would avoid large > >> > copies of maps under a lock when there are large number of partitions, > >> and > >> > also make it easier to remove entries from the map. But I imagine > there > >> > must have been a reason why CopyOnWriteMap was preferred choice. > >> > > >> > Thoughts? > >> > > >> > > >> > Regards, > >> > > >> > Rajini > >> > > >> >
