[ https://issues.apache.org/jira/browse/KAFKA-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489252#comment-16489252 ]
John Roesler commented on KAFKA-6925: ------------------------------------- I read the ticket and took another look at parentSensors in 1.0. I think my trunk change would fix it, since we no longer have a parentSensors map at all, but it might be tricky to make that change surgically and not wind up with another 1k+ diff to review and merge. Alternatively, looking at parentSensors, it seems we only add to the map. I think removeSensor should remove the sensor from the parentSensors map after it removes them from the registry. {noformat} public void removeSensor(Sensor sensor) { Objects.requireNonNull(sensor, "Sensor is null"); metrics.removeSensor(sensor.name()); final Sensor parent = parentSensors.get(sensor); if (parent != null) { metrics.removeSensor(parent.name()); } }{noformat} Since parentSensors are keyed and valued by Sensor instances, it will retain the sensors even after they have been otherwise unloaded. Other memory improvements here would be to store the names instead of the whole sensor (since we actually just need the name to remove them), or to use weak references in the map so that the map alone can't keep the objects alive. All in all, I think that rather than "backporting" my change (which would really be a re-implementation of one aspect of it), I'd recommend to make parentSensors a Map<String,String> (childName to parentName) and to remove from the map during removeSensor. > Memory leak in > org.apache.kafka.streams.processor.internals.StreamThread$StreamsMetricsThreadImpl > ------------------------------------------------------------------------------------------------- > > Key: KAFKA-6925 > URL: https://issues.apache.org/jira/browse/KAFKA-6925 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 1.0.1 > Reporter: Marcin Kuthan > Priority: Major > > The retained heap of > org.apache.kafka.streams.processor.internals.StreamThread$StreamsMetricsThreadImpl > is surprisingly high for long running job. Over 100MB of heap for every > stream after a week of uptime, when for the same application a few hours > after start heap takes 2MB. > For the problematic instance majority of memory StreamsMetricsThreadImpl is > occupied by hash map entries in parentSensors, over 8000 elements 100+kB > each. For fresh instance there are less than 200 elements. > Below you could find retained set report generated from Eclipse Mat but I'm > not fully sure about correctness due to complex object graph in the metrics > related code. Number of objects in single > StreamThread$StreamsMetricsThreadImpl instance. > > {code:java} > Class Name | Objects | Shallow Heap > ----------------------------------------------------------------------------------------------------------- > org.apache.kafka.common.metrics.KafkaMetric | 140,476 | 4,495,232 > org.apache.kafka.common.MetricName | 140,476 | 4,495,232 > org.apache.kafka.common.metrics.stats.SampledStat$Sample | 73,599 | 3,532,752 > org.apache.kafka.common.metrics.stats.Meter | 42,104 | 1,347,328 > org.apache.kafka.common.metrics.stats.Count | 42,104 | 1,347,328 > org.apache.kafka.common.metrics.stats.Rate | 42,104 | 1,010,496 > org.apache.kafka.common.metrics.stats.Total | 42,104 | 1,010,496 > org.apache.kafka.common.metrics.stats.Max | 28,134 | 900,288 > org.apache.kafka.common.metrics.stats.Avg | 28,134 | 900,288 > org.apache.kafka.common.metrics.Sensor | 3,164 | 202,496 > org.apache.kafka.common.metrics.Sensor[] | 3,164 | 71,088 > org.apache.kafka.streams.processor.internals.StreamThread$StreamsMetricsThreadImpl| > 1 | 56 > ----------------------------------------------------------------------------------------------------------- > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)