[
https://issues.apache.org/jira/browse/SAMZA-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marouane RAJI updated SAMZA-2265:
---------------------------------
Environment:
was:
```
job.container.count : 110
yarn.container.memory.mb=4000
yarn.container.cpu.cores=8
yarn.am.container.cpu.cores=8
yarn.am.container.memory.mb=1024
task.opts=-Xmx2800M
task.checkpoint.replication.factor=2
```
> Memory leak potentially due to Kafka Checkpoint Management
> ----------------------------------------------------------
>
> Key: SAMZA-2265
> URL: https://issues.apache.org/jira/browse/SAMZA-2265
> Project: Samza
> Issue Type: Bug
> Affects Versions: 1.0, 1.1
> Environment:
>
> Reporter: Marouane RAJI
> Priority: Major
> Attachments: image-2019-07-01-09-47-11-241.png,
> image-2019-07-01-09-48-45-876.png, image-2019-07-01-09-50-04-693.png
>
>
> Hi,
> We recently upgraded one of our high throughput samza jobs from 0.13.1 to 1.0
> then to 1.1. It seems that in both later versions we would have a memory
> leak. This ever-increasing memory would lead to containers failing/ yarn
> restarting them.
> It is worth noticing that we upgraded other smaller (in container specs and
> throughput) samza jobs without any issues.
> specs about job :
> * reading ~70k msg/sec
> * 211 input topic , including one broadcasting one (2 msg/day, used for
> config updates)
> * 1 output topic.
> Below, memory consumption in both versions for one container
> !image-2019-07-01-09-47-11-241.png!
>
> Heap-dumps comparison:
> !image-2019-07-01-09-48-45-876.png!
>
> The difference between both version keep increasing slowly, the main cause of
> that in the increase in byte[]
> In the 1.0 and 1.1 version the main reference holding these bytes seems to be
> KafkaCheckpointManager:
> !image-2019-07-01-09-50-04-693.png!
>
> Could this PR solves this issues [https://github.com/apache/samza/pull/993] ?
> as, we would be releasing KafkaConsumer used for checkpointing ?
> Thanks.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)