from:"Pavel Savov \(JIRA\)"

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

2019-06-03 Thread Pavel Savov (JIRA)



[ 
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854633#comment-16854633
 ] 

Pavel Savov commented on KAFKA-8367:


[~guozhang] Sure, please see the following gist: 
https://gist.github.com/pavelsavov/ce54be506be9f277ca47901faabe0407

> Non-heap memory leak in Kafka Streams
> -
>
> Key: KAFKA-8367
> URL: https://issues.apache.org/jira/browse/KAFKA-8367
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.2.0
>Reporter: Pavel Savov
>Priority: Major
> Attachments: memory-prod.png, memory-test.png
>
>
> We have been observing a non-heap memory leak after upgrading to Kafka 
> Streams 2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the 
> leak only happens when we enable stateful stream operations (utilizing 
> stores). We are aware of *KAFKA-8323* and have created our own fork of 2.2.0 
> and ported the fix scheduled for release in 2.2.1 to our fork. It did not 
> stop the leak, however.
> We are having this memory leak in our production environment where the 
> consumer group is auto-scaled in and out in response to changes in traffic 
> volume, and in our test environment where we have two consumers, no 
> autoscaling and relatively constant traffic.
> Below is some information I'm hoping will be of help:
>  * RocksDB Config:
> Block cache size: 4 MiB
> Write buffer size: 2 MiB
> Block size: 16 KiB
> Cache index and filter blocks: true
> Manifest preallocation size: 64 KiB
> Max write buffer number: 3
> Max open files: 6144
>  
>  * Memory usage in production
> The attached graph (memory-prod.png) shows memory consumption for each 
> instance as a separate line. The horizontal red line at 6 GiB is the memory 
> limit.
> As illustrated on the attached graph from production, memory consumption in 
> running instances goes up around autoscaling events (scaling the consumer 
> group either in or out) and associated rebalancing. It stabilizes until the 
> next autoscaling event but it never goes back down.
> An example of scaling out can be seen from around 21:00 hrs where three new 
> instances are started in response to a traffic spike.
> Just after midnight traffic drops and some instances are shut down. Memory 
> consumption in the remaining running instances goes up.
> Memory consumption climbs again from around 6:00AM due to increased traffic 
> and new instances are being started until around 10:30AM. Memory consumption 
> never drops until the cluster is restarted around 12:30.
>  
>  * Memory usage in test
> As illustrated by the attached graph (memory-test.png) we have a fixed number 
> of two instances in our test environment and no autoscaling. Memory 
> consumption rises linearly until it reaches the limit (around 2:00 AM on 
> 5/13) and Mesos restarts the offending instances, or we restart the cluster 
> manually.
>  
>  * No heap leaks observed
>  * Window retention: 2 or 11 minutes (depending on operation type)
>  * Issue not present in Kafka Streams 2.0.1
>  * No memory leak for stateless stream operations (when no RocksDB stores are 
> used)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

2019-05-24 Thread Pavel Savov (JIRA)



[ 
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847768#comment-16847768
 ] 

Pavel Savov commented on KAFKA-8367:


Below are dumps of our topologies:

 
 TOPOLOGY 1:
{noformat}
Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-00 (topics: 
[pl.allegro.analytics.page_view_raw])
      --> KSTREAM-MAP-01
    Processor: KSTREAM-MAP-01 (stores: [])
      --> KSTREAM-TRANSFORM-02
      <-- KSTREAM-SOURCE-00
    Processor: KSTREAM-TRANSFORM-02 (stores: 
[page_view_raw_deduplication_store])
      --> KSTREAM-SINK-03
      <-- KSTREAM-MAP-01
    Sink: KSTREAM-SINK-03 (topic: 
pl.allegro.analytics.page_view_raw_by_pv_id)
      <-- KSTREAM-TRANSFORM-02{noformat}
1 state store (window size: 1 minute, window retention: 2 minutes), topic 
pl.allegro.analytics.page_view_raw: 64 partitions, retention: 72 hours; topic 
pl.allegro.analytics.page_view_raw_by_pv_id: 16 partitions, retention: 3 hours

 

TOPOLOGY 2:
{noformat}
Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-00 (topics: [pl.allegro.analytics.event_raw])
      --> KSTREAM-MAP-01
    Processor: KSTREAM-MAP-01 (stores: [])
      --> KSTREAM-TRANSFORM-02
      <-- KSTREAM-SOURCE-00
    Processor: KSTREAM-TRANSFORM-02 (stores: 
[event_raw_deduplication_store])
      --> KSTREAM-SINK-03
      <-- KSTREAM-MAP-01
    Sink: KSTREAM-SINK-03 (topic: 
pl.allegro.analytics.event_raw_by_pv_id_local_pavel.savov)
      <-- KSTREAM-TRANSFORM-02{noformat}
1 state store (window size: 1 minute, window retention: 2 minutes), topic 
pl.allegro.analytics.event_raw: 64 partitions, retention: 72 hours; topic 
pl.allegro.analytics.event_raw_by_pv_id: 16 partitions, retention: 3 hours

 

TOPOLOGY 3:
{noformat}
Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-00 (topics: 
[pl.allegro.analytics.event_raw_by_pv_id])
      --> KSTREAM-FILTER-02
    Processor: KSTREAM-FILTER-02 (stores: [])
      --> KSTREAM-TRANSFORM-04
      <-- KSTREAM-SOURCE-00
    Processor: KSTREAM-TRANSFORM-04 (stores: 
[performance_windowed_store])
      --> KSTREAM-MAPVALUES-05
      <-- KSTREAM-FILTER-02
    Processor: KSTREAM-MAPVALUES-05 (stores: [])
      --> KSTREAM-FOREACH-06
      <-- KSTREAM-TRANSFORM-04
    Source: KSTREAM-SOURCE-01 (topics: 
[pl.allegro.analytics.page_view_raw_by_pv_id])
      --> KSTREAM-PROCESSOR-03
    Processor: KSTREAM-FOREACH-06 (stores: [])
      --> none
      <-- KSTREAM-MAPVALUES-05
    Processor: KSTREAM-PROCESSOR-03 (stores: 
[performance_windowed_store])
      --> none
      <-- KSTREAM-SOURCE-01{noformat}
1 state store (window size: 10 minutes, window retention: 11 minutes), topic 
pl.allegro.analytics.event_raw_by_pv_id: 16 partitions, retention: 3 hours; 
topic pl.allegro.analytics.page_view_raw_by_pv_id: 16 partitions, retention: 3 
hours

 

TOPOLOGY 4:
{noformat}
Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-00 (topics: 
[pl.allegro.analytics.event_raw_by_pv_id])
      --> KSTREAM-FILTER-02
    Processor: KSTREAM-FILTER-02 (stores: [])
      --> KSTREAM-TRANSFORM-04
      <-- KSTREAM-SOURCE-00
    Processor: KSTREAM-TRANSFORM-04 (stores: 
[opbox_boxes_windowed_store])
      --> KSTREAM-MAPVALUES-05
      <-- KSTREAM-FILTER-02
    Processor: KSTREAM-MAPVALUES-05 (stores: [])
      --> KSTREAM-FOREACH-06
      <-- KSTREAM-TRANSFORM-04
    Source: KSTREAM-SOURCE-01 (topics: 
[pl.allegro.analytics.page_view_raw_by_pv_id])
      --> KSTREAM-PROCESSOR-03
    Processor: KSTREAM-FOREACH-06 (stores: [])
      --> none
      <-- KSTREAM-MAPVALUES-05
    Processor: KSTREAM-PROCESSOR-03 (stores: 
[opbox_boxes_windowed_store])
      --> none
      <-- KSTREAM-SOURCE-01{noformat}
1 state store (window size: 10 minutes, window retention: 11 minutes), topic 
pl.allegro.analytics.event_raw_by_pv_id: 16 partitions, retention: 3 hours; 
topic pl.allegro.analytics.page_view_raw_by_pv_id: 16 partitions, retention: 3 
hours

 

Please let me know if there is anything else that could help you with the 
investigation.

Thanks!

 

 

> Non-heap memory leak in Kafka Streams
> -
>
> Key: KAFKA-8367
> URL: https://issues.apache.org/jira/browse/KAFKA-8367
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.2.0
>Reporter: Pavel Savov
>Priority: Major
> Attachments: memory-prod.png, memory-test.png
>
>
> We have been observing a non-heap memory leak after

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

2019-05-21 Thread Pavel Savov (JIRA)



[ 
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844958#comment-16844958
 ] 

Pavel Savov commented on KAFKA-8367:


[~guozhang] Yes, the commit you mentioned was included in the build.

[~ableegoldman] I tried with Kafka 2.1.0 and the leak is gone so it seems to 
have been introduced in 2.2. Our 2.0.1 app was the same as the 2.2 one. 

 

> Non-heap memory leak in Kafka Streams
> -
>
> Key: KAFKA-8367
> URL: https://issues.apache.org/jira/browse/KAFKA-8367
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.2.0
>Reporter: Pavel Savov
>Priority: Major
> Attachments: memory-prod.png, memory-test.png
>
>
> We have been observing a non-heap memory leak after upgrading to Kafka 
> Streams 2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the 
> leak only happens when we enable stateful stream operations (utilizing 
> stores). We are aware of *KAFKA-8323* and have created our own fork of 2.2.0 
> and ported the fix scheduled for release in 2.2.1 to our fork. It did not 
> stop the leak, however.
> We are having this memory leak in our production environment where the 
> consumer group is auto-scaled in and out in response to changes in traffic 
> volume, and in our test environment where we have two consumers, no 
> autoscaling and relatively constant traffic.
> Below is some information I'm hoping will be of help:
>  * RocksDB Config:
> Block cache size: 4 MiB
> Write buffer size: 2 MiB
> Block size: 16 KiB
> Cache index and filter blocks: true
> Manifest preallocation size: 64 KiB
> Max write buffer number: 3
> Max open files: 6144
>  
>  * Memory usage in production
> The attached graph (memory-prod.png) shows memory consumption for each 
> instance as a separate line. The horizontal red line at 6 GiB is the memory 
> limit.
> As illustrated on the attached graph from production, memory consumption in 
> running instances goes up around autoscaling events (scaling the consumer 
> group either in or out) and associated rebalancing. It stabilizes until the 
> next autoscaling event but it never goes back down.
> An example of scaling out can be seen from around 21:00 hrs where three new 
> instances are started in response to a traffic spike.
> Just after midnight traffic drops and some instances are shut down. Memory 
> consumption in the remaining running instances goes up.
> Memory consumption climbs again from around 6:00AM due to increased traffic 
> and new instances are being started until around 10:30AM. Memory consumption 
> never drops until the cluster is restarted around 12:30.
>  
>  * Memory usage in test
> As illustrated by the attached graph (memory-test.png) we have a fixed number 
> of two instances in our test environment and no autoscaling. Memory 
> consumption rises linearly until it reaches the limit (around 2:00 AM on 
> 5/13) and Mesos restarts the offending instances, or we restart the cluster 
> manually.
>  
>  * No heap leaks observed
>  * Window retention: 2 or 11 minutes (depending on operation type)
>  * Issue not present in Kafka Streams 2.0.1
>  * No memory leak for stateless stream operations (when no RocksDB stores are 
> used)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

2019-05-17 Thread Pavel Savov (JIRA)



[ 
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842266#comment-16842266
 ] 

Pavel Savov commented on KAFKA-8367:


Still no joy, I'm afraid. I built from that branch but the leak is still there. 

None of the settings in our RocksDBConfigSetter changed when we upgraded to 
2.2.0. I'll also try downgrading Kafka Streams to 2.1.0 and let you know how it 
goes.

Thanks!

> Non-heap memory leak in Kafka Streams
> -
>
> Key: KAFKA-8367
> URL: https://issues.apache.org/jira/browse/KAFKA-8367
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.2.0
>Reporter: Pavel Savov
>Priority: Major
> Attachments: memory-prod.png, memory-test.png
>
>
> We have been observing a non-heap memory leak after upgrading to Kafka 
> Streams 2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the 
> leak only happens when we enable stateful stream operations (utilizing 
> stores). We are aware of *KAFKA-8323* and have created our own fork of 2.2.0 
> and ported the fix scheduled for release in 2.2.1 to our fork. It did not 
> stop the leak, however.
> We are having this memory leak in our production environment where the 
> consumer group is auto-scaled in and out in response to changes in traffic 
> volume, and in our test environment where we have two consumers, no 
> autoscaling and relatively constant traffic.
> Below is some information I'm hoping will be of help:
>  * RocksDB Config:
> Block cache size: 4 MiB
> Write buffer size: 2 MiB
> Block size: 16 KiB
> Cache index and filter blocks: true
> Manifest preallocation size: 64 KiB
> Max write buffer number: 3
> Max open files: 6144
>  
>  * Memory usage in production
> The attached graph (memory-prod.png) shows memory consumption for each 
> instance as a separate line. The horizontal red line at 6 GiB is the memory 
> limit.
> As illustrated on the attached graph from production, memory consumption in 
> running instances goes up around autoscaling events (scaling the consumer 
> group either in or out) and associated rebalancing. It stabilizes until the 
> next autoscaling event but it never goes back down.
> An example of scaling out can be seen from around 21:00 hrs where three new 
> instances are started in response to a traffic spike.
> Just after midnight traffic drops and some instances are shut down. Memory 
> consumption in the remaining running instances goes up.
> Memory consumption climbs again from around 6:00AM due to increased traffic 
> and new instances are being started until around 10:30AM. Memory consumption 
> never drops until the cluster is restarted around 12:30.
>  
>  * Memory usage in test
> As illustrated by the attached graph (memory-test.png) we have a fixed number 
> of two instances in our test environment and no autoscaling. Memory 
> consumption rises linearly until it reaches the limit (around 2:00 AM on 
> 5/13) and Mesos restarts the offending instances, or we restart the cluster 
> manually.
>  
>  * No heap leaks observed
>  * Window retention: 2 or 11 minutes (depending on operation type)
>  * Issue not present in Kafka Streams 2.0.1
>  * No memory leak for stateless stream operations (when no RocksDB stores are 
> used)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

2019-05-16 Thread Pavel Savov (JIRA)



[ 
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841452#comment-16841452
 ] 

Pavel Savov commented on KAFKA-8367:


Hi [~vvcephei] and [~ableegoldman],

Thank you for your suggestions.

Yes, we are using a RocksDB Config Setter (and had used it before upgrading to 
2.2.0 too). The only object we are creating in that setter is an 
org.rocksdb.BlockBasedTableConfig instance:

 
{code:java}
val tableConfig = new org.rocksdb.BlockBasedTableConfig()
tableConfig.setBlockCacheSize(blockCacheSize) // block_cache_size (fetch cache)
tableConfig.setBlockSize(DefaultBlockSize)
tableConfig.setCacheIndexAndFilterBlocks(DefaultCacheIndexAndFilterBlocks)

options.setTableFormatConfig(tableConfig)
{code}
 

I tried building from the latest trunk but I'm afraid it didn't fix the leak.

Please let me know if there is any info I could provide you with that could 
help narrow down the issue.

 

Thanks!

 

> Non-heap memory leak in Kafka Streams
> -
>
> Key: KAFKA-8367
> URL: https://issues.apache.org/jira/browse/KAFKA-8367
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.2.0
>Reporter: Pavel Savov
>Priority: Major
> Attachments: memory-prod.png, memory-test.png
>
>
> We have been observing a non-heap memory leak after upgrading to Kafka 
> Streams 2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the 
> leak only happens when we enable stateful stream operations (utilizing 
> stores). We are aware of *KAFKA-8323* and have created our own fork of 2.2.0 
> and ported the fix scheduled for release in 2.2.1 to our fork. It did not 
> stop the leak, however.
> We are having this memory leak in our production environment where the 
> consumer group is auto-scaled in and out in response to changes in traffic 
> volume, and in our test environment where we have two consumers, no 
> autoscaling and relatively constant traffic.
> Below is some information I'm hoping will be of help:
>  * RocksDB Config:
> Block cache size: 4 MiB
> Write buffer size: 2 MiB
> Block size: 16 KiB
> Cache index and filter blocks: true
> Manifest preallocation size: 64 KiB
> Max write buffer number: 3
> Max open files: 6144
>  
>  * Memory usage in production
> The attached graph (memory-prod.png) shows memory consumption for each 
> instance as a separate line. The horizontal red line at 6 GiB is the memory 
> limit.
> As illustrated on the attached graph from production, memory consumption in 
> running instances goes up around autoscaling events (scaling the consumer 
> group either in or out) and associated rebalancing. It stabilizes until the 
> next autoscaling event but it never goes back down.
> An example of scaling out can be seen from around 21:00 hrs where three new 
> instances are started in response to a traffic spike.
> Just after midnight traffic drops and some instances are shut down. Memory 
> consumption in the remaining running instances goes up.
> Memory consumption climbs again from around 6:00AM due to increased traffic 
> and new instances are being started until around 10:30AM. Memory consumption 
> never drops until the cluster is restarted around 12:30.
>  
>  * Memory usage in test
> As illustrated by the attached graph (memory-test.png) we have a fixed number 
> of two instances in our test environment and no autoscaling. Memory 
> consumption rises linearly until it reaches the limit (around 2:00 AM on 
> 5/13) and Mesos restarts the offending instances, or we restart the cluster 
> manually.
>  
>  * No heap leaks observed
>  * Window retention: 2 or 11 minutes (depending on operation type)
>  * Issue not present in Kafka Streams 2.0.1
>  * No memory leak for stateless stream operations (when no RocksDB stores are 
> used)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KAFKA-8367) Non-heap memory leak in Kafka Streams

2019-05-15 Thread Pavel Savov (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pavel Savov updated KAFKA-8367:
---
Description:
We have been observing a non-heap memory leak after upgrading to Kafka Streams
2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the leak only
happens when we enable stateful stream operations (utilizing stores). We are
aware of *KAFKA-8323* and have created our own fork of 2.2.0 and ported the fix
scheduled for release in 2.2.1 to our fork. It did not stop the leak, however.

We are having this memory leak in our production environment where the consumer
group is auto-scaled in and out in response to changes in traffic volume, and
in our test environment where we have two consumers, no autoscaling and
relatively constant traffic.

Below is some information I'm hoping will be of help:
* RocksDB Config:

Block cache size: 4 MiB

Write buffer size: 2 MiB

Block size: 16 KiB

Cache index and filter blocks: true

Manifest preallocation size: 64 KiB

Max write buffer number: 3

Max open files: 6144

* Memory usage in production

The attached graph (memory-prod.png) shows memory consumption for each instance
as a separate line. The horizontal red line at 6 GiB is the memory limit.

As illustrated on the attached graph from production, memory consumption in
running instances goes up around autoscaling events (scaling the consumer group
either in or out) and associated rebalancing. It stabilizes until the next
autoscaling event but it never goes back down.

An example of scaling out can be seen from around 21:00 hrs where three new
instances are started in response to a traffic spike.

Just after midnight traffic drops and some instances are shut down. Memory
consumption in the remaining running instances goes up.

Memory consumption climbs again from around 6:00AM due to increased traffic and
new instances are being started until around 10:30AM. Memory consumption never
drops until the cluster is restarted around 12:30.

* Memory usage in test

As illustrated by the attached graph (memory-test.png) we have a fixed number
of two instances in our test environment and no autoscaling. Memory consumption
rises linearly until it reaches the limit (around 2:00 AM on 5/13) and Mesos
restarts the offending instances, or we restart the cluster manually.

* No heap leaks observed
* Window retention: 2 or 11 minutes (depending on operation type)
* Issue not present in Kafka Streams 2.0.1
* No memory leak for stateless stream operations (when no RocksDB stores are
used)

was:
We have been observing a non-heap memory leak after upgrading to Kafka Streams
2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the leak only
happens when we enable stateful stream operations (utilizing stores). We are
aware of *KAFKA-8323* and have created our own fork of 2.2.0 and ported the fix
scheduled for release in 2.2.1 to our fork. It did not stop the leak, however.

Below is some information I'm hoping will be of help:
# RocksDB Config:

Block cache size: 4 MiB

Write buffer size: 2 MiB

Block size: 16 KiB

Cache index and filter blocks: true

Manifest preallocation size: 64 KiB

Max write buffer number: 3

Max open files: 6144

# Memory usage in production

The attached graph (memory-prod.png) shows memory consumption for each instance
as a separate line. The horizontal red line at 6 GiB is the memory limit.

An example of scaling out can be seen from around 21:00 hrs where three new
instances are started in response to a traffic spike.

Just after midnight traffic drops and some instances are shut down. Memory
consumption in the remaining running instances goes up.

# Memory usage in test

# Window retention: 2 or 11 minutes (depending on operation type)

# Issue not present in Kafka Streams 2.0.1

# No

[jira] [Updated] (KAFKA-8367) Non-heap memory leak in Kafka Streams

2019-05-15 Thread Pavel Savov (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Below is some information I'm hoping will be of help:
# RocksDB Config:

Block cache size: 4 MiB

Write buffer size: 2 MiB

Block size: 16 KiB

Cache index and filter blocks: true

Manifest preallocation size: 64 KiB

Max write buffer number: 3

Max open files: 6144

# Memory usage in production

The attached graph (memory-prod.png) shows memory consumption for each instance
as a separate line. The horizontal red line at 6 GiB is the memory limit.

An example of scaling out can be seen from around 21:00 hrs where three new
instances are started in response to a traffic spike.

Just after midnight traffic drops and some instances are shut down. Memory
consumption in the remaining running instances goes up.

# Memory usage in test

# Window retention: 2 or 11 minutes (depending on operation type)

# Issue not present in Kafka Streams 2.0.1

# No heap leaks observed

# No memory leak for stateless stream operations (when no RocksDB stores are
used)

Below is some information I'm hoping will be of help:
# RocksDB Config:

Block cache size: 4 MiB

Write buffer size: 2 MiB

Block size: 16 KiB

Cache index and filter blocks: true

Manifest preallocation size: 64 KiB

Max write buffer number: 3

Max open files: 6144

# Memory usage in production

The attached graph (memory-prod.png) shows memory consumption for each instance
as a separate line. The horizontal red line at 6 GiB is the memory limit.

An example of scaling out can be seen from around 21:00 hrs where three new
instances are started in response to a traffic spike.

Just after midnight traffic drops and some instances are shut down. Memory
consumption in the remaining running instances goes up.

# Memory usage in test

# Window retention: 2 or 11 minutes (depending on operation type)

# Issue not present in Kafka Streams 2.0.1

[jira] [Created] (KAFKA-8367) Non-heap memory leak in Kafka Streams

2019-05-15 Thread Pavel Savov (JIRA)

Pavel Savov created KAFKA-8367:
--

 Summary: Non-heap memory leak in Kafka Streams
 Key: KAFKA-8367
 URL: https://issues.apache.org/jira/browse/KAFKA-8367
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.2.0
Reporter: Pavel Savov
 Attachments: memory-prod.png, memory-test.png

We have been observing a non-heap memory leak after upgrading to Kafka Streams 
2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the leak only 
happens when we enable stateful stream operations (utilizing stores). We are 
aware of *KAFKA-8323* and have created our own fork of 2.2.0 and ported the fix 
scheduled for release in 2.2.1 to our fork. It did not stop the leak, however.

We are having this memory leak in our production environment where the consumer 
group is auto-scaled in and out in response to changes in traffic volume, and 
in our test environment where we have two consumers, no autoscaling and 
relatively constant traffic.

Below is some information I'm hoping will be of help:
 # RocksDB Config:

Block cache size: 4 MiB

Write buffer size: 2 MiB

Block size: 16 KiB

Cache index and filter blocks: true

Manifest preallocation size: 64 KiB

Max write buffer number: 3

Max open files: 6144

 
 #  Memory usage in production

The attached graph (memory-prod.png) shows memory consumption for each instance 
as a separate line. The horizontal red line at 6 GiB is the memory limit.

As illustrated on the attached graph from production, memory consumption in 
running instances goes up around autoscaling events (scaling the consumer group 
either in or out) and associated rebalancing. It stabilizes until the next 
autoscaling event but it never goes back down. 

An example of scaling out can be seen from around 21:00 hrs where three new 
instances are started in response to a traffic spike. 

Just after midnight traffic drops and some instances are shut down. Memory 
consumption in the remaining running instances goes up.

Memory consumption climbs again from around 6:00AM due to increased traffic and 
new instances are being started until around 10:30AM. Memory consumption never 
drops until the cluster is restarted around 12:30.

 
 #  Memory usage in test

As illustrated by the attached graph (memory-test.png) we have a fixed number 
of two instances in our test environment and no autoscaling. Memory consumption 
rises linearly until it reaches the limit (around 2:00 AM on 5/13) and Mesos 
restarts the offending instances, or we restart the cluster manually.

 
 #  Window retention: 2 or 11 minutes (depending on operation type)

 #  Issue not present in Kafka Streams 2.0.1

 #  No heap leaks observed

 #  No memory leak for stateless stream operations (when no RocksDB stores are 
used)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

[jira] [Updated] (KAFKA-8367) Non-heap memory leak in Kafka Streams

[jira] [Updated] (KAFKA-8367) Non-heap memory leak in Kafka Streams

[jira] [Created] (KAFKA-8367) Non-heap memory leak in Kafka Streams

8 matches

Site Navigation

Mail list logo

Footer information