Re: High DirectByteBuffer Usage

2021-07-15 Thread Smile
Hi,
Are you sure that your growing memory came from DirectByteBuffer? What about 
metaspace? 
Flink 1.9 may have some metaspace leak after a full restart or fine-grained 
restart, see  [1] and [2] for more details. And if you didn't set a max 
metaspace by -XX:MaxMetaspaceSize, it will grow indefinitely and finally cause 
an OOM kill.

[1]. https://issues.apache.org/jira/browse/FLINK-16225
[2]. 
https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code

Regards
Smile

On 2021/07/15 18:22:56, bat man  wrote: 
> I am not using the Kafka SSL port.
> 
> On Thu, Jul 15, 2021 at 9:48 PM Alexey Trenikhun  wrote:
> 
> > Just in case, make sure that you are not using Kafka SSL port without
> > setting security protocol, see [1]
> >
> > [1] https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-4090
> > --
> > *From:* bat man 
> > *Sent:* Wednesday, July 14, 2021 10:55:54 AM
> > *To:* Timo Walther 
> > *Cc:* user 
> > *Subject:* Re: High DirectByteBuffer Usage
> >
> > Hi Timo,
> >
> > I am looking at these options.
> > However, I had a couple of questions -
> > 1. The off-heap usage grows overtime. My job does not do any off-heap
> > operations so I don't think there is a leak there. Even after GC it keeps
> > adding a few MBs after hours of running.
> > 2. Secondly, I am seeing as the incoming record volume increases the
> > off-heap usage grows. What's the reason for this?
> >
> > I am using 1.9. Is there any known bug which is causing this issue?
> >
> > Thanks,
> > Hemant
> >
> > On Wed, Jul 14, 2021 at 7:30 PM Timo Walther  wrote:
> >
> > Hi Hemant,
> >
> > did you checkout the dedicated page for memory configuration and
> > troubleshooting:
> >
> >
> > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-direct-buffer-memory
> >
> >
> > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#container-memory-exceeded
> >
> > It is likely that the high number of output streams could cause your
> > issues.
> >
> > Regards,
> > Timo
> >
> >
> >
> >
> > On 14.07.21 08:46, bat man wrote:
> > > Hi,
> > > I have a job which reads different streams from 5 kafka topics. It
> > > filters data and then data is streamed to different operators for
> > > processing. This step involves data shuffling.
> > >
> > > Also, once data is enriched in 4 joins(KeyedCoProcessFunction)
> > > operators. After joining the data is written to different kafka topics.
> > > There are a total of 16 different output streams which are written to 4
> > > topics.
> > >
> > > I have been facing some issues with yarn killing containers. I took the
> > > heap dump and ran it through JXray [1]. Heap usage is not high. One
> > > thing which stands out is off-heap usage which is very high. My guess is
> > > this is what is killing the containers as the data inflow increases.
> > >
> > > Screenshot 2021-07-14 at 11.52.41 AM.png
> > >
> > >
> > >  From the stack above is this usage high because of many output streams
> > > being written to kafka topics. As the stack shows RecordWriter holding
> > > off this DirectByteBuffer. I have assigned Network Memory as 1GB, and
> > > --MaxDirectMemorySize also shows ~1GB for task managers.
> > >
> > >  From here[2] I found that setting -Djdk.nio.maxCachedBufferSize=262144
> > > limits the temp buffer cache. Will it help in this case?
> > > jvm version used is - JVM: OpenJDK 64-Bit Server VM - Red Hat, Inc. -
> > > 1.8/25.282-b08
> > >
> > > [1] - https://jxray.com <https://jxray.com>
> > > [2] -
> > >
> > https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo
> > > <
> > https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo
> > >
> > >
> > > Thanks,
> > > Hemant
> >
> >
> 


Re: High DirectByteBuffer Usage

2021-07-15 Thread bat man
I am not using the Kafka SSL port.

On Thu, Jul 15, 2021 at 9:48 PM Alexey Trenikhun  wrote:

> Just in case, make sure that you are not using Kafka SSL port without
> setting security protocol, see [1]
>
> [1] https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-4090
> --
> *From:* bat man 
> *Sent:* Wednesday, July 14, 2021 10:55:54 AM
> *To:* Timo Walther 
> *Cc:* user 
> *Subject:* Re: High DirectByteBuffer Usage
>
> Hi Timo,
>
> I am looking at these options.
> However, I had a couple of questions -
> 1. The off-heap usage grows overtime. My job does not do any off-heap
> operations so I don't think there is a leak there. Even after GC it keeps
> adding a few MBs after hours of running.
> 2. Secondly, I am seeing as the incoming record volume increases the
> off-heap usage grows. What's the reason for this?
>
> I am using 1.9. Is there any known bug which is causing this issue?
>
> Thanks,
> Hemant
>
> On Wed, Jul 14, 2021 at 7:30 PM Timo Walther  wrote:
>
> Hi Hemant,
>
> did you checkout the dedicated page for memory configuration and
> troubleshooting:
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-direct-buffer-memory
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#container-memory-exceeded
>
> It is likely that the high number of output streams could cause your
> issues.
>
> Regards,
> Timo
>
>
>
>
> On 14.07.21 08:46, bat man wrote:
> > Hi,
> > I have a job which reads different streams from 5 kafka topics. It
> > filters data and then data is streamed to different operators for
> > processing. This step involves data shuffling.
> >
> > Also, once data is enriched in 4 joins(KeyedCoProcessFunction)
> > operators. After joining the data is written to different kafka topics.
> > There are a total of 16 different output streams which are written to 4
> > topics.
> >
> > I have been facing some issues with yarn killing containers. I took the
> > heap dump and ran it through JXray [1]. Heap usage is not high. One
> > thing which stands out is off-heap usage which is very high. My guess is
> > this is what is killing the containers as the data inflow increases.
> >
> > Screenshot 2021-07-14 at 11.52.41 AM.png
> >
> >
> >  From the stack above is this usage high because of many output streams
> > being written to kafka topics. As the stack shows RecordWriter holding
> > off this DirectByteBuffer. I have assigned Network Memory as 1GB, and
> > --MaxDirectMemorySize also shows ~1GB for task managers.
> >
> >  From here[2] I found that setting -Djdk.nio.maxCachedBufferSize=262144
> > limits the temp buffer cache. Will it help in this case?
> > jvm version used is - JVM: OpenJDK 64-Bit Server VM - Red Hat, Inc. -
> > 1.8/25.282-b08
> >
> > [1] - https://jxray.com <https://jxray.com>
> > [2] -
> >
> https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo
> > <
> https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo
> >
> >
> > Thanks,
> > Hemant
>
>


Re: High DirectByteBuffer Usage

2021-07-15 Thread Alexey Trenikhun
Just in case, make sure that you are not using Kafka SSL port without setting 
security protocol, see [1]

[1] https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-4090

From: bat man 
Sent: Wednesday, July 14, 2021 10:55:54 AM
To: Timo Walther 
Cc: user 
Subject: Re: High DirectByteBuffer Usage

Hi Timo,

I am looking at these options.
However, I had a couple of questions -
1. The off-heap usage grows overtime. My job does not do any off-heap 
operations so I don't think there is a leak there. Even after GC it keeps 
adding a few MBs after hours of running.
2. Secondly, I am seeing as the incoming record volume increases the off-heap 
usage grows. What's the reason for this?

I am using 1.9. Is there any known bug which is causing this issue?

Thanks,
Hemant

On Wed, Jul 14, 2021 at 7:30 PM Timo Walther 
mailto:twal...@apache.org>> wrote:
Hi Hemant,

did you checkout the dedicated page for memory configuration and
troubleshooting:

https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-direct-buffer-memory

https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#container-memory-exceeded

It is likely that the high number of output streams could cause your issues.

Regards,
Timo




On 14.07.21 08:46, bat man wrote:
> Hi,
> I have a job which reads different streams from 5 kafka topics. It
> filters data and then data is streamed to different operators for
> processing. This step involves data shuffling.
>
> Also, once data is enriched in 4 joins(KeyedCoProcessFunction)
> operators. After joining the data is written to different kafka topics.
> There are a total of 16 different output streams which are written to 4
> topics.
>
> I have been facing some issues with yarn killing containers. I took the
> heap dump and ran it through JXray [1]. Heap usage is not high. One
> thing which stands out is off-heap usage which is very high. My guess is
> this is what is killing the containers as the data inflow increases.
>
> Screenshot 2021-07-14 at 11.52.41 AM.png
>
>
>  From the stack above is this usage high because of many output streams
> being written to kafka topics. As the stack shows RecordWriter holding
> off this DirectByteBuffer. I have assigned Network Memory as 1GB, and
> --MaxDirectMemorySize also shows ~1GB for task managers.
>
>  From here[2] I found that setting -Djdk.nio.maxCachedBufferSize=262144
> limits the temp buffer cache. Will it help in this case?
> jvm version used is - JVM: OpenJDK 64-Bit Server VM - Red Hat, Inc. -
> 1.8/25.282-b08
>
> [1] - https://jxray.com <https://jxray.com>
> [2] -
> https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo
> <https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo>
>
> Thanks,
> Hemant



Re: High DirectByteBuffer Usage

2021-07-14 Thread bat man
Hi Timo,

I am looking at these options.
However, I had a couple of questions -
1. The off-heap usage grows overtime. My job does not do any off-heap
operations so I don't think there is a leak there. Even after GC it keeps
adding a few MBs after hours of running.
2. Secondly, I am seeing as the incoming record volume increases the
off-heap usage grows. What's the reason for this?

I am using 1.9. Is there any known bug which is causing this issue?

Thanks,
Hemant

On Wed, Jul 14, 2021 at 7:30 PM Timo Walther  wrote:

> Hi Hemant,
>
> did you checkout the dedicated page for memory configuration and
> troubleshooting:
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-direct-buffer-memory
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#container-memory-exceeded
>
> It is likely that the high number of output streams could cause your
> issues.
>
> Regards,
> Timo
>
>
>
>
> On 14.07.21 08:46, bat man wrote:
> > Hi,
> > I have a job which reads different streams from 5 kafka topics. It
> > filters data and then data is streamed to different operators for
> > processing. This step involves data shuffling.
> >
> > Also, once data is enriched in 4 joins(KeyedCoProcessFunction)
> > operators. After joining the data is written to different kafka topics.
> > There are a total of 16 different output streams which are written to 4
> > topics.
> >
> > I have been facing some issues with yarn killing containers. I took the
> > heap dump and ran it through JXray [1]. Heap usage is not high. One
> > thing which stands out is off-heap usage which is very high. My guess is
> > this is what is killing the containers as the data inflow increases.
> >
> > Screenshot 2021-07-14 at 11.52.41 AM.png
> >
> >
> >  From the stack above is this usage high because of many output streams
> > being written to kafka topics. As the stack shows RecordWriter holding
> > off this DirectByteBuffer. I have assigned Network Memory as 1GB, and
> > --MaxDirectMemorySize also shows ~1GB for task managers.
> >
> >  From here[2] I found that setting -Djdk.nio.maxCachedBufferSize=262144
> > limits the temp buffer cache. Will it help in this case?
> > jvm version used is - JVM: OpenJDK 64-Bit Server VM - Red Hat, Inc. -
> > 1.8/25.282-b08
> >
> > [1] - https://jxray.com 
> > [2] -
> >
> https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo
> > <
> https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo
> >
> >
> > Thanks,
> > Hemant
>
>


Re: High DirectByteBuffer Usage

2021-07-14 Thread Timo Walther

Hi Hemant,

did you checkout the dedicated page for memory configuration and 
troubleshooting:


https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-direct-buffer-memory

https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#container-memory-exceeded

It is likely that the high number of output streams could cause your issues.

Regards,
Timo




On 14.07.21 08:46, bat man wrote:

Hi,
I have a job which reads different streams from 5 kafka topics. It 
filters data and then data is streamed to different operators for 
processing. This step involves data shuffling.


Also, once data is enriched in 4 joins(KeyedCoProcessFunction) 
operators. After joining the data is written to different kafka topics. 
There are a total of 16 different output streams which are written to 4 
topics.


I have been facing some issues with yarn killing containers. I took the 
heap dump and ran it through JXray [1]. Heap usage is not high. One 
thing which stands out is off-heap usage which is very high. My guess is 
this is what is killing the containers as the data inflow increases.


Screenshot 2021-07-14 at 11.52.41 AM.png


 From the stack above is this usage high because of many output streams 
being written to kafka topics. As the stack shows RecordWriter holding 
off this DirectByteBuffer. I have assigned Network Memory as 1GB, and 
--MaxDirectMemorySize also shows ~1GB for task managers.


 From here[2] I found that setting -Djdk.nio.maxCachedBufferSize=262144 
limits the temp buffer cache. Will it help in this case?
jvm version used is - JVM: OpenJDK 64-Bit Server VM - Red Hat, Inc. - 
1.8/25.282-b08


[1] - https://jxray.com 
[2] - 
https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo 



Thanks,
Hemant