This appears to be a recent issue reported also by others (e.g.
https://github.com/apache/beam/issues/28142), it's being actively
investigated. Therefore, it is unlikely that memory fragmentation is an
issue.

On Tue, Aug 22, 2023 at 5:21 PM Valentyn Tymofieiev <valen...@google.com>
wrote:

> Hi, thanks for reaching out.
>
> I'd be curious to see whether the memory consumption patterns you observe
> change if you switch the memory allocator library.
>
> For example, you could try to use a custom container, install jemalloc and
> enable it. See: https://beam.apache.org/documentation/runtime/environments
> , https://cloud.google.com/dataflow/docs/guides/using-custom-containers
>
> Your Dockerfile might look like the following:
>
> FROM apache/beam_python3.10_sdk:2.49.0
>
> # Prebuilt other dependencies
> RUN apt-get update \
>   && apt-get install -y libjemalloc-dev
>
> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
>
> # Set the entrypoint to the Apache Beam SDK launcher.
> ENTRYPOINT ["/opt/apache/beam/boot"]
>
>
> On Tue, Aug 22, 2023 at 10:42 AM Cheng Han Lee <le...@allium.so> wrote:
>
>> Hello!
>>
>> I'm an avid apache beam user (on Dataflow) and we use beam to stream
>> blockchain data to various sinks. I recently noticed some memory issues
>> across all our pipelines but have yet to be able to find the root cause and
>> was hoping someone on your team might be able to help. If this isn't the
>> right avenue for it, please let me know how I should reach out.
>>
>> The details are here in stackoverflow:
>>
>>
>> https://stackoverflow.com/questions/76950068/memory-leak-in-apache-beam-python-readfrompubsub-io
>>
>> Thanks,
>> Chenghan
>> CTO | Allium
>>
>

Reply via email to