Micah Wylde created BEAM-5640:
---------------------------------

             Summary: Portable python sdk worker leaks memory when PyOpenSSL 
package is present
                 Key: BEAM-5640
                 URL: https://issues.apache.org/jira/browse/BEAM-5640
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-harness
            Reporter: Micah Wylde
            Assignee: Robert Bradshaw


When PyOpenSSL package is installed on a system (e.g., in a virtualenv) the 
python sdk_worker process leaks memory. I've validated this when using the 
flink portable runner in streaming mode, but it may occur in other 
configurations as well. The leak is pretty significant, amounting to tens of 
MBs/sec.

I've put together a reproduction for the issue 
[here|https://github.com/mwylde/beam/tree/micah_memory_leak]. That branch 
includes a flink streaming data source that generates data, as well as a python 
pipeline that demonstrates the issue.

To reproduce:
{code:java}
check out the branch:
$ git clone g...@github.com:mwylde/beam.git
$ git checkout micah_memory_leak

build the python docker container with pyopenssl installed:
$ cd beam
$ ./gradlew :beam-sdks-python-container:docker

start the job server with embedded flink cluster:
$ ./gradlew runShadow

run the pipeline:
$ ./gradlew :beam-sdks-python:streamingLeak{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to