EDigioacchinoTBL opened a new issue, #35943:
URL: https://github.com/apache/beam/issues/35943

   ### What would you like to happen?
   
   Hello and good evening;
   
   I've been trying to reverse engineer a pure docker of the Flink cluster (and 
workers) for portable Python Beam. I say reverse engineer because all of the 
documents I've come across deploy fully in Kubernetes or partially in Docker 
(kafka in docker with local Flink cluster, or similar). 
   
   For a bit more context I'm not bothering with streaming (yet) so I'm not too 
worried there. Just wanted to interact with S3 and output to a custom DB sink. 
Pretty basic and well documented (thanks for that btw!).
   
   It's taken me a while and I was getting close. Using DOCKER environment type 
doesn't work in a docker deployed cluster which makes sense. So, I went the 
external route sending jobs to a python worker. Since that worker (`-p 
50000:50000 apache/beam_python3.11_sdk:latest --worker_pool`) lives in docker 
and is referable by the job server, and job manager, it cannot be referred to 
as localhost. And there's no evident way to change the control host.
   
   
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner/worker_handlers.py#L629
   
   
   In this issue the following occurs:
   
   <img width="818" height="383" alt="Image" 
src="https://github.com/user-attachments/assets/0edae1f2-2dae-4d89-85c0-079f2927a147";
 />
   
   When given this prompt - which is legal according to [SDK Harness 
Docs](https://beam.apache.org/documentation/runtime/sdk-harness-config/) and 
[Flink Runner 
Docs](https://beam.apache.org/documentation/runners/flink/#executing-a-beam-pipeline-on-a-flink-cluster)
   `python script-beam-bench.py --runner FlinkRunner --flink_master 
jobmanager:8081 --environment_type EXTERNAL --environment_config 
beamworker:50000`
   
   In the event this issue is actually not, please keep me apprised. I've 
attached my cluster to this issue. As well as the pipeline I'm testing with.
   
   
[script-beam-bench.py](https://github.com/user-attachments/files/21944008/script-beam-bench.py)
   
   
[portable-flink-cluster.docker-compose.yaml](https://github.com/user-attachments/files/21943935/portable-flink-cluster.docker-compose.yaml)
   
   ### Issue Priority
   
   Priority: 2 (default / most feature requests should be filed as P2)
   
   ### Issue Components
   
   - [x] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [x] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [x] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [x] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to