Running Beam python pipeline on Spark

Xinyu Liu Wed, 03 Jun 2020 17:48:26 -0700

Hi, folks,

I am trying to do some experiment to run a simple "hello world" python
pipeline on a remote Spark cluster on Hadoop. So far I ran the
SparkJobServerDriver on the Yarn application master and managed to submit a
python pipeline to it. SparkPipelineRunner was able to run the portable
pipeline and spawn some containers to run it. On the container itself, I
don't see the sdk_worker.py getting executed so for ExecutableStage the
code throws grpc io exceptions. I am wondering whether there is a way for
spark runner to run python worker in the containers of yarn cluster? I
don't see any existing code for it, and seems the ports allocated for
bundle factory are also arbitrary. Any thoughts?


Thanks,
Xinyu

Running Beam python pipeline on Spark

Reply via email to