Hi, folks,

I am trying to do some experiment to run a simple "hello world" python
pipeline on a remote Spark cluster on Hadoop. So far I ran the
SparkJobServerDriver on the Yarn application master and managed to submit a
python pipeline to it. SparkPipelineRunner was able to run the portable
pipeline and spawn some containers to run it. On the container itself, I
don't see the sdk_worker.py getting executed so for ExecutableStage the
code throws grpc io exceptions. I am wondering whether there is a way for
spark runner to run python worker in the containers of yarn cluster? I
don't see any existing code for it, and seems the ports allocated for
bundle factory are also arbitrary. Any thoughts?

Thanks,
Xinyu

Reply via email to