Hi, folks, I am trying to do some experiment to run a simple "hello world" python pipeline on a remote Spark cluster on Hadoop. So far I ran the SparkJobServerDriver on the Yarn application master and managed to submit a python pipeline to it. SparkPipelineRunner was able to run the portable pipeline and spawn some containers to run it. On the container itself, I don't see the sdk_worker.py getting executed so for ExecutableStage the code throws grpc io exceptions. I am wondering whether there is a way for spark runner to run python worker in the containers of yarn cluster? I don't see any existing code for it, and seems the ports allocated for bundle factory are also arbitrary. Any thoughts?
Thanks, Xinyu
