[
https://issues.apache.org/jira/browse/BEAM-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548958#comment-17548958
]
Danny McCormick commented on BEAM-11378:
----------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/20568
> Cannot run Python PortableRunner on EMR cluster
> -----------------------------------------------
>
> Key: BEAM-11378
> URL: https://issues.apache.org/jira/browse/BEAM-11378
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Ratul Ray
> Priority: P3
>
> I have been trying to run the python word-count example on an [AWS
> EMR|https://aws.amazon.com/emr/] cluster. And it does not work.
> Things I have tried:
> * Running with
> {code:bash}
> python3 py_codes/word_count_beam.py --output word_count_output
> --runner=SparkRunner
> {code}
> This results in implicitly running with {{--spark-master-url local[4]}} which
> defeats the purpose of running it in a cluster
> * Tried
> {code:bash}
> python3 py_codes/word_count_beam.py --output word_count_output
> --runner=SparkRunner --spark-master-url=yarn
> {code}
> Still uses local master.
> * Could not use method described in
> [https://beam.apache.org/documentation/runners/spark/] under "Running on a
> pre-deployed Spark cluster" because in yarn master is not exposed with an URL
> like localhost:7077
> * Tried
> {code:bash}
> python3 py_codes/word_ount_beam.py --output word_count_output
> --runner=SparkRunner --output_executable_path=jars/beam_word_count.jar
> {code}
> as described in https://issues.apache.org/jira/browse/BEAM-8970
> It can create a jar file, but when I submit the jar with spark-submit I get
> docker permission denied exception. Possibly related to
> https://issues.apache.org/jira/browse/BEAM-6020
> *So, no way to run a python beam code in a yarn spark cluster?*
> This also means no way to run TFX code (which uses beam) in a yarn cluster.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)