Ratul Ray created BEAM-11378:
--------------------------------

             Summary: Cannot run Python PortableRunner on EMR cluster
                 Key: BEAM-11378
                 URL: https://issues.apache.org/jira/browse/BEAM-11378
             Project: Beam
          Issue Type: Bug
          Components: runner-spark
            Reporter: Ratul Ray


I have been trying to run the python word-count example on an [AWS 
EMR|https://aws.amazon.com/emr/] cluster. And it does not work.

Things I have tried:
 * Running with 
{code:bash}
python3 py_codes/word_count_beam.py --output word_count_output 
--runner=SparkRunner
{code}
This results in implicitly running with {{--spark-master-url local[4]}} which 
defeats the purpose of running it in a cluster

 * Tried
{code:bash}
python3 py_codes/word_count_beam.py --output word_count_output 
--runner=SparkRunner --spark-master-url=yarn
{code}
Still uses local master.

 * Could not use method described in 
[https://beam.apache.org/documentation/runners/spark/] under "Running on a 
pre-deployed Spark cluster" because in yarn master is not exposed with an URL 
like localhost:7077

 * Tried
{code:bash}
python3 py_codes/word_ount_beam.py --output word_count_output 
--runner=SparkRunner --output_executable_path=jars/beam_word_count.jar
{code}
as described in https://issues.apache.org/jira/browse/BEAM-8970
 It can create a jar file, but when I submit the jar with spark-submit I get 
docker permission denied exception. Possibly related to 
https://issues.apache.org/jira/browse/BEAM-6020

*So, no way to run a python beam code in a yarn spark cluster?*
 This also means no way to run TFX code (which uses beam) in a yarn cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to