Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)
Please take a look at the pull request with the actual fix; that will explain why it's the same issue. On Thu, Jun 25, 2015 at 12:51 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: Thanks Marcelo. But my case is different. My mypython/libs/numpy-1.9.2.zip is in *local directory* (can also put in HDFS), but still fails. But SPARK-5479 https://issues.apache.org/jira/browse/SPARK-5479 is : PySpark on yarn mode need to support *non-local* python files. The job fails only when i try to include 3rd party dependency from local computer with --py-files (in Spark 1.4) Both of these commands succeed: ./bin/spark-submit --master yarn-cluster --verbose hdfs:///pi.py ./bin/spark-submit --master yarn-cluster --deploy-mode cluster --verbose examples/src/main/python/pi.py But in this particular example with 3rd party numpy module: ./bin/spark-submit --verbose --master yarn-cluster --py-files mypython/libs/numpy-1.9.2.zip --deploy-mode cluster mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 All these files : mypython/libs/numpy-1.9.2.zip, mypython/scripts/kmeans.py are local files, kmeans_data.txt is in HDFS. Thanks. On Thu, Jun 25, 2015 at 12:22 PM, Marcelo Vanzin van...@cloudera.com wrote: That sounds like SPARK-5479 which is not in 1.4... On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: In addition to previous emails, when i try to execute this command from command line: ./bin/spark-submit --verbose --master yarn-cluster --py-files mypython/libs/numpy-1.9.2.zip --deploy-mode cluster mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 - numpy-1.9.2.zip - is downloaded numpy package - kmeans.py is default example which comes with Spark 1.4 - kmeans_data.txt - is default data file which comes with Spark 1.4 It fails saying that it could not find numpy: File kmeans.py, line 31, in module import numpy ImportError: No module named numpy Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with) What are the configurations or installations to be done before running Python Spark job with 3rd party dependencies on Yarn-cluster ? Thanks in advance. -- Marcelo -- Best regards, Elkhan Dadashov -- Marcelo
Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)
Thanks Marcelo. But my case is different. My mypython/libs/numpy-1.9.2.zip is in *local directory* (can also put in HDFS), but still fails. But SPARK-5479 https://issues.apache.org/jira/browse/SPARK-5479 is : PySpark on yarn mode need to support *non-local* python files. The job fails only when i try to include 3rd party dependency from local computer with --py-files (in Spark 1.4) Both of these commands succeed: ./bin/spark-submit --master yarn-cluster --verbose hdfs:///pi.py ./bin/spark-submit --master yarn-cluster --deploy-mode cluster --verbose examples/src/main/python/pi.py But in this particular example with 3rd party numpy module: ./bin/spark-submit --verbose --master yarn-cluster --py-files mypython/libs/numpy-1.9.2.zip --deploy-mode cluster mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 All these files : mypython/libs/numpy-1.9.2.zip, mypython/scripts/kmeans.py are local files, kmeans_data.txt is in HDFS. Thanks. On Thu, Jun 25, 2015 at 12:22 PM, Marcelo Vanzin van...@cloudera.com wrote: That sounds like SPARK-5479 which is not in 1.4... On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: In addition to previous emails, when i try to execute this command from command line: ./bin/spark-submit --verbose --master yarn-cluster --py-files mypython/libs/numpy-1.9.2.zip --deploy-mode cluster mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 - numpy-1.9.2.zip - is downloaded numpy package - kmeans.py is default example which comes with Spark 1.4 - kmeans_data.txt - is default data file which comes with Spark 1.4 It fails saying that it could not find numpy: File kmeans.py, line 31, in module import numpy ImportError: No module named numpy Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with) What are the configurations or installations to be done before running Python Spark job with 3rd party dependencies on Yarn-cluster ? Thanks in advance. -- Marcelo -- Best regards, Elkhan Dadashov
Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)
Hi Marcelo, Quick Question. I am using Spark 1.3 and using Yarn Client mode. It is working well, provided I have to manually pip-install all the 3rd party libraries like numpy etc to the executor nodes. So the SPARK-5479 fix in 1.5 which you mentioned fix this as well? Thanks. On Thu, Jun 25, 2015 at 2:22 PM, Marcelo Vanzin van...@cloudera.com wrote: That sounds like SPARK-5479 which is not in 1.4... On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: In addition to previous emails, when i try to execute this command from command line: ./bin/spark-submit --verbose --master yarn-cluster --py-files mypython/libs/numpy-1.9.2.zip --deploy-mode cluster mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 - numpy-1.9.2.zip - is downloaded numpy package - kmeans.py is default example which comes with Spark 1.4 - kmeans_data.txt - is default data file which comes with Spark 1.4 It fails saying that it could not find numpy: File kmeans.py, line 31, in module import numpy ImportError: No module named numpy Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with) What are the configurations or installations to be done before running Python Spark job with 3rd party dependencies on Yarn-cluster ? Thanks in advance. -- Marcelo
Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)
In addition to previous emails, when i try to execute this command from command line: ./bin/spark-submit --verbose --master yarn-cluster --py-files mypython/libs/numpy-1.9.2.zip --deploy-mode cluster mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 - numpy-1.9.2.zip - is downloaded numpy package - kmeans.py is default example which comes with Spark 1.4 - kmeans_data.txt - is default data file which comes with Spark 1.4 It fails saying that it could not find numpy: File kmeans.py, line 31, in module import numpy ImportError: No module named numpy Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with) What are the configurations or installations to be done before running Python Spark job with 3rd party dependencies on Yarn-cluster ? Thanks in advance. On Thu, Jun 25, 2015 at 12:09 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: Hi all, Does Spark 1.4 version support Python applications on Yarn-cluster ? (--master yarn-cluster) Does Spark 1.4 version support Python applications with deploy-mode cluster ? (--deploy-mode cluster) How can we ship 3rd party Python dependencies with Python Spark job ? (running on Yarn cluster) Thanks. On Wed, Jun 24, 2015 at 3:13 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: Hi all, I'm trying to run kmeans.py Spark example on Yarn cluster mode. I'm using Spark 1.4.0. I'm passing numpy-1.9.2.zip with --py-files flag. Here is the command I'm trying to execute but it fails: ./bin/spark-submit --master yarn-cluster --verbose --py-files mypython/libs/numpy-1.9.2.zip mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 - I have kmeans_data.txt in HDFS in / directory. I receive this error: ... 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.) 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.) 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1435182120590_0009 15/06/24 15:08:22 INFO util.Utils: Shutdown hook called \00 stdout\00 134Traceback (most recent call last): File kmeans.py, line 31, in module import numpy as np ImportError: No module named numpy ... Any idea why it cannot import numpy-1.9.2.zip while running kmeans.py example provided with Spark ? How can we run python script which has other 3rd-party python module dependency on yarn-cluster ? Thanks. -- Best regards, Elkhan Dadashov
Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)
That sounds like SPARK-5479 which is not in 1.4... On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com wrote: In addition to previous emails, when i try to execute this command from command line: ./bin/spark-submit --verbose --master yarn-cluster --py-files mypython/libs/numpy-1.9.2.zip --deploy-mode cluster mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 - numpy-1.9.2.zip - is downloaded numpy package - kmeans.py is default example which comes with Spark 1.4 - kmeans_data.txt - is default data file which comes with Spark 1.4 It fails saying that it could not find numpy: File kmeans.py, line 31, in module import numpy ImportError: No module named numpy Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with) What are the configurations or installations to be done before running Python Spark job with 3rd party dependencies on Yarn-cluster ? Thanks in advance. -- Marcelo