Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Marcelo Vanzin
Please take a look at the pull request with the actual fix; that will
explain why it's the same issue.

On Thu, Jun 25, 2015 at 12:51 PM, Elkhan Dadashov elkhan8...@gmail.com
wrote:

 Thanks Marcelo.

 But my case is different. My mypython/libs/numpy-1.9.2.zip is in *local
 directory* (can also put in HDFS), but still fails.

 But SPARK-5479 https://issues.apache.org/jira/browse/SPARK-5479 is :
 PySpark on yarn mode need to support *non-local* python files.

 The job fails only when i try to include 3rd party dependency from local
 computer with --py-files (in Spark 1.4)

 Both of these commands succeed:

 ./bin/spark-submit --master yarn-cluster --verbose hdfs:///pi.py
 ./bin/spark-submit --master yarn-cluster --deploy-mode cluster  --verbose
 examples/src/main/python/pi.py

 But in this particular example with 3rd party numpy module:

 ./bin/spark-submit --verbose --master yarn-cluster --py-files
  mypython/libs/numpy-1.9.2.zip --deploy-mode cluster
 mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0


 All these files :

 mypython/libs/numpy-1.9.2.zip,  mypython/scripts/kmeans.py are local
 files, kmeans_data.txt is in HDFS.


 Thanks.


 On Thu, Jun 25, 2015 at 12:22 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 That sounds like SPARK-5479 which is not in 1.4...

 On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com
 wrote:

 In addition to previous emails, when i try to execute this command from
 command line:

 ./bin/spark-submit --verbose --master yarn-cluster --py-files
  mypython/libs/numpy-1.9.2.zip --deploy-mode cluster
 mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0


 - numpy-1.9.2.zip - is downloaded numpy package
 - kmeans.py is default example which comes with Spark 1.4
 - kmeans_data.txt  - is default data file which comes with Spark 1.4


 It fails saying that it could not find numpy:

 File kmeans.py, line 31, in module
 import numpy
 ImportError: No module named numpy

 Has anyone run Python Spark application on Yarn-cluster mode ? (which
 has 3rd party Python modules to be shipped with)

 What are the configurations or installations to be done before running
 Python Spark job with 3rd party dependencies on Yarn-cluster ?

 Thanks in advance.


 --
 Marcelo




 --

 Best regards,
 Elkhan Dadashov




-- 
Marcelo


Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Elkhan Dadashov
Thanks Marcelo.

But my case is different. My mypython/libs/numpy-1.9.2.zip is in *local
directory* (can also put in HDFS), but still fails.

But SPARK-5479 https://issues.apache.org/jira/browse/SPARK-5479 is :
PySpark on yarn mode need to support *non-local* python files.

The job fails only when i try to include 3rd party dependency from local
computer with --py-files (in Spark 1.4)

Both of these commands succeed:

./bin/spark-submit --master yarn-cluster --verbose hdfs:///pi.py
./bin/spark-submit --master yarn-cluster --deploy-mode cluster  --verbose
examples/src/main/python/pi.py

But in this particular example with 3rd party numpy module:

./bin/spark-submit --verbose --master yarn-cluster --py-files
 mypython/libs/numpy-1.9.2.zip --deploy-mode cluster
mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0


All these files :

mypython/libs/numpy-1.9.2.zip,  mypython/scripts/kmeans.py are local files,
kmeans_data.txt is in HDFS.


Thanks.


On Thu, Jun 25, 2015 at 12:22 PM, Marcelo Vanzin van...@cloudera.com
wrote:

 That sounds like SPARK-5479 which is not in 1.4...

 On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com
 wrote:

 In addition to previous emails, when i try to execute this command from
 command line:

 ./bin/spark-submit --verbose --master yarn-cluster --py-files
  mypython/libs/numpy-1.9.2.zip --deploy-mode cluster
 mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0


 - numpy-1.9.2.zip - is downloaded numpy package
 - kmeans.py is default example which comes with Spark 1.4
 - kmeans_data.txt  - is default data file which comes with Spark 1.4


 It fails saying that it could not find numpy:

 File kmeans.py, line 31, in module
 import numpy
 ImportError: No module named numpy

 Has anyone run Python Spark application on Yarn-cluster mode ? (which has
 3rd party Python modules to be shipped with)

 What are the configurations or installations to be done before running
 Python Spark job with 3rd party dependencies on Yarn-cluster ?

 Thanks in advance.


 --
 Marcelo




-- 

Best regards,
Elkhan Dadashov


Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Naveen Madhire
Hi Marcelo, Quick Question.

I am using Spark 1.3 and using Yarn Client mode. It is working well,
provided I have to manually pip-install all the 3rd party libraries like
numpy etc to the executor nodes.



So the SPARK-5479 fix in 1.5 which you mentioned fix this as well?
Thanks.


On Thu, Jun 25, 2015 at 2:22 PM, Marcelo Vanzin van...@cloudera.com wrote:

 That sounds like SPARK-5479 which is not in 1.4...

 On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com
 wrote:

 In addition to previous emails, when i try to execute this command from
 command line:

 ./bin/spark-submit --verbose --master yarn-cluster --py-files
  mypython/libs/numpy-1.9.2.zip --deploy-mode cluster
 mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0


 - numpy-1.9.2.zip - is downloaded numpy package
 - kmeans.py is default example which comes with Spark 1.4
 - kmeans_data.txt  - is default data file which comes with Spark 1.4


 It fails saying that it could not find numpy:

 File kmeans.py, line 31, in module
 import numpy
 ImportError: No module named numpy

 Has anyone run Python Spark application on Yarn-cluster mode ? (which has
 3rd party Python modules to be shipped with)

 What are the configurations or installations to be done before running
 Python Spark job with 3rd party dependencies on Yarn-cluster ?

 Thanks in advance.


 --
 Marcelo



Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Elkhan Dadashov
In addition to previous emails, when i try to execute this command from
command line:

./bin/spark-submit --verbose --master yarn-cluster --py-files
 mypython/libs/numpy-1.9.2.zip --deploy-mode cluster
mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0


- numpy-1.9.2.zip - is downloaded numpy package
- kmeans.py is default example which comes with Spark 1.4
- kmeans_data.txt  - is default data file which comes with Spark 1.4


It fails saying that it could not find numpy:

File kmeans.py, line 31, in module
import numpy
ImportError: No module named numpy

Has anyone run Python Spark application on Yarn-cluster mode ? (which has
3rd party Python modules to be shipped with)

What are the configurations or installations to be done before running
Python Spark job with 3rd party dependencies on Yarn-cluster ?

Thanks in advance.

On Thu, Jun 25, 2015 at 12:09 PM, Elkhan Dadashov elkhan8...@gmail.com
wrote:

 Hi all,

 Does Spark 1.4 version support Python applications on Yarn-cluster ?
 (--master yarn-cluster)

 Does Spark 1.4 version support Python applications with deploy-mode
 cluster ? (--deploy-mode cluster)

 How can we ship 3rd party Python dependencies with Python Spark job ?
 (running on Yarn cluster)

 Thanks.






 On Wed, Jun 24, 2015 at 3:13 PM, Elkhan Dadashov elkhan8...@gmail.com
 wrote:

 Hi all,

 I'm trying to run kmeans.py Spark example on Yarn cluster mode. I'm using
 Spark 1.4.0.

 I'm passing numpy-1.9.2.zip with --py-files flag.

 Here is the command I'm trying to execute but it fails:

 ./bin/spark-submit --master yarn-cluster --verbose  --py-files
mypython/libs/numpy-1.9.2.zip mypython/scripts/kmeans.py
 /kmeans_data.txt 5 1.0


 - I have kmeans_data.txt in HDFS in / directory.


 I receive this error:

 
 ...
 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Final app status:
 SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status
 was reported.)
 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Unregistering
 ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before
 final status was reported.)
 15/06/24 15:08:21 INFO yarn.ApplicationMaster: Deleting staging directory
 .sparkStaging/application_1435182120590_0009
 15/06/24 15:08:22 INFO util.Utils: Shutdown hook called
 \00 stdout\00 134Traceback (most recent call last):
   File kmeans.py, line 31, in module
 import numpy as np
 ImportError: No module named numpy
 ...

 

 Any idea why it cannot import numpy-1.9.2.zip while running kmeans.py
 example provided with Spark ?

 How can we run python script which has other 3rd-party python module
 dependency on yarn-cluster ?

 Thanks.




 --

 Best regards,
 Elkhan Dadashov



Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Marcelo Vanzin
That sounds like SPARK-5479 which is not in 1.4...

On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov elkhan8...@gmail.com
wrote:

 In addition to previous emails, when i try to execute this command from
 command line:

 ./bin/spark-submit --verbose --master yarn-cluster --py-files
  mypython/libs/numpy-1.9.2.zip --deploy-mode cluster
 mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0


 - numpy-1.9.2.zip - is downloaded numpy package
 - kmeans.py is default example which comes with Spark 1.4
 - kmeans_data.txt  - is default data file which comes with Spark 1.4


 It fails saying that it could not find numpy:

 File kmeans.py, line 31, in module
 import numpy
 ImportError: No module named numpy

 Has anyone run Python Spark application on Yarn-cluster mode ? (which has
 3rd party Python modules to be shipped with)

 What are the configurations or installations to be done before running
 Python Spark job with 3rd party dependencies on Yarn-cluster ?

 Thanks in advance.


-- 
Marcelo