[ 
https://issues.apache.org/jira/browse/SPARK-21186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071254#comment-16071254
 ] 

Jeff Zhang commented on SPARK-21186:
------------------------------------

I think this is due to how spark-deep-learning distribute its jar and python 
code. --packages is supposed to be for jar not for python stuff. so spark won't 
put it under PYTHONPATH

> PySpark with --packages fails to import library due to lack of pythonpath to 
> .ivy2/jars/*.jar
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21186
>                 URL: https://issues.apache.org/jira/browse/SPARK-21186
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.0
>         Environment: Spark is downloaded and compiled by myself.
> Spark: 2.2.0-SNAPSHOT
> Python: Anaconda Python2 (on client and workers)
>            Reporter: HanCheol Cho
>            Priority: Minor
>
> I experienced "ImportError: No module named sparkdl" exception while trying 
> to use databricks' spark-deep-learning (sparkdl) in PySpark.
> The package is included with --packages option as below.
> {code}
> $ pyspark --packages databricks:spark-deep-learning:0.1.0-spark2.1-s_2.11
> >>> import sparkdl
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ImportError: No module named sparkdl
> {code}
> The problem was that PySpark fails to detect this package's jar files located 
> in .ivy2/jars/ directory.
> I could circumvent this issue by manually adding this path to PYTHONPATH 
> after launching PySpark as follows.
> {code}
> $ pyspark --packages databricks:spark-deep-learning:0.1.0-spark2.1-s_2.11
> >>> import sys, glob, os
> >>> sys.path.extend(glob.glob(os.path.join(os.path.expanduser("~"), 
> >>> ".ivy2/jars/*.jar")))
> >>>
> >>> import sparkdl
> Using TensorFlow backend.
> >>> my_images = sparkdl.readImages("data/flower_photos/daisy/*.jpg")
> >>> my_images.show()
> +--------------------+--------------------+
> |            filePath|               image|
> +--------------------+--------------------+
> |hdfs://mycluster/...|[RGB,263,320,3,[B...|
> |hdfs://mycluster/...|[RGB,313,500,3,[B...|
> |hdfs://mycluster/...|[RGB,215,320,3,[B...|
> ...
> {code}
> I think that it may be better to add ivy2/jar directory path to PYTHONPATH 
> while launching PySpark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to