[ https://issues.apache.org/jira/browse/SPARK-21186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071254#comment-16071254 ]
Jeff Zhang commented on SPARK-21186: ------------------------------------ I think this is due to how spark-deep-learning distribute its jar and python code. --packages is supposed to be for jar not for python stuff. so spark won't put it under PYTHONPATH > PySpark with --packages fails to import library due to lack of pythonpath to > .ivy2/jars/*.jar > --------------------------------------------------------------------------------------------- > > Key: SPARK-21186 > URL: https://issues.apache.org/jira/browse/SPARK-21186 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.2.0 > Environment: Spark is downloaded and compiled by myself. > Spark: 2.2.0-SNAPSHOT > Python: Anaconda Python2 (on client and workers) > Reporter: HanCheol Cho > Priority: Minor > > I experienced "ImportError: No module named sparkdl" exception while trying > to use databricks' spark-deep-learning (sparkdl) in PySpark. > The package is included with --packages option as below. > {code} > $ pyspark --packages databricks:spark-deep-learning:0.1.0-spark2.1-s_2.11 > >>> import sparkdl > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > ImportError: No module named sparkdl > {code} > The problem was that PySpark fails to detect this package's jar files located > in .ivy2/jars/ directory. > I could circumvent this issue by manually adding this path to PYTHONPATH > after launching PySpark as follows. > {code} > $ pyspark --packages databricks:spark-deep-learning:0.1.0-spark2.1-s_2.11 > >>> import sys, glob, os > >>> sys.path.extend(glob.glob(os.path.join(os.path.expanduser("~"), > >>> ".ivy2/jars/*.jar"))) > >>> > >>> import sparkdl > Using TensorFlow backend. > >>> my_images = sparkdl.readImages("data/flower_photos/daisy/*.jpg") > >>> my_images.show() > +--------------------+--------------------+ > | filePath| image| > +--------------------+--------------------+ > |hdfs://mycluster/...|[RGB,263,320,3,[B...| > |hdfs://mycluster/...|[RGB,313,500,3,[B...| > |hdfs://mycluster/...|[RGB,215,320,3,[B...| > ... > {code} > I think that it may be better to add ivy2/jar directory path to PYTHONPATH > while launching PySpark. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org