HanCheol Cho created SPARK-21186: ------------------------------------ Summary: PySpark with --packages fails to import library due to lack of pythonpath to .ivy2/jars/*.jar Key: SPARK-21186 URL: https://issues.apache.org/jira/browse/SPARK-21186 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.2.0 Environment: Spark is downloaded and compiled by myself.
Spark: 2.2.0-SNAPSHOT Python: Anaconda Python2 (on client and workers) Reporter: HanCheol Cho Priority: Minor I experienced "ImportError: No module named sparkdl" exception while trying to use databricks' spark-deep-learning (sparkdl) in PySpark. The package is included with --packages option as below. {code} $ pyspark --packages databricks:spark-deep-learning:0.1.0-spark2.1-s_2.11 {code} The problem was that PySpark fails to detect this package's jar files located in .ivy2/jars/ directory. I could circumvent this issue by manually adding this path to PYTHONPATH after launching PySpark as follows. {code} >>> import sys, glob, os >>> sys.path.extend(glob.glob(os.path.join(os.path.expanduser("~"), >>> ".ivy2/jars/*.jar"))) >>> >>> import sparkdl Using TensorFlow backend. >>> my_images = sparkdl.readImages("data/flower_photos/daisy/*.jpg") >>> my_images.show() +--------------------+--------------------+ | filePath| image| +--------------------+--------------------+ |hdfs://mycluster/...|[RGB,263,320,3,[B...| |hdfs://mycluster/...|[RGB,313,500,3,[B...| |hdfs://mycluster/...|[RGB,215,320,3,[B...| ... {code} I think that it may be better to add ivy2/jar directory path to PYTHONPATH while launching PySpark. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org