*.jar

HanCheol Cho (JIRA) Thu, 22 Jun 2017 20:05:29 -0700

HanCheol Cho created SPARK-21186:
------------------------------------

             Summary: PySpark with --packages fails to import library due to 
lack of pythonpath to .ivy2/jars/*.jar
                 Key: SPARK-21186
                 URL: https://issues.apache.org/jira/browse/SPARK-21186
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.2.0
         Environment: Spark is downloaded and compiled by myself.


Spark: 2.2.0-SNAPSHOT
Python: Anaconda Python2 (on client and workers)
            Reporter: HanCheol Cho
            Priority: Minor


I experienced "ImportError: No module named sparkdl" exception while trying to 
use databricks' spark-deep-learning (sparkdl) in PySpark.
The package is included with --packages option as below.

{code}
$ pyspark --packages databricks:spark-deep-learning:0.1.0-spark2.1-s_2.11
{code}

The problem was that PySpark fails to detect this package's jar files located 
in .ivy2/jars/ directory.
I could circumvent this issue by manually adding this path to PYTHONPATH after 
launching PySpark as follows.

{code}
>>> import sys, glob, os
>>> sys.path.extend(glob.glob(os.path.join(os.path.expanduser("~"), 
>>> ".ivy2/jars/*.jar")))
>>>
>>> import sparkdl
Using TensorFlow backend.
>>> my_images = sparkdl.readImages("data/flower_photos/daisy/*.jpg")
>>> my_images.show()
+--------------------+--------------------+
|            filePath|               image|
+--------------------+--------------------+
|hdfs://mycluster/...|[RGB,263,320,3,[B...|
|hdfs://mycluster/...|[RGB,313,500,3,[B...|
|hdfs://mycluster/...|[RGB,215,320,3,[B...|
...
{code}

I think that it may be better to add ivy2/jar directory path to PYTHONPATH 
while launching PySpark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21186) PySpark with --packages fails to import library due to lack of pythonpath to .ivy2/jars/*.jar

Reply via email to