Hi,

I'm trying to figure out how to use a third party jar inside a python
program which I'm running via PyCharm in order to debug it. I am normally
able to run spark code in python such as this:

    spark_conf = SparkConf().setMaster('local').setAppName('test')
    sc = SparkContext(conf=spark_conf)
    cars = sc.textFile('c:/cars.csv')
    print cars.count()
    sc.stop()

The code I'm trying to run is below - it uses the databricks spark csv jar.
I can get it working fine in pyspark shell using the packages argument, but
I can't figure out how to get it to work via PyCharm.

from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext

spark_conf = SparkConf().setMaster('local').setAppName('test')
sc = SparkContext(conf=spark_conf)

sqlContext = SQLContext(sc)
df = sqlContext.load(source="com.databricks.spark.csv", header="true", path
= "c:/cars.csv", delimiter='\t')
df.select("year")

The error message I'm getting is:
py4j.protocol.Py4JJavaError: An error occurred while calling o20.load.
: java.lang.RuntimeException: Failed to load class for data source:
com.databricks.spark.csv
        at scala.sys.package$.error(package.scala:27)
        at
org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:194)
        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:205)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:745)


I presume I need to set the spark classpath somehow but I'm not sure of the
right way to do it. Any advice/guidance would be appreciated.

Thanks,

Mark.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-jars-to-standalone-pyspark-program-tp22685.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to