Its JTDS 1.3.1; http://sourceforge.net/projects/jtds/files/jtds/1.3.1/

I put that jar in /tmp on the driver/machine I’m running spark shell from.

Then I ran with ./bin/spark-shell --jars /tmp/jtds-1.3.1.jar --master 
yarn-client

So I’m guessing that --jars doesn’t set the class path for the primordial class 
loader. And because its on the class path in ‘user land’ I’m guessing

Thinking a work around would be to merge my spark assembly jar with the jtds 
driver… But it seems like a hack. The other thing I notice is there is --file 
which lets me pass around files with the YARN distribute, so Im thinking I can 
somehow use this if --jars doesn’t work.

Really I need to understand how the spark class path is set when running on 
YARN.


From: "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com<mailto:deepuj...@gmail.com>>
Date: Thursday, 16 April 2015 3:02 pm
To: Nathan 
<nathan.mccar...@quantium.com.au<mailto:nathan.mccar...@quantium.com.au>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0

Can you provide the JDBC connector jar version. Possibly the full JAR name and 
full command you ran Spark with ?

On Wed, Apr 15, 2015 at 11:27 AM, Nathan McCarthy 
<nathan.mccar...@quantium.com.au<mailto:nathan.mccar...@quantium.com.au>> wrote:
Just an update, tried with the old JdbcRDD and that worked fine.

From: Nathan 
<nathan.mccar...@quantium.com.au<mailto:nathan.mccar...@quantium.com.au>>
Date: Wednesday, 15 April 2015 1:57 pm
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0

Hi guys,

Trying to use a Spark SQL context’s .load(“jdbc", …) method to create a DF from 
a JDBC data source. All seems to work well locally (master = local[*]), however 
as soon as we try and run on YARN we have problems.

We seem to be running into problems with the class path and loading up the JDBC 
driver. I’m using the jTDS 1.3.1 driver, net.sourceforge.jtds.jdbc.Driver.

./bin/spark-shell --jars /tmp/jtds-1.3.1.jar --master yarn-client

When trying to run I get an exception;

scala> sqlContext.load("jdbc", Map("url" -> 
"jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "dbtable" -> 
"CUBE.DIM_SUPER_STORE_TBL”))

java.sql.SQLException: No suitable driver found for 
jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd

Thinking maybe we need to force load the driver, if I supply “driver” -> 
“net.sourceforge.jtds.jdbc.Driver” to .load we get;

scala> sqlContext.load("jdbc", Map("url" -> 
"jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "driver" -> 
"net.sourceforge.jtds.jdbc.Driver", "dbtable" -> "CUBE.DIM_SUPER_STORE_TBL”))

java.lang.ClassNotFoundException: net.sourceforge.jtds.jdbc.Driver
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:191)
        at 
org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:97)
        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:21)

Yet if I run a Class.forName() just from the shell;

scala> Class.forName("net.sourceforge.jtds.jdbc.Driver")
res1: Class[_] = class net.sourceforge.jtds.jdbc.Driver

No problem finding the JAR. I’ve tried in both the shell, and running with 
spark-submit (packing the driver in with my application as a fat JAR). Nothing 
seems to work.

I can also get a connection in the driver/shell no problem;

scala> import java.sql.DriverManager
import java.sql.DriverManager
scala> 
DriverManager.getConnection("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd")
res3: java.sql.Connection = net.sourceforge.jtds.jdbc.JtdsConnection@2a67ecd0

I’m probably missing some class path setting here. In 
jdbc.DefaultSource.createRelation it looks like the call to Class.forName 
doesn’t specify a class loader so it just uses the default Java behaviour to 
reflectively get the class loader. It almost feels like its using a different 
class loader.

I also tried seeing if the class path was there on all my executors by running;

import scala.collection.JavaConverters._
sc.parallelize(Seq(1,2,3,4)).flatMap(_ => 
java.sql.DriverManager.getDrivers().asScala.map(d => s”$d | 
${d.acceptsURL("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd")}")).collect().foreach(println)

This successfully returns;

15/04/15 01:07:37 INFO scheduler.DAGScheduler: Job 0 finished: collect at 
Main.scala:46, took 1.495597 s
org.apache.derby.jdbc.AutoloadedDriver40 | false
com.mysql.jdbc.Driver | false
net.sourceforge.jtds.jdbc.Driver | true
org.apache.derby.jdbc.AutoloadedDriver40 | false
com.mysql.jdbc.Driver | false
net.sourceforge.jtds.jdbc.Driver | true
org.apache.derby.jdbc.AutoloadedDriver40 | false
com.mysql.jdbc.Driver | false
net.sourceforge.jtds.jdbc.Driver | true
org.apache.derby.jdbc.AutoloadedDriver40 | false
com.mysql.jdbc.Driver | false
net.sourceforge.jtds.jdbc.Driver | true

As a final test we tried with postgres driver and had the same problem. Any 
ideas?

Cheers,
Nathan



--
Deepak

Reply via email to