Its JTDS 1.3.1; http://sourceforge.net/projects/jtds/files/jtds/1.3.1/
I put that jar in /tmp on the driver/machine I’m running spark shell from. Then I ran with ./bin/spark-shell --jars /tmp/jtds-1.3.1.jar --master yarn-client So I’m guessing that --jars doesn’t set the class path for the primordial class loader. And because its on the class path in ‘user land’ I’m guessing Thinking a work around would be to merge my spark assembly jar with the jtds driver… But it seems like a hack. The other thing I notice is there is --file which lets me pass around files with the YARN distribute, so Im thinking I can somehow use this if --jars doesn’t work. Really I need to understand how the spark class path is set when running on YARN. From: "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com<mailto:deepuj...@gmail.com>> Date: Thursday, 16 April 2015 3:02 pm To: Nathan <nathan.mccar...@quantium.com.au<mailto:nathan.mccar...@quantium.com.au>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0 Can you provide the JDBC connector jar version. Possibly the full JAR name and full command you ran Spark with ? On Wed, Apr 15, 2015 at 11:27 AM, Nathan McCarthy <nathan.mccar...@quantium.com.au<mailto:nathan.mccar...@quantium.com.au>> wrote: Just an update, tried with the old JdbcRDD and that worked fine. From: Nathan <nathan.mccar...@quantium.com.au<mailto:nathan.mccar...@quantium.com.au>> Date: Wednesday, 15 April 2015 1:57 pm To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0 Hi guys, Trying to use a Spark SQL context’s .load(“jdbc", …) method to create a DF from a JDBC data source. All seems to work well locally (master = local[*]), however as soon as we try and run on YARN we have problems. We seem to be running into problems with the class path and loading up the JDBC driver. I’m using the jTDS 1.3.1 driver, net.sourceforge.jtds.jdbc.Driver. ./bin/spark-shell --jars /tmp/jtds-1.3.1.jar --master yarn-client When trying to run I get an exception; scala> sqlContext.load("jdbc", Map("url" -> "jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "dbtable" -> "CUBE.DIM_SUPER_STORE_TBL”)) java.sql.SQLException: No suitable driver found for jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd Thinking maybe we need to force load the driver, if I supply “driver” -> “net.sourceforge.jtds.jdbc.Driver” to .load we get; scala> sqlContext.load("jdbc", Map("url" -> "jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd", "driver" -> "net.sourceforge.jtds.jdbc.Driver", "dbtable" -> "CUBE.DIM_SUPER_STORE_TBL”)) java.lang.ClassNotFoundException: net.sourceforge.jtds.jdbc.Driver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:97) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:21) Yet if I run a Class.forName() just from the shell; scala> Class.forName("net.sourceforge.jtds.jdbc.Driver") res1: Class[_] = class net.sourceforge.jtds.jdbc.Driver No problem finding the JAR. I’ve tried in both the shell, and running with spark-submit (packing the driver in with my application as a fat JAR). Nothing seems to work. I can also get a connection in the driver/shell no problem; scala> import java.sql.DriverManager import java.sql.DriverManager scala> DriverManager.getConnection("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd") res3: java.sql.Connection = net.sourceforge.jtds.jdbc.JtdsConnection@2a67ecd0 I’m probably missing some class path setting here. In jdbc.DefaultSource.createRelation it looks like the call to Class.forName doesn’t specify a class loader so it just uses the default Java behaviour to reflectively get the class loader. It almost feels like its using a different class loader. I also tried seeing if the class path was there on all my executors by running; import scala.collection.JavaConverters._ sc.parallelize(Seq(1,2,3,4)).flatMap(_ => java.sql.DriverManager.getDrivers().asScala.map(d => s”$d | ${d.acceptsURL("jdbc:jtds:sqlserver://blah:1433/MyDB;user=usr;password=pwd")}")).collect().foreach(println) This successfully returns; 15/04/15 01:07:37 INFO scheduler.DAGScheduler: Job 0 finished: collect at Main.scala:46, took 1.495597 s org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true org.apache.derby.jdbc.AutoloadedDriver40 | false com.mysql.jdbc.Driver | false net.sourceforge.jtds.jdbc.Driver | true As a final test we tried with postgres driver and had the same problem. Any ideas? Cheers, Nathan -- Deepak