Thanks, Cody. Yes, I originally started off by looking at that but I get a
compile error if I try and use that approach: constructor JdbcRDD in class
JdbcRDD<T> cannot be applied to given types.  Not to mention that
JavaJdbcRDDSuite somehow manages to not pass in the class tag (the last
argument).

Wonder if it's a JDK version issue, I'm using 1.7.

So I've got this, which doesn't compile

JdbcRDD<Row> jdbcRDD = new JdbcRDD<Row>(
new SparkContext(conf),
new JdbcRDD.ConnectionFactory() {
public Connection getConnection() throws SQLException {
Connection conn = null;
try {
Class.forName(JDBC_DRIVER);
conn = DriverManager.getConnection(JDBC_URL, JDBC_USER, JDBC_PASSWORD);
} catch (ClassNotFoundException ex) {
throw new RuntimeException("Error while loading JDBC driver.", ex);
}
return conn;
}
},
"SELECT * FROM EMPLOYEES",
0L,
1000L,
10,
new Function<ResultSet, Row>() {
public Row call(ResultSet r) throws Exception {
return null; // have some actual logic here...
}
},
scala.reflect.ClassManifestFactory$.MODULE$.fromClass(Row.class));

The other approach was mimicing the DbConnection class from this post:
http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/.
It got around any of the compilation issues but then I got the runtime
error where Spark wouldn't recognize the db connection class as a
scala.Function0.



On Wed, Feb 18, 2015 at 12:37 PM, Cody Koeninger <c...@koeninger.org> wrote:

> Take a look at
>
>
> https://github.com/apache/spark/blob/v1.2.1/core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java
>
>
>
> On Wed, Feb 18, 2015 at 11:14 AM, dgoldenberg <dgoldenberg...@gmail.com>
> wrote:
>
>> I'm reading data from a database using JdbcRDD, in Java, and I have an
>> implementation of Function0<Connection> whose instance I supply as the
>> 'getConnection' parameter into the JdbcRDD constructor. Compiles fine.
>>
>> The definition of the class/function is as follows:
>>
>>   public class GetDbConnection extends AbstractFunction0<Connection>
>> implements Serializable
>>
>> where scala.runtime.AbstractFunction0 extends scala.Function0.
>>
>> At runtime, I get an exception as below. Does anyone have an idea as to
>> how
>> to resolve this/work around it? Thanks.
>>
>> I'm running Spark 1.2.1 built for Hadoop 2.4.
>>
>>
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due
>> to stage failure: Task 3 in stage 0.0 failed 1 times, most recent failure:
>> Lost task 3.0 in stage 0.0 (TID 3, localhost):
>> java.lang.ClassCastException:
>> cannot assign instance of com.kona.motivis.spark.proto.GetDbConnection to
>> field
>> org.apache.spark.rdd.JdbcRDD.org$apache$spark$rdd$JdbcRDD$$getConnection
>> of
>> type scala.Function0 in instance of org.apache.spark.rdd.JdbcRDD
>>         at
>>
>> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
>>         at
>> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>         at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>         at
>>
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
>>         at
>>
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
>>         at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:744)
>>
>> Driver stacktrace:
>>         at
>> org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
>>         at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
>>         at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
>>         at
>>
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>         at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>         at
>>
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
>>         at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
>>         at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
>>         at scala.Option.foreach(Option.scala:236)
>>         at
>>
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
>>         at
>>
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>         at
>>
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>         at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>>         at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>>         at
>>
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>>         at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>         at
>>
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>         at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>         at
>>
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-ClassCastException-with-scala-Function0-tp21707.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to