JDBCRdd issue

Saurabh Malviya (samalviy) Mon, 21 Sep 2015 11:04:34 -0700

Hi,


While using reference with in JDBCRdd , It is throwing serializable exception. 
Does JDBCRdd does not except reference from other part of code.?
             confMap= ConfFactory.getConf(ParquetStreaming)

              val jdbcRDD = new JdbcRDD(sc, () => {
                Class.forName("org.apache.phoenix.jdbc.PhoenixDriver")
                DriverManager.getConnection(confMap(PHOENIX_URL)) -Throwing 
below exception
                
//DriverManager.getConnection(ConfFactory.getConf(ParquetStreaming)(PHOENIX_URL))
 ---This works
              }, s"SELECT tenant_id, data_source_id, mne_id, device_type1_key " 
+
                 s" FROM XYZ_TYPE1_TEST WHERE DEVICE_TYPE1_KEY >= ? and 
DEVICE_TYPE1_KEY <= ? and TENANT_ID in ($tenantIds) " +
                 s" AND DATA_SOURCE_ID in ($dataSourceIds) AND ISDELETED = 
false",
                minKey, maxKey, 10, row => DeviceDel(row.getString(1), 
row.getString(2), row.getLong(3), row.getLong(4))).cache()

It throws runtime exception. However, " 
DriverManager.getConnection("jdbc:phoenix:10.20.87.1:2181") "   works fine.

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task not serializable: java.io.NotSerializableException: 
org.apache.spark.SparkContext
Serialization stack:
                - object not serializable (class: 
org.apache.spark.SparkContext, value: 
org.apache.spark.SparkContext@5bb273b4<mailto:org.apache.spark.SparkContext@5bb273b4>)
                - field (class: 
advance_reporting.transformations.DeviceDelETL$$anonfun$main$1, name: sc$1, 
type: class org.apache.spark.SparkContext)
                - object (class 
advance_reporting.transformations.DeviceDelETL$$anonfun$main$1, <function1>)
                - field (class: 
advance_reporting.transformations.DeviceDelETL$$anonfun$main$1$$anonfun$6, 
name: $outer, type: class $$anonfun$main$1)
                - object (class 
advance_reporting.transformations.DeviceDelETL$$anonfun$main$1$$anonfun$6, 
<function0>)
                - field (class: org.apache.spark.rdd.JdbcRDD, name: 
org$apache$spark$rdd$JdbcRDD$$getConnection, type: interface scala.Function0)
                - object (class org.apache.spark.rdd.JdbcRDD, JdbcRDD[15] at 
JdbcRDD at DeviceDelETL.scala:91)
                - field (class: scala.Tuple2, name: _1, type: class 
java.lang.Object)
                - object (class scala.Tuple2, (JdbcRDD[15] at JdbcRDD at 
DeviceDelETL.scala:91,<function2>))
                at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
                at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
                at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
                at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
                at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
                at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
                at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:878)
                at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815)
                at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799)
                at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1426)
                at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
                at 
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Sep 18, 2015 12:22:59 PM INFO: parquet.hadoop.ParquetFileReader: Initiating 
action with parallelism: 5

Any idea?

JDBCRdd issue

Reply via email to