Hi All,

PFB sample code ,

val df = spark.read.parquet(....)
df.registerTempTable("df")
val zip = df.select("zip_code").distinct().as[String].rdd


def comp(zipcode:String):Unit={

val zipval = "SELECT * FROM df WHERE
zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode)
val data = spark.sql(zipval) //Throwing null pointer exception with RDD
data.write.parquet(......)

}

val sam = zip.map(x => comp(x))
sam.count

But when i do val zip =
df.select("zip_code").distinct().as[String].rdd.collect and call the
function, then i get data computer, but in sequential order.

I would like to know, why when tried running map with rdd, i get null
pointer exception and is there a way to compute the comp function for each
zipcode in parallel ie run multiple zipcode at the same time.

Any clue or inputs are appreciated.

Regards.

Reply via email to