Hi All, PFB sample code ,
val df = spark.read.parquet(....) df.registerTempTable("df") val zip = df.select("zip_code").distinct().as[String].rdd def comp(zipcode:String):Unit={ val zipval = "SELECT * FROM df WHERE zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode) val data = spark.sql(zipval) //Throwing null pointer exception with RDD data.write.parquet(......) } val sam = zip.map(x => comp(x)) sam.count But when i do val zip = df.select("zip_code").distinct().as[String].rdd.collect and call the function, then i get data computer, but in sequential order. I would like to know, why when tried running map with rdd, i get null pointer exception and is there a way to compute the comp function for each zipcode in parallel ie run multiple zipcode at the same time. Any clue or inputs are appreciated. Regards.