Hi, val df = spark.read.parquet(....) df.registerTempTable("df") val zip = df.select("zip_code").distinct().as[String].rdd
def comp(zipcode:String):Unit={ val zipval = "SELECT * FROM df WHERE zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode) val data = spark.sql(zipval) data.write.parquet(......) } val sam = zip.map(x => comp(x)) //the whole idea is to run the comp method parallely for multiple zipcodes on the cluster, sam.count but because i have to collect() and apply map method , i would be ending calling comp for single zipcode and executing comp for each zipcode sequentially. Regards. On Tue, Dec 20, 2016 at 5:46 PM, Liang-Chi Hsieh <vii...@gmail.com> wrote: > > Hi, > > You can't invoke any RDD actions/transformations inside another > transformations. They must be invoked by the driver. > > If I understand your purpose correctly, you can partition your data (i.e., > `partitionBy`) when writing out to parquet files. > > > > ----- > Liang-Chi Hsieh | @viirya > Spark Technology Center > http://www.spark.tc/ > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Null-pointer- > exception-with-RDD-while-computing-a-method-creating- > dataframe-tp20308p20309.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >