Hi,

val df = spark.read.parquet(....)
df.registerTempTable("df")
val zip = df.select("zip_code").distinct().as[String].rdd


def comp(zipcode:String):Unit={
val zipval = "SELECT * FROM df WHERE
zip_code='$zipvalrepl'".replace("$zipvalrepl",
zipcode)
val data = spark.sql(zipval)
data.write.parquet(......)
}

val sam = zip.map(x => comp(x)) //the whole idea is to run the comp method
parallely for multiple zipcodes on the cluster,
sam.count                                   but because i have to collect()
and apply map method , i would be ending calling comp for single zipcode
                                                and executing comp for each
zipcode sequentially.

Regards.

On Tue, Dec 20, 2016 at 5:46 PM, Liang-Chi Hsieh <vii...@gmail.com> wrote:

>
> Hi,
>
> You can't invoke any RDD actions/transformations inside another
> transformations. They must be invoked by the driver.
>
> If I understand your purpose correctly, you can partition your data (i.e.,
> `partitionBy`) when writing out to parquet files.
>
>
>
> -----
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Null-pointer-
> exception-with-RDD-while-computing-a-method-creating-
> dataframe-tp20308p20309.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to