Re: UDF on multiple columns

Meeraj Kunnumpurath Wed, 12 Oct 2016 11:02:44 -0700

This is what I do at the moment,

def build(path: String, spark: SparkSession) = {
  val toDouble = udf((x: String) => x.toDouble)
  val df = spark.read.
    option("header", "true").
    csv(path).
    withColumn("sqft_living", toDouble('sqft_living)).
    withColumn("price", toDouble('price)).
    withColumn("bedrooms", toDouble('bedrooms)).
    withColumn("bathrooms", toDouble('bathrooms)).
    withColumn("lat", toDouble('lat)).
    withColumn("long", toDouble('long))
  df.createOrReplaceTempView("sales")
  spark.sql("select bedrooms * bedrooms, bedrooms * bathrooms, lat +
long, log(sqft_living), price from sales")
}



On Wed, Oct 12, 2016 at 9:56 PM, Meeraj Kunnumpurath <
mee...@servicesymphony.com> wrote:

> Hello,
>
> How do I write a UDF that operate on two columns. For example, how do I
> introduce a new column, which is a product of two columns already on the
> dataframe.
>
> Many thanks
> Meeraj
>



-- 
*Meeraj Kunnumpurath*


*Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*

*00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>*

Re: UDF on multiple columns

Reply via email to