This is what I do at the moment, def build(path: String, spark: SparkSession) = { val toDouble = udf((x: String) => x.toDouble) val df = spark.read. option("header", "true"). csv(path). withColumn("sqft_living", toDouble('sqft_living)). withColumn("price", toDouble('price)). withColumn("bedrooms", toDouble('bedrooms)). withColumn("bathrooms", toDouble('bathrooms)). withColumn("lat", toDouble('lat)). withColumn("long", toDouble('long)) df.createOrReplaceTempView("sales") spark.sql("select bedrooms * bedrooms, bedrooms * bathrooms, lat + long, log(sqft_living), price from sales") }
On Wed, Oct 12, 2016 at 9:56 PM, Meeraj Kunnumpurath < mee...@servicesymphony.com> wrote: > Hello, > > How do I write a UDF that operate on two columns. For example, how do I > introduce a new column, which is a product of two columns already on the > dataframe. > > Many thanks > Meeraj > -- *Meeraj Kunnumpurath* *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* *00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>*