import org.apache.spark.sql.functions._ val rows = Seq(("90s", 1), ("80s", 2), ("80s", 3)).toDF("age", "gender") rows.show
+---+------+ |age|gender| +---+------+ |90s| 1| |80s| 2| |80s| 3| +---+------+ val modifiedRows .select( substring('age, 0, 2) as "age", when('gender === 1, "male").otherwise(when('gender === 2, "female").otherwise("unknown")) as "gender" ) modifiedRows.show +---+-------+ |age| gender| +---+-------+ | 90| male| | 80| female| | 80|unknown| +---+-------+ On Thu, Nov 17, 2016 at 3:37 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: > Could you give me an example, how to use Column function? > Thanks very much. > > On Thu, Nov 17, 2016 at 12:23 PM, Divya Gehlot <divya.htco...@gmail.com> > wrote: >> >> Hi, >> >> You can use the Column functions provided by Spark API >> >> >> https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html >> >> Hope this helps . >> >> Thanks, >> Divya >> >> >> On 17 November 2016 at 12:08, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: >>> >>> Hi, >>> I have a sample, like: >>> +---+------+--------------------+ >>> |age|gender| city_id| >>> +---+------+--------------------+ >>> | | 1|1042015:city_2044...| >>> |90s| 2|1042015:city_2035...| >>> |80s| 2|1042015:city_2061...| >>> +---+------+--------------------+ >>> >>> and expectation is: >>> "age": 90s -> 90, 80s -> 80 >>> "gender": 1 -> "male", 2 -> "female" >>> >>> I have two solutions: >>> 1. Handle each column separately, and then join all by index. >>> val age = input.select("age").map(...) >>> val gender = input.select("gender").map(...) >>> val result = ... >>> >>> 2. Write utf function for each column, and then use in together: >>> val result = input.select(ageUDF($"age"), genderUDF($"gender")) >>> >>> However, both are awkward, >>> >>> Does anyone have a better work flow? >>> Write some custom Transforms and use pipeline? >>> >>> Thanks. >>> >>> >>> >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org