Sorry. Small typo. That last part should be: val modifiedRows = rows .select( substring('age, 0, 2) as "age", when('gender === 1, "male").otherwise(when('gender === 2, "female").otherwise("unknown")) as "gender" ) modifiedRows.show
+---+-------+ |age| gender| +---+-------+ | 90| male| | 80| female| | 80|unknown| +---+-------+ On Thu, Nov 17, 2016 at 8:57 AM, Stuart White <stuart.whi...@gmail.com> wrote: > import org.apache.spark.sql.functions._ > > val rows = Seq(("90s", 1), ("80s", 2), ("80s", 3)).toDF("age", "gender") > rows.show > > +---+------+ > |age|gender| > +---+------+ > |90s| 1| > |80s| 2| > |80s| 3| > +---+------+ > > val modifiedRows > .select( > substring('age, 0, 2) as "age", > when('gender === 1, "male").otherwise(when('gender === 2, > "female").otherwise("unknown")) as "gender" > ) > modifiedRows.show > > +---+-------+ > |age| gender| > +---+-------+ > | 90| male| > | 80| female| > | 80|unknown| > +---+-------+ > > On Thu, Nov 17, 2016 at 3:37 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: >> Could you give me an example, how to use Column function? >> Thanks very much. >> >> On Thu, Nov 17, 2016 at 12:23 PM, Divya Gehlot <divya.htco...@gmail.com> >> wrote: >>> >>> Hi, >>> >>> You can use the Column functions provided by Spark API >>> >>> >>> https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html >>> >>> Hope this helps . >>> >>> Thanks, >>> Divya >>> >>> >>> On 17 November 2016 at 12:08, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: >>>> >>>> Hi, >>>> I have a sample, like: >>>> +---+------+--------------------+ >>>> |age|gender| city_id| >>>> +---+------+--------------------+ >>>> | | 1|1042015:city_2044...| >>>> |90s| 2|1042015:city_2035...| >>>> |80s| 2|1042015:city_2061...| >>>> +---+------+--------------------+ >>>> >>>> and expectation is: >>>> "age": 90s -> 90, 80s -> 80 >>>> "gender": 1 -> "male", 2 -> "female" >>>> >>>> I have two solutions: >>>> 1. Handle each column separately, and then join all by index. >>>> val age = input.select("age").map(...) >>>> val gender = input.select("gender").map(...) >>>> val result = ... >>>> >>>> 2. Write utf function for each column, and then use in together: >>>> val result = input.select(ageUDF($"age"), genderUDF($"gender")) >>>> >>>> However, both are awkward, >>>> >>>> Does anyone have a better work flow? >>>> Write some custom Transforms and use pipeline? >>>> >>>> Thanks. >>>> >>>> >>>> >>> >> --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org