Re: Best practice for preprocessing feature with DataFrame

2016-11-22 Thread Yan Facai
Thanks, White. On Thu, Nov 17, 2016 at 11:15 PM, Stuart White wrote: > Sorry. Small typo. That last part should be: > > val modifiedRows = rows > .select( > substring('age, 0, 2) as "age", > when('gender === 1, "male").otherwise(when('gender === 2, >

Re: Best practice for preprocessing feature with DataFrame

2016-11-17 Thread Stuart White
Sorry. Small typo. That last part should be: val modifiedRows = rows .select( substring('age, 0, 2) as "age", when('gender === 1, "male").otherwise(when('gender === 2, "female").otherwise("unknown")) as "gender" ) modifiedRows.show +---+---+ |age| gender| +---+---+ | 90|

Re: Best practice for preprocessing feature with DataFrame

2016-11-17 Thread Stuart White
import org.apache.spark.sql.functions._ val rows = Seq(("90s", 1), ("80s", 2), ("80s", 3)).toDF("age", "gender") rows.show +---+--+ |age|gender| +---+--+ |90s| 1| |80s| 2| |80s| 3| +---+--+ val modifiedRows .select( substring('age, 0, 2) as "age", when('gender

Re: Best practice for preprocessing feature with DataFrame

2016-11-17 Thread Yan Facai
Could you give me an example, how to use Column function? Thanks very much. On Thu, Nov 17, 2016 at 12:23 PM, Divya Gehlot wrote: > Hi, > > You can use the Column functions provided by Spark API > > https://spark.apache.org/docs/1.6.2/api/java/org/apache/ >

Re: Best practice for preprocessing feature with DataFrame

2016-11-16 Thread Divya Gehlot
Hi, You can use the Column functions provided by Spark API https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html Hope this helps . Thanks, Divya On 17 November 2016 at 12:08, 颜发才(Yan Facai) wrote: > Hi, > I have a sample, like: >

Best practice for preprocessing feature with DataFrame

2016-11-16 Thread Yan Facai
Hi, I have a sample, like: +---+--++ |age|gender| city_id| +---+--++ | | 1|1042015:city_2044...| |90s| 2|1042015:city_2035...| |80s| 2|1042015:city_2061...| +---+--++ and expectation is: "age": 90s