Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread Viktor ARDELEAN
Hello, I want to add a new String column to the dataframe based on an existing column values: from pyspark.sql.functions import lit df.withColumn('strReplaced', lit(df.str.replace("a", "b").replace("c", "d"))) So basically I want to add a new column named "strReplaced", that is the same as the

Re: Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread ndjido
Hi Viktor, Try to create a UDF. It's quite simple! Ardo. > On 10 Feb 2016, at 10:34, Viktor ARDELEAN wrote: > > Hello, > > I want to add a new String column to the dataframe based on an existing > column values: > > from pyspark.sql.functions import lit >

Re: Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread Viktor ARDELEAN
I figured it out. Here is how it's done: from pyspark.sql.functions import udf replaceFunction = udf(lambda columnValue : columnValue.replace("\n", " ").replace('\r', " ")) df.withColumn('strReplaced', replaceFunction(df["str"])) On 10 February 2016 at 13:04, wrote: > Hi