subject:"Pyspark \- How to add new column to dataframe based on existing column value"

Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread Viktor ARDELEAN

Hello,

I want to add a new String column to the dataframe based on an existing
column values:

from pyspark.sql.functions import lit

df.withColumn('strReplaced', lit(df.str.replace("a", "b").replace("c", "d")))

So basically I want to add a new column named "strReplaced", that is
the same as the "str" column, just with character "a" replaced with
"b" and "c" replaced with "d".

When I try the code above I get following error:

Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'Column' object has no attribute 'replace'


So in fact I need somehow to get the value of the column df.str in
order to call replace on it.

Any ideas how to do this?
-- 
Viktor ARDELEAN

*P*   Don't print this email, unless it's really necessary. Take care of
the environment.

Re: Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread ndjido

Hi Viktor,

Try to create a UDF. It's quite simple!

Ardo.


> On 10 Feb 2016, at 10:34, Viktor ARDELEAN  wrote:
> 
> Hello,
> 
> I want to add a new String column to the dataframe based on an existing 
> column values:
> 
> from pyspark.sql.functions import lit
> df.withColumn('strReplaced', lit(df.str.replace("a", "b").replace("c", "d")))
> So basically I want to add a new column named "strReplaced", that is the same 
> as the "str" column, just with character "a" replaced with "b" and "c" 
> replaced with "d".
> When I try the code above I get following error:
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: 'Column' object has no attribute 'replace'
> 
> So in fact I need somehow to get the value of the column df.str in order to 
> call replace on it.
> Any ideas how to do this?
> -- 
> Viktor ARDELEAN
> 
> P   Don't print this email, unless it's really necessary. Take care of the 
> environment.

Re: Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread Viktor ARDELEAN

I figured it out.
Here is how it's done:

from pyspark.sql.functions import udf
replaceFunction = udf(lambda columnValue : columnValue.replace("\n", "
").replace('\r', " "))

df.withColumn('strReplaced', replaceFunction(df["str"]))


On 10 February 2016 at 13:04,  wrote:

> Hi Viktor,
>
> Try to create a UDF. It's quite simple!
>
> Ardo.
>
>
> On 10 Feb 2016, at 10:34, Viktor ARDELEAN  wrote:
>
> Hello,
>
> I want to add a new String column to the dataframe based on an existing
> column values:
>
> from pyspark.sql.functions import lit
>
> df.withColumn('strReplaced', lit(df.str.replace("a", "b").replace("c", "d")))
>
> So basically I want to add a new column named "strReplaced", that is the same 
> as the "str" column, just with character "a" replaced with "b" and "c" 
> replaced with "d".
>
> When I try the code above I get following error:
>
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: 'Column' object has no attribute 'replace'
>
>
> So in fact I need somehow to get the value of the column df.str in order to 
> call replace on it.
>
> Any ideas how to do this?
> --
> Viktor ARDELEAN
>
> *P*   Don't print this email, unless it's really necessary. Take care of
> the environment.
>
>


-- 
Viktor ARDELEAN

*P*   Don't print this email, unless it's really necessary. Take care of
the environment.

Pyspark - How to add new column to dataframe based on existing column value

Re: Pyspark - How to add new column to dataframe based on existing column value

Re: Pyspark - How to add new column to dataframe based on existing column value

3 matches

Site Navigation

Mail list logo

Footer information