Re: Add column sum as new column in PySpark dataframe

2016-08-05 Thread Nicholas Chammas
I think this is what you need: import pyspark.sql.functions as sqlf df.withColumn('total', sqlf.sum(df.columns)) Nic On Thu, Aug 4, 2016 at 9:41 AM Javier Rey jre...@gmail.com wrote: Hi everybody, > > Sorry, I sent last mesage it was imcomplete this is

Re: Add column sum as new column in PySpark dataframe

2016-08-04 Thread Mike Metzger
This is a little ugly, but it may do what you're after - df.withColumn('total', expr("+".join([col for col in df.columns]))) I believe this will handle null values ok, but will likely error if there are any string columns present. Mike On Thu, Aug 4, 2016 at 8:41 AM, Javier Rey

Re: Add column sum as new column in PySpark dataframe

2016-08-04 Thread Mich Talebzadeh
sorry you want the sum for each row or sum for each Colum? assuming all rows are numeric Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Add column sum as new column in PySpark dataframe

2016-08-04 Thread Javier Rey
Hi everybody, Sorry, I sent last mesage it was imcomplete this is complete: I'm using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a column that is the sum of all the other columns. Suppose my dataframe had columns "a", "b", and "c". I know I can do this:

Add column sum as new column in PySpark dataframe

2016-08-04 Thread Javier Rey
I'm using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a column that is the sum of all the other columns. Suppose my dataframe had columns "a", "b", and "c". I know I can do this: