I think this is what you need: import pyspark.sql.functions as sqlf
df.withColumn('total', sqlf.sum(df.columns)) Nic On Thu, Aug 4, 2016 at 9:41 AM Javier Rey jre...@gmail.com <http://mailto:jre...@gmail.com> wrote: Hi everybody, > > Sorry, I sent last mesage it was imcomplete this is complete: > > I'm using PySpark and I have a Spark dataframe with a bunch of numeric > columns. I want to add a column that is the sum of all the other columns. > > Suppose my dataframe had columns "a", "b", and "c". I know I can do this: > > df.withColumn('total_col', df.a + df.b + df.c) > > The problem is that I don't want to type out each column individually and > add them, especially if I have a lot of columns. I want to be able to do > this automatically or by specifying a list of column names that I want to > add. Is there another way to do this? > > I find this solution: > > df.withColumn('total', sum(df[col] for col in df.columns)) > > But I get this error: > > "AttributeError: 'generator' object has no attribute '_get_object_id" > > Additionally I want to sum onlt not nulls values. > > Thanks in advance, > > Samir >