This is a little ugly, but it may do what you're after -

df.withColumn('total', expr("+".join([col for col in df.columns])))

I believe this will handle null values ok, but will likely error if there
are any string columns present.


Mike



On Thu, Aug 4, 2016 at 8:41 AM, Javier Rey <jre...@gmail.com> wrote:

> Hi everybody,
>
> Sorry, I sent last mesage it was imcomplete this is complete:
>
> I'm using PySpark and I have a Spark dataframe with a bunch of numeric
> columns. I want to add a column that is the sum of all the other columns.
>
> Suppose my dataframe had columns "a", "b", and "c". I know I can do this:
>
> df.withColumn('total_col', df.a + df.b + df.c)
>
> The problem is that I don't want to type out each column individually and
> add them, especially if I have a lot of columns. I want to be able to do
> this automatically or by specifying a list of column names that I want to
> add. Is there another way to do this?
>
> I find this solution:
>
> df.withColumn('total', sum(df[col] for col in df.columns))
>
> But I get this error:
>
> "AttributeError: 'generator' object has no attribute '_get_object_id"
>
> Additionally I want to sum onlt not nulls values.
>
> Thanks in advance,
>
> Samir
>

Reply via email to