Nice
From: Alexander Krasnukhin
Date: Tuesday, March 29, 2016 at 10:42 AM
To: Andrew Davidson
Cc: "user @spark"
Subject: Re: looking for an easy to to find the max value of a column in a
data frame
> You can
You can even use the fact that pyspark has dynamic properties
rows = idDF2.select(max("col[id]").alias("max")).collect()
firstRow = rows[0]
max = firstRow.max
On Tue, Mar 29, 2016 at 7:14 PM, Alexander Krasnukhin wrote:
> You should be able to index columns directly
You should be able to index columns directly either by index or column name
i.e.
from pyspark.sql.functions import max
rows = idDF2.select(max("col[id]")).collect()
firstRow = rows[0]
# by index
max = firstRow[0]
# by column name
max = firstRow["max(col[id])"]
On Tue, Mar 29, 2016 at 6:58 PM,
Hi Alexander
Many thanks. I think the key was I needed to import that max function. Turns
out you do not need to use col
Df.select(max(³foo²)).show()
To get the actual value of max you still need to write more code than I
would expect. I wonder if there is a easier way to work with Rows?
In
e.g. select max value for column "foo":
from pyspark.sql.functions import max, col
df.select(max(col("foo"))).show()
On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson <
a...@santacruzintegration.com> wrote:
> I am using pyspark 1.6.1 and python3.
>
>
> *Given:*
>
> idDF2 = idDF.select(idDF.id,
I am using pyspark 1.6.1 and python3.
Given:
idDF2 = idDF.select(idDF.id, idDF.col.id )
idDF2.printSchema()
idDF2.show()
root
|-- id: string (nullable = true)
|-- col[id]: long (nullable = true)
+--+--+
|id| col[id]|
+--+--+
|1008930924| 534494917|