Re: looking for an easy to to find the max value of a column in a data frame

2016-03-29 Thread Andy Davidson
Nice From: Alexander Krasnukhin Date: Tuesday, March 29, 2016 at 10:42 AM To: Andrew Davidson Cc: "user @spark" Subject: Re: looking for an easy to to find the max value of a column in a data frame > You can

Re: looking for an easy to to find the max value of a column in a data frame

2016-03-29 Thread Alexander Krasnukhin
You can even use the fact that pyspark has dynamic properties rows = idDF2.select(max("col[id]").alias("max")).collect() firstRow = rows[0] max = firstRow.max On Tue, Mar 29, 2016 at 7:14 PM, Alexander Krasnukhin wrote: > You should be able to index columns directly

Re: looking for an easy to to find the max value of a column in a data frame

2016-03-29 Thread Alexander Krasnukhin
You should be able to index columns directly either by index or column name i.e. from pyspark.sql.functions import max rows = idDF2.select(max("col[id]")).collect() firstRow = rows[0] # by index max = firstRow[0] # by column name max = firstRow["max(col[id])"] On Tue, Mar 29, 2016 at 6:58 PM,

Re: looking for an easy to to find the max value of a column in a data frame

2016-03-29 Thread Andy Davidson
Hi Alexander Many thanks. I think the key was I needed to import that max function. Turns out you do not need to use col Df.select(max(³foo²)).show() To get the actual value of max you still need to write more code than I would expect. I wonder if there is a easier way to work with Rows? In

Re: looking for an easy to to find the max value of a column in a data frame

2016-03-28 Thread Alexander Krasnukhin
e.g. select max value for column "foo": from pyspark.sql.functions import max, col df.select(max(col("foo"))).show() On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson < a...@santacruzintegration.com> wrote: > I am using pyspark 1.6.1 and python3. > > > *Given:* > > idDF2 = idDF.select(idDF.id,

looking for an easy to to find the max value of a column in a data frame

2016-03-28 Thread Andy Davidson
I am using pyspark 1.6.1 and python3. Given: idDF2 = idDF.select(idDF.id, idDF.col.id ) idDF2.printSchema() idDF2.show() root |-- id: string (nullable = true) |-- col[id]: long (nullable = true) +--+--+ |id| col[id]| +--+--+ |1008930924| 534494917|