Hi Alexander Many thanks. I think the key was I needed to import that max function. Turns out you do not need to use col Df.select(max(³foo²)).show()
To get the actual value of max you still need to write more code than I would expect. I wonder if there is a easier way to work with Rows? In [19]: from pyspark.sql.functions import max maxRow = idDF2.select(max("col[id]")).collect() max = maxRow[0].asDict()['max(col[id])'] max Out[19]: 713912692155621376 From: Alexander Krasnukhin <the.malk...@gmail.com> Date: Monday, March 28, 2016 at 5:55 PM To: Andrew Davidson <a...@santacruzintegration.com> Cc: "user @spark" <user@spark.apache.org> Subject: Re: looking for an easy to to find the max value of a column in a data frame > e.g. select max value for column "foo": > > from pyspark.sql.functions import max, col > df.select(max(col("foo"))).show() > > On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson <a...@santacruzintegration.com> > wrote: >> I am using pyspark 1.6.1 and python3. >> >> >> Given: >> >> idDF2 = idDF.select(idDF.id, idDF.col.id <http://idDF.col.id> ) >> idDF2.printSchema() >> idDF2.show() >> root >> |-- id: string (nullable = true) >> |-- col[id]: long (nullable = true) >> >> +----------+----------+ >> | id| col[id]| >> +----------+----------+ >> |1008930924| 534494917| >> |1008930924| 442237496| >> |1008930924| 98069752| >> |1008930924|2790311425| >> |1008930924|3300869821| >> >> >> I have to do a lot of work to get the max value >> >> rows = idDF2.select("col[id]").describe().collect() >> hack = [s for s in rows if s.summary == 'max'] >> print(hack) >> print(hack[0].summary) >> print(type(hack[0])) >> print(hack[0].asDict()['col[id]']) >> maxStr = hack[0].asDict()['col[id]'] >> ttt = int(maxStr) >> numDimensions = 1 + ttt >> print(numDimensions) >> >> Is there an easier way? >> >> Kind regards >> >> Andy > > > > -- > Regards, > Alexander