Hello

May I know from what version of spark, the RDD syntax can be shorten as this?

rdd.groupByKey().mapValues(lambda x:len(x)).collect()
[('b', 2), ('d', 1), ('a', 2)]
rdd.groupByKey().mapValues(len).collect()
[('b', 2), ('d', 1), ('a', 2)]

I know in scala the syntax: xxx(x => x.len) can be written as: xxx(_.len).
But I never know in pyspark the "_" placeholder can even be ignored.

Thank you.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to