Please see my this test: https://blog.cloudcache.net/computing-performance-comparison-for-words-statistics/
Don’t use Python RDD, using dataframe instead. Regards On Fri, Feb 4, 2022 at 5:02 PM Hinko Kocevar <hinko.koce...@ess.eu.invalid> wrote: > I'm looking into using Python interface with Spark and came across this > [1] chart showing some performance hit when going with Python RDD. Data is > ~ 7 years and for older version of Spark. Is this still the case with more > recent Spark releases? > > I'm trying to understand what to expect from Python and Spark and under > what conditions. > > [1] > https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html > > Thanks, > //hinko > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >