Please see my this test:
https://blog.cloudcache.net/computing-performance-comparison-for-words-statistics/

Don’t use Python RDD, using dataframe instead.

Regards

On Fri, Feb 4, 2022 at 5:02 PM Hinko Kocevar <hinko.koce...@ess.eu.invalid>
wrote:

> I'm looking into using Python interface with Spark and came across this
> [1] chart showing some performance hit when going with Python RDD. Data is
> ~ 7 years and for older version of Spark. Is this still the case with more
> recent Spark releases?
>
> I'm trying to understand what to expect from Python and Spark and under
> what conditions.
>
> [1]
> https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html
>
> Thanks,
> //hinko
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to