Re: Question regarding Spark 3.X performance

2023-01-26 Thread Mich Talebzadeh
You have given some stats, 5-10 sec vs 60 sec with set-up and systematics being the same for both tests? so let us assume we see with 3.3.1, <10> sec average time versus 60 with the older spark 2.x so that gives us (60-10) = 50*100/60) ~ 80% gain However, that would not tell us why the 3.3,.1

Re: Question regarding Spark 3.X performance

2023-01-26 Thread Mich Talebzadeh
Please qualify what you mean by* extreme improvements*? What matrix are you using? HTH view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility

Question regarding Spark 3.X performance

2023-01-26 Thread Athanasios Kordelas
Hi all, I'm running some tests on spark streaming (not structured) for my PhD, and I'm seeing an extreme improvement when using Spark/Kafka 3.3.1 versus Spark/Kafka 2.4.8/Kafka 2.7.0. My (scala) application code is as follows: *KafkaStream* => foreachRDD => mapPartitions => repartition =>