Re: Question regarding Spark 3.X performance

2023-01-27 Thread Athanasios Kordelas
Hi Mich, I'll gather them and send them to you :) Many thanks, Thanasis Στις Παρ 27 Ιαν 2023 στις 11:40 π.μ., ο/η Mich Talebzadeh < mich.talebza...@gmail.com> έγραψε: > > Hi Athanasios > > > Thanks for the details. Since I believe this is Spark streaming, the all > important indicator is the

Re: Question regarding Spark 3.X performance

2023-01-27 Thread Mich Talebzadeh
Hi Athanasios Thanks for the details. Since I believe this is Spark streaming, the all important indicator is the Processing Time defined by Spark GUI as Time taken to process all jobs of a batch versus the batch interval. The Scheduling Delay and the Total Delay are additional indicators of

Re: Question regarding Spark 3.X performance

2023-01-26 Thread Mich Talebzadeh
You have given some stats, 5-10 sec vs 60 sec with set-up and systematics being the same for both tests? so let us assume we see with 3.3.1, <10> sec average time versus 60 with the older spark 2.x so that gives us (60-10) = 50*100/60) ~ 80% gain However, that would not tell us why the 3.3,.1

Re: Question regarding Spark 3.X performance

2023-01-26 Thread Mich Talebzadeh
Please qualify what you mean by* extreme improvements*? What matrix are you using? HTH view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility

Question regarding Spark 3.X performance

2023-01-26 Thread Athanasios Kordelas
Hi all, I'm running some tests on spark streaming (not structured) for my PhD, and I'm seeing an extreme improvement when using Spark/Kafka 3.3.1 versus Spark/Kafka 2.4.8/Kafka 2.7.0. My (scala) application code is as follows: *KafkaStream* => foreachRDD => mapPartitions => repartition =>