Re: Question related to parallelism using structed streaming parallelism

2023-03-21 Thread Mich Talebzadeh
or download it from here https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited view my Linkedin profile

Re: Question related to parallelism using structed streaming parallelism

2023-03-21 Thread Sean Owen
Yes more specifically, you can't ask for executors once the app starts, in SparkConf like that. You set this when you launch it against a Spark cluster in spark-submit or otherwise. On Tue, Mar 21, 2023 at 4:23 AM Mich Talebzadeh wrote: > Hi Emmanouil, > > This means that your job is running on

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
In spark structured streaming we cannot perform repartition() without stopping the streaming process unless otherwise. Admittedly, It is not a parameter that I have played around with. I still think Spark GUI should provide some insight. view my Linkedin profile

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Sean Owen
That's incorrect, it's spark.default.parallelism, but as the name suggests, that is merely a default. You control partitioning directly with .repartition() On Tue, Mar 14, 2023 at 11:37 AM Mich Talebzadeh wrote: > Check this link > > >

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
Check this link https://sparkbyexamples.com/spark/difference-between-spark-sql-shuffle-partitions-and-spark-default-parallelism/ You can set it spark.conf.set("sparkDefaultParallelism", value]) Have a look at Streaming statistics in Spark GUI, especially *Processing Tim*e, defined by

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Sean Owen
Are you just looking for DataFrame.repartition()? On Tue, Mar 14, 2023 at 10:57 AM Emmanouil Kritharakis < kritharakismano...@gmail.com> wrote: > Hello, > > I hope this email finds you well! > > I have a simple dataflow in which I read from a kafka topic, perform a map > transformation and then

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
What benefits are you going with increasing parallelism? Better througput view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss,

Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Emmanouil Kritharakis
Hello, I hope this email finds you well! I have a simple dataflow in which I read from a kafka topic, perform a map transformation and then I write the result to another topic. Based on your documentation here