Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
In spark structured streaming we cannot perform repartition() without stopping the streaming process unless otherwise. Admittedly, It is not a parameter that I have played around with. I still think Spark GUI should provide some insight. view my Linkedin profile

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Sean Owen
That's incorrect, it's spark.default.parallelism, but as the name suggests, that is merely a default. You control partitioning directly with .repartition() On Tue, Mar 14, 2023 at 11:37 AM Mich Talebzadeh wrote: > Check this link > > >

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
Check this link https://sparkbyexamples.com/spark/difference-between-spark-sql-shuffle-partitions-and-spark-default-parallelism/ You can set it spark.conf.set("sparkDefaultParallelism", value]) Have a look at Streaming statistics in Spark GUI, especially *Processing Tim*e, defined by

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Sean Owen
Are you just looking for DataFrame.repartition()? On Tue, Mar 14, 2023 at 10:57 AM Emmanouil Kritharakis < kritharakismano...@gmail.com> wrote: > Hello, > > I hope this email finds you well! > > I have a simple dataflow in which I read from a kafka topic, perform a map > transformation and then

Question related to asynchronously map transformation using java spark structured streaming

2023-03-14 Thread Emmanouil Kritharakis
Hello, I hope this email finds you well! I have a simple dataflow in which I read from a kafka topic, perform a map transformation and then I write the result to another topic. Based on your documentation here

Re: Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Mich Talebzadeh
What benefits are you going with increasing parallelism? Better througput view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss,

Question related to parallelism using structed streaming parallelism

2023-03-14 Thread Emmanouil Kritharakis
Hello, I hope this email finds you well! I have a simple dataflow in which I read from a kafka topic, perform a map transformation and then I write the result to another topic. Based on your documentation here

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Mich Talebzadeh
Hi Denny, That Apache Spark Linkedin page https://www.linkedin.com/company/apachespark/ looks fine. It also allows a wider audience to benefit from it. +1 for me view my Linkedin profile

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-14 Thread Gary Liu
Hi Mich, The y-axis is the number of executors. The code ran on dataproc serverless spark on 3.3.2. I tried closing autoscaling by setting the following: spark.dynamicAllocation.enabled=false spark.executor.instances=60 And still got the FetchFailedException error. I Wonder why it can run

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Denny Lee
In the past, we've been using the Apache Spark LinkedIn page and group to broadcast these type of events - if you're cool with this? Or we could go through the process of submitting and updating the current https://spark.apache.org or request to

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Joris Billen
This is a very good idea-would love to read such a confluence page. Adding a section “common mistakes/misconceptions” might be useful for many of these sections. It would describe undesired behaviour/errors one would get in case of not following some best practices. On 13 Mar 2023, at 17:20,

Re: Spark 3.3.2 not running with Antlr4 runtime latest version

2023-03-14 Thread yangjie01
From the release notes of antl4 , there are two key changes in antl4 4.10: 1. 4.10-generated parsers incompatible with previous runtimes 2. Increasing minimum java version to Java 11 So I personally think it is temporarily impossible for Spark to upgrade to the antl4 version above

Re: Spark 3.3.2 not running with Antlr4 runtime latest version

2023-03-14 Thread Sean Owen
You want Antlr 3 and Spark is on 4? no I don't think Spark would downgrade. You can shade your app's dependencies maybe. On Tue, Mar 14, 2023 at 8:21 AM Sahu, Karuna wrote: > Hi Team > > > > We are upgrading a legacy application using Spring boot , Spark and > Hibernate. While upgrading

Spark 3.3.2 not running with Antlr4 runtime latest version

2023-03-14 Thread Sahu, Karuna
Hi Team We are upgrading a legacy application using Spring boot , Spark and Hibernate. While upgrading Hibernate to 6.1.6.Final version there is a mismatch for antlr4 runtime jar with Hibernate and latest Spark version. Details for the issue are posted on StackOverflow as well: Issue in

Re: spark on k8s daemonset collect log

2023-03-14 Thread Cheng Pan
The filebeat supports multiline matching, here is an example[1] BTW, I’m working on External Log Service integration[2], it may be useful in your case, feel free to review/left comments [1] https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html#multiline [2]

spark on k8s daemonset collect log

2023-03-14 Thread 404
hi, all Spark runs on k8s, uses daemonset filebeat to collect logs, and writes them to elasticsearch. The docker logs are in json format, and each line is a json string. How to merge multi-line exceptions?