StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-25 Thread karan alang
Hello All, I'm running a StructuredStreaming program on GCP Dataproc, which reads data from Kafka, does some processing and puts processed data back into Kafka. The program was running fine, when I killed it (to make minor changes), and then re-started it. It is giving me the error -

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
Ahh, ok. So, Kafka 3.1 is supported for Spark 3.2.1. Thank you very much. From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Friday, February 25, 2022 2:50 PM To: Michael Williams (SSI) Cc: user@spark.apache.org Subject: Re: Spark Kafka Integration these are the old and news ones

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
Thank you, that is good to know. From: Sean Owen [mailto:sro...@gmail.com] Sent: Friday, February 25, 2022 2:46 PM To: Michael Williams (SSI) Cc: Mich Talebzadeh ; user@spark.apache.org Subject: Re: Spark Kafka Integration Spark 3.2.1 is compiled vs Kafka 2.8.0; the forthcoming Spark 3.3

Re: Spark Kafka Integration

2022-02-25 Thread Mich Talebzadeh
these are the old and news ones For spark 3.1.1 I needed these jar files to make it work kafka-clients-2.7.0.jar --> kafka-clients-3.1.0.jar commons-pool2-2.9.0.jar --> commons-pool2-2.11.1.jar

Re: Spark Kafka Integration

2022-02-25 Thread Sean Owen
Spark 3.2.1 is compiled vs Kafka 2.8.0; the forthcoming Spark 3.3 against Kafka 3.1.0. It may well be mutually compatible though. On Fri, Feb 25, 2022 at 2:40 PM Michael Williams (SSI) < michael.willi...@ssigroup.com> wrote: > I believe it is 3.1, but if there is a different version that “works

Re: Help With unstructured text file with spark scala

2022-02-25 Thread Danilo Sousa
Rafael Mendes, Are you from ? Thanks. > On 21 Feb 2022, at 15:33, Danilo Sousa wrote: > > Yes, this a only single file. > > Thanks Rafael Mendes. > >> On 13 Feb 2022, at 07:13, Rafael Mendes > > wrote: >> >> Hi, Danilo. >> Do you have a single large file,

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
Thank you From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Friday, February 25, 2022 2:35 PM To: Michael Williams (SSI) Cc: Sean Owen ; user@spark.apache.org Subject: Re: Spark Kafka Integration please see my earlier reply for 3.1.1 tested and worked in Google Dataproc

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
I believe it is 3.1, but if there is a different version that “works better” with spark, any advice would be appreciated. Our entire team is totally new to spark and kafka (this is a poc trial). From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Friday, February 25, 2022 2:30 PM

Re: Spark Kafka Integration

2022-02-25 Thread Mich Talebzadeh
please see my earlier reply for 3.1.1 tested and worked in Google Dataproc environment Also this article of mine may be useful Processing Change Data Capture with Spark Structured Streaming HTH

Re: Spark Kafka Integration

2022-02-25 Thread Mich Talebzadeh
and what version of kafka do you have 2.7? for spark 3.1.1 I needed these jar files to make it work kafka-clients-2.7.0.jar commons-pool2-2.9.0.jar spark-streaming_2.12-3.1.1.jar spark-sql-kafka-0-10_2.12-3.1.0.jar HTH view my Linkedin profile

RE: Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
The use case is for spark structured streaming (a spark app will be launched by a worker service that monitors the kafka topic for new messages, once the messages are consumed, the spark app will terminate), but if there is a hitch here, it is that the Spark environment includes the MS dotnet

Re: Spark Kafka Integration

2022-02-25 Thread Sean Owen
That .jar is available on Maven, though typically you depend on it in your app, and compile an uber JAR which will contain it and all its dependencies. You can I suppose manage to compile an uber JAR from that dependency itself with tools if needed. On Fri, Feb 25, 2022 at 1:37 PM Michael

Re: Spark Kafka Integration

2022-02-25 Thread Mich Talebzadeh
What is the use case? Is this for spark structured streaming? HTH view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or

Spark Kafka Integration

2022-02-25 Thread Michael Williams (SSI)
After reviewing Spark's Kafka Integration guide, it indicates that spark-sql-kafka-0-10_2.12_3.2.1.jar and its dependencies are needed for Spark 3.2.1 (+ Scala 2.12) to work with Kafka. Can anybody clarify the cleanest, most repeatable (reliable) way to acquire these jars for including in a

Re: Non-Partition based Workload Distribution

2022-02-25 Thread Gourav Sengupta
Hi, not quite sure here, but can you please share your code? Regards, Gourav Sengupta On Thu, Feb 24, 2022 at 8:25 PM Artemis User wrote: > We got a Spark program that iterates through a while loop on the same > input DataFrame and produces different results per iteration. I see > through

Re: Structured Streaming + UDF - logic based on checking if a column is present in the Dataframe

2022-02-25 Thread Gourav Sengupta
Hi, can you please let us know the following: 1. the spark version 2. a few samples of input data 3. a few samples of what is the expected output that you want Regards, Gourav Sengupta On Wed, Feb 23, 2022 at 8:43 PM karan alang wrote: > Hello All, > > I'm using StructuredStreaming, and am