Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-03-01 Thread Mich Talebzadeh
I checked this process of gracefully terminating the topic when the flag is set to terminate the topic. In this case the topic is called md => market data. The first two batches and then you set the termination flag on Topic market data => md, batchId is 236, at 2022-03-01 20:52:00.099259

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-28 Thread Gourav Sengupta
Hi Karan, If you are running at least once operation, then you can restart the failed job with a new checkpoint area, and you will end up with duplicates in your target but the job will run fine. Since you are using stateful operations, if your keys are large to manage in a state try to use

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-27 Thread karan alang
Hi Gourav, Pls see my responses below : Can you please let us know: 1. the SPARK version, and the kind of streaming query that you are running? KA : Apache Spark 3.1.2 - on Dataproc using Ubunto 18.04 (the highest Spark version supported on dataproc is 3.1.2) , 2. whether you are using at

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-27 Thread karan alang
Hi Mich, thnx .. i'll check the thread you forwarded, and revert back. regds, Karan Alang On Sat, Feb 26, 2022 at 2:44 AM Mich Talebzadeh wrote: > Check the thread I forwarded on how to gracefully shutdown spark > structured streaming > > HTH > > On Fri, 25 Feb 2022 at 22:31, karan alang

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-27 Thread karan alang
Hi Gabor, i just responded to your comment on stackoverflow. regds, Karan Alang On Sat, Feb 26, 2022 at 3:06 PM Gabor Somogyi wrote: > Hi Karan, > > Plz have a look at the stackoverflow comment I've had 2 days ago > > G > > On Fri, 25 Feb 2022, 23:31 karan alang, wrote: > >> Hello All, >>

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Gourav Sengupta
Hi, May be the purpose of the article is different, but: instead of: sources (trail files) --> kafka --> flume --> write to cloud storage -->> SSS a much simpler solution is: sources (trail files) --> write to cloud storage -->> SSS Putting additional components and hops just does sound a bit

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Gabor Somogyi
Hi Karan, Plz have a look at the stackoverflow comment I've had 2 days ago G On Fri, 25 Feb 2022, 23:31 karan alang, wrote: > Hello All, > I'm running a StructuredStreaming program on GCP Dataproc, which reads > data from Kafka, does some processing and puts processed data back into > Kafka.

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Mich Talebzadeh
Besides, is the structure of your checkpoint as in this article of mine? Processing Change Data Capture with Spark Structured Streaming Section on "The

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Gourav Sengupta
Hi, Can you please let us know: 1. the SPARK version, and the kind of streaming query that you are running? 2. whether you are using at least once, utmost once, or only once concepts? 3. any additional details that you can provide, regarding the storage duration in Kafka, etc? 4. are your running

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Mich Talebzadeh
Check the thread I forwarded on how to gracefully shutdown spark structured streaming HTH On Fri, 25 Feb 2022 at 22:31, karan alang wrote: > Hello All, > I'm running a StructuredStreaming program on GCP Dataproc, which reads > data from Kafka, does some processing and puts processed data back

StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-25 Thread karan alang
Hello All, I'm running a StructuredStreaming program on GCP Dataproc, which reads data from Kafka, does some processing and puts processed data back into Kafka. The program was running fine, when I killed it (to make minor changes), and then re-started it. It is giving me the error -