Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Gourav Sengupta
Hi, May be the purpose of the article is different, but: instead of: sources (trail files) --> kafka --> flume --> write to cloud storage -->> SSS a much simpler solution is: sources (trail files) --> write to cloud storage -->> SSS Putting additional components and hops just does sound a bit

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Gabor Somogyi
Hi Karan, Plz have a look at the stackoverflow comment I've had 2 days ago G On Fri, 25 Feb 2022, 23:31 karan alang, wrote: > Hello All, > I'm running a StructuredStreaming program on GCP Dataproc, which reads > data from Kafka, does some processing and puts processed data back into > Kafka.

Re: Issue while creating spark app

2022-02-26 Thread Sean Owen
I don't think any of that is related, no. How are you dependencies set up? manually with IJ, or in a build file (Maven, Gradle)? Normally you do the latter and dependencies are taken care of for you, but you app would definitely have to express a dependency on Scala libs. On Sat, Feb 26, 2022 at

Re: Issue while creating spark app

2022-02-26 Thread Bitfox
Java SDK installed? On Sun, Feb 27, 2022 at 5:39 AM Sachit Murarka wrote: > Hello , > > Thanks for replying. I have installed Scala plugin in IntelliJ first then > also it's giving same error > > Cannot find project Scala library 2.12.12 for module SparkSimpleApp > > Thanks > Rajat > > On Sun,

Re: Issue while creating spark app

2022-02-26 Thread Sachit Murarka
Hello , Thanks for replying. I have installed Scala plugin in IntelliJ first then also it's giving same error Cannot find project Scala library 2.12.12 for module SparkSimpleApp Thanks Rajat On Sun, Feb 27, 2022, 00:52 Bitfox wrote: > You need to install scala first, the current version for

Re: Issue while creating spark app

2022-02-26 Thread Bitfox
You need to install scala first, the current version for spark is 2.12.15 I would suggest you install scala by sdk which works great. Thanks On Sun, Feb 27, 2022 at 12:10 AM rajat kumar wrote: > Hello Users, > > I am trying to create spark application using Scala(Intellij). > I have installed

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Mich Talebzadeh
Besides, is the structure of your checkpoint as in this article of mine? Processing Change Data Capture with Spark Structured Streaming Section on "The

Issue while creating spark app

2022-02-26 Thread rajat kumar
Hello Users, I am trying to create spark application using Scala(Intellij). I have installed Scala plugin in intelliJ still getting below error:- Cannot find project Scala library 2.12.12 for module SparkSimpleApp Could anyone please help what I am doing wrong? Thanks Rajat

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Gourav Sengupta
Hi, Can you please let us know: 1. the SPARK version, and the kind of streaming query that you are running? 2. whether you are using at least once, utmost once, or only once concepts? 3. any additional details that you can provide, regarding the storage duration in Kafka, etc? 4. are your running

Re: How to gracefully shutdown Spark Structured Streaming

2022-02-26 Thread Gourav Sengupta
Dear Mich, a super duper note of thanks, I had to spend around two weeks to figure this out :) Regards, Gourav Sengupta On Sat, Feb 26, 2022 at 10:43 AM Mich Talebzadeh wrote: > > > On Mon, 26 Apr 2021 at 10:21, Mich Talebzadeh > wrote: > >> >> Spark Structured Streaming AKA SSS is a very

Re: StructuredStreaming error - pyspark.sql.utils.StreamingQueryException: batch 44 doesn't exist

2022-02-26 Thread Mich Talebzadeh
Check the thread I forwarded on how to gracefully shutdown spark structured streaming HTH On Fri, 25 Feb 2022 at 22:31, karan alang wrote: > Hello All, > I'm running a StructuredStreaming program on GCP Dataproc, which reads > data from Kafka, does some processing and puts processed data back

Re: How to gracefully shutdown Spark Structured Streaming

2022-02-26 Thread Mich Talebzadeh
On Mon, 26 Apr 2021 at 10:21, Mich Talebzadeh wrote: > > Spark Structured Streaming AKA SSS is a very useful tool in dealing with > Event Driven Architecture. In an Event Driven Architecture, there is > generally a main loop that listens for events and then triggers a call-back > function when

can dataframe API deal with subquery

2022-02-26 Thread capitnfrakass
such as this table definition: desc people; +---+---+--+ | col_name | data_type | comment | +---+---+--+ | name | string| | | born | date