Re: Only one Active task in Spark Structured Streaming application

2021-01-21 Thread Lalwani, Jayesh
If you are going aggregations, you need to watermark the data. Depending on what aggrgations you are doing, state might keep accumulating till failure. From: Eric Beabes Date: Thursday, January 21, 2021 at 12:19 PM To: Sean Owen Cc: spark-user Subject: RE: [EXTERNAL] Only one Active task

Re: Only one Active task in Spark Structured Streaming application

2021-01-21 Thread Eric Beabes
Yes. For this particular use case the state size could be big but I doubt if there's a leak. Maybe adding more memory would help. On Thu, Jan 21, 2021 at 5:55 PM Sean Owen wrote: > Is your app accumulating a lot of streaming state? that's one reason > something could slow down after a long

Re: Only one Active task in Spark Structured Streaming application

2021-01-21 Thread Sean Owen
Is your app accumulating a lot of streaming state? that's one reason something could slow down after a long time. Some memory leak in your app putting GC/memory pressure on the JVM, etc too. On Thu, Jan 21, 2021 at 5:13 AM Eric Beabes wrote: > Hello, > > My Spark Structured Streaming

Re: Only one Active task in Spark Structured Streaming application

2021-01-21 Thread Eric Beabes
I see a lot of messages such as this in the Driver log even though this is not the first batch. Job has been running for more than 3 days Jan 21, 2021 @ 17:09:42.48421/01/21 11:39:34 WARN state.HDFSBackedStateStoreProvider: The state for version 43405 doesn't exist in loadedMaps. Reading

Re: Only one Active task in Spark Structured Streaming application

2021-01-21 Thread Jungtaek Lim
I'm not sure how many people could even guess possible reasons - I'd say there's not enough information. No driver/executor logs, no job/stage/executor information, no code. On Thu, Jan 21, 2021 at 8:25 PM Jacek Laskowski wrote: > Hi, > > I'd look at stages and jobs as it's possible that the

Re: Only one Active task in Spark Structured Streaming application

2021-01-21 Thread Jacek Laskowski
Hi, I'd look at stages and jobs as it's possible that the only task running is the missing one in a stage of a job. Just guessing... Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books Follow me on