You may want to check "where" the job is stuck via taking thread dump - it
could be in kafka consumer, in Spark codebase, etc. Without the information
it's hard to say.
On Thu, Apr 16, 2020 at 4:22 PM Ruijing Li wrote:
> Thanks Jungtaek, that makes sense.
>
> I tried Burak’s solution of just
Thanks Jungtaek, that makes sense.
I tried Burak’s solution of just turning failOnDataLoss to be false, but
instead of failing, the job is stuck. I’m guessing that the offsets are
being deleted faster than the job can process them and it will be stuck
unless I increase resources? Or does once the
I think Spark is trying to ensure that it reads the input "continuously"
without any missing. Technically it may be valid to say the situation is a
kind of "data-loss", as the query couldn't process the offsets which are
being thrown out, and owner of the query needs to be careful as it affects
I see, I wasn’t sure if that would work as expected. The docs seems to
suggest to be careful before turning off that option, and I’m not sure why
failOnDataLoss is true by default.
On Tue, Apr 14, 2020 at 5:16 PM Burak Yavuz wrote:
> Just set `failOnDataLoss=false` as an option in readStream?
>
Just set `failOnDataLoss=false` as an option in readStream?
On Tue, Apr 14, 2020 at 4:33 PM Ruijing Li wrote:
> Hi all,
>
> I have a spark structured streaming app that is consuming from a kafka
> topic with retention set up. Sometimes I face an issue where my query has
> not finished
Hi all,
I have a spark structured streaming app that is consuming from a kafka
topic with retention set up. Sometimes I face an issue where my query has
not finished processing a message but the retention kicks in and deletes
the offset, which since I use the default setting of