Hi all,

I have a spark structured streaming app that is consuming from a kafka
topic with retention set up. Sometimes I face an issue where my query has
not finished processing a message but the retention kicks in and deletes
the offset, which since I use the default setting of “failOnDataLoss=true”
causes my query to fail. The solution I currently have is manual, deleting
the offsets directory and rerunning.

I instead like to have spark automatically fall back to the earliest offset
available. The solutions I saw recommend setting auto.offset = earliest,
but for structured streaming, you cannot set that. How do I do this for
structured streaming?

Thanks!
-- 
Cheers,
Ruijing Li

Reply via email to