Just set `failOnDataLoss=false` as an option in readStream? On Tue, Apr 14, 2020 at 4:33 PM Ruijing Li <liruijin...@gmail.com> wrote:
> Hi all, > > I have a spark structured streaming app that is consuming from a kafka > topic with retention set up. Sometimes I face an issue where my query has > not finished processing a message but the retention kicks in and deletes > the offset, which since I use the default setting of “failOnDataLoss=true” > causes my query to fail. The solution I currently have is manual, deleting > the offsets directory and rerunning. > > I instead like to have spark automatically fall back to the earliest > offset available. The solutions I saw recommend setting auto.offset = > earliest, but for structured streaming, you cannot set that. How do I do > this for structured streaming? > > Thanks! > -- > Cheers, > Ruijing Li >