Hi Ramya, Unfortunately your images are blocked. Could you upload them somewhere and post the links here? Also I think that the TaskManager logs may be able to help a bit more. Could you please provide them here?
Cheers, Kostas On Tue, Sep 22, 2020 at 8:58 AM Ramya Ramamurthy <hair...@gmail.com> wrote: > Hi, > > We are seeing an issue with Flink on our production. The version is 1.7 > which we use. > We started seeing sudden lag on kafka, and the consumers were no longer > working/accepting messages. On trying to enable debug mode, the below > errors were seen > [image: image.jpeg] > > I am not sure why this occurs everyday and when this happens, I can see > the remaining workers arent able to handle the load. Unless i restart my > jobs, i am unable to start processing again. This way, there is data loss > as well. > > On the below graph, there is a slight dip in consumption before 5:30. That > is when this incident happens and correlated with logs. > > [image: image.jpeg] > > Any pointers/suggestions would be appreciated. > > Thanks. > >