+1 For current fix. Code is clean and understandable. I suppose that the
current fix is a correct variant to update heartbeatTs.

вт, 20 июл. 2021 г. в 16:13, Alex Plehanov <plehanov.a...@gmail.com>:

> Hello, Ilya
>
> > But anyway, I propose to remove the update of the heartbeat from other
> threads altogether and wrap the call to listeners in a blockingSection.
> I don't quite understand your proposal. Which call to listeners do you
> mean? If we wrap the listener into the blocking section the result will be
> the same.
> Alternatively, I think we can wrap awaitPendingTasksFinished into the
> blocking section, this will also solve the problem, but in this case we can
> never detect lack of progress by async executor threads and checkpointer
> thread hangs due to this reason.
> What's wrong with the current fix? It solves the current problem and I hope
> will protect us from problems like this in the future.
>
> вт, 20 июл. 2021 г. в 15:28, Ilya Kazakov <kazakov.i...@gmail.com>:
>
> > Hi Igniters, hi Alexey.
> >
> > I want to discuss this issue:
> > https://issues.apache.org/jira/browse/IGNITE-15099. I have caught it
> too.
> >
> > I was able to determine where there is a race.
> >
> > The update of the heartbeat happens asynchronously into the listener
> code.
> > But we always wait in the checkpoint thread for all pending async
> > tasks. And this is reasonable.
> >
> > for (CheckpointListener lsnr : dbLsnrs)
> >   lsnr.beforeCheckpointBegin(ctx0);
> >
> > ctx0.awaitPendingTasksFinished();
> >
> > The race was because of inappropriate order of future registration. In
> > CheckpointContextImpl.executor () (inside listeners execution)
> >
> > GridFutureAdapter<?> res = new GridFutureAdapter<>();
> > res.listen(fut -> heartbeatUpdater.updateHeartbeat());
> > asyncRunner.execute(U.wrapIgniteFuture(cmd, res));
> > pendingTaskFuture.add(res);
> >
> > Here we create a task, submit a task to the executor, and only after this
> > do we register the task. Thus we got a situation where checkpointer
> thread
> > was moving on after ctx0.awaitPendingTasksFinished(); and still,
> > the unregistered asyncRunner task was moving on in parallel.
> >
> > But anyway, I propose to remove the update of the heartbeat from other
> > threads altogether and wrap the call to listeners in a blockingSection.
> >
> > As I understand heartbeat was designed just to indicate self-progress by
> a
> > worker. If a worker can not indicate self-progress we should wrap such
> code
> > into blockingSections. In case of listeners, worker can not indicate
> > self-progress, thus let's wrap it into blockingSection.
> >
> > Guys, what do you think about this?
> >
> > -------------
> > Ilya Kazakov
> >
>


-- 
Sincerely yours, Ivan Daschinskiy

Reply via email to