+1 For current fix. Code is clean and understandable. I suppose that the current fix is a correct variant to update heartbeatTs.
вт, 20 июл. 2021 г. в 16:13, Alex Plehanov <plehanov.a...@gmail.com>: > Hello, Ilya > > > But anyway, I propose to remove the update of the heartbeat from other > threads altogether and wrap the call to listeners in a blockingSection. > I don't quite understand your proposal. Which call to listeners do you > mean? If we wrap the listener into the blocking section the result will be > the same. > Alternatively, I think we can wrap awaitPendingTasksFinished into the > blocking section, this will also solve the problem, but in this case we can > never detect lack of progress by async executor threads and checkpointer > thread hangs due to this reason. > What's wrong with the current fix? It solves the current problem and I hope > will protect us from problems like this in the future. > > вт, 20 июл. 2021 г. в 15:28, Ilya Kazakov <kazakov.i...@gmail.com>: > > > Hi Igniters, hi Alexey. > > > > I want to discuss this issue: > > https://issues.apache.org/jira/browse/IGNITE-15099. I have caught it > too. > > > > I was able to determine where there is a race. > > > > The update of the heartbeat happens asynchronously into the listener > code. > > But we always wait in the checkpoint thread for all pending async > > tasks. And this is reasonable. > > > > for (CheckpointListener lsnr : dbLsnrs) > > lsnr.beforeCheckpointBegin(ctx0); > > > > ctx0.awaitPendingTasksFinished(); > > > > The race was because of inappropriate order of future registration. In > > CheckpointContextImpl.executor () (inside listeners execution) > > > > GridFutureAdapter<?> res = new GridFutureAdapter<>(); > > res.listen(fut -> heartbeatUpdater.updateHeartbeat()); > > asyncRunner.execute(U.wrapIgniteFuture(cmd, res)); > > pendingTaskFuture.add(res); > > > > Here we create a task, submit a task to the executor, and only after this > > do we register the task. Thus we got a situation where checkpointer > thread > > was moving on after ctx0.awaitPendingTasksFinished(); and still, > > the unregistered asyncRunner task was moving on in parallel. > > > > But anyway, I propose to remove the update of the heartbeat from other > > threads altogether and wrap the call to listeners in a blockingSection. > > > > As I understand heartbeat was designed just to indicate self-progress by > a > > worker. If a worker can not indicate self-progress we should wrap such > code > > into blockingSections. In case of listeners, worker can not indicate > > self-progress, thus let's wrap it into blockingSection. > > > > Guys, what do you think about this? > > > > ------------- > > Ilya Kazakov > > > -- Sincerely yours, Ivan Daschinskiy