gianm commented on PR #18766:
URL: https://github.com/apache/druid/pull/18766#issuecomment-3722064540
I believe the Docker test failure is due to a new race in
`IngestionSmokeTest#test_streamLogs_ofCancelledTask`. This code now fails
because the `streamOptional` is empty:
```java
eventCollector.latchableEmitter().waitForEvent(
event -> event.hasMetricName("task/run/time")
.hasDimension(DruidMetrics.TASK_ID, taskId)
.hasDimension(DruidMetrics.TASK_STATUS, "FAILED")
);
final Optional<InputStream> streamOptional =
overlord.bindings()
.getInstance(TaskLogStreamer.class)
.streamTaskLog(taskId, 0);
Assertions.assertTrue(streamOptional.isPresent());
```
It happens because, with this patch, the `task/run/time` metric is emitted
in `notifyStatus`, after the metadata store is updated, the task runner is
notified to stop the task, and locks are released. For a canceled task, this
happens before it actually stops running. Formerly, the metric in this scenario
would be emitted when the task runner reports the task as finished and fires
its callback. So, now there's a window between when `task/run/time` is emitted
and when the log is written to S3.
I believe this little window where logs are not available shortly after
tasks are canceled has always existed. I have seen it myself in production:
sometimes you get a 404 on the task log for a recently-canceled task. But now
it's visible to the test due to the timing of metric emission.
I do believe that emitting metrics in `notifyStatus`, as this patch does, is
better than what we were doing before. `notifyStatus` is the method that
updates the metadata store, which is canonical, so tying metrics to that makes
the metrics more accurate. (That was the bug this patch is fixing.)
@kfaraz I'm wondering if you have a better idea to deal with the timing
issue, beyond adding a sleep here. That's the best thing I can think of right
now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]