bharanic-dev commented on a change in pull request #13130: URL: https://github.com/apache/pulsar/pull/13130#discussion_r763194732
########## File path: pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java ########## @@ -317,7 +346,33 @@ public void invalidateAll() { */ protected void execute(Runnable task, CompletableFuture<?> future) { try { - executor.execute(task); + // Wrap the original task, so we can record the thread on which it is running + TaskWrapper taskWrapper = new TaskWrapper(task); + executorWatchDog.execute(() -> { Review comment: w.r.t your comment: "even if a write operation takes 30seconds, the thread wouldn't normally get blocked for such long time". Just want to clarify my own understanding. Today, the metadata-store callbacks are invoked by the call to execute method: https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java#L318 And the executor is a single-threaded executor. So, the callbacks can only be executed one at a time. Like you point out there could be 10s or 100s of such tasks. With my changes, the additional overhead is that the task being submitted to the watchdog thread, which in turn delegates to the metadata-store executor. So, there is a "wrapper task creation overhead" and additional latency. But that sholuld not be too bad, compared to the current code? The write operation could take as long as it does, but the watchdog executor is not blocked. It only gets blocked if the callback task gets blocked (which is the case in existing code as well). Is my understanding correct? And regarding this comment: "When you get the stack trace for the task that is blocked, you're only seeing the stack trace of a healthy code path, while you won't see the other threads that have caused the deadlock." While I can't comment on all possible code paths, the specific one in the issue https://github.com/apache/pulsar/issues/12929 The metadata-store thread stack trace is the most relevant one and that is the only one that gets logged in the code I added. If you still think that "submitting a dummy task periodically" is the way to go, I will go work on those changes. But please help clarify the above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org