bharanic-dev commented on a change in pull request #13130:
URL: https://github.com/apache/pulsar/pull/13130#discussion_r763194732



##########
File path: 
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java
##########
@@ -317,7 +346,33 @@ public void invalidateAll() {
      */
     protected void execute(Runnable task, CompletableFuture<?> future) {
         try {
-            executor.execute(task);
+            // Wrap the original task, so we can record the thread on which it 
is running
+            TaskWrapper taskWrapper = new TaskWrapper(task);
+            executorWatchDog.execute(() -> {

Review comment:
       w.r.t your comment: "even if a write operation takes 30seconds, the 
thread wouldn't normally get blocked for such long time".
   
   Just want to clarify my own understanding. Today, the metadata-store 
callbacks are invoked by the call to execute method:
   
   
https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java#L318
   
   And the executor is a single-threaded executor. So, the callbacks can only 
be executed one at a time. Like you point out there could be 10s or 100s of 
such tasks.
   
   With my changes, the additional overhead is that the task being submitted to 
the watchdog thread, which in turn delegates to the metadata-store executor. 
So, there is a "wrapper task creation overhead" and additional latency. But 
that sholuld not be too bad, compared to the current code? The write operation 
could take as long as it does, but the watchdog executor is not blocked. It 
only gets blocked if the callback task gets blocked (which is the case in 
existing code as well). Is my understanding correct?
   
   And regarding this comment: "When you get the stack trace for the task that 
is blocked, you're only seeing the stack trace of a healthy code path, while 
you won't see the other threads that have caused the deadlock."
   
   While I can't comment on all possible code paths, the specific one in the 
issue 
   
   https://github.com/apache/pulsar/issues/12929
   
   The metadata-store thread stack trace is the most relevant one and that is 
the only one that gets logged in the code I added.
   
   If you still think that "submitting a dummy task periodically" is the way to 
go, I will go work on those changes. But please help clarify the above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to