apurtell edited a comment on pull request #2574: URL: https://github.com/apache/hbase/pull/2574#issuecomment-718875966
@Apache9 I am having trouble responding to your comment above. > After reading this discussion several times, I think the reason we do not want to niterrupte a WAL sync is that it may lead to a region server abort? > I would say this is not the case here. I checked the code again, the actual sync is done in the disruptor thread, in the rpc thread we just block on a SyncFuture(as Andrew mentioned above), the interruption on the rpc thread will just lead to an IOException tp client, the actual sync operation will not be interrupted so we are safe. > So I do not think we need to disable interrupts here? You are correct about the SyncFuture. Initially my thinking was the same as yours. At some point I became concerned about this case, though: - We start the WAL append - SyncFuture is interrupted, client gets an exception, client thinks the mutation failed - WALedit is actually applied to the WAL by the disruptor, so the mutation is included in the WAL, and so we are at risk of the failed from client perspective mutation being applied during WAL replay for some reason So we have a case where the client's understanding of what happened is incorrect. What happens in the WAL and the client's understanding of what happend should be in sync. If the WAL commit fails the client should get an exception. If the WAL commit succeeds the client should see success. Whether we interrupt or not is our option. disableInterrupt/enableInterrupt here is consideration for the client. I can add the above as a code comment next to the disableInterrupt() call, would that help? Here is another case of concern: - We start the WAL append - SyncFuture is interrupted, memstore is not updated, client gets an exception. Unless the WAL is replayed the local cluster A will not have the mutation applied. - WALedit is actually applied to the WAL by the disruptor, so the mutation is included in the WAL, and so it is shipped to the remote cluster, and is applied at the remote cluster B. - Now the data in cluster A and B are out of sync. The disableInterrupt/enableInterrupt pair combines the WAL append and memstore update into a single protected operation with respect to this proposed change. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org