On 17.08.2023 14:57, Alexander Motin wrote:
On 15.08.2023 12:28, Dag-Erling Smørgrav wrote:
Mateusz Guzik <mjgu...@gmail.com> writes:
Going through the list may or may not reveal other threads doing
something in the area and it very well may be they are deadlocked,
which then results in other processes hanging on them.

Just like in your case the process reported as hung is a random victim
and whatever the real culprit is deeper.

We already know the real culprit, see upthread.

Dag, I looked through the thread once more, and, while thank you for tracing it, but you never went beyond txg_wait_synced() in `zfs revert` thread.  If you are saying that thread is holding the lock, then the question is why transaction commit is stuck.  I need to see stacks for ZFS sync threads, or better all kernel stacks, just in case.  Without that information I can only speculate.

Trying to run your test (so far without reproduction) I see it producing a substantial amount of ZIL writes.  The range of commits you reduced the scope to so far includes my ZIL locking refactoring, where I know for sure are some deadlocks.  I am already waiting for 3 weeks now for reviews and tests for PR that should fix it: https://github.com/openzfs/zfs/pull/15122 .  It would be good if you could test it, though it seems to depend on few more earlier patches not merged to FreeBSD yet.

Ah, appears on the pool I tested first I have sync=always from earlier tests, that explains the high amount of ZIL traffic I saw, so it may be irrelevant. But I still wonder what sync threads are doing in your case.

--
Alexander Motin

Reply via email to