On 07/03/2026 13:01, Michael Kelly wrote:
On 07/03/2026 11:21, Samuel Thibault wrote:
Locked? Then that's the issue. Having a lock while waiting for memory
would indeed be a sure path to deadlock. pmap_enter notably releases its
PVH and pmap precisely to avoid such deadlocks.

I however didn't see where vm_fault/pmap_enter locks the map?

I made a mistake with that conclusion as I can't see how it could have the map locked now either. It does seem like some thread has possession of the lock though as there are 9 threads awaiting a write lock and 1 for a read.  I can't prove that they are all waiting for the same lock, although it seems likely, and I should have tried to find that out at the time. Similarly it might have also helped to record the bit flags associated with the lock.

Anyway, I can run it again to try and supply more detail.

I made more than one mistake with the initial analysis. It is so far off I wonder if I was even examining the correct virtual machine. Please ignore everything that has gone before; I must have been hallucinating too. Here is actually what is happening.

During sbuilds of haskell packages there are dependent packages installed that have a large installed size (ghc-doc for example is ~700M). Often during the write of this data, the system seems to enter a blocked state. Normal page allocation is suspended and so non-vm privileged tasks, including ext2fs servers, soon get blocked if they require more memory. Any process accessing file storage is also likely to block on pagein from the stalled servers so even the console becomes unresponsive.

The system is not actually totally stuck. Pageout processing continues at a low level. There is no default pager running so only external pages can considered for pageout. Appropriate memory_object_data_return requests are issued to external pagers at the rate of approximately 100 per second. The CPU load is so low that the virtual machine 'CPU usage' graph superficially looks like it is zero. None of these m_o_d_r messages can be handled and actually free pages steadily decline.

I added some debugging to log every 100th pageout attempt from when vm_page_alloc_paused becomes set. In one example, free pages steadily drop from ~67500 to about ~32000 over a period of ~22minutes. Then suddenly the pageout processing comes across a large series of pages (~38000) that can be trivially reclaimed which are sufficient to terminate the pageout activity and resume normal page allocation. The system becomes usable again.

Might it be that boralus is also behaving this way without it being noticed? The use of sync=5 might reduce the likelihood of this occurring, I'd guess, but I have also seen this scenario occur using sync=5 myself.

The fundamental problem is that ext2fs, being unprivileged, cannot allocate memory in order to allow other memory to be released. This is well known, I believe, but we need to do something to reduce the likelihood of this scenario as there could be cases that would result in the system not recovering. For example, if internal memory usage was dominant and a large write quickly used the remaining pages (before unprivileged allocation is suspended) and before sync could process the written pages, there might be too few pages available to page out at all.

Regards,

Mike.


Reply via email to