On Wed, Jul 18, 2018 at 08:05:51AM +0800, Qu Wenruo wrote: > No OOM triggers? That's a little strange. > Maybe it's related to how kernel handles memory over-commit? Yes, I think you are correct.
> And for the hang, I think it's related to some memory allocation failure > and error handler just didn't handle it well, so it's causing deadlock > for certain page. That indeed matches what I'm seeing. > ENOMEM handling is pretty common but hardly verified, so it's not that > strange, but we must locate the problem. I seem to be getting deadlocks in the kernel, so I'm hoping that at least it's checked there, but maybe not? > In my system, at least I'm not using btrfs as root fs, and for the > memory eating program I normally ensure it's eating all the memory + > swap, so OOM killer is always triggered, maybe that's the cause. > > So in your case, maybe it's btrfs not really taking up all memory, thus > OOM killer not triggered. Correct, the swap is not used. > Any kernel dmesg about OOM killer triggered? Nothing at all. It never gets triggered. > > Here is my system when it virtually died: > > ER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > > root 31006 21.2 90.7 29639020 29623180 pts/19 D+ 13:49 1:35 ./btrfs > > check /dev/mapper/dshelf2 See how btrs was taking 29GB in that ps output (that's before it takes everything and I can't even type ps anymore) Note that VSZ is almost equal to RSS. Nothing gets swapped. Then see free output: > > total used free shared buffers cached > > Mem: 32643788 32180100 463688 0 44664 119508 > > -/+ buffers/cache: 32015928 627860 > > Swap: 15616764 443676 15173088 > > For swap, it looks like only some other program's memory is swapped out, > not btrfs'. That's exactly correct. btrfs check never goes to swap, I'm not sure why, and because there is virtual memory free, maybe that's why OOM does not trigger? So I guess I can probably "fix" my problem by removing swap, but ultimately it would be useful to know why memory taken by btrfs check does not end up in swap. > And unfortunately, I'm not so familiar with OOM/MM code outside of > filesystem. > Any help from other experienced developers would definitely help to > solve why memory of 'btrfs check' is not swapped out or why OOM killer > is not triggered. Do you have someone from linux-vm you might be able to ask, or should we Cc this thread there? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html