On Mon, May 30, 2016 at 7:48 PM, Chris Johnson <hittingsm...@gmail.com> wrote:
> I have a RAID6 array that had a failed HDD. The drive failed
> completely and has been removed from the system. I'm running a 'device
> replace' operation with a new disk. The array is ~20TB so this will
> take a few days.
>
> Yesterday the system crashed hard with OOM errors about 24 hours into
> the replace. Rebooting after the crash and remounting the array
> automatically resumed the replace where it left off.
>
> Today I kept a close eye on it and have watched the memory usage creep
> up slowly.
>
> htop says this is user process memory (green bar) but shows no user
> processes using this much memory
>
> free says this is almost entirely cached/buffered memory that is
> taking up the space.
>
> slabtop reveals that there is a highly unusual amount of SLAB going to
> 'bio' which has to do with block allocation apparently. slabtop output
> is attached.
>
> 'sync && echo 3 > /proc/sys/vm/drop_caches' clears the high usage
> (~4GB) from dentry but 'bio' does not release any (11GB) memory and
> continues to grow slowly.

Probably you are experiencing a leak that was recently fixed and, at
the moment, available only in the 4.7-rc1 kernel:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4673272f43ae790ab9ec04e38a7542f82bb8f020

>
> This is running the Rockstor distro based on CentOS. The system has 16GB of 
> RAM.
>
> Kernel: 4.4.5-1.el7.elrepo.x86_64
> btrfs-progs: 4.4.1
>
> Kernel messages aren't showing anything of note during the replace
> until it starts throwing out OOM errors.
>
> I would like to collect enough information for a useful bug report
> here, but I also can't babysit this rebuild during the work week and
> reboot it once a day for OOM crashes. Should I cancel the replace
> operation and use 'dev delete missing' instead? Will using 'delete
> missing' cause any problem if it's done after a partially completed
> and canceled replace?



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to