Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) with check/repair/sync
Hello again... So I spent all weekend doing further tests, since this issue is really bugging me for obvious reasons. I thought it would be beneficial if I created a bug report that summarized and centralized everything in one place rather than having everything spread across several lists and posts. Here the bug report I created: https://bugzilla.kernel.org/show_bug.cgi?id=135481 If anyone has any suggestions, ideas or wants me to do further tests, please just let me know. There is not much more I can do at this point without further help/guidance. Thanks, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu services: custom software [desktop, mobile, web], server administration -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
On 2016/07/13 22:47, Michal Hocko wrote: > On Wed 13-07-16 15:18:11, Matthias Dahl wrote: >> I tried to figure this out myself but >> couldn't find anything -- what does the number "-3" state? It is the >> position in some chain or has it a different meaning? > > $ git grep "kmem_cache_create.*bio" > block/bio-integrity.c: bip_slab = kmem_cache_create("bio_integrity_payload", > > so there doesn't seem to be any cache like that in the vanilla kernel. > It is snprintf(bslab->name, sizeof(bslab->name), "bio-%d", entry); line in bio_find_or_create_slab() in block/bio.c. I think you can identify who is creating it by printing backtrace at that line. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) with check/repair/sync
Hello... I am rather persistent (stubborn?) when it comes to tracking down bugs, if somehow possible... and it seems it paid off... somewhat. ;-) So I did quite a lot more further tests and came up with something very interesting: As long as the RAID is in sync (as-in: sync_action=idle), I can not for the life of me trigger this issue -- the used memory still explodes to most of the RAM but it oscillates back and forth. I did very stupid things to stress the machine while dd was running as usual on the dm-crypt device. I opened a second dd instance with the same parameters on the dm-crypt device. I wrote a simple program that allocated random amounts of memory (up to 10 GiB), memset them and after a random amount of time released it again -- in a continuous loop. I put heavy network stress on the machine... whatever I could think of. No matter what, the issue did not trigger. And I repeated said tests quite a few times over extended time periods (usually an hour or so). Everything worked beautifully with nice speeds and no noticeable system slow-downs/lag. As soon as I issued a "check" to sync_action of the RAID device, it was just a matter of a second until the OOM killer kicked in and all hell broke loose again. And basically all of my tests where done while the RAID device was syncing -- due to a very unfortunate series of events. I tried to repeat that same test with an external (USB3) connected disk with a Linux s/w RAID10 over two partitions... but unfortunately that behaves rather differently. I assume it is because it is connected through USB and not SATA. While doing those tests on my RAID10 with the 4 internal SATA3 disks, you can see w/ free that the "used memory" does explode to most of the RAM and then oscillates back and forth. With the same test on the external disk through, that does not happen at all. The used memory stays pretty much constant and only the buffers vary... but most of the memory is still free in that case. I hope my persistence on the matter is not annoying and finally leads us somewhere where the real issue hides. Any suggestions, opinions and ideas are greatly appreciated as I have pretty much exhausted mine at this time. Last but not least: I switched my testing to a OpenSuSE Tumbleweed Live system (x86_64 w/ kernel 4.6.3) as Rawhide w/ 4.7.0rcX behaves rather strangely and unstable at times. Thanks, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu services: custom software [desktop, mobile, web], server administration -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
Hello Ondrej... On 2016-07-13 18:24, Ondrej Kozina wrote: One step after another. Sorry, it was not meant to be rude or anything... more frustration since I cannot be of more help and I really would like to jump in head-first and help fixing it... but lack the necessary insight into the kernel internals. But who knows, I started reading Robert Love's book... so, in a good decade or so. ;-)) https://marc.info/?l=linux-mm&m=146825178722612&w=2 Thanks for that link. I have to read those more closely tomorrow, since there are some nice insights into dm-crypt there. :) Still, you have to admit, it is also rather frustrating/scary if such a crucial subsystem can have bugs over several major versions that do result in complete hangs (and can thus cause corruption) and are quite easily triggerable. It does not instill too much confidence that said subsystem is so intensively used/tested after all. That's at least how I feel about it... So long, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu services: custom software [desktop, mobile, web], server administration -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
On 07/13/2016 05:32 PM, Matthias Dahl wrote: No matter what, I have no clue how to further diagnose this issue. And given that I already had unsolvable issues with dm-crypt a couple of months ago with my old machine where the system simply hang itself or went OOM when the swap was encrypted and just a few kilobytes needed to be swapped out, I am not so sure anymore I can trust dm-crypt with a full disk encryption to the point where I feel "safe"... as-in, nothing bad will happen or the system won't suddenly hang itself due to it. Or if a bug is introduced, that it will actually be possible to diagnose it and help fix it or that it will even be eventually fixed. Which is really a pity, since I would really have liked to help solve this. With the swap issue, I did git bisects, tests, narrowed it down to kernel versions when said bug was introduced... but in the end, the bug is still present as far as I know. :( One step after another. Mathias, your original report was not forgotten, it's just not so easy to find the real culprit and fix it without causing yet another regression. See the https://marc.info/?l=linux-mm&m=146825178722612&w=2 thread... Not to mention that on current 4.7-rc7 kernels it behaves yet slightly differently (yet far from ideally). Regards O. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
Hello... On 2016-07-13 15:47, Michal Hocko wrote: This is getting out of my area of expertise so I am not sure I can help you much more, I am afraid. That's okay. Thank you so much for investing the time. For what it is worth, I did some further tests and here is what I came up with: If I create the plain dm-crypt device with --perf-submit_from_crypt_cpus, I can run the tests for as long as I want but the memory problem never occurs, meaning buffer/cache increase accordingly and thus free memory decreases but used mem stays pretty constant low. Yet the problem here is, the system becomes sluggish and throughput is severely impacted. ksoftirqd is hovering at 100% the whole time. Somehow my guess is that normally dm-crypt simply takes every request, encrypts it and queues it internally by itself. And that queue is then slowly emptied to the underlying device kernel queue. That is why I am seeing the exploding increase in used memory (rather than in buffer/cache) which in the end causes a OOM situation. But that is just my guess. And IMHO that is not the right thing to do (tm), as can be seen in this case. No matter what, I have no clue how to further diagnose this issue. And given that I already had unsolvable issues with dm-crypt a couple of months ago with my old machine where the system simply hang itself or went OOM when the swap was encrypted and just a few kilobytes needed to be swapped out, I am not so sure anymore I can trust dm-crypt with a full disk encryption to the point where I feel "safe"... as-in, nothing bad will happen or the system won't suddenly hang itself due to it. Or if a bug is introduced, that it will actually be possible to diagnose it and help fix it or that it will even be eventually fixed. Which is really a pity, since I would really have liked to help solve this. With the swap issue, I did git bisects, tests, narrowed it down to kernel versions when said bug was introduced... but in the end, the bug is still present as far as I know. :( I will probably look again into ext4 fs encryption. My whole point is just that in case any of disks go faulty and needs to be replaced or sent in for warranty, I don't have to worry about mails, personal or business data still being left on the device (e.g. if it is no longer accessible or has reallocated sectors or whatever) in a readable form. Oh well. Pity, really. Thanks again, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu services: custom software [desktop, mobile, web], server administration -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
On Wed 13-07-16 15:18:11, Matthias Dahl wrote: > Hello Michal, > > many thanks for all your time and help on this issue. It is very much > appreciated and I hope we can track this down somehow. > > On 2016-07-13 14:18, Michal Hocko wrote: > > > So it seems we are accumulating bios and 256B objects. Buffer heads as > > well but so much. Having over 4G worth of bios sounds really suspicious. > > Note that they pin pages to be written so this might be consuming the > > rest of the unaccounted memory! So the main question is why those bios > > do not get dispatched or finished. > > Ok. It is the Block IOs that do not get completed. I do get it right > that those bio-3 are already the encrypted data that should be written > out but do not for some reason? Hard to tell. Maybe they are just allocated and waiting for encryption. But this is just a wild guessing. > I tried to figure this out myself but > couldn't find anything -- what does the number "-3" state? It is the > position in some chain or has it a different meaning? $ git grep "kmem_cache_create.*bio" block/bio-integrity.c: bip_slab = kmem_cache_create("bio_integrity_payload", so there doesn't seem to be any cache like that in the vanilla kernel. > Do you think a trace like you mentioned would help shed some more light > on this? Or would you recommend something else? Dunno. Seeing who is allocating those bios might be helpful but it won't tell much about what has happened to them after allocation. The tracing would be more helpful for a mem leak situation which doesn't seem to be the case here. This is getting out of my area of expertise so I am not sure I can help you much more, I am afraid. -- Michal Hocko SUSE Labs -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
Hello Michal, many thanks for all your time and help on this issue. It is very much appreciated and I hope we can track this down somehow. On 2016-07-13 14:18, Michal Hocko wrote: So it seems we are accumulating bios and 256B objects. Buffer heads as well but so much. Having over 4G worth of bios sounds really suspicious. Note that they pin pages to be written so this might be consuming the rest of the unaccounted memory! So the main question is why those bios do not get dispatched or finished. Ok. It is the Block IOs that do not get completed. I do get it right that those bio-3 are already the encrypted data that should be written out but do not for some reason? I tried to figure this out myself but couldn't find anything -- what does the number "-3" state? It is the position in some chain or has it a different meaning? Do you think a trace like you mentioned would help shed some more light on this? Or would you recommend something else? I have also cc' Mike Snitzer who commented on this issue before, maybe he can see some pattern here as well. Pity that Neil Brown is no longer available as I think this is also somehow related to it being a Intel Rapid Storage RAID10... since it is the only way I can reproduce it. :( Thanks, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu services: custom software [desktop, mobile, web], server administration -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
On Wed 13-07-16 13:21:26, Michal Hocko wrote: > On Tue 12-07-16 16:56:32, Matthias Dahl wrote: [...] > > If that support is baked into the Fedora provided kernel that is. If > > you could give me a few hints or pointers, how to properly do a allocator > > trace point and get some decent data out of it, that would be nice. > > You need to have a kernel with CONFIG_TRACEPOINTS and then enable them > via debugfs. You are interested in kmalloc tracepoint and specify a size > as a filter to only see those that are really interesting. I haven't > checked your slabinfo yet - hope to get to it later today. The largest contributors seem to be $ zcat slabinfo.txt.gz | awk '{printf "%s %d\n" , $1, $6*$15}' | head_and_tail.sh 133 | paste-with-diff.sh | sort -n -k3 initial diff [#pages] radix_tree_node 34442592 debug_objects_cache 388 46159 file_lock_ctx 114 138570 buffer_head 5616238704 kmalloc-256 328 573164 bio-3 24 1118984 So it seems we are accumulating bios and 256B objects. Buffer heads as well but so much. Having over 4G worth of bios sounds really suspicious. Note that they pin pages to be written so this might be consuming the rest of the unaccounted memory! So the main question is why those bios do not get dispatched or finished. -- Michal Hocko SUSE Labs -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
On Tue 12-07-16 16:56:32, Matthias Dahl wrote: > Hello Michal... > > On 2016-07-12 16:07, Michal Hocko wrote: > > > /proc/slabinfo could at least point on who is eating that memory. > > Thanks. I have made another test (and thus again put the RAID10 out of > sync for the 100th time, sigh) and made regular snapshots of slabinfo > which I have attached to this mail. > > > Direct IO doesn't get throttled like buffered IO. > > Is buffered i/o not used in both cases if I don't explicitly request > direct i/o? > > dd if=/dev/zero /dev/md126p5 bs=512K > and dd if=/dev/zero /dev/mapper/test-device bs=512K OK, I misunderstood your question though. You were mentioning the direct IO earlier so I thought you were referring to it here as well. > Given that the test-device is dm-crypt on md125p5. Aren't both using > buffered i/o? Yes they are. > > the number of pages under writeback was more or less same throughout > > the time but there are some local fluctuations when some pages do get > > completed. > > The pages under writeback are those directly destined for the disk, so > after dm-crypt had done its encryption? Those are submitted for the IO. dm-crypt will allocate a "shadow" page for each of them to perform the encryption and only then submit the IO to the storage underneath see http://lkml.kernel.org/r/alpine.lrh.2.02.1607121907160.24...@file01.intranet.prod.int.rdu2.redhat.com > > If not you can enable allocator trace point for a particular object > > size (or range of sizes) and see who is requesting them. > > If that support is baked into the Fedora provided kernel that is. If > you could give me a few hints or pointers, how to properly do a allocator > trace point and get some decent data out of it, that would be nice. You need to have a kernel with CONFIG_TRACEPOINTS and then enable them via debugfs. You are interested in kmalloc tracepoint and specify a size as a filter to only see those that are really interesting. I haven't checked your slabinfo yet - hope to get to it later today. -- Michal Hocko SUSE Labs -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
Hello Michal... On 2016-07-12 16:07, Michal Hocko wrote: /proc/slabinfo could at least point on who is eating that memory. Thanks. I have made another test (and thus again put the RAID10 out of sync for the 100th time, sigh) and made regular snapshots of slabinfo which I have attached to this mail. Direct IO doesn't get throttled like buffered IO. Is buffered i/o not used in both cases if I don't explicitly request direct i/o? dd if=/dev/zero /dev/md126p5 bs=512K and dd if=/dev/zero /dev/mapper/test-device bs=512K Given that the test-device is dm-crypt on md125p5. Aren't both using buffered i/o? the number of pages under writeback was more or less same throughout the time but there are some local fluctuations when some pages do get completed. The pages under writeback are those directly destined for the disk, so after dm-crypt had done its encryption? If not you can enable allocator trace point for a particular object size (or range of sizes) and see who is requesting them. If that support is baked into the Fedora provided kernel that is. If you could give me a few hints or pointers, how to properly do a allocator trace point and get some decent data out of it, that would be nice. Thanks, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu services: custom software [desktop, mobile, web], server administration slabinfo.txt.gz Description: GNU Zip compressed data -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
On Tue 12-07-16 14:42:12, Matthias Dahl wrote: > Hello Michal... > > On 2016-07-12 13:49, Michal Hocko wrote: > > > I am not a storage expert (not even mention dm-crypt). But what those > > counters say is that the IO completion doesn't trigger so the > > PageWriteback flag is still set. Such a page is not reclaimable > > obviously. So I would check the IO delivery path and focus on the > > potential dm-crypt involvement if you suspect this is a contributing > > factor. > > Sounds reasonable... except that I have no clue how to trace that with > the limited means I have at my disposal right now and with the limited > knowledge I have of the kernel internals. ;-) I guess dm resp. block layer experts would be much better to advise here. > > Who is consuming those objects? Where is the rest 70% of memory hiding? > > Is there any way to get a more detailed listing of where the memory is > spent while dd is running? Something I could pipe every 500ms or so for > later analysis or so? /proc/slabinfo could at least point on who is eating that memory. > > Writer will get throttled but the concurrent memory consumer will not > > normally. So you can end up in this situation. > > Hm, okay. I am still confused though: If I, for example, let dd do the > exact same thing on a raw partition on the RAID10, nothing like that > happens. Wouldn't we have the same race and problem then too...? Direct IO doesn't get throttled like buffered IO. > It is > only with dm-crypt in-between that all of this shows itself. But I do > somehow suspect the RAID10 Intel Rapid Storage to be the cause or at > least partially. Well, there are many allocation failures for GFP_ATOMIC requests from scsi_request_fn path. AFAIU the code, the request is deferred and retried later. I cannot find any path which would do regular __GFP_DIRECT_RECLAIM fallback allocation. So while GFP_ATOMIC is ___GFP_KSWAPD_RECLAIM so it would kick kswapd which should reclaim some memory there is no guarantee for a forward progress. Anyway, I believe even __GFP_DIRECT_RECLAIM allocation wouldn't help much in this particular case. There is not much of a reclaimable memory left - most of it is dirty/writeback - so we are close to a deadlock state. But we do not seem to be stuck completely: unevictable:12341 dirty:90458 writeback:651272 unstable:0 unevictable:12341 dirty:90458 writeback:651272 unstable:0 unevictable:12341 dirty:90458 writeback:651272 unstable:0 unevictable:12341 dirty:90222 writeback:651252 unstable:0 unevictable:12341 dirty:90222 writeback:651231 unstable:0 unevictable:12341 dirty:89321 writeback:651905 unstable:0 unevictable:12341 dirty:89212 writeback:652014 unstable:0 unevictable:12341 dirty:89212 writeback:651993 unstable:0 unevictable:12341 dirty:89212 writeback:651993 unstable:0 unevictable:12488 dirty:42892 writeback:656597 unstable:0 unevictable:12488 dirty:42783 writeback:656597 unstable:0 unevictable:12488 dirty:42125 writeback:657125 unstable:0 unevictable:12488 dirty:42125 writeback:657125 unstable:0 unevictable:12488 dirty:42125 writeback:657125 unstable:0 unevictable:12488 dirty:42125 writeback:657125 unstable:0 unevictable:12556 dirty:54778 writeback:648616 unstable:0 unevictable:12556 dirty:54168 writeback:648919 unstable:0 unevictable:12556 dirty:54168 writeback:648919 unstable:0 unevictable:12556 dirty:53237 writeback:649506 unstable:0 unevictable:12556 dirty:53237 writeback:649506 unstable:0 unevictable:12556 dirty:53128 writeback:649615 unstable:0 unevictable:12556 dirty:53128 writeback:649615 unstable:0 unevictable:12556 dirty:52256 writeback:650159 unstable:0 unevictable:12556 dirty:52256 writeback:650159 unstable:0 unevictable:12556 dirty:52256 writeback:650138 unstable:0 unevictable:12635 dirty:49929 writeback:650724 unstable:0 unevictable:12635 dirty:49820 writeback:650833 unstable:0 unevictable:12635 dirty:49820 writeback:650833 unstable:0 unevictable:12635 dirty:49820 writeback:650833 unstable:0 unevictable:13001 dirty:167859 writeback:651864 unstable:0 unevictable:13001 dirty:167672 writeback:652038 unstable:0 the number of pages under writeback was more or less same throughout the time but there are some local fluctuations when some pages do get completed. That being said, I believe that IO is stuck due to lack of memory which is caused by some memory leak or excessive memory consumption. Finding out who that might be would be the first step. /proc/slabinfo should show us which slab cache is backing so many unreclaimable o
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
On Tue 12-07-16 13:49:20, Michal Hocko wrote: > On Tue 12-07-16 13:28:12, Matthias Dahl wrote: > > Hello Michal... > > > > On 2016-07-12 11:50, Michal Hocko wrote: > > > > > This smells like file pages are stuck in the writeback somewhere and the > > > anon memory is not reclaimable because you do not have any swap device. > > > > Not having a swap device shouldn't be a problem -- and in this case, it > > would cause even more trouble as in disk i/o. > > > > What could cause the file pages to get stuck or stopped from being written > > to the disk? And more importantly, what is so unique/special about the > > Intel Rapid Storage that it happens (seemingly) exclusively with that > > and not the the normal Linux s/w raid support? > > I am not a storage expert (not even mention dm-crypt). But what those > counters say is that the IO completion doesn't trigger so the > PageWriteback flag is still set. Such a page is not reclaimable > obviously. So I would check the IO delivery path and focus on the > potential dm-crypt involvement if you suspect this is a contributing > factor. > > > Also, if the pages are not written to disk, shouldn't something error > > out or slow dd down? > > Writers are normally throttled when we the dirty limit. You seem to have > dirty_ratio set to 20% which is quite a lot considering how much memory > you have. And just to clarify. dirty_ratio refers to dirtyable memory which is free_pages+file_lru pages. In your case you you have only 9% of the total memory size dirty/writeback but that is 90% of dirtyable memory. This is quite possible if somebody consumes free_pages racing with the writer. Writer will get throttled but the concurrent memory consumer will not normally. So you can end up in this situation. > If you get back to the memory info from the OOM killer report: > [18907.592209] active_anon:110314 inactive_anon:295 isolated_anon:0 > active_file:27534 inactive_file:819673 isolated_file:160 > unevictable:13001 dirty:167859 writeback:651864 unstable:0 > slab_reclaimable:177477 slab_unreclaimable:1817501 > mapped:934 shmem:588 pagetables:7109 bounce:0 > free:49928 free_pcp:45 free_cma:0 > > The dirty+writeback is ~9%. What is more interesting, though, LRU > pages are negligible to the memory size (~11%). Note the numer of > unreclaimable slab pages (~20%). Who is consuming those objects? > Where is the rest 70% of memory hiding? -- Michal Hocko SUSE Labs -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
On Tue 12-07-16 13:28:12, Matthias Dahl wrote: > Hello Michal... > > On 2016-07-12 11:50, Michal Hocko wrote: > > > This smells like file pages are stuck in the writeback somewhere and the > > anon memory is not reclaimable because you do not have any swap device. > > Not having a swap device shouldn't be a problem -- and in this case, it > would cause even more trouble as in disk i/o. > > What could cause the file pages to get stuck or stopped from being written > to the disk? And more importantly, what is so unique/special about the > Intel Rapid Storage that it happens (seemingly) exclusively with that > and not the the normal Linux s/w raid support? I am not a storage expert (not even mention dm-crypt). But what those counters say is that the IO completion doesn't trigger so the PageWriteback flag is still set. Such a page is not reclaimable obviously. So I would check the IO delivery path and focus on the potential dm-crypt involvement if you suspect this is a contributing factor. > Also, if the pages are not written to disk, shouldn't something error > out or slow dd down? Writers are normally throttled when we the dirty limit. You seem to have dirty_ratio set to 20% which is quite a lot considering how much memory you have. If you get back to the memory info from the OOM killer report: [18907.592209] active_anon:110314 inactive_anon:295 isolated_anon:0 active_file:27534 inactive_file:819673 isolated_file:160 unevictable:13001 dirty:167859 writeback:651864 unstable:0 slab_reclaimable:177477 slab_unreclaimable:1817501 mapped:934 shmem:588 pagetables:7109 bounce:0 free:49928 free_pcp:45 free_cma:0 The dirty+writeback is ~9%. What is more interesting, though, LRU pages are negligible to the memory size (~11%). Note the numer of unreclaimable slab pages (~20%). Who is consuming those objects? Where is the rest 70% of memory hiding? -- Michal Hocko SUSE Labs -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
Hello Michal... On 2016-07-12 13:49, Michal Hocko wrote: I am not a storage expert (not even mention dm-crypt). But what those counters say is that the IO completion doesn't trigger so the PageWriteback flag is still set. Such a page is not reclaimable obviously. So I would check the IO delivery path and focus on the potential dm-crypt involvement if you suspect this is a contributing factor. Sounds reasonable... except that I have no clue how to trace that with the limited means I have at my disposal right now and with the limited knowledge I have of the kernel internals. ;-) Who is consuming those objects? Where is the rest 70% of memory hiding? Is there any way to get a more detailed listing of where the memory is spent while dd is running? Something I could pipe every 500ms or so for later analysis or so? Writer will get throttled but the concurrent memory consumer will not normally. So you can end up in this situation. Hm, okay. I am still confused though: If I, for example, let dd do the exact same thing on a raw partition on the RAID10, nothing like that happens. Wouldn't we have the same race and problem then too...? It is only with dm-crypt in-between that all of this shows itself. But I do somehow suspect the RAID10 Intel Rapid Storage to be the cause or at least partially. Like I said, if you have any pointers how I could further trace this or figure out who is exactly consuming what memory, that would be very helpful... Thanks. So long, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu services: custom software [desktop, mobile, web], server administration -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
On Tue 12-07-16 10:27:37, Matthias Dahl wrote: > Hello, > > I posted this issue already on linux-mm, linux-kernel and dm-devel a > few days ago and after further investigation it seems like that this > issue is somehow related to the fact that I am using an Intel Rapid > Storage RAID10, so I am summarizing everything again in this mail > and include linux-raid in my post. Sorry for the noise... :( > > I am currently setting up a new machine (since my old one broke down) > and I ran into a lot of " Unable to allocate memory on node -1" warnings > while using dm-crypt. I have attached as much of the full log as I could > recover. > > The encrypted device is sitting on a RAID10 (software raid, Intel Rapid > Storage). I am currently limited to testing via Linux live images since > the machine is not yet properly setup but I did my tests across several > of those. > > Steps to reproduce are: > > 1) > cryptsetup -s 512 -d /dev/urandom -c aes-xts-plain64 open --type plain > /dev/md126p5 test-device > > 2) > dd if=/dev/zero of=/dev/mapper/test-device status=progress bs=512K > > While running and monitoring the memory usage with free, it can be seen > that the used memory increases rapidly and after just a few seconds, the > system is out of memory and page allocation failures start to be issued > as well as the OOM killer gets involved. Here are two instances of the oom killer Mem-Info: [18907.592206] Mem-Info: [18907.592209] active_anon:110314 inactive_anon:295 isolated_anon:0 active_file:27534 inactive_file:819673 isolated_file:160 unevictable:13001 dirty:167859 writeback:651864 unstable:0 slab_reclaimable:177477 slab_unreclaimable:1817501 mapped:934 shmem:588 pagetables:7109 bounce:0 free:49928 free_pcp:45 free_cma:0 [18908.976349] Mem-Info: [18908.976352] active_anon:109647 inactive_anon:295 isolated_anon:0 active_file:27535 inactive_file:819602 isolated_file:128 unevictable:13001 dirty:167672 writeback:652038 unstable:0 slab_reclaimable:177477 slab_unreclaimable:1817828 mapped:934 shmem:588 pagetables:7109 bounce:0 free:50252 free_pcp:91 free_cma:0 This smells like file pages are stuck in the writeback somewhere and the anon memory is not reclaimable because you do not have any swap device. -- Michal Hocko SUSE Labs -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)
Hello Michal... On 2016-07-12 11:50, Michal Hocko wrote: This smells like file pages are stuck in the writeback somewhere and the anon memory is not reclaimable because you do not have any swap device. Not having a swap device shouldn't be a problem -- and in this case, it would cause even more trouble as in disk i/o. What could cause the file pages to get stuck or stopped from being written to the disk? And more importantly, what is so unique/special about the Intel Rapid Storage that it happens (seemingly) exclusively with that and not the the normal Linux s/w raid support? Also, if the pages are not written to disk, shouldn't something error out or slow dd down? Obviously dd is capable of copying zeros a lot faster than they could ever be written to disk -- and still, it works just fine without dm-crypt in-between. It is only when dm-crypt /is/ involved, that the memory gets filled up and things get out of control. Thanks, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu services: custom software [desktop, mobile, web], server administration -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel