Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) with check/repair/sync

2016-07-18 Thread Matthias Dahl

Hello again...

So I spent all weekend doing further tests, since this issue is
really bugging me for obvious reasons.

I thought it would be beneficial if I created a bug report that
summarized and centralized everything in one place rather than
having everything spread across several lists and posts.

Here the bug report I created:
https://bugzilla.kernel.org/show_bug.cgi?id=135481

If anyone has any suggestions, ideas or wants me to do further tests,
please just let me know. There is not much more I can do at this point
without further help/guidance.

Thanks,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-15 Thread Tetsuo Handa
On 2016/07/13 22:47, Michal Hocko wrote:
> On Wed 13-07-16 15:18:11, Matthias Dahl wrote:
>> I tried to figure this out myself but
>> couldn't find anything -- what does the number "-3" state? It is the
>> position in some chain or has it a different meaning?
> 
> $ git grep "kmem_cache_create.*bio"
> block/bio-integrity.c:  bip_slab = kmem_cache_create("bio_integrity_payload",
> 
> so there doesn't seem to be any cache like that in the vanilla kernel.
> 
It is

  snprintf(bslab->name, sizeof(bslab->name), "bio-%d", entry);

line in bio_find_or_create_slab() in block/bio.c.
I think you can identify who is creating it by printing backtrace at that line.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) with check/repair/sync

2016-07-15 Thread Matthias Dahl

Hello...

I am rather persistent (stubborn?) when it comes to tracking down bugs,
if somehow possible... and it seems it paid off... somewhat. ;-)

So I did quite a lot more further tests and came up with something very
interesting: As long as the RAID is in sync (as-in: sync_action=idle),
I can not for the life of me trigger this issue -- the used memory
still explodes to most of the RAM but it oscillates back and forth.

I did very stupid things to stress the machine while dd was running as
usual on the dm-crypt device. I opened a second dd instance with the
same parameters on the dm-crypt device. I wrote a simple program that
allocated random amounts of memory (up to 10 GiB), memset them and after
a random amount of time released it again -- in a continuous loop. I
put heavy network stress on the machine... whatever I could think of.

No matter what, the issue did not trigger. And I repeated said tests
quite a few times over extended time periods (usually an hour or so).
Everything worked beautifully with nice speeds and no noticeable system
slow-downs/lag.

As soon as I issued a "check" to sync_action of the RAID device, it was
just a matter of a second until the OOM killer kicked in and all hell
broke loose again. And basically all of my tests where done while the
RAID device was syncing -- due to a very unfortunate series of events.

I tried to repeat that same test with an external (USB3) connected disk
with a Linux s/w RAID10 over two partitions... but unfortunately that
behaves rather differently. I assume it is because it is connected
through USB and not SATA. While doing those tests on my RAID10 with the
4 internal SATA3 disks, you can see w/ free that the "used memory" does
explode to most of the RAM and then oscillates back and forth. With the
same test on the external disk through, that does not happen at all. The
used memory stays pretty much constant and only the buffers vary... but
most of the memory is still free in that case.

I hope my persistence on the matter is not annoying and finally leads us
somewhere where the real issue hides.

Any suggestions, opinions and ideas are greatly appreciated as I have
pretty much exhausted mine at this time.

Last but not least: I switched my testing to a OpenSuSE Tumbleweed Live
system (x86_64 w/ kernel 4.6.3) as Rawhide w/ 4.7.0rcX behaves rather
strangely and unstable at times.

Thanks,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Matthias Dahl

Hello Ondrej...

On 2016-07-13 18:24, Ondrej Kozina wrote:


One step after another.


Sorry, it was not meant to be rude or anything... more frustration
since I cannot be of more help and I really would like to jump in
head-first and help fixing it... but lack the necessary insight into
the kernel internals. But who knows, I started reading Robert Love's
book... so, in a good decade or so. ;-))


https://marc.info/?l=linux-mm&m=146825178722612&w=2


Thanks for that link. I have to read those more closely tomorrow, since
there are some nice insights into dm-crypt there. :)

Still, you have to admit, it is also rather frustrating/scary if such
a crucial subsystem can have bugs over several major versions that do
result in complete hangs (and can thus cause corruption) and are quite
easily triggerable. It does not instill too much confidence that said
subsystem is so intensively used/tested after all. That's at least how
I feel about it...

So long,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Ondrej Kozina

On 07/13/2016 05:32 PM, Matthias Dahl wrote:


No matter what, I have no clue how to further diagnose this issue. And
given that I already had unsolvable issues with dm-crypt a couple of
months ago with my old machine where the system simply hang itself or
went OOM when the swap was encrypted and just a few kilobytes needed to
be swapped out, I am not so sure anymore I can trust dm-crypt with a
full disk encryption to the point where I feel "safe"... as-in, nothing
bad will happen or the system won't suddenly hang itself due to it. Or
if a bug is introduced, that it will actually be possible to diagnose it
and help fix it or that it will even be eventually fixed. Which is
really
a pity, since I would really have liked to help solve this. With the
swap issue, I did git bisects, tests, narrowed it down to kernel
versions
when said bug was introduced... but in the end, the bug is still present
as far as I know. :(



One step after another. Mathias, your original report was not forgotten, 
it's just not so easy to find the real culprit and fix it without 
causing yet another regression. See the 
https://marc.info/?l=linux-mm&m=146825178722612&w=2 thread...


Not to mention that on current 4.7-rc7 kernels it behaves yet slightly 
differently (yet far from ideally).


Regards O.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Matthias Dahl

Hello...

On 2016-07-13 15:47, Michal Hocko wrote:


This is getting out of my area of expertise so I am not sure I can help
you much more, I am afraid.


That's okay. Thank you so much for investing the time.

For what it is worth, I did some further tests and here is what I came
up with:

If I create the plain dm-crypt device with 
--perf-submit_from_crypt_cpus,

I can run the tests for as long as I want but the memory problem never
occurs, meaning buffer/cache increase accordingly and thus free memory
decreases but used mem stays pretty constant low. Yet the problem here
is, the system becomes sluggish and throughput is severely impacted.
ksoftirqd is hovering at 100% the whole time.

Somehow my guess is that normally dm-crypt simply takes every request,
encrypts it and queues it internally by itself. And that queue is then
slowly emptied to the underlying device kernel queue. That is why I am
seeing the exploding increase in used memory (rather than in 
buffer/cache)

which in the end causes a OOM situation. But that is just my guess. And
IMHO that is not the right thing to do (tm), as can be seen in this 
case.


No matter what, I have no clue how to further diagnose this issue. And
given that I already had unsolvable issues with dm-crypt a couple of
months ago with my old machine where the system simply hang itself or
went OOM when the swap was encrypted and just a few kilobytes needed to
be swapped out, I am not so sure anymore I can trust dm-crypt with a
full disk encryption to the point where I feel "safe"... as-in, nothing
bad will happen or the system won't suddenly hang itself due to it. Or
if a bug is introduced, that it will actually be possible to diagnose it
and help fix it or that it will even be eventually fixed. Which is 
really

a pity, since I would really have liked to help solve this. With the
swap issue, I did git bisects, tests, narrowed it down to kernel 
versions

when said bug was introduced... but in the end, the bug is still present
as far as I know. :(

I will probably look again into ext4 fs encryption. My whole point is
just that in case any of disks go faulty and needs to be replaced or
sent in for warranty, I don't have to worry about mails, personal or
business data still being left on the device (e.g. if it is no longer
accessible or has reallocated sectors or whatever) in a readable form.

Oh well. Pity, really.

Thanks again,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Michal Hocko
On Wed 13-07-16 15:18:11, Matthias Dahl wrote:
> Hello Michal,
> 
> many thanks for all your time and help on this issue. It is very much
> appreciated and I hope we can track this down somehow.
> 
> On 2016-07-13 14:18, Michal Hocko wrote:
> 
> > So it seems we are accumulating bios and 256B objects. Buffer heads as
> > well but so much. Having over 4G worth of bios sounds really suspicious.
> > Note that they pin pages to be written so this might be consuming the
> > rest of the unaccounted memory! So the main question is why those bios
> > do not get dispatched or finished.
> 
> Ok. It is the Block IOs that do not get completed. I do get it right
> that those bio-3 are already the encrypted data that should be written
> out but do not for some reason?

Hard to tell. Maybe they are just allocated and waiting for encryption.
But this is just a wild guessing.


> I tried to figure this out myself but
> couldn't find anything -- what does the number "-3" state? It is the
> position in some chain or has it a different meaning?

$ git grep "kmem_cache_create.*bio"
block/bio-integrity.c:  bip_slab = kmem_cache_create("bio_integrity_payload",

so there doesn't seem to be any cache like that in the vanilla kernel.

> Do you think a trace like you mentioned would help shed some more light
> on this? Or would you recommend something else?

Dunno. Seeing who is allocating those bios might be helpful but it won't
tell much about what has happened to them after allocation. The tracing
would be more helpful for a mem leak situation which doesn't seem to be
the case here.

This is getting out of my area of expertise so I am not sure I can help
you much more, I am afraid.
-- 
Michal Hocko
SUSE Labs

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Matthias Dahl

Hello Michal,

many thanks for all your time and help on this issue. It is very much
appreciated and I hope we can track this down somehow.

On 2016-07-13 14:18, Michal Hocko wrote:


So it seems we are accumulating bios and 256B objects. Buffer heads as
well but so much. Having over 4G worth of bios sounds really 
suspicious.

Note that they pin pages to be written so this might be consuming the
rest of the unaccounted memory! So the main question is why those bios
do not get dispatched or finished.


Ok. It is the Block IOs that do not get completed. I do get it right
that those bio-3 are already the encrypted data that should be written
out but do not for some reason? I tried to figure this out myself but
couldn't find anything -- what does the number "-3" state? It is the
position in some chain or has it a different meaning?

Do you think a trace like you mentioned would help shed some more light
on this? Or would you recommend something else?

I have also cc' Mike Snitzer who commented on this issue before, maybe
he can see some pattern here as well. Pity that Neil Brown is no longer
available as I think this is also somehow related to it being a Intel
Rapid Storage RAID10... since it is the only way I can reproduce it. :(

Thanks,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Michal Hocko
On Wed 13-07-16 13:21:26, Michal Hocko wrote:
> On Tue 12-07-16 16:56:32, Matthias Dahl wrote:
[...]
> > If that support is baked into the Fedora provided kernel that is. If
> > you could give me a few hints or pointers, how to properly do a allocator
> > trace point and get some decent data out of it, that would be nice.
> 
> You need to have a kernel with CONFIG_TRACEPOINTS and then enable them
> via debugfs. You are interested in kmalloc tracepoint and specify a size
> as a filter to only see those that are really interesting. I haven't
> checked your slabinfo yet - hope to get to it later today.

The largest contributors seem to be
$ zcat slabinfo.txt.gz | awk '{printf "%s %d\n" , $1, $6*$15}' | 
head_and_tail.sh 133 | paste-with-diff.sh | sort -n -k3
initial diff [#pages]
radix_tree_node 34442592
debug_objects_cache 388 46159
file_lock_ctx   114 138570
buffer_head 5616238704
kmalloc-256 328 573164
bio-3   24  1118984

So it seems we are accumulating bios and 256B objects. Buffer heads as
well but so much. Having over 4G worth of bios sounds really suspicious.
Note that they pin pages to be written so this might be consuming the
rest of the unaccounted memory! So the main question is why those bios
do not get dispatched or finished.
-- 
Michal Hocko
SUSE Labs

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-13 Thread Michal Hocko
On Tue 12-07-16 16:56:32, Matthias Dahl wrote:
> Hello Michal...
> 
> On 2016-07-12 16:07, Michal Hocko wrote:
> 
> > /proc/slabinfo could at least point on who is eating that memory.
> 
> Thanks. I have made another test (and thus again put the RAID10 out of
> sync for the 100th time, sigh) and made regular snapshots of slabinfo
> which I have attached to this mail.
> 
> > Direct IO doesn't get throttled like buffered IO.
> 
> Is buffered i/o not used in both cases if I don't explicitly request
> direct i/o?
> 
> dd if=/dev/zero /dev/md126p5 bs=512K
> and dd if=/dev/zero /dev/mapper/test-device bs=512K

OK, I misunderstood your question though. You were mentioning the direct
IO earlier so I thought you were referring to it here as well.
 
> Given that the test-device is dm-crypt on md125p5. Aren't both using
> buffered i/o?

Yes they are.

> > the number of pages under writeback was more or less same throughout
> > the time but there are some local fluctuations when some pages do get
> > completed.
> 
> The pages under writeback are those directly destined for the disk, so
> after dm-crypt had done its encryption?

Those are submitted for the IO. dm-crypt will allocate a "shadow" page
for each of them to perform the encryption and only then submit the IO
to the storage underneath see
http://lkml.kernel.org/r/alpine.lrh.2.02.1607121907160.24...@file01.intranet.prod.int.rdu2.redhat.com

> > If not you can enable allocator trace point for a particular object
> > size (or range of sizes) and see who is requesting them.
> 
> If that support is baked into the Fedora provided kernel that is. If
> you could give me a few hints or pointers, how to properly do a allocator
> trace point and get some decent data out of it, that would be nice.

You need to have a kernel with CONFIG_TRACEPOINTS and then enable them
via debugfs. You are interested in kmalloc tracepoint and specify a size
as a filter to only see those that are really interesting. I haven't
checked your slabinfo yet - hope to get to it later today.
-- 
Michal Hocko
SUSE Labs

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Matthias Dahl

Hello Michal...

On 2016-07-12 16:07, Michal Hocko wrote:


/proc/slabinfo could at least point on who is eating that memory.


Thanks. I have made another test (and thus again put the RAID10 out of
sync for the 100th time, sigh) and made regular snapshots of slabinfo
which I have attached to this mail.


Direct IO doesn't get throttled like buffered IO.


Is buffered i/o not used in both cases if I don't explicitly request
direct i/o?

dd if=/dev/zero /dev/md126p5 bs=512K
and dd if=/dev/zero /dev/mapper/test-device bs=512K

Given that the test-device is dm-crypt on md125p5. Aren't both using
buffered i/o?


the number of pages under writeback was more or less same throughout
the time but there are some local fluctuations when some pages do get
completed.


The pages under writeback are those directly destined for the disk, so
after dm-crypt had done its encryption?


If not you can enable allocator trace point for a particular object
size (or range of sizes) and see who is requesting them.


If that support is baked into the Fedora provided kernel that is. If
you could give me a few hints or pointers, how to properly do a 
allocator

trace point and get some decent data out of it, that would be nice.

Thanks,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

slabinfo.txt.gz
Description: GNU Zip compressed data
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Michal Hocko
On Tue 12-07-16 14:42:12, Matthias Dahl wrote:
> Hello Michal...
> 
> On 2016-07-12 13:49, Michal Hocko wrote:
> 
> > I am not a storage expert (not even mention dm-crypt). But what those
> > counters say is that the IO completion doesn't trigger so the
> > PageWriteback flag is still set. Such a page is not reclaimable
> > obviously. So I would check the IO delivery path and focus on the
> > potential dm-crypt involvement if you suspect this is a contributing
> > factor.
> 
> Sounds reasonable... except that I have no clue how to trace that with
> the limited means I have at my disposal right now and with the limited
> knowledge I have of the kernel internals. ;-)

I guess dm resp. block layer experts would be much better to advise
here.
 
> > Who is consuming those objects? Where is the rest 70% of memory hiding?
> 
> Is there any way to get a more detailed listing of where the memory is
> spent while dd is running? Something I could pipe every 500ms or so for
> later analysis or so?

/proc/slabinfo could at least point on who is eating that memory.

> > Writer will get throttled but the concurrent memory consumer will not
> > normally. So you can end up in this situation.
> 
> Hm, okay. I am still confused though: If I, for example, let dd do the
> exact same thing on a raw partition on the RAID10, nothing like that
> happens. Wouldn't we have the same race and problem then too...?

Direct IO doesn't get throttled like buffered IO.

> It is
> only with dm-crypt in-between that all of this shows itself. But I do
> somehow suspect the RAID10 Intel Rapid Storage to be the cause or at
> least partially.

Well, there are many allocation failures for GFP_ATOMIC requests
from scsi_request_fn path. AFAIU the code, the request is deferred
and retried later.  I cannot find any path which would do regular
__GFP_DIRECT_RECLAIM fallback allocation. So while GFP_ATOMIC is
___GFP_KSWAPD_RECLAIM so it would kick kswapd which should reclaim some
memory there is no guarantee for a forward progress. Anyway, I believe
even __GFP_DIRECT_RECLAIM allocation wouldn't help much in this
particular case. There is not much of a reclaimable memory left - most
of it is dirty/writeback - so we are close to a deadlock state.

But we do not seem to be stuck completely:
unevictable:12341 dirty:90458 writeback:651272 unstable:0
unevictable:12341 dirty:90458 writeback:651272 unstable:0
unevictable:12341 dirty:90458 writeback:651272 unstable:0
unevictable:12341 dirty:90222 writeback:651252 unstable:0
unevictable:12341 dirty:90222 writeback:651231 unstable:0
unevictable:12341 dirty:89321 writeback:651905 unstable:0
unevictable:12341 dirty:89212 writeback:652014 unstable:0
unevictable:12341 dirty:89212 writeback:651993 unstable:0
unevictable:12341 dirty:89212 writeback:651993 unstable:0
unevictable:12488 dirty:42892 writeback:656597 unstable:0
unevictable:12488 dirty:42783 writeback:656597 unstable:0
unevictable:12488 dirty:42125 writeback:657125 unstable:0
unevictable:12488 dirty:42125 writeback:657125 unstable:0
unevictable:12488 dirty:42125 writeback:657125 unstable:0
unevictable:12488 dirty:42125 writeback:657125 unstable:0
unevictable:12556 dirty:54778 writeback:648616 unstable:0
unevictable:12556 dirty:54168 writeback:648919 unstable:0
unevictable:12556 dirty:54168 writeback:648919 unstable:0
unevictable:12556 dirty:53237 writeback:649506 unstable:0
unevictable:12556 dirty:53237 writeback:649506 unstable:0
unevictable:12556 dirty:53128 writeback:649615 unstable:0
unevictable:12556 dirty:53128 writeback:649615 unstable:0
unevictable:12556 dirty:52256 writeback:650159 unstable:0
unevictable:12556 dirty:52256 writeback:650159 unstable:0
unevictable:12556 dirty:52256 writeback:650138 unstable:0
unevictable:12635 dirty:49929 writeback:650724 unstable:0
unevictable:12635 dirty:49820 writeback:650833 unstable:0
unevictable:12635 dirty:49820 writeback:650833 unstable:0
unevictable:12635 dirty:49820 writeback:650833 unstable:0
unevictable:13001 dirty:167859 writeback:651864 unstable:0
unevictable:13001 dirty:167672 writeback:652038 unstable:0

the number of pages under writeback was more or less same throughout
the time but there are some local fluctuations when some pages do get
completed.

That being said, I believe that IO is stuck due to lack of memory which
is caused by some memory leak or excessive memory consumption. Finding
out who that might be would be the first step. /proc/slabinfo should
show us which slab cache is backing so many unreclaimable o

Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Michal Hocko
On Tue 12-07-16 13:49:20, Michal Hocko wrote:
> On Tue 12-07-16 13:28:12, Matthias Dahl wrote:
> > Hello Michal...
> > 
> > On 2016-07-12 11:50, Michal Hocko wrote:
> > 
> > > This smells like file pages are stuck in the writeback somewhere and the
> > > anon memory is not reclaimable because you do not have any swap device.
> > 
> > Not having a swap device shouldn't be a problem -- and in this case, it
> > would cause even more trouble as in disk i/o.
> > 
> > What could cause the file pages to get stuck or stopped from being written
> > to the disk? And more importantly, what is so unique/special about the
> > Intel Rapid Storage that it happens (seemingly) exclusively with that
> > and not the the normal Linux s/w raid support?
> 
> I am not a storage expert (not even mention dm-crypt). But what those
> counters say is that the IO completion doesn't trigger so the
> PageWriteback flag is still set. Such a page is not reclaimable
> obviously. So I would check the IO delivery path and focus on the
> potential dm-crypt involvement if you suspect this is a contributing
> factor.
>  
> > Also, if the pages are not written to disk, shouldn't something error
> > out or slow dd down?
> 
> Writers are normally throttled when we the dirty limit. You seem to have
> dirty_ratio set to 20% which is quite a lot considering how much memory
> you have.

And just to clarify. dirty_ratio refers to dirtyable memory which is
free_pages+file_lru pages. In your case you you have only 9% of the total
memory size dirty/writeback but that is 90% of dirtyable memory. This is
quite possible if somebody consumes free_pages racing with the writer.
Writer will get throttled but the concurrent memory consumer will not
normally. So you can end up in this situation.

> If you get back to the memory info from the OOM killer report:
> [18907.592209] active_anon:110314 inactive_anon:295 isolated_anon:0
> active_file:27534 inactive_file:819673 isolated_file:160
> unevictable:13001 dirty:167859 writeback:651864 unstable:0
> slab_reclaimable:177477 slab_unreclaimable:1817501
> mapped:934 shmem:588 pagetables:7109 bounce:0
> free:49928 free_pcp:45 free_cma:0
> 
> The dirty+writeback is ~9%. What is more interesting, though, LRU
> pages are negligible to the memory size (~11%). Note the numer of
> unreclaimable slab pages (~20%). Who is consuming those objects?
> Where is the rest 70% of memory hiding?


-- 
Michal Hocko
SUSE Labs

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Michal Hocko
On Tue 12-07-16 13:28:12, Matthias Dahl wrote:
> Hello Michal...
> 
> On 2016-07-12 11:50, Michal Hocko wrote:
> 
> > This smells like file pages are stuck in the writeback somewhere and the
> > anon memory is not reclaimable because you do not have any swap device.
> 
> Not having a swap device shouldn't be a problem -- and in this case, it
> would cause even more trouble as in disk i/o.
> 
> What could cause the file pages to get stuck or stopped from being written
> to the disk? And more importantly, what is so unique/special about the
> Intel Rapid Storage that it happens (seemingly) exclusively with that
> and not the the normal Linux s/w raid support?

I am not a storage expert (not even mention dm-crypt). But what those
counters say is that the IO completion doesn't trigger so the
PageWriteback flag is still set. Such a page is not reclaimable
obviously. So I would check the IO delivery path and focus on the
potential dm-crypt involvement if you suspect this is a contributing
factor.
 
> Also, if the pages are not written to disk, shouldn't something error
> out or slow dd down?

Writers are normally throttled when we the dirty limit. You seem to have
dirty_ratio set to 20% which is quite a lot considering how much memory
you have. If you get back to the memory info from the OOM killer report:
[18907.592209] active_anon:110314 inactive_anon:295 isolated_anon:0
active_file:27534 inactive_file:819673 isolated_file:160
unevictable:13001 dirty:167859 writeback:651864 unstable:0
slab_reclaimable:177477 slab_unreclaimable:1817501
mapped:934 shmem:588 pagetables:7109 bounce:0
free:49928 free_pcp:45 free_cma:0

The dirty+writeback is ~9%. What is more interesting, though, LRU
pages are negligible to the memory size (~11%). Note the numer of
unreclaimable slab pages (~20%). Who is consuming those objects?
Where is the rest 70% of memory hiding?
-- 
Michal Hocko
SUSE Labs

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Matthias Dahl

Hello Michal...

On 2016-07-12 13:49, Michal Hocko wrote:


I am not a storage expert (not even mention dm-crypt). But what those
counters say is that the IO completion doesn't trigger so the
PageWriteback flag is still set. Such a page is not reclaimable
obviously. So I would check the IO delivery path and focus on the
potential dm-crypt involvement if you suspect this is a contributing
factor.


Sounds reasonable... except that I have no clue how to trace that with
the limited means I have at my disposal right now and with the limited
knowledge I have of the kernel internals. ;-)


Who is consuming those objects? Where is the rest 70% of memory hiding?


Is there any way to get a more detailed listing of where the memory is
spent while dd is running? Something I could pipe every 500ms or so for
later analysis or so?


Writer will get throttled but the concurrent memory consumer will not
normally. So you can end up in this situation.


Hm, okay. I am still confused though: If I, for example, let dd do the
exact same thing on a raw partition on the RAID10, nothing like that
happens. Wouldn't we have the same race and problem then too...? It is
only with dm-crypt in-between that all of this shows itself. But I do
somehow suspect the RAID10 Intel Rapid Storage to be the cause or at
least partially.

Like I said, if you have any pointers how I could further trace this
or figure out who is exactly consuming what memory, that would be very
helpful... Thanks.

So long,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Michal Hocko
On Tue 12-07-16 10:27:37, Matthias Dahl wrote:
> Hello,
> 
> I posted this issue already on linux-mm, linux-kernel and dm-devel a
> few days ago and after further investigation it seems like that this
> issue is somehow related to the fact that I am using an Intel Rapid
> Storage RAID10, so I am summarizing everything again in this mail
> and include linux-raid in my post. Sorry for the noise... :(
> 
> I am currently setting up a new machine (since my old one broke down)
> and I ran into a lot of " Unable to allocate memory on node -1" warnings
> while using dm-crypt. I have attached as much of the full log as I could
> recover.
> 
> The encrypted device is sitting on a RAID10 (software raid, Intel Rapid
> Storage). I am currently limited to testing via Linux live images since
> the machine is not yet properly setup but I did my tests across several
> of those.
> 
> Steps to reproduce are:
> 
> 1)
> cryptsetup -s 512 -d /dev/urandom -c aes-xts-plain64 open --type plain
> /dev/md126p5 test-device
> 
> 2)
> dd if=/dev/zero of=/dev/mapper/test-device status=progress bs=512K
> 
> While running and monitoring the memory usage with free, it can be seen
> that the used memory increases rapidly and after just a few seconds, the
> system is out of memory and page allocation failures start to be issued
> as well as the OOM killer gets involved.

Here are two instances of the oom killer Mem-Info:

[18907.592206] Mem-Info:
[18907.592209] active_anon:110314 inactive_anon:295 isolated_anon:0
active_file:27534 inactive_file:819673 isolated_file:160
unevictable:13001 dirty:167859 writeback:651864 unstable:0
slab_reclaimable:177477 slab_unreclaimable:1817501
mapped:934 shmem:588 pagetables:7109 bounce:0
free:49928 free_pcp:45 free_cma:0

[18908.976349] Mem-Info:
[18908.976352] active_anon:109647 inactive_anon:295 isolated_anon:0
active_file:27535 inactive_file:819602 isolated_file:128
unevictable:13001 dirty:167672 writeback:652038 unstable:0
slab_reclaimable:177477 slab_unreclaimable:1817828
mapped:934 shmem:588 pagetables:7109 bounce:0
free:50252 free_pcp:91 free_cma:0

This smells like file pages are stuck in the writeback somewhere and the
anon memory is not reclaimable because you do not have any swap device.
-- 
Michal Hocko
SUSE Labs

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

2016-07-12 Thread Matthias Dahl

Hello Michal...

On 2016-07-12 11:50, Michal Hocko wrote:

This smells like file pages are stuck in the writeback somewhere and 
the

anon memory is not reclaimable because you do not have any swap device.


Not having a swap device shouldn't be a problem -- and in this case, it
would cause even more trouble as in disk i/o.

What could cause the file pages to get stuck or stopped from being 
written

to the disk? And more importantly, what is so unique/special about the
Intel Rapid Storage that it happens (seemingly) exclusively with that
and not the the normal Linux s/w raid support?

Also, if the pages are not written to disk, shouldn't something error
out or slow dd down? Obviously dd is capable of copying zeros a lot
faster than they could ever be written to disk -- and still, it works
just fine without dm-crypt in-between. It is only when dm-crypt /is/
involved, that the memory gets filled up and things get out of control.

Thanks,
Matthias

--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel