Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-05-03 Thread Mikulas Patocka


On Tue, 1 May 2018, Andrew Morton wrote:

> On Tue, 24 Apr 2018 12:33:01 -0400 (EDT) Mikulas Patocka 
>  wrote:
> 
> > 
> > 
> > On Tue, 24 Apr 2018, Michal Hocko wrote:
> > 
> > > On Tue 24-04-18 11:30:40, Mikulas Patocka wrote:
> > > > 
> > > > 
> > > > On Tue, 24 Apr 2018, Michal Hocko wrote:
> > > > 
> > > > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote:
> > > > > 
> > > > > > Fixing __vmalloc code 
> > > > > > is easy and it doesn't require cooperation with maintainers.
> > > > > 
> > > > > But it is a hack against the intention of the scope api.
> > > > 
> > > > It is not!
> > > 
> > > This discussion simply doesn't make much sense it seems. The scope API
> > > is to document the scope of the reclaim recursion critical section. That
> > > certainly is not a utility function like vmalloc.
> > 
> > That 15-line __vmalloc bugfix doesn't prevent you (or any other kernel 
> > developer) from converting the code to the scope API. You make nonsensical 
> > excuses.
> > 
> 
> Fun thread!
> 
> Winding back to the original problem, I'd state it as
> 
> - Caller uses kvmalloc() but passes the address into vmalloc-naive
>   DMA API and
> 
> - Caller uses kvmalloc() but passes the address into kfree()
> 
> Yes?
> 
> If so, then...
> 
> Is there a way in which, in the kvmalloc-called-kmalloc path, we can
> tag the slab-allocated memory with a "this memory was allocated with
> kvmalloc()" flag?  I *think* there's extra per-object storage available
> with suitable slab/slub debugging options?  Perhaps we could steal one
> bit from the redzone, dunno.
> 
> If so then we can
> 
> a) set that flag in kvmalloc() if the kmalloc() call succeeded
> 
> b) check for that flag in the DMA code, WARN if it is set.
> 
> c) in kvfree(), clear that flag before calling kfree()
> 
> d) in kfree(), check for that flag and go WARN() if set.
> 
> So both potential bugs are detected all the time, dependent upon
> CONFIG_SLUB_DEBUG (and perhaps other slub config options).

Yes, it would be good. You also need to check it in virt_to_phys(), 
virt_to_pfn(), __pa() and maybe some others.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-05-01 Thread Andrew Morton
On Tue, 24 Apr 2018 12:33:01 -0400 (EDT) Mikulas Patocka  
wrote:

> 
> 
> On Tue, 24 Apr 2018, Michal Hocko wrote:
> 
> > On Tue 24-04-18 11:30:40, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Tue, 24 Apr 2018, Michal Hocko wrote:
> > > 
> > > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote:
> > > > 
> > > > > Fixing __vmalloc code 
> > > > > is easy and it doesn't require cooperation with maintainers.
> > > > 
> > > > But it is a hack against the intention of the scope api.
> > > 
> > > It is not!
> > 
> > This discussion simply doesn't make much sense it seems. The scope API
> > is to document the scope of the reclaim recursion critical section. That
> > certainly is not a utility function like vmalloc.
> 
> That 15-line __vmalloc bugfix doesn't prevent you (or any other kernel 
> developer) from converting the code to the scope API. You make nonsensical 
> excuses.
> 

Fun thread!

Winding back to the original problem, I'd state it as

- Caller uses kvmalloc() but passes the address into vmalloc-naive
  DMA API and

- Caller uses kvmalloc() but passes the address into kfree()

Yes?

If so, then...

Is there a way in which, in the kvmalloc-called-kmalloc path, we can
tag the slab-allocated memory with a "this memory was allocated with
kvmalloc()" flag?  I *think* there's extra per-object storage available
with suitable slab/slub debugging options?  Perhaps we could steal one
bit from the redzone, dunno.

If so then we can

a) set that flag in kvmalloc() if the kmalloc() call succeeded

b) check for that flag in the DMA code, WARN if it is set.

c) in kvfree(), clear that flag before calling kfree()

d) in kfree(), check for that flag and go WARN() if set.

So both potential bugs are detected all the time, dependent upon
CONFIG_SLUB_DEBUG (and perhaps other slub config options).



Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-24 Thread Mikulas Patocka


On Tue, 24 Apr 2018, Michal Hocko wrote:

> On Tue 24-04-18 11:30:40, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 24 Apr 2018, Michal Hocko wrote:
> > 
> > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote:
> > > 
> > > > Fixing __vmalloc code 
> > > > is easy and it doesn't require cooperation with maintainers.
> > > 
> > > But it is a hack against the intention of the scope api.
> > 
> > It is not!
> 
> This discussion simply doesn't make much sense it seems. The scope API
> is to document the scope of the reclaim recursion critical section. That
> certainly is not a utility function like vmalloc.

That 15-line __vmalloc bugfix doesn't prevent you (or any other kernel 
developer) from converting the code to the scope API. You make nonsensical 
excuses.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-24 Thread Michal Hocko
On Tue 24-04-18 10:12:42, Michal Hocko wrote:
> On Tue 24-04-18 11:30:40, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 24 Apr 2018, Michal Hocko wrote:
> > 
> > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote:
> > > 
> > > > Fixing __vmalloc code 
> > > > is easy and it doesn't require cooperation with maintainers.
> > > 
> > > But it is a hack against the intention of the scope api.
> > 
> > It is not!
> 
> This discussion simply doesn't make much sense it seems. The scope API
> is to document the scope of the reclaim recursion critical section. That
> certainly is not a utility function like vmalloc.

http://lkml.kernel.org/r/20180424162712.gl17...@dhcp22.suse.cz

let's see how it rolls this time.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-24 Thread Michal Hocko
On Tue 24-04-18 11:30:40, Mikulas Patocka wrote:
> 
> 
> On Tue, 24 Apr 2018, Michal Hocko wrote:
> 
> > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote:
> > 
> > > Fixing __vmalloc code 
> > > is easy and it doesn't require cooperation with maintainers.
> > 
> > But it is a hack against the intention of the scope api.
> 
> It is not!

This discussion simply doesn't make much sense it seems. The scope API
is to document the scope of the reclaim recursion critical section. That
certainly is not a utility function like vmalloc.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-24 Thread Mikulas Patocka


On Tue, 24 Apr 2018, Michal Hocko wrote:

> On Mon 23-04-18 20:25:15, Mikulas Patocka wrote:
> 
> > Fixing __vmalloc code 
> > is easy and it doesn't require cooperation with maintainers.
> 
> But it is a hack against the intention of the scope api.

It is not! You can fix __vmalloc now and you can convert the kernel to the 
scope API in 4 years. It's not one way or the other.

> It also alows maintainers to not care about their broken code.

Most maintainers don't even know that it's broken. Out of 14 subsystems 
using __vmalloc with GFP_NOIO/NOFS, only 2 realized that its 
implementation is broken and implemented a workaround (me and the XFS 
developers).

Misimplementing a function in a subtle and hard-to-notice way won't drive 
developers away from using it.

> > > > He refuses 15-line patch to fix GFP_NOIO bug because he believes that 
> > > > in 4 
> > > > years, the kernel will be refactored and GFP_NOIO will be eliminated. 
> > > > Why 
> > > > does he have veto over this part of the code? I'd much rather argue 
> > > > with 
> > > > people who have constructive comments about fixing bugs than with him.
> > > 
> > > I didn't NACK the patch AFAIR. I've said it is not a good idea longterm.
> > > I would be much more willing to change my mind if you would back your
> > > patch by a real bug report. Hacks are acceptable when we have a real
> > > issue in hands. But if we want to fix potential issue then better make
> > > it properly.
> > 
> > Developers should fix bugs in advance, not to wait until a crash hapens, 
> > is analyzed and reported.
> 
> I agree. But are those existing users broken in the first place? I have
> seen so many GFP_NOFS abuses that I would dare to guess that most of
> those vmalloc NOFS abusers can be simply turned into GFP_KERNEL. Maybe
> that is the reason we haven't heard any complains in years.

alloc_pages reclaims clean pages and most hard work is done by kswapd, so 
GFP_KERNEL doesn't cause much issues with writeback. But cheating isn't 
justified if you can get away with it. Incorrect GFP flags cause real 
problems with shrinkers - because shrinkers are called from alloc_pages 
and they do respond to GFP flags.

I had reported deadlock due to GFP issues (9d28eb12447). And the worst 
thing about these bug reports is that they are totally unreproducible and 
I get nothing, but a stacktrace in bugzilla. I had to guess what happened 
and I couldn't even test if the patch fixed the bug.

I'm not really happy that you are deliberately leaving these issues behind 
and making excuses.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-24 Thread Michal Hocko
On Mon 23-04-18 20:25:15, Mikulas Patocka wrote:
> 
> 
> On Mon, 23 Apr 2018, Michal Hocko wrote:
> 
> > On Mon 23-04-18 10:06:08, Mikulas Patocka wrote:
> > 
> > > > > He didn't want to fix vmalloc(GFP_NOIO)
> > > > 
> > > > I don't remember that conversation, so I don't know whether I agree with
> > > > his reasoning or not.  But we are supposed to be moving away from 
> > > > GFP_NOIO
> > > > towards marking regions with memalloc_noio_save() / restore.  If you do
> > > > that, you won't need vmalloc(GFP_NOIO).
> > > 
> > > He said the same thing a year ago. And there was small progress. 6 out of 
> > > 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in 
> > > infiniband and 1 in btrfs. (the whole discussion is here 
> > > http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html )
> > 
> > Well this is not that easy. It requires a cooperation from maintainers.
> > I can only do as much. I've posted patches in the past and actively
> > bringing up this topic at LSFMM last two years...
> 
> You're right - but you have chosen the uneasy path.

Yes.

> Fixing __vmalloc code 
> is easy and it doesn't require cooperation with maintainers.

But it is a hack against the intention of the scope api. It also alows
maintainers to not care about their broken code.

> > > He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 
> > > 4 
> > > years, the kernel will be refactored and GFP_NOIO will be eliminated. Why 
> > > does he have veto over this part of the code? I'd much rather argue with 
> > > people who have constructive comments about fixing bugs than with him.
> > 
> > I didn't NACK the patch AFAIR. I've said it is not a good idea longterm.
> > I would be much more willing to change my mind if you would back your
> > patch by a real bug report. Hacks are acceptable when we have a real
> > issue in hands. But if we want to fix potential issue then better make
> > it properly.
> 
> Developers should fix bugs in advance, not to wait until a crash hapens, 
> is analyzed and reported.

I agree. But are those existing users broken in the first place? I have
seen so many GFP_NOFS abuses that I would dare to guess that most of
those vmalloc NOFS abusers can be simply turned into GFP_KERNEL. Maybe
that is the reason we haven't heard any complains in years.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-23 Thread Mikulas Patocka


On Mon, 23 Apr 2018, Michal Hocko wrote:

> On Mon 23-04-18 10:06:08, Mikulas Patocka wrote:
> 
> > > > He didn't want to fix vmalloc(GFP_NOIO)
> > > 
> > > I don't remember that conversation, so I don't know whether I agree with
> > > his reasoning or not.  But we are supposed to be moving away from GFP_NOIO
> > > towards marking regions with memalloc_noio_save() / restore.  If you do
> > > that, you won't need vmalloc(GFP_NOIO).
> > 
> > He said the same thing a year ago. And there was small progress. 6 out of 
> > 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in 
> > infiniband and 1 in btrfs. (the whole discussion is here 
> > http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html )
> 
> Well this is not that easy. It requires a cooperation from maintainers.
> I can only do as much. I've posted patches in the past and actively
> bringing up this topic at LSFMM last two years...

You're right - but you have chosen the uneasy path. Fixing __vmalloc code 
is easy and it doesn't require cooperation with maintainers.

> > He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 
> > years, the kernel will be refactored and GFP_NOIO will be eliminated. Why 
> > does he have veto over this part of the code? I'd much rather argue with 
> > people who have constructive comments about fixing bugs than with him.
> 
> I didn't NACK the patch AFAIR. I've said it is not a good idea longterm.
> I would be much more willing to change my mind if you would back your
> patch by a real bug report. Hacks are acceptable when we have a real
> issue in hands. But if we want to fix potential issue then better make
> it properly.

Developers should fix bugs in advance, not to wait until a crash hapens, 
is analyzed and reported.

What's the problem with 15-line hack? Is the problem that kernel 
developers would feel depressed when looking the source code? Other than 
harming developers' feelings, I don't see what kind of damange could that 
piece of code do.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-23 Thread Mikulas Patocka


On Mon, 23 Apr 2018, Michal Hocko wrote:

> On Mon 23-04-18 10:24:02, Mikulas Patocka wrote:
> 
> > > Really, we have a fault injection framework and this sounds like
> > > something to hook in there.
> > 
> > The testing people won't set it up. They install the "kernel-debug" 
> > package and run the tests in it.
> > 
> > If you introduce a hidden option that no one knows about, no one will use 
> > it.
> 
> then make sure people know about it. Fuzzers already do test fault
> injections.

I think that in the long term we can introduce a kernel parameter like 
"debug_level=1", "debug_level=2", "debug_level=3" that will turn on 
debugging features across all kernel subsystems and we can teach users to 
use it to diagnose problems.

But it won't work if every subsystem has different debug parameters. There 
are 192 distinct filenames in debugfs, if we add 193rd one, harly anyone 
notices it.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-23 Thread Michael S. Tsirkin
On Fri, Apr 20, 2018 at 08:20:23AM -0400, Mikulas Patocka wrote:
> 
> 
> On Fri, 20 Apr 2018, Matthew Wilcox wrote:
> 
> > On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote:
> > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> > > uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> > > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> > > code.
> > 
> > Maybe it's time to have the SG code handle vmalloced pages?  This is
> > becoming more and more common with vmapped stacks (and some of our
> > workarounds are hideous -- allocate 4 bytes with kmalloc because we can't
> > DMA onto the stack any more?).  We already have a few places which do
> > handle sgs of vmalloced addresses, such as the nx crypto driver:
> > 
> > if (is_vmalloc_addr(start_addr))
> > sg_addr = page_to_phys(vmalloc_to_page(start_addr))
> >   + offset_in_page(sg_addr);
> > else
> > sg_addr = __pa(sg_addr);
> > 
> > and videobuf:
> > 
> > pg = vmalloc_to_page(virt);
> > if (NULL == pg)
> > goto err;
> > BUG_ON(page_to_pfn(pg) >= (1 << (32 - PAGE_SHIFT)));
> > sg_set_page([i], pg, PAGE_SIZE, 0);
> > 
> > Yes, there's the potential that we have to produce two SG entries for a
> > virtually contiguous region if it crosses a page boundary, and our APIs
> > aren't set up right to make it happen.  But this is something we should
> > consider fixing ... otherwise we'll end up with dozens of driver hacks.
> > The videobuf implementation was already copy-and-pasted into the saa7146
> > driver, for example.
> 
> What if the device requires physically contiguous area and the vmalloc 
> area crosses a page? Will you use a bounce buffer? Where do you allocate 
> the bounce buffer from? What if you run out of bounce buffers?
> 
> Mikulkas

I agree with Matthew here.

4 byte variables are typically size aligned so won't cross a boundary.

That's enough for virtio at least. People using structs can force
alignment.

We could wrap access in a macro (sizeof(x) >= alignof(x)) to help
guarantee that.

-- 
MST


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-23 Thread Michal Hocko
On Mon 23-04-18 10:06:08, Mikulas Patocka wrote:
> 
> 
> On Sat, 21 Apr 2018, Matthew Wilcox wrote:
> 
> > On Fri, Apr 20, 2018 at 05:21:26PM -0400, Mikulas Patocka wrote:
> > > On Fri, 20 Apr 2018, Matthew Wilcox wrote:
> > > > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote:
> > > > > On Fri, 20 Apr 2018, Michal Hocko wrote:
> > > > > > No way. This is just wrong! First of all, you will explode most 
> > > > > > likely
> > > > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends 
> > > > > > to be
> > > > > > enabled quite often.
> > > > > 
> > > > > You're an evil person who doesn't want to fix bugs.
> > > > 
> > > > Steady on.  There's no need for that.  Michal isn't evil.  Please
> > > > apologise.
> > > 
> > > I see this attitude from Michal again and again.
> > 
> > Fine; then *say that*.  I also see Michal saying "No" a lot.  Sometimes
> > I agree with him, sometimes I don't.  I think he genuinely wants the best
> > code in the kernel, and saying "No" is part of it.
> > 
> > > He didn't want to fix vmalloc(GFP_NOIO)
> > 
> > I don't remember that conversation, so I don't know whether I agree with
> > his reasoning or not.  But we are supposed to be moving away from GFP_NOIO
> > towards marking regions with memalloc_noio_save() / restore.  If you do
> > that, you won't need vmalloc(GFP_NOIO).
> 
> He said the same thing a year ago. And there was small progress. 6 out of 
> 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in 
> infiniband and 1 in btrfs. (the whole discussion is here 
> http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html )

Well this is not that easy. It requires a cooperation from maintainers.
I can only do as much. I've posted patches in the past and actively
bringing up this topic at LSFMM last two years...

> He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 
> years, the kernel will be refactored and GFP_NOIO will be eliminated. Why 
> does he have veto over this part of the code? I'd much rather argue with 
> people who have constructive comments about fixing bugs than with him.

I didn't NACK the patch AFAIR. I've said it is not a good idea longterm.
I would be much more willing to change my mind if you would back your
patch by a real bug report. Hacks are acceptable when we have a real
issue in hands. But if we want to fix potential issue then better make
it properly.

[...]

> I sent the CONFIG_DEBUG_SG patch before (I wonder why he didn't repond to 
> it). I'll send a third version of the patch that actually randomly chooses 
> between kmalloc and vmalloc, because some abuses can only be detected with 
> kmalloc and we should test both.
> 
> For bisecting, it is better to always fallback to vmalloc, but for general 
> testing, it is better to test both branches.

Agreed!

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-23 Thread Michal Hocko
On Mon 23-04-18 10:24:02, Mikulas Patocka wrote:
[...]

I am not going to comment on your continuous accusations. We can discuss
patches but general rants make very limited sense.
 
> > > > I already said that we can change it from CONFIG_DEBUG_VM to 
> > > > CONFIG_DEBUG_SG - or to whatever other option you may want, just to 
> > > > make 
> > > > sure that it is enabled in distro debug kernels by default.
> > > 
> > > Yes, and I think that's the right idea.  So send a v2 and ignore the
> > > replies that are clearly relating to an earlier version of the patch.
> > > Not everybody reads every mail in the thread before responding to one they
> > > find interesting.  Yes, ideally, one would, but sometimes one doesn't.
> > 
> > And look here. This is yet another ad-hoc idea. We have many users of
> > kvmalloc which have no relation to SG, yet you are going to control
> > their behavior by CONFIG_DEBUG_SG? No way! (yeah evil again)
> 
> Why aren't you constructive and pick up pick up the CONFIG flag?

Because config doesn't make that much of a sense. You do not want a
permanent vmalloc fallback unconditionally. Debugging option which
changes the behavior completely is not useful IMNHO. Besides that who is
going to enable it?

> > Really, we have a fault injection framework and this sounds like
> > something to hook in there.
> 
> The testing people won't set it up. They install the "kernel-debug" 
> package and run the tests in it.
> 
> If you introduce a hidden option that no one knows about, no one will use 
> it.

then make sure people know about it. Fuzzers already do test fault
injections.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-23 Thread Mikulas Patocka


On Sun, 22 Apr 2018, Michal Hocko wrote:

> On Sat 21-04-18 07:47:57, Matthew Wilcox wrote:
> 
> > > He didn't want to fix vmalloc(GFP_NOIO)
> > 
> > I don't remember that conversation, so I don't know whether I agree with
> > his reasoning or not.  But we are supposed to be moving away from GFP_NOIO
> > towards marking regions with memalloc_noio_save() / restore.  If you do
> > that, you won't need vmalloc(GFP_NOIO).
> 
> It was basically to detect GFP_NOIO context _inside_ vmalloc and use the
> scope API to enforce it there. Does it solve potential problems? Yes it
> does. Does it solve any existing report, no I am not aware of any. Is
> it a good fix longterm? Absolutely no, because the scope API should be
> used _at the place_ where the scope starts rather than a random utility
> function. If we are going the easier way now, we will never teach users
> to use the API properly. And I am willing to risk to keep a broken
> code which we have for years rather than allow a random hack that will
> seemingly fix it.
> 
> Btw. I was pretty much explicit with this reasoning when rejecting the
> patch. Do you still call that evil?

You are making nonsensical excuses.

That patch doesn't prevent you from refactoring the kernel and eliminating 
GFP_NOIO in the long term. You can apply the patch and then continue 
working on GFP_NOIO refactoring as well as before.

> > > he didn't want to fix alloc_pages sleeping when __GFP_NORETRY is used.
> > 
> > The GFP flags are a mess, still.
> 
> I do not remember that one but __GFP_NORETRY is _allowed_ to sleep. And
> yes I do _agree_ gfp flags are a mess which is really hard to get fixed
> because they are lacking a good design from the very beginning. Fixing
> some of those issues today is a completely PITA.

It may sleep, but if it sleeps regularly, it slows down swapping (because 
the swapping code does mempool_alloc and mempool_alloc does __GFP_NORETRY 
allocation). And there were two INTENTIONAL sleeps with schedule_timeout. 
You removed one and left the other, claiming that __GFP_NORETRY allocation 
should sleep.

> > > I already said that we can change it from CONFIG_DEBUG_VM to 
> > > CONFIG_DEBUG_SG - or to whatever other option you may want, just to make 
> > > sure that it is enabled in distro debug kernels by default.
> > 
> > Yes, and I think that's the right idea.  So send a v2 and ignore the
> > replies that are clearly relating to an earlier version of the patch.
> > Not everybody reads every mail in the thread before responding to one they
> > find interesting.  Yes, ideally, one would, but sometimes one doesn't.
> 
> And look here. This is yet another ad-hoc idea. We have many users of
> kvmalloc which have no relation to SG, yet you are going to control
> their behavior by CONFIG_DEBUG_SG? No way! (yeah evil again)

Why aren't you constructive and pick up pick up the CONFIG flag?

> Really, we have a fault injection framework and this sounds like
> something to hook in there.

The testing people won't set it up. They install the "kernel-debug" 
package and run the tests in it.

If you introduce a hidden option that no one knows about, no one will use 
it.

> -- 
> Michal Hocko
> SUSE Labs

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-23 Thread Mikulas Patocka


On Sat, 21 Apr 2018, Matthew Wilcox wrote:

> On Fri, Apr 20, 2018 at 05:21:26PM -0400, Mikulas Patocka wrote:
> > On Fri, 20 Apr 2018, Matthew Wilcox wrote:
> > > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote:
> > > > On Fri, 20 Apr 2018, Michal Hocko wrote:
> > > > > No way. This is just wrong! First of all, you will explode most likely
> > > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to 
> > > > > be
> > > > > enabled quite often.
> > > > 
> > > > You're an evil person who doesn't want to fix bugs.
> > > 
> > > Steady on.  There's no need for that.  Michal isn't evil.  Please
> > > apologise.
> > 
> > I see this attitude from Michal again and again.
> 
> Fine; then *say that*.  I also see Michal saying "No" a lot.  Sometimes
> I agree with him, sometimes I don't.  I think he genuinely wants the best
> code in the kernel, and saying "No" is part of it.
> 
> > He didn't want to fix vmalloc(GFP_NOIO)
> 
> I don't remember that conversation, so I don't know whether I agree with
> his reasoning or not.  But we are supposed to be moving away from GFP_NOIO
> towards marking regions with memalloc_noio_save() / restore.  If you do
> that, you won't need vmalloc(GFP_NOIO).

He said the same thing a year ago. And there was small progress. 6 out of 
27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in 
infiniband and 1 in btrfs. (the whole discussion is here 
http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html )

He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 
years, the kernel will be refactored and GFP_NOIO will be eliminated. Why 
does he have veto over this part of the code? I'd much rather argue with 
people who have constructive comments about fixing bugs than with him.

(note that even if the refactoring eventually succeeds, it will not be 
backported to stable branches. The small vmalloc patch could be 
backported)

> > he didn't want to fix alloc_pages sleeping when __GFP_NORETRY is used.
> 
> The GFP flags are a mess, still.

That's the problem - the flag doesn't have a clear contract and the 
developers change behavior ad hoc according to bug reports.

> > So what should I say? Fix them and you won't be evil :-)
>
> No, you should reserve calling somebody evil for truly evil things.

How would you call it? Michal falsely believes that a 15-line patch would 
prevent him from doing long-term refactoring work and so he refuses it.

> > I already said that we can change it from CONFIG_DEBUG_VM to 
> > CONFIG_DEBUG_SG - or to whatever other option you may want, just to make 
> > sure that it is enabled in distro debug kernels by default.
> 
> Yes, and I think that's the right idea.  So send a v2 and ignore the
> replies that are clearly relating to an earlier version of the patch.
> Not everybody reads every mail in the thread before responding to one they
> find interesting.  Yes, ideally, one would, but sometimes one doesn't.

I sent the CONFIG_DEBUG_SG patch before (I wonder why he didn't repond to 
it). I'll send a third version of the patch that actually randomly chooses 
between kmalloc and vmalloc, because some abuses can only be detected with 
kmalloc and we should test both.

For bisecting, it is better to always fallback to vmalloc, but for general 
testing, it is better to test both branches.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-22 Thread Michal Hocko
On Sat 21-04-18 07:47:57, Matthew Wilcox wrote:
> On Fri, Apr 20, 2018 at 05:21:26PM -0400, Mikulas Patocka wrote:
> > On Fri, 20 Apr 2018, Matthew Wilcox wrote:
> > > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote:
> > > > On Fri, 20 Apr 2018, Michal Hocko wrote:
> > > > > No way. This is just wrong! First of all, you will explode most likely
> > > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to 
> > > > > be
> > > > > enabled quite often.
> > > > 
> > > > You're an evil person who doesn't want to fix bugs.
> > > 
> > > Steady on.  There's no need for that.  Michal isn't evil.  Please
> > > apologise.
> > 
> > I see this attitude from Michal again and again.
> 
> Fine; then *say that*.  I also see Michal saying "No" a lot.  Sometimes
> I agree with him, sometimes I don't.  I think he genuinely wants the best
> code in the kernel, and saying "No" is part of it.

Exactly! We have a lot of legacy herritage and random semantic because
we were too eager to merge stuff. I think it is time to stop that and
think twice before merging someothing. If you call that evil then I am
fine to be evil.

> > He didn't want to fix vmalloc(GFP_NOIO)
> 
> I don't remember that conversation, so I don't know whether I agree with
> his reasoning or not.  But we are supposed to be moving away from GFP_NOIO
> towards marking regions with memalloc_noio_save() / restore.  If you do
> that, you won't need vmalloc(GFP_NOIO).

It was basically to detect GFP_NOIO context _inside_ vmalloc and use the
scope API to enforce it there. Does it solve potential problems? Yes it
does. Does it solve any existing report, no I am not aware of any. Is
it a good fix longterm? Absolutely no, because the scope API should be
used _at the place_ where the scope starts rather than a random utility
function. If we are going the easier way now, we will never teach users
to use the API properly. And I am willing to risk to keep a broken
code which we have for years rather than allow a random hack that will
seemingly fix it.

Btw. I was pretty much explicit with this reasoning when rejecting the
patch. Do you still call that evil?

> > he didn't want to fix alloc_pages sleeping when __GFP_NORETRY is used.
> 
> The GFP flags are a mess, still.

I do not remember that one but __GFP_NORETRY is _allowed_ to sleep. And
yes I do _agree_ gfp flags are a mess which is really hard to get fixed
because they are lacking a good design from the very beginning. Fixing
some of those issues today is a completely PITA.

> > So what should I say? Fix them and 
> > you won't be evil :-)
> 
> No, you should reserve calling somebody evil for truly evil things.
> 
> > (he could also fix the oom killer, so that it is triggered when 
> > free_memory+cache+free_swap goes beyond a threshold and not when you loop 
> > too long in the allocator)
> 
> ... that also doesn't make somebody evil.

And again, it is way much more easier to claim that something will get
fixed when the reality is much more complicated. I've tried to explain
those risks as well.

> > I already said that we can change it from CONFIG_DEBUG_VM to 
> > CONFIG_DEBUG_SG - or to whatever other option you may want, just to make 
> > sure that it is enabled in distro debug kernels by default.
> 
> Yes, and I think that's the right idea.  So send a v2 and ignore the
> replies that are clearly relating to an earlier version of the patch.
> Not everybody reads every mail in the thread before responding to one they
> find interesting.  Yes, ideally, one would, but sometimes one doesn't.

And look here. This is yet another ad-hoc idea. We have many users of
kvmalloc which have no relation to SG, yet you are going to control
their behavior by CONFIG_DEBUG_SG? No way! (yeah evil again)

Really, we have a fault injection framework and this sounds like
something to hook in there.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-21 Thread Matthew Wilcox
On Fri, Apr 20, 2018 at 05:21:26PM -0400, Mikulas Patocka wrote:
> On Fri, 20 Apr 2018, Matthew Wilcox wrote:
> > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote:
> > > On Fri, 20 Apr 2018, Michal Hocko wrote:
> > > > No way. This is just wrong! First of all, you will explode most likely
> > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
> > > > enabled quite often.
> > > 
> > > You're an evil person who doesn't want to fix bugs.
> > 
> > Steady on.  There's no need for that.  Michal isn't evil.  Please
> > apologise.
> 
> I see this attitude from Michal again and again.

Fine; then *say that*.  I also see Michal saying "No" a lot.  Sometimes
I agree with him, sometimes I don't.  I think he genuinely wants the best
code in the kernel, and saying "No" is part of it.

> He didn't want to fix vmalloc(GFP_NOIO)

I don't remember that conversation, so I don't know whether I agree with
his reasoning or not.  But we are supposed to be moving away from GFP_NOIO
towards marking regions with memalloc_noio_save() / restore.  If you do
that, you won't need vmalloc(GFP_NOIO).

> he didn't want to fix alloc_pages sleeping when __GFP_NORETRY is used.

The GFP flags are a mess, still.

> So what should I say? Fix them and 
> you won't be evil :-)

No, you should reserve calling somebody evil for truly evil things.

> (he could also fix the oom killer, so that it is triggered when 
> free_memory+cache+free_swap goes beyond a threshold and not when you loop 
> too long in the allocator)

... that also doesn't make somebody evil.

> I already said that we can change it from CONFIG_DEBUG_VM to 
> CONFIG_DEBUG_SG - or to whatever other option you may want, just to make 
> sure that it is enabled in distro debug kernels by default.

Yes, and I think that's the right idea.  So send a v2 and ignore the
replies that are clearly relating to an earlier version of the patch.
Not everybody reads every mail in the thread before responding to one they
find interesting.  Yes, ideally, one would, but sometimes one doesn't.



Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Mikulas Patocka


On Fri, 20 Apr 2018, Matthew Wilcox wrote:

> On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote:
> > On Fri, 20 Apr 2018, Michal Hocko wrote:
> > > No way. This is just wrong! First of all, you will explode most likely
> > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
> > > enabled quite often.
> > 
> > You're an evil person who doesn't want to fix bugs.
> 
> Steady on.  There's no need for that.  Michal isn't evil.  Please
> apologise.

I see this attitude from Michal again and again.

He didn't want to fix vmalloc(GFP_NOIO), he didn't want to fix alloc_pages 
sleeping when __GFP_NORETRY is used. So what should I say? Fix them and 
you won't be evil :-)

(he could also fix the oom killer, so that it is triggered when 
free_memory+cache+free_swap goes beyond a threshold and not when you loop 
too long in the allocator)

> > You refused to fix vmalloc(GFP_NOIO) misbehavior a year ago (did you make 
> > some progress with it since that time?) and you refuse to fix kvmalloc 
> > misuses.
> 
> I understand you're frustrated, but this is not the way to get the problems
> fixed.
> 
> > I tried this patch on text-only virtual machine and /proc/vmallocinfo 
> > shows 614kB more memory. I tried it on a desktop machine with the chrome 
> > browser open and /proc/vmallocinfo space is increased by 7MB. So no - this 
> > won't exhaust memory and kill the machine.
> 
> This is good data, thank you for providing it.
> 
> > Arguing that this increases memory consumption is as bogus as arguing that 
> > CONFIG_LOCKDEP increses memory consumption. No one is forcing you to 
> > enable CONFIG_LOCKDEP and no one is forcing you to enable this kvmalloc 
> > test too.
> 
> I think there's a real problem which is that CONFIG_DEBUG_VM is too broad.
> It inserts code in a *lot* of places, some of which is quite expensive.
> We would do better to split it into more granular pieces ... although
> an explosion of configuration options isn't great either.  Maybe just
> CONFIG_DEBUG_VM and CONFIG_DEBUG_VM_EXPENSIVE.
> 
> Michal may be wrong, but he's not evil.

I already said that we can change it from CONFIG_DEBUG_VM to 
CONFIG_DEBUG_SG - or to whatever other option you may want, just to make 
sure that it is enabled in distro debug kernels by default.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Matthew Wilcox
On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote:
> On Fri, 20 Apr 2018, Michal Hocko wrote:
> > No way. This is just wrong! First of all, you will explode most likely
> > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
> > enabled quite often.
> 
> You're an evil person who doesn't want to fix bugs.

Steady on.  There's no need for that.  Michal isn't evil.  Please
apologise.

> You refused to fix vmalloc(GFP_NOIO) misbehavior a year ago (did you make 
> some progress with it since that time?) and you refuse to fix kvmalloc 
> misuses.

I understand you're frustrated, but this is not the way to get the problems
fixed.

> I tried this patch on text-only virtual machine and /proc/vmallocinfo 
> shows 614kB more memory. I tried it on a desktop machine with the chrome 
> browser open and /proc/vmallocinfo space is increased by 7MB. So no - this 
> won't exhaust memory and kill the machine.

This is good data, thank you for providing it.

> Arguing that this increases memory consumption is as bogus as arguing that 
> CONFIG_LOCKDEP increses memory consumption. No one is forcing you to 
> enable CONFIG_LOCKDEP and no one is forcing you to enable this kvmalloc 
> test too.

I think there's a real problem which is that CONFIG_DEBUG_VM is too broad.
It inserts code in a *lot* of places, some of which is quite expensive.
We would do better to split it into more granular pieces ... although
an explosion of configuration options isn't great either.  Maybe just
CONFIG_DEBUG_VM and CONFIG_DEBUG_VM_EXPENSIVE.

Michal may be wrong, but he's not evil.


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Mikulas Patocka


On Fri, 20 Apr 2018, Michal Hocko wrote:

> On Fri 20-04-18 06:41:36, Matthew Wilcox wrote:
> > On Fri, Apr 20, 2018 at 03:08:52PM +0200, Michal Hocko wrote:
> > > > In order to detect these bugs reliably I submit this patch that changes
> > > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> > > 
> > > No way. This is just wrong! First of all, you will explode most likely
> > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
> > > enabled quite often.
> > 
> > I think it'll still suit Mikulas' debugging needs if we always use
> > vmalloc for sizes above PAGE_SIZE?
> 
> Even if that was the case then this doesn't sounds like CONFIG_DEBUG_VM
> material. We do not want a completely different behavior when the config
> 
> -- 
> Michal Hocko
> SUSE Labs

I'm not arguing that it must be turned on exactly by CONFIG_DEBUG_VM. It 
may be turned on some other option that is enabled in debug kernels 
(CONFIG_DEBUG_SG may be a better option, because you'll get meaningful 
stracktraces from DMA API then).

> is enabled. If we really need some better fallback testing coverage
> then the fault injection, as suggested by Vlastimil, sounds much more
> reasonable to me

People who test kernels will install the kernel-debug package, reboot to 
the debug kernel and run their testsuites. They won't turn on magic 
options in debugfs or use some hideous kernel commandline arguments. If 
the kvmalloc test isn't in the debug kernel, then the testing crew won't 
test it - that's it.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Mikulas Patocka


On Fri, 20 Apr 2018, Michal Hocko wrote:

> On Thu 19-04-18 12:12:38, Mikulas Patocka wrote:
> [...]
> > From: Mikulas Patocka 
> > Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
> > 
> > The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> > kmalloc fails.
> > 
> > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> > uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> > code.
> > 
> > These bugs are hard to reproduce because vmalloc falls back to kmalloc
> > only if memory is fragmented.
> > 
> > In order to detect these bugs reliably I submit this patch that changes
> > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> 
> No way. This is just wrong! First of all, you will explode most likely
> on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
> enabled quite often.

You're an evil person who doesn't want to fix bugs.

You refused to fix vmalloc(GFP_NOIO) misbehavior a year ago (did you make 
some progress with it since that time?) and you refuse to fix kvmalloc 
misuses.

I tried this patch on text-only virtual machine and /proc/vmallocinfo 
shows 614kB more memory. I tried it on a desktop machine with the chrome 
browser open and /proc/vmallocinfo space is increased by 7MB. So no - this 
won't exhaust memory and kill the machine.

Arguing that this increases memory consumption is as bogus as arguing that 
CONFIG_LOCKDEP increses memory consumption. No one is forcing you to 
enable CONFIG_LOCKDEP and no one is forcing you to enable this kvmalloc 
test too.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Michal Hocko
On Fri 20-04-18 06:41:36, Matthew Wilcox wrote:
> On Fri, Apr 20, 2018 at 03:08:52PM +0200, Michal Hocko wrote:
> > > In order to detect these bugs reliably I submit this patch that changes
> > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> > 
> > No way. This is just wrong! First of all, you will explode most likely
> > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
> > enabled quite often.
> 
> I think it'll still suit Mikulas' debugging needs if we always use
> vmalloc for sizes above PAGE_SIZE?

Even if that was the case then this doesn't sounds like CONFIG_DEBUG_VM
material. We do not want a completely different behavior when the config
is enabled. If we really need some better fallback testing coverage
then the fault injection, as suggested by Vlastimil, sounds much more
reasonable to me

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Matthew Wilcox
On Fri, Apr 20, 2018 at 03:08:52PM +0200, Michal Hocko wrote:
> > In order to detect these bugs reliably I submit this patch that changes
> > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> 
> No way. This is just wrong! First of all, you will explode most likely
> on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
> enabled quite often.

I think it'll still suit Mikulas' debugging needs if we always use
vmalloc for sizes above PAGE_SIZE?



Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Michal Hocko
On Thu 19-04-18 12:12:38, Mikulas Patocka wrote:
[...]
> From: Mikulas Patocka 
> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
> 
> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
> 
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
> 
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
> 
> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.

No way. This is just wrong! First of all, you will explode most likely
on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be
enabled quite often.

> Signed-off-by: Mikulas Patocka 

Nacked-by: Michal Hocko 

> ---
>  mm/util.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6/mm/util.c
> ===
> --- linux-2.6.orig/mm/util.c  2018-04-18 15:46:23.0 +0200
> +++ linux-2.6/mm/util.c   2018-04-18 16:00:43.0 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> +#ifndef CONFIG_DEBUG_VM
>   gfp_t kmalloc_flags = flags;
>   void *ret;
>  
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
>*/
>   if (ret || size <= PAGE_SIZE)
>   return ret;
> +#endif
>  
>   return __vmalloc_node_flags_caller(size, node, flags,
>   __builtin_return_address(0));

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Mikulas Patocka


On Fri, 20 Apr 2018, Matthew Wilcox wrote:

> On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote:
> > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> > uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> > code.
> 
> Maybe it's time to have the SG code handle vmalloced pages?  This is
> becoming more and more common with vmapped stacks (and some of our
> workarounds are hideous -- allocate 4 bytes with kmalloc because we can't
> DMA onto the stack any more?).  We already have a few places which do
> handle sgs of vmalloced addresses, such as the nx crypto driver:
> 
> if (is_vmalloc_addr(start_addr))
> sg_addr = page_to_phys(vmalloc_to_page(start_addr))
>   + offset_in_page(sg_addr);
> else
> sg_addr = __pa(sg_addr);
> 
> and videobuf:
> 
> pg = vmalloc_to_page(virt);
> if (NULL == pg)
> goto err;
> BUG_ON(page_to_pfn(pg) >= (1 << (32 - PAGE_SHIFT)));
> sg_set_page([i], pg, PAGE_SIZE, 0);
> 
> Yes, there's the potential that we have to produce two SG entries for a
> virtually contiguous region if it crosses a page boundary, and our APIs
> aren't set up right to make it happen.  But this is something we should
> consider fixing ... otherwise we'll end up with dozens of driver hacks.
> The videobuf implementation was already copy-and-pasted into the saa7146
> driver, for example.

What if the device requires physically contiguous area and the vmalloc 
area crosses a page? Will you use a bounce buffer? Where do you allocate 
the bounce buffer from? What if you run out of bounce buffers?

Mikulkas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Mikulas Patocka


On Thu, 19 Apr 2018, Andrew Morton wrote:

> On Thu, 19 Apr 2018 17:19:20 -0400 (EDT) Mikulas Patocka 
>  wrote:
> 
> > > > In order to detect these bugs reliably I submit this patch that changes
> > > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> > > > 
> > > > ...
> > > >
> > > > --- linux-2.6.orig/mm/util.c2018-04-18 15:46:23.0 +0200
> > > > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200
> > > > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
> > > >   */
> > > >  void *kvmalloc_node(size_t size, gfp_t flags, int node)
> > > >  {
> > > > +#ifndef CONFIG_DEBUG_VM
> > > > gfp_t kmalloc_flags = flags;
> > > > void *ret;
> > > >  
> > > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
> > > >  */
> > > > if (ret || size <= PAGE_SIZE)
> > > > return ret;
> > > > +#endif
> > > >  
> > > > return __vmalloc_node_flags_caller(size, node, flags,
> > > > __builtin_return_address(0));
> > > 
> > > Well, it doesn't have to be done at compile-time, does it?  We could
> > > add a knob (in debugfs, presumably) which enables this at runtime. 
> > > That's far more user-friendly.
> > 
> > But who will turn it on in debugfs?
> 
> But who will turn it on in Kconfig?  Just a handful of developers.  We

So, it won't receive much testing.

I've never played with those debugfs files (because I didn't need it), and 
most users also won't play with it. Having a debugfs option is like having 
no option at all.

> could add SONFIG_DEBUG_SG to the list in
> Documentation/process/submit-checklist.rst, but nobody reads that.
> 
> Also, a whole bunch of defconfigs set CONFIG_DEBUG_SG=y and some
> googling indicates that they aren't the only ones...
> 
> > It should be default for debugging 
> > kernels, so that users using them would report the error.
> 
> Well.  This isn't the first time we've wanted to enable expensive (or
> noisy) debugging things in debug kernels, by any means.
> 
> So how could we define a debug kernel in which it's OK to enable such
> things?

Debug kernel is what distributions distribute as debug kernel - i.e. RHEL 
or Fedora have debugging kernels. So it needs to be bound to an option 
that is turned on in these kernels - so that any user who boots the 
debugging kernel triggers the bug.

> - Could be "it's an -rc kernel".  But then we'd be enabling a bunch of
>   untested code when Linus cuts a release.
> 
> - Could be "it's an -rc kernel with SUBLEVEL <= 5".  But then we risk
>   unexpected things happening when Linux cuts -rc6, which still isn't
>   good.
> 
> - How about "it's an -rc kernel with odd-numbered SUBLEVEL and
>   SUBLEVEL <= 5".  That way everybody who runs -rc1, -rc3 and -rc5 will
>   have kvmalloc debugging enabled.  That's potentially nasty because
>   vmalloc is much slower than kmalloc.  But kvmalloc() is only used for
>   large and probably infrequent allocations, so it's probably OK.
> 
> I wonder how we get at SUBLEVEL from within .c.  

Don't bind it to rc level, bind it to some debugging configuration option.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-20 Thread Matthew Wilcox
On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote:
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.

Maybe it's time to have the SG code handle vmalloced pages?  This is
becoming more and more common with vmapped stacks (and some of our
workarounds are hideous -- allocate 4 bytes with kmalloc because we can't
DMA onto the stack any more?).  We already have a few places which do
handle sgs of vmalloced addresses, such as the nx crypto driver:

if (is_vmalloc_addr(start_addr))
sg_addr = page_to_phys(vmalloc_to_page(start_addr))
  + offset_in_page(sg_addr);
else
sg_addr = __pa(sg_addr);

and videobuf:

pg = vmalloc_to_page(virt);
if (NULL == pg)
goto err;
BUG_ON(page_to_pfn(pg) >= (1 << (32 - PAGE_SHIFT)));
sg_set_page([i], pg, PAGE_SIZE, 0);

Yes, there's the potential that we have to produce two SG entries for a
virtually contiguous region if it crosses a page boundary, and our APIs
aren't set up right to make it happen.  But this is something we should
consider fixing ... otherwise we'll end up with dozens of driver hacks.
The videobuf implementation was already copy-and-pasted into the saa7146
driver, for example.



Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-19 Thread Andrew Morton
On Thu, 19 Apr 2018 17:19:20 -0400 (EDT) Mikulas Patocka  
wrote:

> > > In order to detect these bugs reliably I submit this patch that changes
> > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> > > 
> > > ...
> > >
> > > --- linux-2.6.orig/mm/util.c  2018-04-18 15:46:23.0 +0200
> > > +++ linux-2.6/mm/util.c   2018-04-18 16:00:43.0 +0200
> > > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
> > >   */
> > >  void *kvmalloc_node(size_t size, gfp_t flags, int node)
> > >  {
> > > +#ifndef CONFIG_DEBUG_VM
> > >   gfp_t kmalloc_flags = flags;
> > >   void *ret;
> > >  
> > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
> > >*/
> > >   if (ret || size <= PAGE_SIZE)
> > >   return ret;
> > > +#endif
> > >  
> > >   return __vmalloc_node_flags_caller(size, node, flags,
> > >   __builtin_return_address(0));
> > 
> > Well, it doesn't have to be done at compile-time, does it?  We could
> > add a knob (in debugfs, presumably) which enables this at runtime. 
> > That's far more user-friendly.
> 
> But who will turn it on in debugfs?

But who will turn it on in Kconfig?  Just a handful of developers.  We
could add SONFIG_DEBUG_SG to the list in
Documentation/process/submit-checklist.rst, but nobody reads that.

Also, a whole bunch of defconfigs set CONFIG_DEBUG_SG=y and some
googling indicates that they aren't the only ones...

> It should be default for debugging 
> kernels, so that users using them would report the error.

Well.  This isn't the first time we've wanted to enable expensive (or
noisy) debugging things in debug kernels, by any means.

So how could we define a debug kernel in which it's OK to enable such
things?

- Could be "it's an -rc kernel".  But then we'd be enabling a bunch of
  untested code when Linus cuts a release.

- Could be "it's an -rc kernel with SUBLEVEL <= 5".  But then we risk
  unexpected things happening when Linux cuts -rc6, which still isn't
  good.

- How about "it's an -rc kernel with odd-numbered SUBLEVEL and
  SUBLEVEL <= 5".  That way everybody who runs -rc1, -rc3 and -rc5 will
  have kvmalloc debugging enabled.  That's potentially nasty because
  vmalloc is much slower than kmalloc.  But kvmalloc() is only used for
  large and probably infrequent allocations, so it's probably OK.

I wonder how we get at SUBLEVEL from within .c.  


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-19 Thread Mikulas Patocka


On Thu, 19 Apr 2018, Andrew Morton wrote:

> On Thu, 19 Apr 2018 12:12:38 -0400 (EDT) Mikulas Patocka 
>  wrote:
> 
> > The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> > kmalloc fails.
> > 
> > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> > uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> > code.
> > 
> > These bugs are hard to reproduce because vmalloc falls back to kmalloc
> > only if memory is fragmented.
> 
> Yes, that's nasty.
> 
> > In order to detect these bugs reliably I submit this patch that changes
> > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> > 
> > ...
> >
> > --- linux-2.6.orig/mm/util.c2018-04-18 15:46:23.0 +0200
> > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200
> > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
> >   */
> >  void *kvmalloc_node(size_t size, gfp_t flags, int node)
> >  {
> > +#ifndef CONFIG_DEBUG_VM
> > gfp_t kmalloc_flags = flags;
> > void *ret;
> >  
> > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
> >  */
> > if (ret || size <= PAGE_SIZE)
> > return ret;
> > +#endif
> >  
> > return __vmalloc_node_flags_caller(size, node, flags,
> > __builtin_return_address(0));
> 
> Well, it doesn't have to be done at compile-time, does it?  We could
> add a knob (in debugfs, presumably) which enables this at runtime. 
> That's far more user-friendly.

But who will turn it on in debugfs? It should be default for debugging 
kernels, so that users using them would report the error.

Conditioning it on CONFIG_DEBUG_SG is better than CONFIG_DEBUG_VM, it will 
print a stacktrace where the incorrect use happened.

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-19 Thread Andrew Morton
On Thu, 19 Apr 2018 12:12:38 -0400 (EDT) Mikulas Patocka  
wrote:

> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
> 
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
> 
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.

Yes, that's nasty.

> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> 
> ...
>
> --- linux-2.6.orig/mm/util.c  2018-04-18 15:46:23.0 +0200
> +++ linux-2.6/mm/util.c   2018-04-18 16:00:43.0 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> +#ifndef CONFIG_DEBUG_VM
>   gfp_t kmalloc_flags = flags;
>   void *ret;
>  
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
>*/
>   if (ret || size <= PAGE_SIZE)
>   return ret;
> +#endif
>  
>   return __vmalloc_node_flags_caller(size, node, flags,
>   __builtin_return_address(0));

Well, it doesn't have to be done at compile-time, does it?  We could
add a knob (in debugfs, presumably) which enables this at runtime. 
That's far more user-friendly.


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-19 Thread Vlastimil Babka
On 04/19/2018 06:12 PM, Mikulas Patocka wrote:
> From: Mikulas Patocka 
> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
> 
> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
> 
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
> 
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
> 
> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> 
> Signed-off-by: Mikulas Patocka 

Hmm AFAIK Fedora uses CONFIG_DEBUG_VM in their kernels. Sure you want to
impose this on all users? Seems too much for DEBUG_VM to me. Maybe it
should be hidden under some error injection config?

> ---
>  mm/util.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6/mm/util.c
> ===
> --- linux-2.6.orig/mm/util.c  2018-04-18 15:46:23.0 +0200
> +++ linux-2.6/mm/util.c   2018-04-18 16:00:43.0 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> +#ifndef CONFIG_DEBUG_VM
>   gfp_t kmalloc_flags = flags;
>   void *ret;
>  
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
>*/
>   if (ret || size <= PAGE_SIZE)
>   return ret;
> +#endif

Did you verify that vmalloc does the right thing for sub-page sizes?
Shouldn't those be exempted?

>   return __vmalloc_node_flags_caller(size, node, flags,
>   __builtin_return_address(0));
> 



Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-19 Thread Michael S. Tsirkin
On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote:
> 
> 
> On Wed, 18 Apr 2018, Mikulas Patocka wrote:
> 
> > 
> > 
> > On Wed, 18 Apr 2018, David Miller wrote:
> > 
> > > From: Mikulas Patocka 
> > > Date: Wed, 18 Apr 2018 12:44:25 -0400 (EDT)
> > > 
> > > > The structure net_device is followed by arbitrary driver-specific data 
> > > > (accessible with the function netdev_priv). And for virtio-net, these 
> > > > driver-specific data must be in DMA memory.
> > > 
> > > And we are saying that this assumption is wrong and needs to be
> > > corrected.
> > 
> > So, try to find all the networking drivers that to DMA to the private 
> > area.
> > 
> > The problem here is that kvzalloc usually returns DMA-able area, but it 
> > may return non-DMA area rarely, if the memory is too fragmented. So, we 
> > are in a situation, where some networking drivers will randomly fail. Go 
> > and find them.
> > 
> > Mikulas
> 
> Her I submit a patch that makes kvmalloc always use vmalloc if 
> CONFIG_DEBUG_VM is defined.
> 
> 
> 
> 
> From: Mikulas Patocka 
> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
> 
> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
> 
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
> 
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
> 
> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> 
> Signed-off-by: Mikulas Patocka 

Maybe make it conditional on CONFIG_DEBUG_SG too?
Otherwise I think you just trigger a hard to debug memory corruption.


> ---
>  mm/util.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6/mm/util.c
> ===
> --- linux-2.6.orig/mm/util.c  2018-04-18 15:46:23.0 +0200
> +++ linux-2.6/mm/util.c   2018-04-18 16:00:43.0 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> +#ifndef CONFIG_DEBUG_VM
>   gfp_t kmalloc_flags = flags;
>   void *ret;
>  
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
>*/
>   if (ret || size <= PAGE_SIZE)
>   return ret;
> +#endif
>  
>   return __vmalloc_node_flags_caller(size, node, flags,
>   __builtin_return_address(0));


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-19 Thread Mikulas Patocka


On Thu, 19 Apr 2018, Eric Dumazet wrote:

> 
> 
> On 04/19/2018 09:12 AM, Mikulas Patocka wrote:
> > 
> > 
> > These bugs are hard to reproduce because vmalloc falls back to kmalloc
> > only if memory is fragmented.
> > 
> 
> This sentence is wrong.
> 
>  because kvmalloc() falls back to vmalloc() ...

Yes. There should be "falls back to vmalloc()".

Mikulas


Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-19 Thread Eric Dumazet


On 04/19/2018 09:12 AM, Mikulas Patocka wrote:
> 
> 
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
> 

This sentence is wrong.

 because kvmalloc() falls back to vmalloc() ...