Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Tue, 1 May 2018, Andrew Morton wrote: > On Tue, 24 Apr 2018 12:33:01 -0400 (EDT) Mikulas Patocka >wrote: > > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > On Tue 24-04-18 11:30:40, Mikulas Patocka wrote: > > > > > > > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > > > > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote: > > > > > > > > > > > Fixing __vmalloc code > > > > > > is easy and it doesn't require cooperation with maintainers. > > > > > > > > > > But it is a hack against the intention of the scope api. > > > > > > > > It is not! > > > > > > This discussion simply doesn't make much sense it seems. The scope API > > > is to document the scope of the reclaim recursion critical section. That > > > certainly is not a utility function like vmalloc. > > > > That 15-line __vmalloc bugfix doesn't prevent you (or any other kernel > > developer) from converting the code to the scope API. You make nonsensical > > excuses. > > > > Fun thread! > > Winding back to the original problem, I'd state it as > > - Caller uses kvmalloc() but passes the address into vmalloc-naive > DMA API and > > - Caller uses kvmalloc() but passes the address into kfree() > > Yes? > > If so, then... > > Is there a way in which, in the kvmalloc-called-kmalloc path, we can > tag the slab-allocated memory with a "this memory was allocated with > kvmalloc()" flag? I *think* there's extra per-object storage available > with suitable slab/slub debugging options? Perhaps we could steal one > bit from the redzone, dunno. > > If so then we can > > a) set that flag in kvmalloc() if the kmalloc() call succeeded > > b) check for that flag in the DMA code, WARN if it is set. > > c) in kvfree(), clear that flag before calling kfree() > > d) in kfree(), check for that flag and go WARN() if set. > > So both potential bugs are detected all the time, dependent upon > CONFIG_SLUB_DEBUG (and perhaps other slub config options). Yes, it would be good. You also need to check it in virt_to_phys(), virt_to_pfn(), __pa() and maybe some others. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Tue, 24 Apr 2018 12:33:01 -0400 (EDT) Mikulas Patockawrote: > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > On Tue 24-04-18 11:30:40, Mikulas Patocka wrote: > > > > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote: > > > > > > > > > Fixing __vmalloc code > > > > > is easy and it doesn't require cooperation with maintainers. > > > > > > > > But it is a hack against the intention of the scope api. > > > > > > It is not! > > > > This discussion simply doesn't make much sense it seems. The scope API > > is to document the scope of the reclaim recursion critical section. That > > certainly is not a utility function like vmalloc. > > That 15-line __vmalloc bugfix doesn't prevent you (or any other kernel > developer) from converting the code to the scope API. You make nonsensical > excuses. > Fun thread! Winding back to the original problem, I'd state it as - Caller uses kvmalloc() but passes the address into vmalloc-naive DMA API and - Caller uses kvmalloc() but passes the address into kfree() Yes? If so, then... Is there a way in which, in the kvmalloc-called-kmalloc path, we can tag the slab-allocated memory with a "this memory was allocated with kvmalloc()" flag? I *think* there's extra per-object storage available with suitable slab/slub debugging options? Perhaps we could steal one bit from the redzone, dunno. If so then we can a) set that flag in kvmalloc() if the kmalloc() call succeeded b) check for that flag in the DMA code, WARN if it is set. c) in kvfree(), clear that flag before calling kfree() d) in kfree(), check for that flag and go WARN() if set. So both potential bugs are detected all the time, dependent upon CONFIG_SLUB_DEBUG (and perhaps other slub config options).
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Tue, 24 Apr 2018, Michal Hocko wrote: > On Tue 24-04-18 11:30:40, Mikulas Patocka wrote: > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote: > > > > > > > Fixing __vmalloc code > > > > is easy and it doesn't require cooperation with maintainers. > > > > > > But it is a hack against the intention of the scope api. > > > > It is not! > > This discussion simply doesn't make much sense it seems. The scope API > is to document the scope of the reclaim recursion critical section. That > certainly is not a utility function like vmalloc. That 15-line __vmalloc bugfix doesn't prevent you (or any other kernel developer) from converting the code to the scope API. You make nonsensical excuses. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Tue 24-04-18 10:12:42, Michal Hocko wrote: > On Tue 24-04-18 11:30:40, Mikulas Patocka wrote: > > > > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote: > > > > > > > Fixing __vmalloc code > > > > is easy and it doesn't require cooperation with maintainers. > > > > > > But it is a hack against the intention of the scope api. > > > > It is not! > > This discussion simply doesn't make much sense it seems. The scope API > is to document the scope of the reclaim recursion critical section. That > certainly is not a utility function like vmalloc. http://lkml.kernel.org/r/20180424162712.gl17...@dhcp22.suse.cz let's see how it rolls this time. -- Michal Hocko SUSE Labs
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Tue 24-04-18 11:30:40, Mikulas Patocka wrote: > > > On Tue, 24 Apr 2018, Michal Hocko wrote: > > > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote: > > > > > Fixing __vmalloc code > > > is easy and it doesn't require cooperation with maintainers. > > > > But it is a hack against the intention of the scope api. > > It is not! This discussion simply doesn't make much sense it seems. The scope API is to document the scope of the reclaim recursion critical section. That certainly is not a utility function like vmalloc. -- Michal Hocko SUSE Labs
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Tue, 24 Apr 2018, Michal Hocko wrote: > On Mon 23-04-18 20:25:15, Mikulas Patocka wrote: > > > Fixing __vmalloc code > > is easy and it doesn't require cooperation with maintainers. > > But it is a hack against the intention of the scope api. It is not! You can fix __vmalloc now and you can convert the kernel to the scope API in 4 years. It's not one way or the other. > It also alows maintainers to not care about their broken code. Most maintainers don't even know that it's broken. Out of 14 subsystems using __vmalloc with GFP_NOIO/NOFS, only 2 realized that its implementation is broken and implemented a workaround (me and the XFS developers). Misimplementing a function in a subtle and hard-to-notice way won't drive developers away from using it. > > > > He refuses 15-line patch to fix GFP_NOIO bug because he believes that > > > > in 4 > > > > years, the kernel will be refactored and GFP_NOIO will be eliminated. > > > > Why > > > > does he have veto over this part of the code? I'd much rather argue > > > > with > > > > people who have constructive comments about fixing bugs than with him. > > > > > > I didn't NACK the patch AFAIR. I've said it is not a good idea longterm. > > > I would be much more willing to change my mind if you would back your > > > patch by a real bug report. Hacks are acceptable when we have a real > > > issue in hands. But if we want to fix potential issue then better make > > > it properly. > > > > Developers should fix bugs in advance, not to wait until a crash hapens, > > is analyzed and reported. > > I agree. But are those existing users broken in the first place? I have > seen so many GFP_NOFS abuses that I would dare to guess that most of > those vmalloc NOFS abusers can be simply turned into GFP_KERNEL. Maybe > that is the reason we haven't heard any complains in years. alloc_pages reclaims clean pages and most hard work is done by kswapd, so GFP_KERNEL doesn't cause much issues with writeback. But cheating isn't justified if you can get away with it. Incorrect GFP flags cause real problems with shrinkers - because shrinkers are called from alloc_pages and they do respond to GFP flags. I had reported deadlock due to GFP issues (9d28eb12447). And the worst thing about these bug reports is that they are totally unreproducible and I get nothing, but a stacktrace in bugzilla. I had to guess what happened and I couldn't even test if the patch fixed the bug. I'm not really happy that you are deliberately leaving these issues behind and making excuses. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Mon 23-04-18 20:25:15, Mikulas Patocka wrote: > > > On Mon, 23 Apr 2018, Michal Hocko wrote: > > > On Mon 23-04-18 10:06:08, Mikulas Patocka wrote: > > > > > > > He didn't want to fix vmalloc(GFP_NOIO) > > > > > > > > I don't remember that conversation, so I don't know whether I agree with > > > > his reasoning or not. But we are supposed to be moving away from > > > > GFP_NOIO > > > > towards marking regions with memalloc_noio_save() / restore. If you do > > > > that, you won't need vmalloc(GFP_NOIO). > > > > > > He said the same thing a year ago. And there was small progress. 6 out of > > > 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in > > > infiniband and 1 in btrfs. (the whole discussion is here > > > http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html ) > > > > Well this is not that easy. It requires a cooperation from maintainers. > > I can only do as much. I've posted patches in the past and actively > > bringing up this topic at LSFMM last two years... > > You're right - but you have chosen the uneasy path. Yes. > Fixing __vmalloc code > is easy and it doesn't require cooperation with maintainers. But it is a hack against the intention of the scope api. It also alows maintainers to not care about their broken code. > > > He refuses 15-line patch to fix GFP_NOIO bug because he believes that in > > > 4 > > > years, the kernel will be refactored and GFP_NOIO will be eliminated. Why > > > does he have veto over this part of the code? I'd much rather argue with > > > people who have constructive comments about fixing bugs than with him. > > > > I didn't NACK the patch AFAIR. I've said it is not a good idea longterm. > > I would be much more willing to change my mind if you would back your > > patch by a real bug report. Hacks are acceptable when we have a real > > issue in hands. But if we want to fix potential issue then better make > > it properly. > > Developers should fix bugs in advance, not to wait until a crash hapens, > is analyzed and reported. I agree. But are those existing users broken in the first place? I have seen so many GFP_NOFS abuses that I would dare to guess that most of those vmalloc NOFS abusers can be simply turned into GFP_KERNEL. Maybe that is the reason we haven't heard any complains in years. -- Michal Hocko SUSE Labs
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Mon, 23 Apr 2018, Michal Hocko wrote: > On Mon 23-04-18 10:06:08, Mikulas Patocka wrote: > > > > > He didn't want to fix vmalloc(GFP_NOIO) > > > > > > I don't remember that conversation, so I don't know whether I agree with > > > his reasoning or not. But we are supposed to be moving away from GFP_NOIO > > > towards marking regions with memalloc_noio_save() / restore. If you do > > > that, you won't need vmalloc(GFP_NOIO). > > > > He said the same thing a year ago. And there was small progress. 6 out of > > 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in > > infiniband and 1 in btrfs. (the whole discussion is here > > http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html ) > > Well this is not that easy. It requires a cooperation from maintainers. > I can only do as much. I've posted patches in the past and actively > bringing up this topic at LSFMM last two years... You're right - but you have chosen the uneasy path. Fixing __vmalloc code is easy and it doesn't require cooperation with maintainers. > > He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 > > years, the kernel will be refactored and GFP_NOIO will be eliminated. Why > > does he have veto over this part of the code? I'd much rather argue with > > people who have constructive comments about fixing bugs than with him. > > I didn't NACK the patch AFAIR. I've said it is not a good idea longterm. > I would be much more willing to change my mind if you would back your > patch by a real bug report. Hacks are acceptable when we have a real > issue in hands. But if we want to fix potential issue then better make > it properly. Developers should fix bugs in advance, not to wait until a crash hapens, is analyzed and reported. What's the problem with 15-line hack? Is the problem that kernel developers would feel depressed when looking the source code? Other than harming developers' feelings, I don't see what kind of damange could that piece of code do. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Mon, 23 Apr 2018, Michal Hocko wrote: > On Mon 23-04-18 10:24:02, Mikulas Patocka wrote: > > > > Really, we have a fault injection framework and this sounds like > > > something to hook in there. > > > > The testing people won't set it up. They install the "kernel-debug" > > package and run the tests in it. > > > > If you introduce a hidden option that no one knows about, no one will use > > it. > > then make sure people know about it. Fuzzers already do test fault > injections. I think that in the long term we can introduce a kernel parameter like "debug_level=1", "debug_level=2", "debug_level=3" that will turn on debugging features across all kernel subsystems and we can teach users to use it to diagnose problems. But it won't work if every subsystem has different debug parameters. There are 192 distinct filenames in debugfs, if we add 193rd one, harly anyone notices it. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Fri, Apr 20, 2018 at 08:20:23AM -0400, Mikulas Patocka wrote: > > > On Fri, 20 Apr 2018, Matthew Wilcox wrote: > > > On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote: > > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > > > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > > > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > > > code. > > > > Maybe it's time to have the SG code handle vmalloced pages? This is > > becoming more and more common with vmapped stacks (and some of our > > workarounds are hideous -- allocate 4 bytes with kmalloc because we can't > > DMA onto the stack any more?). We already have a few places which do > > handle sgs of vmalloced addresses, such as the nx crypto driver: > > > > if (is_vmalloc_addr(start_addr)) > > sg_addr = page_to_phys(vmalloc_to_page(start_addr)) > > + offset_in_page(sg_addr); > > else > > sg_addr = __pa(sg_addr); > > > > and videobuf: > > > > pg = vmalloc_to_page(virt); > > if (NULL == pg) > > goto err; > > BUG_ON(page_to_pfn(pg) >= (1 << (32 - PAGE_SHIFT))); > > sg_set_page([i], pg, PAGE_SIZE, 0); > > > > Yes, there's the potential that we have to produce two SG entries for a > > virtually contiguous region if it crosses a page boundary, and our APIs > > aren't set up right to make it happen. But this is something we should > > consider fixing ... otherwise we'll end up with dozens of driver hacks. > > The videobuf implementation was already copy-and-pasted into the saa7146 > > driver, for example. > > What if the device requires physically contiguous area and the vmalloc > area crosses a page? Will you use a bounce buffer? Where do you allocate > the bounce buffer from? What if you run out of bounce buffers? > > Mikulkas I agree with Matthew here. 4 byte variables are typically size aligned so won't cross a boundary. That's enough for virtio at least. People using structs can force alignment. We could wrap access in a macro (sizeof(x) >= alignof(x)) to help guarantee that. -- MST
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Mon 23-04-18 10:06:08, Mikulas Patocka wrote: > > > On Sat, 21 Apr 2018, Matthew Wilcox wrote: > > > On Fri, Apr 20, 2018 at 05:21:26PM -0400, Mikulas Patocka wrote: > > > On Fri, 20 Apr 2018, Matthew Wilcox wrote: > > > > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote: > > > > > On Fri, 20 Apr 2018, Michal Hocko wrote: > > > > > > No way. This is just wrong! First of all, you will explode most > > > > > > likely > > > > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends > > > > > > to be > > > > > > enabled quite often. > > > > > > > > > > You're an evil person who doesn't want to fix bugs. > > > > > > > > Steady on. There's no need for that. Michal isn't evil. Please > > > > apologise. > > > > > > I see this attitude from Michal again and again. > > > > Fine; then *say that*. I also see Michal saying "No" a lot. Sometimes > > I agree with him, sometimes I don't. I think he genuinely wants the best > > code in the kernel, and saying "No" is part of it. > > > > > He didn't want to fix vmalloc(GFP_NOIO) > > > > I don't remember that conversation, so I don't know whether I agree with > > his reasoning or not. But we are supposed to be moving away from GFP_NOIO > > towards marking regions with memalloc_noio_save() / restore. If you do > > that, you won't need vmalloc(GFP_NOIO). > > He said the same thing a year ago. And there was small progress. 6 out of > 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in > infiniband and 1 in btrfs. (the whole discussion is here > http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html ) Well this is not that easy. It requires a cooperation from maintainers. I can only do as much. I've posted patches in the past and actively bringing up this topic at LSFMM last two years... > He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 > years, the kernel will be refactored and GFP_NOIO will be eliminated. Why > does he have veto over this part of the code? I'd much rather argue with > people who have constructive comments about fixing bugs than with him. I didn't NACK the patch AFAIR. I've said it is not a good idea longterm. I would be much more willing to change my mind if you would back your patch by a real bug report. Hacks are acceptable when we have a real issue in hands. But if we want to fix potential issue then better make it properly. [...] > I sent the CONFIG_DEBUG_SG patch before (I wonder why he didn't repond to > it). I'll send a third version of the patch that actually randomly chooses > between kmalloc and vmalloc, because some abuses can only be detected with > kmalloc and we should test both. > > For bisecting, it is better to always fallback to vmalloc, but for general > testing, it is better to test both branches. Agreed! -- Michal Hocko SUSE Labs
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Mon 23-04-18 10:24:02, Mikulas Patocka wrote: [...] I am not going to comment on your continuous accusations. We can discuss patches but general rants make very limited sense. > > > > I already said that we can change it from CONFIG_DEBUG_VM to > > > > CONFIG_DEBUG_SG - or to whatever other option you may want, just to > > > > make > > > > sure that it is enabled in distro debug kernels by default. > > > > > > Yes, and I think that's the right idea. So send a v2 and ignore the > > > replies that are clearly relating to an earlier version of the patch. > > > Not everybody reads every mail in the thread before responding to one they > > > find interesting. Yes, ideally, one would, but sometimes one doesn't. > > > > And look here. This is yet another ad-hoc idea. We have many users of > > kvmalloc which have no relation to SG, yet you are going to control > > their behavior by CONFIG_DEBUG_SG? No way! (yeah evil again) > > Why aren't you constructive and pick up pick up the CONFIG flag? Because config doesn't make that much of a sense. You do not want a permanent vmalloc fallback unconditionally. Debugging option which changes the behavior completely is not useful IMNHO. Besides that who is going to enable it? > > Really, we have a fault injection framework and this sounds like > > something to hook in there. > > The testing people won't set it up. They install the "kernel-debug" > package and run the tests in it. > > If you introduce a hidden option that no one knows about, no one will use > it. then make sure people know about it. Fuzzers already do test fault injections. -- Michal Hocko SUSE Labs
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Sun, 22 Apr 2018, Michal Hocko wrote: > On Sat 21-04-18 07:47:57, Matthew Wilcox wrote: > > > > He didn't want to fix vmalloc(GFP_NOIO) > > > > I don't remember that conversation, so I don't know whether I agree with > > his reasoning or not. But we are supposed to be moving away from GFP_NOIO > > towards marking regions with memalloc_noio_save() / restore. If you do > > that, you won't need vmalloc(GFP_NOIO). > > It was basically to detect GFP_NOIO context _inside_ vmalloc and use the > scope API to enforce it there. Does it solve potential problems? Yes it > does. Does it solve any existing report, no I am not aware of any. Is > it a good fix longterm? Absolutely no, because the scope API should be > used _at the place_ where the scope starts rather than a random utility > function. If we are going the easier way now, we will never teach users > to use the API properly. And I am willing to risk to keep a broken > code which we have for years rather than allow a random hack that will > seemingly fix it. > > Btw. I was pretty much explicit with this reasoning when rejecting the > patch. Do you still call that evil? You are making nonsensical excuses. That patch doesn't prevent you from refactoring the kernel and eliminating GFP_NOIO in the long term. You can apply the patch and then continue working on GFP_NOIO refactoring as well as before. > > > he didn't want to fix alloc_pages sleeping when __GFP_NORETRY is used. > > > > The GFP flags are a mess, still. > > I do not remember that one but __GFP_NORETRY is _allowed_ to sleep. And > yes I do _agree_ gfp flags are a mess which is really hard to get fixed > because they are lacking a good design from the very beginning. Fixing > some of those issues today is a completely PITA. It may sleep, but if it sleeps regularly, it slows down swapping (because the swapping code does mempool_alloc and mempool_alloc does __GFP_NORETRY allocation). And there were two INTENTIONAL sleeps with schedule_timeout. You removed one and left the other, claiming that __GFP_NORETRY allocation should sleep. > > > I already said that we can change it from CONFIG_DEBUG_VM to > > > CONFIG_DEBUG_SG - or to whatever other option you may want, just to make > > > sure that it is enabled in distro debug kernels by default. > > > > Yes, and I think that's the right idea. So send a v2 and ignore the > > replies that are clearly relating to an earlier version of the patch. > > Not everybody reads every mail in the thread before responding to one they > > find interesting. Yes, ideally, one would, but sometimes one doesn't. > > And look here. This is yet another ad-hoc idea. We have many users of > kvmalloc which have no relation to SG, yet you are going to control > their behavior by CONFIG_DEBUG_SG? No way! (yeah evil again) Why aren't you constructive and pick up pick up the CONFIG flag? > Really, we have a fault injection framework and this sounds like > something to hook in there. The testing people won't set it up. They install the "kernel-debug" package and run the tests in it. If you introduce a hidden option that no one knows about, no one will use it. > -- > Michal Hocko > SUSE Labs Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Sat, 21 Apr 2018, Matthew Wilcox wrote: > On Fri, Apr 20, 2018 at 05:21:26PM -0400, Mikulas Patocka wrote: > > On Fri, 20 Apr 2018, Matthew Wilcox wrote: > > > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote: > > > > On Fri, 20 Apr 2018, Michal Hocko wrote: > > > > > No way. This is just wrong! First of all, you will explode most likely > > > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to > > > > > be > > > > > enabled quite often. > > > > > > > > You're an evil person who doesn't want to fix bugs. > > > > > > Steady on. There's no need for that. Michal isn't evil. Please > > > apologise. > > > > I see this attitude from Michal again and again. > > Fine; then *say that*. I also see Michal saying "No" a lot. Sometimes > I agree with him, sometimes I don't. I think he genuinely wants the best > code in the kernel, and saying "No" is part of it. > > > He didn't want to fix vmalloc(GFP_NOIO) > > I don't remember that conversation, so I don't know whether I agree with > his reasoning or not. But we are supposed to be moving away from GFP_NOIO > towards marking regions with memalloc_noio_save() / restore. If you do > that, you won't need vmalloc(GFP_NOIO). He said the same thing a year ago. And there was small progress. 6 out of 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in infiniband and 1 in btrfs. (the whole discussion is here http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html ) He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 years, the kernel will be refactored and GFP_NOIO will be eliminated. Why does he have veto over this part of the code? I'd much rather argue with people who have constructive comments about fixing bugs than with him. (note that even if the refactoring eventually succeeds, it will not be backported to stable branches. The small vmalloc patch could be backported) > > he didn't want to fix alloc_pages sleeping when __GFP_NORETRY is used. > > The GFP flags are a mess, still. That's the problem - the flag doesn't have a clear contract and the developers change behavior ad hoc according to bug reports. > > So what should I say? Fix them and you won't be evil :-) > > No, you should reserve calling somebody evil for truly evil things. How would you call it? Michal falsely believes that a 15-line patch would prevent him from doing long-term refactoring work and so he refuses it. > > I already said that we can change it from CONFIG_DEBUG_VM to > > CONFIG_DEBUG_SG - or to whatever other option you may want, just to make > > sure that it is enabled in distro debug kernels by default. > > Yes, and I think that's the right idea. So send a v2 and ignore the > replies that are clearly relating to an earlier version of the patch. > Not everybody reads every mail in the thread before responding to one they > find interesting. Yes, ideally, one would, but sometimes one doesn't. I sent the CONFIG_DEBUG_SG patch before (I wonder why he didn't repond to it). I'll send a third version of the patch that actually randomly chooses between kmalloc and vmalloc, because some abuses can only be detected with kmalloc and we should test both. For bisecting, it is better to always fallback to vmalloc, but for general testing, it is better to test both branches. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Sat 21-04-18 07:47:57, Matthew Wilcox wrote: > On Fri, Apr 20, 2018 at 05:21:26PM -0400, Mikulas Patocka wrote: > > On Fri, 20 Apr 2018, Matthew Wilcox wrote: > > > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote: > > > > On Fri, 20 Apr 2018, Michal Hocko wrote: > > > > > No way. This is just wrong! First of all, you will explode most likely > > > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to > > > > > be > > > > > enabled quite often. > > > > > > > > You're an evil person who doesn't want to fix bugs. > > > > > > Steady on. There's no need for that. Michal isn't evil. Please > > > apologise. > > > > I see this attitude from Michal again and again. > > Fine; then *say that*. I also see Michal saying "No" a lot. Sometimes > I agree with him, sometimes I don't. I think he genuinely wants the best > code in the kernel, and saying "No" is part of it. Exactly! We have a lot of legacy herritage and random semantic because we were too eager to merge stuff. I think it is time to stop that and think twice before merging someothing. If you call that evil then I am fine to be evil. > > He didn't want to fix vmalloc(GFP_NOIO) > > I don't remember that conversation, so I don't know whether I agree with > his reasoning or not. But we are supposed to be moving away from GFP_NOIO > towards marking regions with memalloc_noio_save() / restore. If you do > that, you won't need vmalloc(GFP_NOIO). It was basically to detect GFP_NOIO context _inside_ vmalloc and use the scope API to enforce it there. Does it solve potential problems? Yes it does. Does it solve any existing report, no I am not aware of any. Is it a good fix longterm? Absolutely no, because the scope API should be used _at the place_ where the scope starts rather than a random utility function. If we are going the easier way now, we will never teach users to use the API properly. And I am willing to risk to keep a broken code which we have for years rather than allow a random hack that will seemingly fix it. Btw. I was pretty much explicit with this reasoning when rejecting the patch. Do you still call that evil? > > he didn't want to fix alloc_pages sleeping when __GFP_NORETRY is used. > > The GFP flags are a mess, still. I do not remember that one but __GFP_NORETRY is _allowed_ to sleep. And yes I do _agree_ gfp flags are a mess which is really hard to get fixed because they are lacking a good design from the very beginning. Fixing some of those issues today is a completely PITA. > > So what should I say? Fix them and > > you won't be evil :-) > > No, you should reserve calling somebody evil for truly evil things. > > > (he could also fix the oom killer, so that it is triggered when > > free_memory+cache+free_swap goes beyond a threshold and not when you loop > > too long in the allocator) > > ... that also doesn't make somebody evil. And again, it is way much more easier to claim that something will get fixed when the reality is much more complicated. I've tried to explain those risks as well. > > I already said that we can change it from CONFIG_DEBUG_VM to > > CONFIG_DEBUG_SG - or to whatever other option you may want, just to make > > sure that it is enabled in distro debug kernels by default. > > Yes, and I think that's the right idea. So send a v2 and ignore the > replies that are clearly relating to an earlier version of the patch. > Not everybody reads every mail in the thread before responding to one they > find interesting. Yes, ideally, one would, but sometimes one doesn't. And look here. This is yet another ad-hoc idea. We have many users of kvmalloc which have no relation to SG, yet you are going to control their behavior by CONFIG_DEBUG_SG? No way! (yeah evil again) Really, we have a fault injection framework and this sounds like something to hook in there. -- Michal Hocko SUSE Labs
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Fri, Apr 20, 2018 at 05:21:26PM -0400, Mikulas Patocka wrote: > On Fri, 20 Apr 2018, Matthew Wilcox wrote: > > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote: > > > On Fri, 20 Apr 2018, Michal Hocko wrote: > > > > No way. This is just wrong! First of all, you will explode most likely > > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be > > > > enabled quite often. > > > > > > You're an evil person who doesn't want to fix bugs. > > > > Steady on. There's no need for that. Michal isn't evil. Please > > apologise. > > I see this attitude from Michal again and again. Fine; then *say that*. I also see Michal saying "No" a lot. Sometimes I agree with him, sometimes I don't. I think he genuinely wants the best code in the kernel, and saying "No" is part of it. > He didn't want to fix vmalloc(GFP_NOIO) I don't remember that conversation, so I don't know whether I agree with his reasoning or not. But we are supposed to be moving away from GFP_NOIO towards marking regions with memalloc_noio_save() / restore. If you do that, you won't need vmalloc(GFP_NOIO). > he didn't want to fix alloc_pages sleeping when __GFP_NORETRY is used. The GFP flags are a mess, still. > So what should I say? Fix them and > you won't be evil :-) No, you should reserve calling somebody evil for truly evil things. > (he could also fix the oom killer, so that it is triggered when > free_memory+cache+free_swap goes beyond a threshold and not when you loop > too long in the allocator) ... that also doesn't make somebody evil. > I already said that we can change it from CONFIG_DEBUG_VM to > CONFIG_DEBUG_SG - or to whatever other option you may want, just to make > sure that it is enabled in distro debug kernels by default. Yes, and I think that's the right idea. So send a v2 and ignore the replies that are clearly relating to an earlier version of the patch. Not everybody reads every mail in the thread before responding to one they find interesting. Yes, ideally, one would, but sometimes one doesn't.
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Fri, 20 Apr 2018, Matthew Wilcox wrote: > On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote: > > On Fri, 20 Apr 2018, Michal Hocko wrote: > > > No way. This is just wrong! First of all, you will explode most likely > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be > > > enabled quite often. > > > > You're an evil person who doesn't want to fix bugs. > > Steady on. There's no need for that. Michal isn't evil. Please > apologise. I see this attitude from Michal again and again. He didn't want to fix vmalloc(GFP_NOIO), he didn't want to fix alloc_pages sleeping when __GFP_NORETRY is used. So what should I say? Fix them and you won't be evil :-) (he could also fix the oom killer, so that it is triggered when free_memory+cache+free_swap goes beyond a threshold and not when you loop too long in the allocator) > > You refused to fix vmalloc(GFP_NOIO) misbehavior a year ago (did you make > > some progress with it since that time?) and you refuse to fix kvmalloc > > misuses. > > I understand you're frustrated, but this is not the way to get the problems > fixed. > > > I tried this patch on text-only virtual machine and /proc/vmallocinfo > > shows 614kB more memory. I tried it on a desktop machine with the chrome > > browser open and /proc/vmallocinfo space is increased by 7MB. So no - this > > won't exhaust memory and kill the machine. > > This is good data, thank you for providing it. > > > Arguing that this increases memory consumption is as bogus as arguing that > > CONFIG_LOCKDEP increses memory consumption. No one is forcing you to > > enable CONFIG_LOCKDEP and no one is forcing you to enable this kvmalloc > > test too. > > I think there's a real problem which is that CONFIG_DEBUG_VM is too broad. > It inserts code in a *lot* of places, some of which is quite expensive. > We would do better to split it into more granular pieces ... although > an explosion of configuration options isn't great either. Maybe just > CONFIG_DEBUG_VM and CONFIG_DEBUG_VM_EXPENSIVE. > > Michal may be wrong, but he's not evil. I already said that we can change it from CONFIG_DEBUG_VM to CONFIG_DEBUG_SG - or to whatever other option you may want, just to make sure that it is enabled in distro debug kernels by default. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Fri, Apr 20, 2018 at 04:54:53PM -0400, Mikulas Patocka wrote: > On Fri, 20 Apr 2018, Michal Hocko wrote: > > No way. This is just wrong! First of all, you will explode most likely > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be > > enabled quite often. > > You're an evil person who doesn't want to fix bugs. Steady on. There's no need for that. Michal isn't evil. Please apologise. > You refused to fix vmalloc(GFP_NOIO) misbehavior a year ago (did you make > some progress with it since that time?) and you refuse to fix kvmalloc > misuses. I understand you're frustrated, but this is not the way to get the problems fixed. > I tried this patch on text-only virtual machine and /proc/vmallocinfo > shows 614kB more memory. I tried it on a desktop machine with the chrome > browser open and /proc/vmallocinfo space is increased by 7MB. So no - this > won't exhaust memory and kill the machine. This is good data, thank you for providing it. > Arguing that this increases memory consumption is as bogus as arguing that > CONFIG_LOCKDEP increses memory consumption. No one is forcing you to > enable CONFIG_LOCKDEP and no one is forcing you to enable this kvmalloc > test too. I think there's a real problem which is that CONFIG_DEBUG_VM is too broad. It inserts code in a *lot* of places, some of which is quite expensive. We would do better to split it into more granular pieces ... although an explosion of configuration options isn't great either. Maybe just CONFIG_DEBUG_VM and CONFIG_DEBUG_VM_EXPENSIVE. Michal may be wrong, but he's not evil.
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Fri, 20 Apr 2018, Michal Hocko wrote: > On Fri 20-04-18 06:41:36, Matthew Wilcox wrote: > > On Fri, Apr 20, 2018 at 03:08:52PM +0200, Michal Hocko wrote: > > > > In order to detect these bugs reliably I submit this patch that changes > > > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > > > > > No way. This is just wrong! First of all, you will explode most likely > > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be > > > enabled quite often. > > > > I think it'll still suit Mikulas' debugging needs if we always use > > vmalloc for sizes above PAGE_SIZE? > > Even if that was the case then this doesn't sounds like CONFIG_DEBUG_VM > material. We do not want a completely different behavior when the config > > -- > Michal Hocko > SUSE Labs I'm not arguing that it must be turned on exactly by CONFIG_DEBUG_VM. It may be turned on some other option that is enabled in debug kernels (CONFIG_DEBUG_SG may be a better option, because you'll get meaningful stracktraces from DMA API then). > is enabled. If we really need some better fallback testing coverage > then the fault injection, as suggested by Vlastimil, sounds much more > reasonable to me People who test kernels will install the kernel-debug package, reboot to the debug kernel and run their testsuites. They won't turn on magic options in debugfs or use some hideous kernel commandline arguments. If the kvmalloc test isn't in the debug kernel, then the testing crew won't test it - that's it. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Fri, 20 Apr 2018, Michal Hocko wrote: > On Thu 19-04-18 12:12:38, Mikulas Patocka wrote: > [...] > > From: Mikulas Patocka> > Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM > > > > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > > kmalloc fails. > > > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > > code. > > > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > > only if memory is fragmented. > > > > In order to detect these bugs reliably I submit this patch that changes > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > No way. This is just wrong! First of all, you will explode most likely > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be > enabled quite often. You're an evil person who doesn't want to fix bugs. You refused to fix vmalloc(GFP_NOIO) misbehavior a year ago (did you make some progress with it since that time?) and you refuse to fix kvmalloc misuses. I tried this patch on text-only virtual machine and /proc/vmallocinfo shows 614kB more memory. I tried it on a desktop machine with the chrome browser open and /proc/vmallocinfo space is increased by 7MB. So no - this won't exhaust memory and kill the machine. Arguing that this increases memory consumption is as bogus as arguing that CONFIG_LOCKDEP increses memory consumption. No one is forcing you to enable CONFIG_LOCKDEP and no one is forcing you to enable this kvmalloc test too. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Fri 20-04-18 06:41:36, Matthew Wilcox wrote: > On Fri, Apr 20, 2018 at 03:08:52PM +0200, Michal Hocko wrote: > > > In order to detect these bugs reliably I submit this patch that changes > > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > > > No way. This is just wrong! First of all, you will explode most likely > > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be > > enabled quite often. > > I think it'll still suit Mikulas' debugging needs if we always use > vmalloc for sizes above PAGE_SIZE? Even if that was the case then this doesn't sounds like CONFIG_DEBUG_VM material. We do not want a completely different behavior when the config is enabled. If we really need some better fallback testing coverage then the fault injection, as suggested by Vlastimil, sounds much more reasonable to me -- Michal Hocko SUSE Labs
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Fri, Apr 20, 2018 at 03:08:52PM +0200, Michal Hocko wrote: > > In order to detect these bugs reliably I submit this patch that changes > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > No way. This is just wrong! First of all, you will explode most likely > on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be > enabled quite often. I think it'll still suit Mikulas' debugging needs if we always use vmalloc for sizes above PAGE_SIZE?
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu 19-04-18 12:12:38, Mikulas Patocka wrote: [...] > From: Mikulas Patocka> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM > > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > kmalloc fails. > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > code. > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > only if memory is fragmented. > > In order to detect these bugs reliably I submit this patch that changes > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. No way. This is just wrong! First of all, you will explode most likely on many allocations of small sizes. Second, CONFIG_DEBUG_VM tends to be enabled quite often. > Signed-off-by: Mikulas Patocka Nacked-by: Michal Hocko > --- > mm/util.c |2 ++ > 1 file changed, 2 insertions(+) > > Index: linux-2.6/mm/util.c > === > --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.0 +0200 > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200 > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap); > */ > void *kvmalloc_node(size_t size, gfp_t flags, int node) > { > +#ifndef CONFIG_DEBUG_VM > gfp_t kmalloc_flags = flags; > void *ret; > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f >*/ > if (ret || size <= PAGE_SIZE) > return ret; > +#endif > > return __vmalloc_node_flags_caller(size, node, flags, > __builtin_return_address(0)); -- Michal Hocko SUSE Labs
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Fri, 20 Apr 2018, Matthew Wilcox wrote: > On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote: > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > > code. > > Maybe it's time to have the SG code handle vmalloced pages? This is > becoming more and more common with vmapped stacks (and some of our > workarounds are hideous -- allocate 4 bytes with kmalloc because we can't > DMA onto the stack any more?). We already have a few places which do > handle sgs of vmalloced addresses, such as the nx crypto driver: > > if (is_vmalloc_addr(start_addr)) > sg_addr = page_to_phys(vmalloc_to_page(start_addr)) > + offset_in_page(sg_addr); > else > sg_addr = __pa(sg_addr); > > and videobuf: > > pg = vmalloc_to_page(virt); > if (NULL == pg) > goto err; > BUG_ON(page_to_pfn(pg) >= (1 << (32 - PAGE_SHIFT))); > sg_set_page([i], pg, PAGE_SIZE, 0); > > Yes, there's the potential that we have to produce two SG entries for a > virtually contiguous region if it crosses a page boundary, and our APIs > aren't set up right to make it happen. But this is something we should > consider fixing ... otherwise we'll end up with dozens of driver hacks. > The videobuf implementation was already copy-and-pasted into the saa7146 > driver, for example. What if the device requires physically contiguous area and the vmalloc area crosses a page? Will you use a bounce buffer? Where do you allocate the bounce buffer from? What if you run out of bounce buffers? Mikulkas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu, 19 Apr 2018, Andrew Morton wrote: > On Thu, 19 Apr 2018 17:19:20 -0400 (EDT) Mikulas Patocka >wrote: > > > > > In order to detect these bugs reliably I submit this patch that changes > > > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > > > > > > > ... > > > > > > > > --- linux-2.6.orig/mm/util.c2018-04-18 15:46:23.0 +0200 > > > > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200 > > > > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap); > > > > */ > > > > void *kvmalloc_node(size_t size, gfp_t flags, int node) > > > > { > > > > +#ifndef CONFIG_DEBUG_VM > > > > gfp_t kmalloc_flags = flags; > > > > void *ret; > > > > > > > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f > > > > */ > > > > if (ret || size <= PAGE_SIZE) > > > > return ret; > > > > +#endif > > > > > > > > return __vmalloc_node_flags_caller(size, node, flags, > > > > __builtin_return_address(0)); > > > > > > Well, it doesn't have to be done at compile-time, does it? We could > > > add a knob (in debugfs, presumably) which enables this at runtime. > > > That's far more user-friendly. > > > > But who will turn it on in debugfs? > > But who will turn it on in Kconfig? Just a handful of developers. We So, it won't receive much testing. I've never played with those debugfs files (because I didn't need it), and most users also won't play with it. Having a debugfs option is like having no option at all. > could add SONFIG_DEBUG_SG to the list in > Documentation/process/submit-checklist.rst, but nobody reads that. > > Also, a whole bunch of defconfigs set CONFIG_DEBUG_SG=y and some > googling indicates that they aren't the only ones... > > > It should be default for debugging > > kernels, so that users using them would report the error. > > Well. This isn't the first time we've wanted to enable expensive (or > noisy) debugging things in debug kernels, by any means. > > So how could we define a debug kernel in which it's OK to enable such > things? Debug kernel is what distributions distribute as debug kernel - i.e. RHEL or Fedora have debugging kernels. So it needs to be bound to an option that is turned on in these kernels - so that any user who boots the debugging kernel triggers the bug. > - Could be "it's an -rc kernel". But then we'd be enabling a bunch of > untested code when Linus cuts a release. > > - Could be "it's an -rc kernel with SUBLEVEL <= 5". But then we risk > unexpected things happening when Linux cuts -rc6, which still isn't > good. > > - How about "it's an -rc kernel with odd-numbered SUBLEVEL and > SUBLEVEL <= 5". That way everybody who runs -rc1, -rc3 and -rc5 will > have kvmalloc debugging enabled. That's potentially nasty because > vmalloc is much slower than kmalloc. But kvmalloc() is only used for > large and probably infrequent allocations, so it's probably OK. > > I wonder how we get at SUBLEVEL from within .c. Don't bind it to rc level, bind it to some debugging configuration option. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote: > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > code. Maybe it's time to have the SG code handle vmalloced pages? This is becoming more and more common with vmapped stacks (and some of our workarounds are hideous -- allocate 4 bytes with kmalloc because we can't DMA onto the stack any more?). We already have a few places which do handle sgs of vmalloced addresses, such as the nx crypto driver: if (is_vmalloc_addr(start_addr)) sg_addr = page_to_phys(vmalloc_to_page(start_addr)) + offset_in_page(sg_addr); else sg_addr = __pa(sg_addr); and videobuf: pg = vmalloc_to_page(virt); if (NULL == pg) goto err; BUG_ON(page_to_pfn(pg) >= (1 << (32 - PAGE_SHIFT))); sg_set_page([i], pg, PAGE_SIZE, 0); Yes, there's the potential that we have to produce two SG entries for a virtually contiguous region if it crosses a page boundary, and our APIs aren't set up right to make it happen. But this is something we should consider fixing ... otherwise we'll end up with dozens of driver hacks. The videobuf implementation was already copy-and-pasted into the saa7146 driver, for example.
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu, 19 Apr 2018 17:19:20 -0400 (EDT) Mikulas Patockawrote: > > > In order to detect these bugs reliably I submit this patch that changes > > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > > > > > ... > > > > > > --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.0 +0200 > > > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200 > > > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap); > > > */ > > > void *kvmalloc_node(size_t size, gfp_t flags, int node) > > > { > > > +#ifndef CONFIG_DEBUG_VM > > > gfp_t kmalloc_flags = flags; > > > void *ret; > > > > > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f > > >*/ > > > if (ret || size <= PAGE_SIZE) > > > return ret; > > > +#endif > > > > > > return __vmalloc_node_flags_caller(size, node, flags, > > > __builtin_return_address(0)); > > > > Well, it doesn't have to be done at compile-time, does it? We could > > add a knob (in debugfs, presumably) which enables this at runtime. > > That's far more user-friendly. > > But who will turn it on in debugfs? But who will turn it on in Kconfig? Just a handful of developers. We could add SONFIG_DEBUG_SG to the list in Documentation/process/submit-checklist.rst, but nobody reads that. Also, a whole bunch of defconfigs set CONFIG_DEBUG_SG=y and some googling indicates that they aren't the only ones... > It should be default for debugging > kernels, so that users using them would report the error. Well. This isn't the first time we've wanted to enable expensive (or noisy) debugging things in debug kernels, by any means. So how could we define a debug kernel in which it's OK to enable such things? - Could be "it's an -rc kernel". But then we'd be enabling a bunch of untested code when Linus cuts a release. - Could be "it's an -rc kernel with SUBLEVEL <= 5". But then we risk unexpected things happening when Linux cuts -rc6, which still isn't good. - How about "it's an -rc kernel with odd-numbered SUBLEVEL and SUBLEVEL <= 5". That way everybody who runs -rc1, -rc3 and -rc5 will have kvmalloc debugging enabled. That's potentially nasty because vmalloc is much slower than kmalloc. But kvmalloc() is only used for large and probably infrequent allocations, so it's probably OK. I wonder how we get at SUBLEVEL from within .c.
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu, 19 Apr 2018, Andrew Morton wrote: > On Thu, 19 Apr 2018 12:12:38 -0400 (EDT) Mikulas Patocka >wrote: > > > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > > kmalloc fails. > > > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > > code. > > > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > > only if memory is fragmented. > > Yes, that's nasty. > > > In order to detect these bugs reliably I submit this patch that changes > > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > > > ... > > > > --- linux-2.6.orig/mm/util.c2018-04-18 15:46:23.0 +0200 > > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200 > > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap); > > */ > > void *kvmalloc_node(size_t size, gfp_t flags, int node) > > { > > +#ifndef CONFIG_DEBUG_VM > > gfp_t kmalloc_flags = flags; > > void *ret; > > > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f > > */ > > if (ret || size <= PAGE_SIZE) > > return ret; > > +#endif > > > > return __vmalloc_node_flags_caller(size, node, flags, > > __builtin_return_address(0)); > > Well, it doesn't have to be done at compile-time, does it? We could > add a knob (in debugfs, presumably) which enables this at runtime. > That's far more user-friendly. But who will turn it on in debugfs? It should be default for debugging kernels, so that users using them would report the error. Conditioning it on CONFIG_DEBUG_SG is better than CONFIG_DEBUG_VM, it will print a stacktrace where the incorrect use happened. Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu, 19 Apr 2018 12:12:38 -0400 (EDT) Mikulas Patockawrote: > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > kmalloc fails. > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > code. > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > only if memory is fragmented. Yes, that's nasty. > In order to detect these bugs reliably I submit this patch that changes > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > ... > > --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.0 +0200 > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200 > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap); > */ > void *kvmalloc_node(size_t size, gfp_t flags, int node) > { > +#ifndef CONFIG_DEBUG_VM > gfp_t kmalloc_flags = flags; > void *ret; > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f >*/ > if (ret || size <= PAGE_SIZE) > return ret; > +#endif > > return __vmalloc_node_flags_caller(size, node, flags, > __builtin_return_address(0)); Well, it doesn't have to be done at compile-time, does it? We could add a knob (in debugfs, presumably) which enables this at runtime. That's far more user-friendly.
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On 04/19/2018 06:12 PM, Mikulas Patocka wrote: > From: Mikulas Patocka> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM > > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > kmalloc fails. > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > code. > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > only if memory is fragmented. > > In order to detect these bugs reliably I submit this patch that changes > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > Signed-off-by: Mikulas Patocka Hmm AFAIK Fedora uses CONFIG_DEBUG_VM in their kernels. Sure you want to impose this on all users? Seems too much for DEBUG_VM to me. Maybe it should be hidden under some error injection config? > --- > mm/util.c |2 ++ > 1 file changed, 2 insertions(+) > > Index: linux-2.6/mm/util.c > === > --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.0 +0200 > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200 > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap); > */ > void *kvmalloc_node(size_t size, gfp_t flags, int node) > { > +#ifndef CONFIG_DEBUG_VM > gfp_t kmalloc_flags = flags; > void *ret; > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f >*/ > if (ret || size <= PAGE_SIZE) > return ret; > +#endif Did you verify that vmalloc does the right thing for sub-page sizes? Shouldn't those be exempted? > return __vmalloc_node_flags_caller(size, node, flags, > __builtin_return_address(0)); >
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu, Apr 19, 2018 at 12:12:38PM -0400, Mikulas Patocka wrote: > > > On Wed, 18 Apr 2018, Mikulas Patocka wrote: > > > > > > > On Wed, 18 Apr 2018, David Miller wrote: > > > > > From: Mikulas Patocka> > > Date: Wed, 18 Apr 2018 12:44:25 -0400 (EDT) > > > > > > > The structure net_device is followed by arbitrary driver-specific data > > > > (accessible with the function netdev_priv). And for virtio-net, these > > > > driver-specific data must be in DMA memory. > > > > > > And we are saying that this assumption is wrong and needs to be > > > corrected. > > > > So, try to find all the networking drivers that to DMA to the private > > area. > > > > The problem here is that kvzalloc usually returns DMA-able area, but it > > may return non-DMA area rarely, if the memory is too fragmented. So, we > > are in a situation, where some networking drivers will randomly fail. Go > > and find them. > > > > Mikulas > > Her I submit a patch that makes kvmalloc always use vmalloc if > CONFIG_DEBUG_VM is defined. > > > > > From: Mikulas Patocka > Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM > > The kvmalloc function tries to use kmalloc and falls back to vmalloc if > kmalloc fails. > > Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then > uses DMA-API on the returned memory or frees it with kfree. Such bugs were > found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific > code. > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > only if memory is fragmented. > > In order to detect these bugs reliably I submit this patch that changes > kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on. > > Signed-off-by: Mikulas Patocka Maybe make it conditional on CONFIG_DEBUG_SG too? Otherwise I think you just trigger a hard to debug memory corruption. > --- > mm/util.c |2 ++ > 1 file changed, 2 insertions(+) > > Index: linux-2.6/mm/util.c > === > --- linux-2.6.orig/mm/util.c 2018-04-18 15:46:23.0 +0200 > +++ linux-2.6/mm/util.c 2018-04-18 16:00:43.0 +0200 > @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap); > */ > void *kvmalloc_node(size_t size, gfp_t flags, int node) > { > +#ifndef CONFIG_DEBUG_VM > gfp_t kmalloc_flags = flags; > void *ret; > > @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f >*/ > if (ret || size <= PAGE_SIZE) > return ret; > +#endif > > return __vmalloc_node_flags_caller(size, node, flags, > __builtin_return_address(0));
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On Thu, 19 Apr 2018, Eric Dumazet wrote: > > > On 04/19/2018 09:12 AM, Mikulas Patocka wrote: > > > > > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > > only if memory is fragmented. > > > > This sentence is wrong. > > because kvmalloc() falls back to vmalloc() ... Yes. There should be "falls back to vmalloc()". Mikulas
Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
On 04/19/2018 09:12 AM, Mikulas Patocka wrote: > > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > only if memory is fragmented. > This sentence is wrong. because kvmalloc() falls back to vmalloc() ...