Re: Oops in a driver while using SLUB as a SLAB allocator
On Mon, 25 Jun 2007, Hugh Dickins wrote: > Please reread mails of the last 20 hours. Your "ARM fix" may or may not > be a good thing, I don't know. But it had nothing to do with this oops, > which (we believe) came from a kmalloc in drivers/mmc/core/sd.c. There > are likely to be many other drivers which pass down kmalloc'ed buffers > to routines which handle both kmalloc'ed buffers and page buffers. > Though very few of those needing to flush_dcache_page. If that is so then those functions may want to check PageSlab before calling flush_dcache_page. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Mon, 25 Jun 2007, Christoph Lameter wrote: > On Mon, 25 Jun 2007, Hugh Dickins wrote: > > > > The "sometimes we have kmalloced buffers" locations need to be fixed. > > > > I've said enough, I'd better leave it to others to deter you or not > > from fiddling around pointlessly here. > > Are there any locations left after the two fixes to pa-risc and arm? > > If the ARM fix solves the issue then we may not need that special casing > anymore. Maybe the sometimes has become never? Please reread mails of the last 20 hours. Your "ARM fix" may or may not be a good thing, I don't know. But it had nothing to do with this oops, which (we believe) came from a kmalloc in drivers/mmc/core/sd.c. There are likely to be many other drivers which pass down kmalloc'ed buffers to routines which handle both kmalloc'ed buffers and page buffers. Though very few of those needing to flush_dcache_page. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Mon, 25 Jun 2007, Hugh Dickins wrote: > > The "sometimes we have kmalloced buffers" locations need to be fixed. > > I've said enough, I'd better leave it to others to deter you or not > from fiddling around pointlessly here. Are there any locations left after the two fixes to pa-risc and arm? If the ARM fix solves the issue then we may not need that special casing anymore. Maybe the sometimes has become never? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Mon, 25 Jun 2007, Christoph Lameter wrote: > On Mon, 25 Jun 2007, Hugh Dickins wrote: > > > > I didn't claim that flush_dcache_page(virt_to_page(virt)) is not expected > > to work. I claim that flush_dcache_page is expected to be a noop rather > > than an oops on a kmalloced page. > > There are no kmalloced pages. There is only kmalloced memory. You allocate > pages from the page allocator. Its a layering violation to expect a page > struct operation on a slab object to work. Oh, thanks so much for resolving my confusion on that. > > It is not okay to use a page cache function on a slab object. The slab > object does not fulfill the alignment requirements of a page cache page > nor does it have a compatible page struct content as a page cache page. > This can only cause trouble. > > The problem is that the code is allocating some slab memory and then > determines the page struct pointer and then hands it back to the DMA > layer? > > > > Ok. I think your patch is fine as a quick fix for 2.6.22. I am a bit > > > uneasy with that given that its in such a broadly used function while its > > > only use is to enable flush_dcache_page to work. But we need the general > > > issue taken care of after 2.6.22. > > > > What general issue? > > How to flush slab objects in a reliable way. > > > Please see Documentation/cachetlb.txt: flush_dcache_page is about > > pagecache pages mapped into userspace. We don't use kmalloc for those, > > but we do sometimes need to flush_dcache_page in places which commonly > > deal with pagecache pages, but sometimes handle kmalloc'ed buffers too. > > Luckily we don't have to deal with buffers in which the first page is > > kmalloced and the next comes from pagecache. > > This gets crazier and crazier. flush_dcache_page is for pages not for > allocated buffers via kmalloc. So this has nothing to do with any need > to flush slab objects? > > Sometimes we do this and then we do that? So someone played loose > ball with the slab, was successful and that makes it right now? > > We need the VM_BUG_ON in include/linux/mm.h to stop this nonsense. > > The "sometimes we have kmalloced buffers" locations need to be fixed. I've said enough, I'd better leave it to others to deter you or not from fiddling around pointlessly here. But Andrew, please cancel my Acked-by to mm's add-vm_bug_on-in-case-someone-uses-page_mapping-on-a-slab-page.patch Thanks, Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Mon, 25 Jun 2007, Hugh Dickins wrote: > > > PageSlab to avoid oopsing on page->mapping. > > > > It is definitely intended to work. Otherwise we would not have code > > like this: > > > > [EMAIL PROTECTED]:~/linux-2.6$ find . -name "*.c" | xargs grep > > "flush_dcache_page"|grep virt > > ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev)); > > ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev)); > > I didn't claim that flush_dcache_page(virt_to_page(virt)) is not expected > to work. I claim that flush_dcache_page is expected to be a noop rather > than an oops on a kmalloced page. There are no kmalloced pages. There is only kmalloced memory. You allocate pages from the page allocator. Its a layering violation to expect a page struct operation on a slab object to work. It is not okay to use a page cache function on a slab object. The slab object does not fulfill the alignment requirements of a page cache page nor does it have a compatible page struct content as a page cache page. This can only cause trouble. The problem is that the code is allocating some slab memory and then determines the page struct pointer and then hands it back to the DMA layer? > > Ok. I think your patch is fine as a quick fix for 2.6.22. I am a bit > > uneasy with that given that its in such a broadly used function while its > > only use is to enable flush_dcache_page to work. But we need the general > > issue taken care of after 2.6.22. > > What general issue? How to flush slab objects in a reliable way. > Please see Documentation/cachetlb.txt: flush_dcache_page is about > pagecache pages mapped into userspace. We don't use kmalloc for those, > but we do sometimes need to flush_dcache_page in places which commonly > deal with pagecache pages, but sometimes handle kmalloc'ed buffers too. > Luckily we don't have to deal with buffers in which the first page is > kmalloced and the next comes from pagecache. This gets crazier and crazier. flush_dcache_page is for pages not for allocated buffers via kmalloc. So this has nothing to do with any need to flush slab objects? Sometimes we do this and then we do that? So someone played loose ball with the slab, was successful and that makes it right now? We need the VM_BUG_ON in include/linux/mm.h to stop this nonsense. The "sometimes we have kmalloced buffers" locations need to be fixed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Mon, 25 Jun 2007, Christoph Lameter wrote: > On Mon, 25 Jun 2007, Hugh Dickins wrote: > > > > In many situations the page struct passed to flush_dcache_page is > > > simply used to calculate the virtual address. So its mostly harmless. > > > Trouble starts when page attributes like the mapping is used. > > > > Mostly harmless indeed. I don't understand why you insist on trying > > to complicate the situation. flush_dcache_page is only expected to > > do something on pages mapped into userspace (correct me if I'm wrong > > there), it's expected to do nothing on kmalloc'ed pages. It's > > been working that way for years, and will continue to work that way > > with slub, providing either page_mapping or flush_dcache_page checks > > PageSlab to avoid oopsing on page->mapping. > > It is definitely intended to work. Otherwise we would not have code > like this: > > [EMAIL PROTECTED]:~/linux-2.6$ find . -name "*.c" | xargs grep > "flush_dcache_page"|grep virt > ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev)); > ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev)); I didn't claim that flush_dcache_page(virt_to_page(virt)) is not expected to work. I claim that flush_dcache_page is expected to be a noop rather than an oops on a kmalloced page. > > 2.6.22-rc6 has page_mapping making that check: we could argue about > > which is the better site for it, there are good arguments both ways > > (page_mapping is the correct place, flush_dcache_page is the more > > efficient place), I suggest we leave it as is. > > Ok. I think your patch is fine as a quick fix for 2.6.22. I am a bit > uneasy with that given that its in such a broadly used function while its > only use is to enable flush_dcache_page to work. But we need the general > issue taken care of after 2.6.22. What general issue? > > > > A kmalloc slab object (even 64 byte) may be crossing a page boundary > > > with a ARCH_KMALLOC_MINALIGN of 4 or 8. So I think that > > > flush_dcache_range *must* be used rather than flush_dcache_page. > > > > Why All we require of flush_dcache_page is that it not oops on > > the first page in the range: we don't need to change over to > > flush_dcache_range for that. > > As explained about: There are corner cases in which it does not work. You > seem to assume that flush_dcache_page can become a no op. That may not be > true on platforms that need explicit cache flushing for a DMA engine to > access a data structure. The above listed use suggests that the caller > expects flushing to occur correctly. The scsi_tgt_if.c use you show above? That's not dealing with kmalloced pages, is it? True, the pages it is dealing with don't have page->mapping set, so those architectures which use page->mapping to find what to do in their flush_dcache_page, won't do anything there in their flush_dcache_page. Whether that's a bug or not, I wouldn't pretend to know; but it's nothing to do with the present discussion. Please see Documentation/cachetlb.txt: flush_dcache_page is about pagecache pages mapped into userspace. We don't use kmalloc for those, but we do sometimes need to flush_dcache_page in places which commonly deal with pagecache pages, but sometimes handle kmalloc'ed buffers too. Luckily we don't have to deal with buffers in which the first page is kmalloced and the next comes from pagecache. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Mon, 25 Jun 2007, Hugh Dickins wrote: > > In many situations the page struct passed to flush_dcache_page is > > simply used to calculate the virtual address. So its mostly harmless. > > Trouble starts when page attributes like the mapping is used. > > Mostly harmless indeed. I don't understand why you insist on trying > to complicate the situation. flush_dcache_page is only expected to > do something on pages mapped into userspace (correct me if I'm wrong > there), it's expected to do nothing on kmalloc'ed pages. It's > been working that way for years, and will continue to work that way > with slub, providing either page_mapping or flush_dcache_page checks > PageSlab to avoid oopsing on page->mapping. It is definitely intended to work. Otherwise we would not have code like this: [EMAIL PROTECTED]:~/linux-2.6$ find . -name "*.c" | xargs grep "flush_dcache_page"|grep virt ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev)); ./drivers/scsi/scsi_tgt_if.c: flush_dcache_page(virt_to_page(ev)); > 2.6.22-rc6 has page_mapping making that check: we could argue about > which is the better site for it, there are good arguments both ways > (page_mapping is the correct place, flush_dcache_page is the more > efficient place), I suggest we leave it as is. Ok. I think your patch is fine as a quick fix for 2.6.22. I am a bit uneasy with that given that its in such a broadly used function while its only use is to enable flush_dcache_page to work. But we need the general issue taken care of after 2.6.22. > > A kmalloc slab object (even 64 byte) may be crossing a page boundary > > with a ARCH_KMALLOC_MINALIGN of 4 or 8. So I think that > > flush_dcache_range *must* be used rather than flush_dcache_page. > > Why All we require of flush_dcache_page is that it not oops on > the first page in the range: we don't need to change over to > flush_dcache_range for that. As explained about: There are corner cases in which it does not work. You seem to assume that flush_dcache_page can become a no op. That may not be true on platforms that need explicit cache flushing for a DMA engine to access a data structure. The above listed use suggests that the caller expects flushing to occur correctly. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Mon, 25 Jun 2007, Christoph Lameter wrote: > On Mon, 25 Jun 2007, Hugh Dickins wrote: > > > And I now rather think that needs to stay, not be replaced by the > > VM_BUG_ON Christoph was proposing for 2.6.23 (which earlier I acked). > > > > Christoph responded to my page_mapping patch by looking at arch/arm, > > and there finding a kmalloc in dma_alloc_coherent which he didn't > > like; but you're right, it's entirely irrelevant to Nicolas' oops. > > > > The slub allocation which gives rise to Nicolas' oops in not in > > ARM, but (I'm guessing) in drivers/mmc/core/sd.c: one of those > > status = kmalloc(64, GFP_KERNEL); > > where status is passed down for the response from mmc_sd_switch. > > > > And what is wrong with using kmalloc there? > > Why should that be changed to allocate a whole page? > > How many other such cases might there be? > > So someone effectively does a flush_dcache_page(virt_to_page(status))? Yes. > > > In the kmalloc case it's not mapped into userspace: flush_dcache_page > > should detect that and do nothing, as it does with slab; but slub was > > reusing page->mapping for something else, so we oopsed. > > If that is the case then what we really want is a flush_dcache_range > not the above. flush_dcache_range does not take a page struct as an > argument and it will work on memory that has no struct page backing it. > > Is flush_dcache_range available in all platforms? I see some drivers > using it: > > drivers/net/fec.c > drivers/serial/mpsc.c > drivers/char/agp/uninorth-agp.c > > > flush_dcache_page is implemented by > > sparc64 Uses mapping > shOk. Only uses PG_mapped > arm Uses mapping in the mmu case > frv Does a kmap_atomic ?? Otherwise looks ok. > ppc Clears PG_arch_1 > mips Uses mapping > sh64 No page struct use > pariscUses mapping > xtensaUses mapping > powerpc Handles page flags PG_arch_1 > ia64 Clears PG_arch_1 > sparc Calculates address based on page struct addr. > blackfin Does an immediate page_address(page) > m68k Does an immediate page_address(page) > > In many situations the page struct passed to flush_dcache_page is > simply used to calculate the virtual address. So its mostly harmless. > Trouble starts when page attributes like the mapping is used. Mostly harmless indeed. I don't understand why you insist on trying to complicate the situation. flush_dcache_page is only expected to do something on pages mapped into userspace (correct me if I'm wrong there), it's expected to do nothing on kmalloc'ed pages. It's been working that way for years, and will continue to work that way with slub, providing either page_mapping or flush_dcache_page checks PageSlab to avoid oopsing on page->mapping. 2.6.22-rc6 has page_mapping making that check: we could argue about which is the better site for it, there are good arguments both ways (page_mapping is the correct place, flush_dcache_page is the more efficient place), I suggest we leave it as is. > > So the problem platforms are > > sparc64 arm mips parisc xtensa > > If we indeed do these weird things then I think the general fix should > be to use flush_dcache_range() but that is too late for 2.6.22. The > VM_BUG_ON will be useful to detect these scenarios. Maybe we need > to replace that with a WARN_ON or something if the usage is frequent? > There are a large number of platforms on which flush_dcache_range has > no effect or an effect that is negligible. > > A kmalloc slab object (even 64 byte) may be crossing a page boundary > with a ARCH_KMALLOC_MINALIGN of 4 or 8. So I think that > flush_dcache_range *must* be used rather than flush_dcache_page. Why All we require of flush_dcache_page is that it not oops on the first page in the range: we don't need to change over to flush_dcache_range for that. > flush_dcache_page(virt_to_page(object)) takes the starting address of > the object and flushes the page in which the object started. It may > not be the complete object. This usually works fine with 64 byte objects > because they neatly fit into a slab page. Again if CONFIG_SLAB_DEBUG > f.e. is enabled then the alignment will no longer be to a 64 byte bound > but only to the alignment guaranteed by ARCH_KMALLOC_MINALIGN. If this > trick is used on a non kmalloc cache with a non power of size then we > may have a larger chance of trouble occurring. > > For 2.6.22 the easiest solution may be to check for PageSlab in the > flush_dcache_pages of the affected platforms and then count on > the users not enabling any slab debugging. Its then simply the same state > as before. We have the PageSlab check in page_mapping now, I don't see what further change is needed. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo
Re: Oops in a driver while using SLUB as a SLAB allocator
On Mon, 25 Jun 2007, Hugh Dickins wrote: > And I now rather think that needs to stay, not be replaced by the > VM_BUG_ON Christoph was proposing for 2.6.23 (which earlier I acked). > > Christoph responded to my page_mapping patch by looking at arch/arm, > and there finding a kmalloc in dma_alloc_coherent which he didn't > like; but you're right, it's entirely irrelevant to Nicolas' oops. > > The slub allocation which gives rise to Nicolas' oops in not in > ARM, but (I'm guessing) in drivers/mmc/core/sd.c: one of those > status = kmalloc(64, GFP_KERNEL); > where status is passed down for the response from mmc_sd_switch. > > And what is wrong with using kmalloc there? > Why should that be changed to allocate a whole page? > How many other such cases might there be? So someone effectively does a flush_dcache_page(virt_to_page(status))? > In the kmalloc case it's not mapped into userspace: flush_dcache_page > should detect that and do nothing, as it does with slab; but slub was > reusing page->mapping for something else, so we oopsed. If that is the case then what we really want is a flush_dcache_range not the above. flush_dcache_range does not take a page struct as an argument and it will work on memory that has no struct page backing it. Is flush_dcache_range available in all platforms? I see some drivers using it: drivers/net/fec.c drivers/serial/mpsc.c drivers/char/agp/uninorth-agp.c flush_dcache_page is implemented by sparc64 Uses mapping sh Ok. Only uses PG_mapped arm Uses mapping in the mmu case frv Does a kmap_atomic ?? Otherwise looks ok. ppc Clears PG_arch_1 mipsUses mapping sh64No page struct use parisc Uses mapping xtensa Uses mapping powerpc Handles page flags PG_arch_1 ia64Clears PG_arch_1 sparc Calculates address based on page struct addr. blackfinDoes an immediate page_address(page) m68kDoes an immediate page_address(page) In many situations the page struct passed to flush_dcache_page is simply used to calculate the virtual address. So its mostly harmless. Trouble starts when page attributes like the mapping is used. So the problem platforms are sparc64 arm mips parisc xtensa If we indeed do these weird things then I think the general fix should be to use flush_dcache_range() but that is too late for 2.6.22. The VM_BUG_ON will be useful to detect these scenarios. Maybe we need to replace that with a WARN_ON or something if the usage is frequent? There are a large number of platforms on which flush_dcache_range has no effect or an effect that is negligible. A kmalloc slab object (even 64 byte) may be crossing a page boundary with a ARCH_KMALLOC_MINALIGN of 4 or 8. So I think that flush_dcache_range *must* be used rather than flush_dcache_page. flush_dcache_page(virt_to_page(object)) takes the starting address of the object and flushes the page in which the object started. It may not be the complete object. This usually works fine with 64 byte objects because they neatly fit into a slab page. Again if CONFIG_SLAB_DEBUG f.e. is enabled then the alignment will no longer be to a 64 byte bound but only to the alignment guaranteed by ARCH_KMALLOC_MINALIGN. If this trick is used on a non kmalloc cache with a non power of size then we may have a larger chance of trouble occurring. For 2.6.22 the easiest solution may be to check for PageSlab in the flush_dcache_pages of the affected platforms and then count on the users not enabling any slab debugging. Its then simply the same state as before. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
Hugh Dickins : On Sun, 24 Jun 2007, Russell King wrote: On Sun, Jun 24, 2007 at 11:24:16AM +0100, Hugh Dickins wrote: On Sun, 24 Jun 2007, Russell King wrote: On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote: Please forward the original problem report. Done. Okay, that seems to back up my suspicions - it's definitely AT91-based. Since AT91-based machines do not have a DMA coherent cache, arch_is_coherent() must be defined to '0'. The only way that kmalloc could be reached is if that were defined to something other than '0', and if that's done on a machine with DMA incoherent caches, that will lead to data corruption. Yes, having looked through that now, I agree with you 100%. AT91 _is_ defined with the arch_is_coherent() switch set to 0 (in include/asm-arm/memory.h and not overloaded). I think we need to wait for Nicolas to respond on this issue before running headlong into applying a sticky plaster for something which is actually a deeper issue. No need for Nicolas to respond, I think I've found what's "wrong": see below. However, the arch_is_coherent() path _is_ buggy as it stands, but in more than the way identified thus far. Eg, it doesn't set __GFP_DMA appropriately for various DMA masks, so it might return DMA inaccessible memory. Ok it is bad but, in AT91, all memory is accessible with DMA. [..] The slub allocation which gives rise to Nicolas' oops in not in ARM, but (I'm guessing) in drivers/mmc/core/sd.c: one of those status = kmalloc(64, GFP_KERNEL); where status is passed down for the response from mmc_sd_switch. True, the oops appears after a mmc switch command (#6 command). [..] Not sure I can add much to what Hugh has said. If you need some more precision anyway, I will try to answer. Regards, -- Nicolas Ferre - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Sun, 24 Jun 2007, Russell King wrote: > On Sun, Jun 24, 2007 at 11:24:16AM +0100, Hugh Dickins wrote: > > On Sun, 24 Jun 2007, Russell King wrote: > > > On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote: > > > > > > Please forward the original problem report. > > > > Done. > > Okay, that seems to back up my suspicions - it's definitely AT91-based. > Since AT91-based machines do not have a DMA coherent cache, > arch_is_coherent() must be defined to '0'. The only way that kmalloc > could be reached is if that were defined to something other than '0', > and if that's done on a machine with DMA incoherent caches, that will > lead to data corruption. Yes, having looked through that now, I agree with you 100%. > > I think we need to wait for Nicolas to respond on this issue before > running headlong into applying a sticky plaster for something which is > actually a deeper issue. No need for Nicolas to respond, I think I've found what's "wrong": see below. > > However, the arch_is_coherent() path _is_ buggy as it stands, but in > more than the way identified thus far. Eg, it doesn't set __GFP_DMA > appropriately for various DMA masks, so it might return DMA inaccessible > memory. I expect you're right, but that's a separate issue. I had thought you were approving Christoph's ARM patch because both you and he seemed to agree that kmalloc was inappropriate for use in dma_alloc_coherent, whatever additional issues you saw with it. I still don't see why kmalloc is wrong there myself: for a while I bought Christoph's alignment argument, but now I don't see why (more than long) alignment is important to it. But I'm easily wrong when it comes to DMA mapping issues. > > If we're after a simple fix for 2.6.22, the _easiest_ solution would be > to delete the entire arch_is_coherent() branches in arch/arm/mm/consistent.c; > that will result in a working solution for everyone, albiet at a slightly > lower performance for the DMA-coherent CPUs. The fix for 2.6.22 is my PageSlab test in page_mapping which Linus already put into -git. And I now rather think that needs to stay, not be replaced by the VM_BUG_ON Christoph was proposing for 2.6.23 (which earlier I acked). Christoph responded to my page_mapping patch by looking at arch/arm, and there finding a kmalloc in dma_alloc_coherent which he didn't like; but you're right, it's entirely irrelevant to Nicolas' oops. The slub allocation which gives rise to Nicolas' oops in not in ARM, but (I'm guessing) in drivers/mmc/core/sd.c: one of those status = kmalloc(64, GFP_KERNEL); where status is passed down for the response from mmc_sd_switch. And what is wrong with using kmalloc there? Why should that be changed to allocate a whole page? How many other such cases might there be? And the flush_dcache_page in at91mci_post_dma_read looks correct to me too: it has just filled and perhaps also swabbed a buffer, that buffer might in some cases be mapped into userspace, so it calls flush_dcache_page. In the kmalloc case it's not mapped into userspace: flush_dcache_page should detect that and do nothing, as it does with slab; but slub was reusing page->mapping for something else, so we oopsed. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Sun, Jun 24, 2007 at 11:24:16AM +0100, Hugh Dickins wrote: > On Sun, 24 Jun 2007, Russell King wrote: > > On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote: > > > I'm quite happy with this approach for 2.6.23-rc, along with your ARM > > > dma_map patch which (if I understood aright) rmk approved. > > > > I didn't approve it. Please re-read my reply - there are still some > > unanswered questions in it which _really_ need answering. > > Sorry for misrepresenting you. > > > The report talks about the AT91 machines. These machines do not have > > cache coherent DMA. Therefore, the code being patched should be > > optimised away by the compiler. *Or* we have even bigger problems. > > > > Please forward the original problem report. > > Done. Okay, that seems to back up my suspicions - it's definitely AT91-based. Since AT91-based machines do not have a DMA coherent cache, arch_is_coherent() must be defined to '0'. The only way that kmalloc could be reached is if that were defined to something other than '0', and if that's done on a machine with DMA incoherent caches, that will lead to data corruption. I think we need to wait for Nicolas to respond on this issue before running headlong into applying a sticky plaster for something which is actually a deeper issue. However, the arch_is_coherent() path _is_ buggy as it stands, but in more than the way identified thus far. Eg, it doesn't set __GFP_DMA appropriately for various DMA masks, so it might return DMA inaccessible memory. If we're after a simple fix for 2.6.22, the _easiest_ solution would be to delete the entire arch_is_coherent() branches in arch/arm/mm/consistent.c; that will result in a working solution for everyone, albiet at a slightly lower performance for the DMA-coherent CPUs. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Sun, 24 Jun 2007, Russell King wrote: > On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote: > > I'm quite happy with this approach for 2.6.23-rc, along with your ARM > > dma_map patch which (if I understood aright) rmk approved. > > I didn't approve it. Please re-read my reply - there are still some > unanswered questions in it which _really_ need answering. Sorry for misrepresenting you. > The report talks about the AT91 machines. These machines do not have > cache coherent DMA. Therefore, the code being patched should be > optimised away by the compiler. *Or* we have even bigger problems. > > Please forward the original problem report. Done. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, Jun 22, 2007 at 07:39:33PM +0100, Hugh Dickins wrote: > On Fri, 22 Jun 2007, Christoph Lameter wrote: > > > > Add VM_BUG_ON in case someone uses page_mapping on a slab page > > > > Detect slab objects being passed to the page oriented functions of the VM. > > > > It is not sufficient to simply return NULL because the functions calling > > page_mapping may depend on other items of the page_struct also to be setup > > properly. Moreover the slab object may not be properly aligned. The page > > orientedfunctions of the VM expect to operate on page aligned, page sized > > objects. operations on objects straddling page boundaries may only affect > > the objects partially which may lead to surprising results. > > > > It is better to detect eventually remaining uses and eliminate them. > > > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > I'm quite happy with this approach for 2.6.23-rc, along with your ARM > dma_map patch which (if I understood aright) rmk approved. I didn't approve it. Please re-read my reply - there are still some unanswered questions in it which _really_ need answering. The report talks about the AT91 machines. These machines do not have cache coherent DMA. Therefore, the code being patched should be optimised away by the compiler. *Or* we have even bigger problems. Please forward the original problem report. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
* From: Christoph Lameter * Newsgroups: linux.kernel,linux.ports.arm.kernel * Date: Fri, 22 Jun 2007 13:15:19 -0700 (PDT) > > Here is the corresponding PA-RISC piece. Its as straightforward as > the other one since the only way to make this work properly in the past > was if the sizes passed to the dma alloc functions are page size based. [] > --- linux-2.6.orig/arch/parisc/kernel/pci-dma.c 2007-06-22 > 13:02:32.0 -0700 > +++ linux-2.6/arch/parisc/kernel/pci-dma.c2007-06-22 13:06:39.0 > -0700 > @@ -572,7 +572,7 @@ static void *pa11_dma_alloc_noncoherent( > void *addr = NULL; > > /* rely on kmalloc to be cacheline aligned */ > - addr = kmalloc(size, flag); > + addr = (void *)__get_free_pages(flag, get_order(size)); Seems like comment must be removed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Hugh Dickins wrote: > On Fri, 22 Jun 2007, Christoph Lameter wrote: > > > > We need to fix any remaining weird slab object uses right now. Your check > > leaves a lot of holes open. 2.6.22 removes all other such strange slab > > uses in other arches. It would be inconsistent if we left these things in > > ARM (and maybe PA-RISC). > > As I understand it, that driver used to work right with SLAB, then > oopsed with SLUB, and now works okay again with the page_mapping patch? Try to enable debugging then it may fail again despite your patch. You just scratched the surface with this and are enabling a dangerous usage mode with SLUB that we explicitly did not want to support anymore. > I'm unclear how it comes about that you removed "all other such strange > slab uses in other arches", yet missed this? That suggests there may > be further unexpected uses. There could be other uses that were missed. I looked for slabs created by kmem_cache_create. The trouble is that any kmalloc could also be used for engineer these weird things (as seen here) and there are gazillions of kmallocs. That is why a VM_BUG_ON is useful. However, it requires some effort even with SLAB to create these things and--given that we have tested extensively on lots of arches--I am hopeful that we have caught everything. > It worries me that any use which catches you by surprise has to be > fixed up in the caller, rather than in slub itself: slab/slub is a > service, not a master. But I'm rather repeating myself. SLUB used to implement the same special casing as SLAB did (which results in the fragile scenarios in which the above is possible). But we made the decision to clean up the slab interface and we dropped the emulation of the SLAB frills from SLUB. Messy and problematic code like this should be removed. That improves the quality of the kernel. The removal is a straightfoward process. And the cases that we are discussing here are in remote corners of the kernel. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Christoph Lameter wrote: > > We need to fix any remaining weird slab object uses right now. Your check > leaves a lot of holes open. 2.6.22 removes all other such strange slab > uses in other arches. It would be inconsistent if we left these things in > ARM (and maybe PA-RISC). As I understand it, that driver used to work right with SLAB, then oopsed with SLUB, and now works okay again with the page_mapping patch? I'm unclear how it comes about that you removed "all other such strange slab uses in other arches", yet missed this? That suggests there may be further unexpected uses. It worries me that any use which catches you by surprise has to be fixed up in the caller, rather than in slub itself: slab/slub is a service, not a master. But I'm rather repeating myself. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, Jun 22, 2007 at 09:40:45AM -0700, Linus Torvalds wrote: > > > On Fri, 22 Jun 2007, Hugh Dickins wrote: > > > > On Thu, 21 Jun 2007, Christoph Lameter wrote: > > > > > Maybe this will address the issue on ARM? > > > > Looks like it would indeed address the immediate issue on ARM - > > IF they've no particular reason to be using kmalloc there. > > I think the right thing to do is do both of these things. I already > applied Hugh's patch - it seemed like a total nobrainer to do at this > stage in the 2.6.22 -rc series. But that doesn't mean that we should not > _also_ look at "flush_dcache_page()" users. Note, however, that the use of flush_dcache_page() on allocations derived from dma_alloc_coherent() is undefined and unpredictable. dma_alloc_coherent() may return a remapped virtual address which would be invalid for things like virt_to_page() to operate on. So the fact that dma_alloc_coherent() returning something that was kmalloc'd and then causing flush_dcache_page() to oops is actually a sign that there's far deeper problems. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
Here is the corresponding PA-RISC piece. Its as straightforward as the other one since the only way to make this work properly in the past was if the sizes passed to the dma alloc functions are page size based. PA-RISC: Use page allocator instead of slab allocator. Slab pages obtained via kmalloc are not cachline aligned. Nor is it advisable to perform VM operations designed for page allocator pages on memory obtained via kmalloc. So replace the page sized allocations in kernel/pci-dma.c with page allocator pages. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/parisc/kernel/pci-dma.c |4 ++-- include/linux/mm.h |5 + 2 files changed, 3 insertions(+), 6 deletions(-) Index: linux-2.6/arch/parisc/kernel/pci-dma.c === --- linux-2.6.orig/arch/parisc/kernel/pci-dma.c 2007-06-22 13:02:32.0 -0700 +++ linux-2.6/arch/parisc/kernel/pci-dma.c 2007-06-22 13:06:39.0 -0700 @@ -572,7 +572,7 @@ static void *pa11_dma_alloc_noncoherent( void *addr = NULL; /* rely on kmalloc to be cacheline aligned */ - addr = kmalloc(size, flag); + addr = (void *)__get_free_pages(flag, get_order(size)); if(addr) *dma_handle = (dma_addr_t)virt_to_phys(addr); @@ -582,7 +582,7 @@ static void *pa11_dma_alloc_noncoherent( static void pa11_dma_free_noncoherent(struct device *dev, size_t size, void *vaddr, dma_addr_t iova) { - kfree(vaddr); + free_pages((unsigned long)vaddr, get_order(size)); return; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Hugh Dickins wrote: > Acked-by: Hugh Dickins <[EMAIL PROTECTED]> > (immediately after 2.6.22, accompanied by your ARM patch) We need to fix any remaining weird slab object uses right now. Your check leaves a lot of holes open. 2.6.22 removes all other such strange slab uses in other arches. It would be inconsistent if we left these things in ARM (and maybe PA-RISC). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Christoph Lameter wrote: > On Fri, 22 Jun 2007, Hugh Dickins wrote: > > > I'm quite happy with this approach for 2.6.23-rc, along with your ARM > > dma_map patch which (if I understood aright) rmk approved. Except, > > haven't you misplaced that VM_BUG_ON between an if and its else? > > Right. > > > And I'd much prefer you to make it an outright BUG_ON, because > > many testers have those VM_BUG_ONs configured out. > > Thought about it but doing so would create quite a load of BUG_ONs in the > VM given the frequent use of that particular inline function. And AFAIK > we are only aware of one other potential call site that could cause > trouble. Many arches have run SLUB now for awhile and would certainly have > shown issues if they would do strange things with slab allocs. Even with > SLAB they would have to be very careful in order to make this work. So I > think its rather unlikely that this is going to be triggered. Its > primarily useful for debugging if strange things start to happen. > The VM_BUG_ON could stay there for good to make sure development does not > result in similar issues in the future. Okay; and I was overlooking that (as in this case) we'd probably get an easily debuggable oops instead of the explicit BUG when it is configured out. > > Fixed up patch: > > > > > Add VM_BUG_ON in case someone uses page_mapping on a slab page > > Detect slab objects being passed to the page oriented functions of the VM. > > It is not sufficient to simply return NULL because the functions calling > page_mapping may depend on other items of the page_struct also to be setup > properly. Moreover slab object may not be properly aligned. The page oriented > functions of the VM expect to operate on page aligned, page sized objects. > Operations on object straddling page boundaries may only affect the objects > partially which may lead to surprising results. > > It is better to detect eventually remaining uses and eliminate them. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Acked-by: Hugh Dickins <[EMAIL PROTECTED]> (immediately after 2.6.22, accompanied by your ARM patch) > > --- > include/linux/mm.h |5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > Index: linux-2.6/include/linux/mm.h > === > --- linux-2.6.orig/include/linux/mm.h 2007-06-22 10:33:27.0 -0700 > +++ linux-2.6/include/linux/mm.h 2007-06-22 11:44:10.0 -0700 > @@ -601,12 +601,9 @@ static inline struct address_space *page > { > struct address_space *mapping = page->mapping; > > + VM_BUG_ON(PageSlab(page)); > if (unlikely(PageSwapCache(page))) > mapping = &swapper_space; > -#ifdef CONFIG_SLUB > - else if (unlikely(PageSlab(page))) > - mapping = NULL; > -#endif > else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) > mapping = NULL; > return mapping; > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Hugh Dickins wrote: > I'm quite happy with this approach for 2.6.23-rc, along with your ARM > dma_map patch which (if I understood aright) rmk approved. Except, > haven't you misplaced that VM_BUG_ON between an if and its else? Right. > And I'd much prefer you to make it an outright BUG_ON, because > many testers have those VM_BUG_ONs configured out. Thought about it but doing so would create quite a load of BUG_ONs in the VM given the frequent use of that particular inline function. And AFAIK we are only aware of one other potential call site that could cause trouble. Many arches have run SLUB now for awhile and would certainly have shown issues if they would do strange things with slab allocs. Even with SLAB they would have to be very careful in order to make this work. So I think its rather unlikely that this is going to be triggered. Its primarily useful for debugging if strange things start to happen. The VM_BUG_ON could stay there for good to make sure development does not result in similar issues in the future. Fixed up patch: Add VM_BUG_ON in case someone uses page_mapping on a slab page Detect slab objects being passed to the page oriented functions of the VM. It is not sufficient to simply return NULL because the functions calling page_mapping may depend on other items of the page_struct also to be setup properly. Moreover slab object may not be properly aligned. The page oriented functions of the VM expect to operate on page aligned, page sized objects. Operations on object straddling page boundaries may only affect the objects partially which may lead to surprising results. It is better to detect eventually remaining uses and eliminate them. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/mm.h |5 + 1 file changed, 1 insertion(+), 4 deletions(-) Index: linux-2.6/include/linux/mm.h === --- linux-2.6.orig/include/linux/mm.h 2007-06-22 10:33:27.0 -0700 +++ linux-2.6/include/linux/mm.h2007-06-22 11:44:10.0 -0700 @@ -601,12 +601,9 @@ static inline struct address_space *page { struct address_space *mapping = page->mapping; + VM_BUG_ON(PageSlab(page)); if (unlikely(PageSwapCache(page))) mapping = &swapper_space; -#ifdef CONFIG_SLUB - else if (unlikely(PageSlab(page))) - mapping = NULL; -#endif else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) mapping = NULL; return mapping; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Christoph Lameter wrote: > > Add VM_BUG_ON in case someone uses page_mapping on a slab page > > Detect slab objects being passed to the page oriented functions of the VM. > > It is not sufficient to simply return NULL because the functions calling > page_mapping may depend on other items of the page_struct also to be setup > properly. Moreover the slab object may not be properly aligned. The page > orientedfunctions of the VM expect to operate on page aligned, page sized > objects. operations on objects straddling page boundaries may only affect > the objects partially which may lead to surprising results. > > It is better to detect eventually remaining uses and eliminate them. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> I'm quite happy with this approach for 2.6.23-rc, along with your ARM dma_map patch which (if I understood aright) rmk approved. Except, haven't you misplaced that VM_BUG_ON between an if and its else? And I'd much prefer you to make it an outright BUG_ON, because many testers have those VM_BUG_ONs configured out. Hugh > > --- > include/linux/mm.h |5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > Index: linux-2.6/include/linux/mm.h > === > --- linux-2.6.orig/include/linux/mm.h 2007-06-22 10:33:27.0 -0700 > +++ linux-2.6/include/linux/mm.h 2007-06-22 10:34:09.0 -0700 > @@ -603,10 +603,7 @@ static inline struct address_space *page > > if (unlikely(PageSwapCache(page))) > mapping = &swapper_space; > -#ifdef CONFIG_SLUB > - else if (unlikely(PageSlab(page))) > - mapping = NULL; > -#endif > + VM_BUG_ON(PageSlab(page)); > else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) > mapping = NULL; > return mapping; > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
Add VM_BUG_ON in case someone uses page_mapping on a slab page Detect slab objects being passed to the page oriented functions of the VM. It is not sufficient to simply return NULL because the functions calling page_mapping may depend on other items of the page_struct also to be setup properly. Moreover the slab object may not be properly aligned. The page orientedfunctions of the VM expect to operate on page aligned, page sized objects. operations on objects straddling page boundaries may only affect the objects partially which may lead to surprising results. It is better to detect eventually remaining uses and eliminate them. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/mm.h |5 + 1 file changed, 1 insertion(+), 4 deletions(-) Index: linux-2.6/include/linux/mm.h === --- linux-2.6.orig/include/linux/mm.h 2007-06-22 10:33:27.0 -0700 +++ linux-2.6/include/linux/mm.h2007-06-22 10:34:09.0 -0700 @@ -603,10 +603,7 @@ static inline struct address_space *page if (unlikely(PageSwapCache(page))) mapping = &swapper_space; -#ifdef CONFIG_SLUB - else if (unlikely(PageSlab(page))) - mapping = NULL; -#endif + VM_BUG_ON(PageSlab(page)); else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) mapping = NULL; return mapping; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Linus Torvalds wrote: > > Looks like it would indeed address the immediate issue on ARM - > > IF they've no particular reason to be using kmalloc there. > > I think the right thing to do is do both of these things. I already > applied Hugh's patch - it seemed like a total nobrainer to do at this > stage in the 2.6.22 -rc series. But that doesn't mean that we should not > _also_ look at "flush_dcache_page()" users. Hugh's patch not address the complete issue. It only works right now because the size of the allocation is page size and fits right into a slab page. If debugging is enabled then the slab size will increase and the "pages" will be misaligned which will lead to other sorts of funky behavior. kmalloc allocations are only guaranteed to be aligned to ARCH_KMALLOC_MINALIGN which is 4 to 8 bytes. If one must have a page aligned entity out of a slab allocator then a custom slab needs to be created. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Hugh Dickins wrote: > > On Thu, 21 Jun 2007, Christoph Lameter wrote: > > > Maybe this will address the issue on ARM? > > Looks like it would indeed address the immediate issue on ARM - > IF they've no particular reason to be using kmalloc there. I think the right thing to do is do both of these things. I already applied Hugh's patch - it seemed like a total nobrainer to do at this stage in the 2.6.22 -rc series. But that doesn't mean that we should not _also_ look at "flush_dcache_page()" users. I do think that even just the name (the ".._page()" part) makes it obvious that it was designed for page-level allocations, not kmalloc(). So I think Christoph's patch makes sense in that context. At the same time, I do think that the whole notion of flushing the D$ is certainly something that makes sense for kmalloc() allocations also, and maybe people do actually do small DMA allocations and Christophs patch would break that. End result: for 2.6.22, I think the patch from Hugh that I already applied is the right thing. But as to 2.6.23 and onward.. I dunno. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
Nicolas Ferre : Marc Pignat : please use this patch, sorry for the later My eyes are too tired or this patch is the same as the previous one :-\ Indeed, my eyes where too tired ;-) Sorry for the trouble. -- Nicolas Ferre - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, Jun 22, 2007 at 05:26:33AM +0100, Hugh Dickins wrote: > On Thu, 21 Jun 2007, Christoph Lameter wrote: > > On Thu, 21 Jun 2007, Hugh Dickins wrote: > > > > > > The oops seems to occur after a page unmapping using dma_unmap_page() > > > > followed > > > > by a flush_dcache_page() (in at91mci_post_dma_read()). > > > > Was the page allocated using slab calls? > > You've found yes (in the ARM case). > > > Well one may be better off allocating pages using the page allocator > > instead of the slab allocator. I removed these things from i386 but I did > > not check ARM. > > They may or may not be: I think that's a matter to discuss with rmk. The coherent case on ARM is broken in more ways, not only because it uses kmalloc, but it also takes no notice of the DMA mask. However, AT91 isn't a coherent ARM architecture, so arch_is_coherent() should be false. Therefore, we should never be allocating pages for DMA from SLAB/SLUB for AT91 platforms. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Thu, 21 Jun 2007, Christoph Lameter wrote: > On Fri, 22 Jun 2007, Hugh Dickins wrote: > > > However... what gives you confidence that flush_dcache_page is > > never applied to other slab pages? > > Flush dcache page is supposed to run on pages not objects of varying > length. It is suprising that this has not lead to earlier problems. > Objects allocated this way may straddle a page boundary under some > conditions and in that case virt_to_page may not lead to a page that > covers the complete object that is supposed to be flushed. Hopefully the > "size" of the allocated object were whole pages. No, that's the wrong way round. Neither ARM nor PA-RISC expects flush_dcache_page to flush any dcache when given a slab allocation: they just expect it to pass through, not to oops. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Hugh Dickins wrote: > But PA-RISC also has a function called flush_dcache_page, which uses > page_mapping and expects a struct address_space * from it. If that > can ever be get applied to a SLOB page (which is not so clear as in > the ARM case, but cannot easily be ruled out completely), we're > in trouble without a PageSlab test within page_mapping. A page comes from the page allocator. Access to a slab allocators "page" is an access to subsystem internals. Those internals vary (including the representation of objects in the "page") and depend on the slab allocator selected, the debug options in effect and parameters with which the slab was created etc etc. If flush_dcache_page is applied to a slab object then we need to do a similar change to PA-RISC as for ARM. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Hugh Dickins wrote: > You keep on forcing the outside world to revolve around your needs > within slub.c: that is a good way to keep slub lean, and may be > justified; but it's at least questionable to be enforcing such > restrictions years after people have grown accustomed to more > freedom from their slabs. This work is a cleanup of the VM code and part of the slab cleanup that was done. Having slab objects on the LRU and other components of the VM that are supposed to work on page sized objects is plainly wrong and can lead to surprising results. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Thu, 21 Jun 2007, Christoph Lameter wrote: > On Thu, 21 Jun 2007, Hugh Dickins wrote: > > > Seems a little odd that it's gone throughout 2.6.22-rc unnoticed > > until now - nobody else trying SLUB on ARM or PA-RISC yet perhaps. > > The impact is only on a subset of ARM machines. > > PA_RISC? It looks like they run their own flushing function for byte > ranges called flush_kernel_dache_range. That does not use the page struct. PA-RISC does have a function of that name, and I'm guessing that you came across it in looking at the PA-RISC dma_map_single. But PA-RISC also has a function called flush_dcache_page, which uses page_mapping and expects a struct address_space * from it. If that can ever be get applied to a SLOB page (which is not so clear as in the ARM case, but cannot easily be ruled out completely), we're in trouble without a PageSlab test within page_mapping. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Fri, 22 Jun 2007, Hugh Dickins wrote: > However... what gives you confidence that flush_dcache_page is > never applied to other slab pages? Flush dcache page is supposed to run on pages not objects of varying length. It is suprising that this has not lead to earlier problems. Objects allocated this way may straddle a page boundary under some conditions and in that case virt_to_page may not lead to a page that covers the complete object that is supposed to be flushed. Hopefully the "size" of the allocated object were whole pages. > This looks to me like a clean way forward to try in -mm; but that > 2.6.22 should go with the safety PageSlab test in page_mapping. 2.6.22 cleans up these issues and this one follows in those footsteps. The reason for the introduction of the quicklist f.e. was to have a clear separation between page sized allocation and the variable allocations through slab allocators. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Thu, 21 Jun 2007, Christoph Lameter wrote: > Maybe this will address the issue on ARM? Looks like it would indeed address the immediate issue on ARM - IF they've no particular reason to be using kmalloc there. However... what gives you confidence that flush_dcache_page is never applied to other slab pages? This looks to me like a clean way forward to try in -mm; but that 2.6.22 should go with the safety PageSlab test in page_mapping. Hugh > > > ARM: Allocate dma pages via the page allocator and not via the slab allocator > > Slab allocations are not guaranteed to be page aligned and slab allocators > may use the page structs for their own purposes. Using the page allocator > yields a properly aligned page and also makes the page flushing logic work > right. > > Passing a kmalloced "page" to a flushing function will not work reliably. > > This will hopefully address the issue with SLUB on ARM. SLUB uses the > page->mapping field which is also checked by the flushing logic. The > flushing logic expects a normal page and not a slab page. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > --- > arch/arm/mm/consistent.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: linux-2.6/arch/arm/mm/consistent.c > === > --- linux-2.6.orig/arch/arm/mm/consistent.c 2007-06-21 18:18:15.0 > -0700 > +++ linux-2.6/arch/arm/mm/consistent.c2007-06-21 18:29:16.0 > -0700 > @@ -277,7 +277,7 @@ dma_alloc_coherent(struct device *dev, s > if (arch_is_coherent()) { > void *virt; > > - virt = kmalloc(size, gfp); > + virt = get_free_pages(gfp, get_order(size)); > if (!virt) > return NULL; > *handle = virt_to_dma(dev, virt); > @@ -364,7 +364,7 @@ void dma_free_coherent(struct device *de > WARN_ON(irqs_disabled()); > > if (arch_is_coherent()) { > - kfree(cpu_addr); > + free_pages((unsigned long)cpu_addr, get_order(size)); > return; > } > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Thu, 21 Jun 2007, Christoph Lameter wrote: > On Thu, 21 Jun 2007, Hugh Dickins wrote: > > > > The oops seems to occur after a page unmapping using dma_unmap_page() > > > followed > > > by a flush_dcache_page() (in at91mci_post_dma_read()). > > Was the page allocated using slab calls? You've found yes (in the ARM case). > Well one may be better off allocating pages using the page allocator > instead of the slab allocator. I removed these things from i386 but I did > not check ARM. They may or may not be: I think that's a matter to discuss with rmk. You keep on forcing the outside world to revolve around your needs within slub.c: that is a good way to keep slub lean, and may be justified; but it's at least questionable to be enforcing such restrictions years after people have grown accustomed to more freedom from their slabs. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Thu, 21 Jun 2007, Hugh Dickins wrote: > Seems a little odd that it's gone throughout 2.6.22-rc unnoticed > until now - nobody else trying SLUB on ARM or PA-RISC yet perhaps. The impact is only on a subset of ARM machines. PA_RISC? It looks like they run their own flushing function for byte ranges called flush_kernel_dache_range. That does not use the page struct. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
Maybe this will address the issue on ARM? ARM: Allocate dma pages via the page allocator and not via the slab allocator Slab allocations are not guaranteed to be page aligned and slab allocators may use the page structs for their own purposes. Using the page allocator yields a properly aligned page and also makes the page flushing logic work right. Passing a kmalloced "page" to a flushing function will not work reliably. This will hopefully address the issue with SLUB on ARM. SLUB uses the page->mapping field which is also checked by the flushing logic. The flushing logic expects a normal page and not a slab page. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/arm/mm/consistent.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6/arch/arm/mm/consistent.c === --- linux-2.6.orig/arch/arm/mm/consistent.c 2007-06-21 18:18:15.0 -0700 +++ linux-2.6/arch/arm/mm/consistent.c 2007-06-21 18:29:16.0 -0700 @@ -277,7 +277,7 @@ dma_alloc_coherent(struct device *dev, s if (arch_is_coherent()) { void *virt; - virt = kmalloc(size, gfp); + virt = get_free_pages(gfp, get_order(size)); if (!virt) return NULL; *handle = virt_to_dma(dev, virt); @@ -364,7 +364,7 @@ void dma_free_coherent(struct device *de WARN_ON(irqs_disabled()); if (arch_is_coherent()) { - kfree(cpu_addr); + free_pages((unsigned long)cpu_addr, get_order(size)); return; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Thu, 21 Jun 2007, Hugh Dickins wrote: > > The oops seems to occur after a page unmapping using dma_unmap_page() > > followed > > by a flush_dcache_page() (in at91mci_post_dma_read()). Was the page allocated using slab calls? > Seems a little odd that it's gone throughout 2.6.22-rc unnoticed > until now - nobody else trying SLUB on ARM or PA-RISC yet perhaps. > As I understand it, you're not doing anything wrong (disclaimer: > I'm no expert on dma_mapping things), but SLUB's reuse of struct > page fields has collided with what flush_dcache_page is expecting. > > Here's a patch: I'm not convinced it's necessarily the best one > (most uses of page_mapping will never see a slab page, it's a pity > to be cluttering up that inline even further); but in case nobody > else can provide a better... Well one may be better off allocating pages using the page allocator instead of the slab allocator. I removed these things from i386 but I did not check ARM. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
On Thu, 21 Jun 2007, Nicolas Ferre wrote: > > While debugging a Linux driver on my ARM platform (AT91), I switched on the > 2.6.22-rc5 kernel. While reconfiguring it I selected CONFIG_SLUB as a SLAB > allocator. > > The sd/mmc driver I tried to run is vanilla driver which never, until now, > produces Oops (at91_mci.c). > > The oops seems to occur after a page unmapping using dma_unmap_page() followed > by a flush_dcache_page() (in at91mci_post_dma_read()). ... > Unable to handle kernel NULL pointer dereference at virtual address 000d > pgd = c0004000 > [000d] *pgd= > Internal error: Oops: 1 [#1] > Modules linked in: > CPU: 0 > PC is at get_index+0x1c/0x50 > LR is at prio_tree_next+0x60/0x194 > pc : []lr : []Not tainted > sp : c3d0bca8 ip : ffdd fp : c3d0bcb4 > r10: 0040 r9 : c3d0be78 r8 : c3d0bcc4 > r7 : c3d0bcc0 r6 : r5 : c029635c r4 : c3d0bcfc > r3 : c3d0bcc0 r2 : c3d0bcc4 r1 : 0001 r0 : c3d0bcc0 > Flags: nZCv IRQs off FIQs on Mode SVC_32 Segment kernel > Control: 5317F > Table: 20004000 DAC: 0017 > Process kmmcd (pid: 761, stack limit = 0xc3d0a258) ... > Backtrace: > [] (get_index+0x0/0x50) from [] > (prio_tree_next+0x60/0x194) > [] (prio_tree_next+0x0/0x194) from [] > (vma_prio_tree_next+0x1c/0x88) > r8:c3d649e0 r7:c3d0be68 r6:c0298594 r5: r4: > [] (vma_prio_tree_next+0x0/0x88) from [] > (flush_dcache_page+0x108/0x124) > [] (flush_dcache_page+0x0/0x124) from [] > (at91_mci_irq+0x210/0x45c) > r6: r5:c3d0be68 r4:c3d0a000 > [] (at91_mci_irq+0x0/0x45c) from [] > (handle_IRQ_event+0x44/0x80) > [] (handle_IRQ_event+0x0/0x80) from [] ... > > Do you find a reason why I cannot use SLUB ? Did I missed something or do I > use a bad dma_xx or cache flush call in the driver ? That looks serious: thanks a lot for reporting it. (I see Marc has sent you a patch or two for the driver end: I didn't see their relevance, maybe they skirt around the problem somehow; but unless I'm mistaken, what you've found goes beyond that particular driver.) Seems a little odd that it's gone throughout 2.6.22-rc unnoticed until now - nobody else trying SLUB on ARM or PA-RISC yet perhaps. As I understand it, you're not doing anything wrong (disclaimer: I'm no expert on dma_mapping things), but SLUB's reuse of struct page fields has collided with what flush_dcache_page is expecting. Here's a patch: I'm not convinced it's necessarily the best one (most uses of page_mapping will never see a slab page, it's a pity to be cluttering up that inline even further); but in case nobody else can provide a better... [PATCH] page_mapping must avoid slub pages Nicolas Ferre reports oops from flush_dcache_page() on ARM when using SLUB: which reuses page->mapping as page->slab. The page_mapping() function, used by ARM and PA-RISC flush_dcache_page() implementations, must not confuse SLUB pages with those which have page->mapping set. Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]> --- That #ifdef I've put in there is not essential: it's perhaps more of a comment or an accusation ;) include/linux/mm.h |4 1 file changed, 4 insertions(+) --- 2.6.22-rc5/include/linux/mm.h 2007-05-26 07:16:48.0 +0100 +++ linux/include/linux/mm.h2007-06-21 15:52:47.0 +0100 @@ -603,6 +603,10 @@ static inline struct address_space *page if (unlikely(PageSwapCache(page))) mapping = &swapper_space; +#ifdef CONFIG_SLUB + else if (unlikely(PageSlab(page))) + mapping = NULL; +#endif else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) mapping = NULL; return mapping; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
Marc Pignat : please use this patch, sorry for the later My eyes are too tired or this patch is the same as the previous one :-\ Cheers, -- Nicolas Ferre - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
please use this patch, sorry for the later Regards Marc --- drivers/mmc/host/at91_mci.c-2.6.22-rc5.orig 2007-06-21 16:27:31.0 +0200 +++ drivers/mmc/host/at91_mci.c-2.6.22-rc5 2007-06-21 16:42:48.0 +0200 @@ -164,7 +164,7 @@ static inline void at91mci_sg_to_dma(str else memcpy(dmabuf, sgbuffer, amount); - kunmap_atomic(sgbuffer, KM_BIO_SRC_IRQ); + kunmap_atomic(sgbuffer - sg->offset, KM_BIO_SRC_IRQ); if (size == 0) break; @@ -293,7 +293,7 @@ static void at91mci_post_dma_read(struct buffer[index] = swab32(buffer[index]); } - kunmap_atomic(buffer, KM_BIO_SRC_IRQ); + kunmap_atomic(buffer - sg->offset, KM_BIO_SRC_IRQ); flush_dcache_page(sg->page); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in a driver while using SLUB as a SLAB allocator
Hello Nicolas! Good news! I think I've found the problem, this seems to work (tested with SLUB and SLAB). If you're in the cleanup stage, I think the whole kmap and kunmap can be in the 'if (cpu_is_at91rm9200())', we have no reason to kmap data we don't touch :-D Regards Marc --- drivers/mmc/host/at91_mci.c-2.6.22-rc5.orig 2007-06-21 16:27:31.0 +0200 +++ drivers/mmc/host/at91_mci.c-2.6.22-rc5 2007-06-21 16:42:48.0 +0200 @@ -164,7 +164,7 @@ static inline void at91mci_sg_to_dma(str else memcpy(dmabuf, sgbuffer, amount); - kunmap_atomic(sgbuffer, KM_BIO_SRC_IRQ); + kunmap_atomic(sgbuffer - sg->offset, KM_BIO_SRC_IRQ); if (size == 0) break; @@ -293,7 +293,7 @@ static void at91mci_post_dma_read(struct buffer[index] = swab32(buffer[index]); } - kunmap_atomic(buffer, KM_BIO_SRC_IRQ); + kunmap_atomic(buffer - sg->buffer, KM_BIO_SRC_IRQ); flush_dcache_page(sg->page); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/