Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On 29.10.2015 [18:49:55 -0700], David Miller wrote: > From: Nishanth Aravamudan > Date: Thu, 29 Oct 2015 08:57:01 -0700 > > > So, would that imply changing just the NVMe driver code rather than > > adding the dma_page_shift API at all? What about > > architectures that can support the larger page sizes? There is an > > implied performance impact, at least, of shifting the IO size down. > > > > Sorry for the continuing questions -- I got lots of conflicting feedback > > on the last series and want to make sure v4 is more acceptable. > > In the long term I would be very happy to see us having a real interface > for this stuff, just my opinion... Yep, I think I'll try and balance the two -- fix NVMe for now with a 4K page size as suggested by Christoph, and then work on the more complete API for the next merge. Thanks, Nish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
From: Nishanth Aravamudan Date: Thu, 29 Oct 2015 08:57:01 -0700 > So, would that imply changing just the NVMe driver code rather than > adding the dma_page_shift API at all? What about > architectures that can support the larger page sizes? There is an > implied performance impact, at least, of shifting the IO size down. > > Sorry for the continuing questions -- I got lots of conflicting feedback > on the last series and want to make sure v4 is more acceptable. In the long term I would be very happy to see us having a real interface for this stuff, just my opinion... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On Thu, Oct 29, 2015 at 08:57:01AM -0700, Nishanth Aravamudan wrote: > On 29.10.2015 [04:55:36 -0700], Christoph Hellwig wrote: > > We had a quick cht about this issue and I think we simply should > > default to a NVMe controler page size of 4k everywhere as that's the > > safe default. This is also what we do for RDMA Memory reigstrations and > > it works fine there for SRP and iSER. > > So, would that imply changing just the NVMe driver code rather than > adding the dma_page_shift API at all? What about > architectures that can support the larger page sizes? There is an > implied performance impact, at least, of shifting the IO size down. It is the safe option, but you're right that it might have a measurable performance impact (can you run an experiment?). Maybe we should just change the driver to always use MPSMIN for the moment in the interest of time, and you can flush out the new API before the next merge window. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On 29.10.2015 [04:55:36 -0700], Christoph Hellwig wrote: > On Wed, Oct 28, 2015 at 01:59:23PM +, Busch, Keith wrote: > > The "new" interface for all the other architectures is the same as the > > old one we've been using for the last 5 years. > > > > I welcome x86 maintainer feedback to confirm virtual and DMA addresses > > have the same offset at 4k alignment, but I have to insist we don't > > break my currently working hardware to force their attention. > > We had a quick cht about this issue and I think we simply should > default to a NVMe controler page size of 4k everywhere as that's the > safe default. This is also what we do for RDMA Memory reigstrations and > it works fine there for SRP and iSER. So, would that imply changing just the NVMe driver code rather than adding the dma_page_shift API at all? What about architectures that can support the larger page sizes? There is an implied performance impact, at least, of shifting the IO size down. Sorry for the continuing questions -- I got lots of conflicting feedback on the last series and want to make sure v4 is more acceptable. Thanks, Nish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On Wed, Oct 28, 2015 at 01:59:23PM +, Busch, Keith wrote: > The "new" interface for all the other architectures is the same as the > old one we've been using for the last 5 years. > > I welcome x86 maintainer feedback to confirm virtual and DMA addresses > have the same offset at 4k alignment, but I have to insist we don't > break my currently working hardware to force their attention. We had a quick cht about this issue and I think we simply should default to a NVMe controler page size of 4k everywhere as that's the safe default. This is also what we do for RDMA Memory reigstrations and it works fine there for SRP and iSER. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On Tue, Oct 27, 2015 at 05:54:43PM -0700, David Miller wrote: > From: "Busch, Keith" > Date: Tue, 27 Oct 2015 22:36:43 + > > > If you're suggesting to compile-time break architectures that currently > > work just fine with NVMe, let me stop you right there. > > Silently "working" without the architecture maintainer having to explicity > look at the new interface and make sure his platform is implementing it > properly is an extremely bad practice. It won't work if it's wrong. It'll BUG_ON, and I'll be assigned to help fix it, like what's happened here (on a private bugzilla). The "new" interface for all the other architectures is the same as the old one we've been using for the last 5 years. I welcome x86 maintainer feedback to confirm virtual and DMA addresses have the same offset at 4k alignment, but I have to insist we don't break my currently working hardware to force their attention. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On 27.10.2015 [17:53:22 -0700], David Miller wrote: > From: Nishanth Aravamudan > Date: Tue, 27 Oct 2015 15:20:10 -0700 > > > Well, looks like I should spin up a v4 anyways for the powerpc changes. > > So, to make sure I understand your point, should I make the generic > > dma_get_page_shift a compile-error kind of thing? It will only fail on > > architectures that actually build the NVME driver (as the only caller). > > But I'm not sure how exactly to achieve that, if you could give a bit > > more detail I'd appreciate it! > > Yes, I am basically suggesting to simply not provide a default at all. For my own edification -- what is the way that gets resolved? I guess I mean it seems like linux-next would cease to compile because of my new series. Would my patches just get kicked out of -next for introducing that (or even via the 0-day notifications), or should I put something into the commit message indicating it is an API introduction? Sorry for the tentativeness, I have not introduce a cross-architecture API like this before. Thanks, Nish > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
From: Julian Calaby Date: Wed, 28 Oct 2015 10:43:35 +1100 > You'll be CCing the maintainers of each architecture on the patches to > add the functions, so if they do have specific requirements, I'm sure > they'll let you know or provide patches. People miss things, maintainers get busy, so while a CC: is important and appreciated, it doesn't ensure a correct implementation is likely to result. Personally, I'd much rather my arch stop building so I have to go in there to explicitly look at the reason why and make sure I fill in the missing pieces properly and accurately. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
From: "Busch, Keith" Date: Tue, 27 Oct 2015 22:36:43 + > If you're suggesting to compile-time break architectures that currently > work just fine with NVMe, let me stop you right there. Silently "working" without the architecture maintainer having to explicity look at the new interface and make sure his platform is implementing it properly is an extremely bad practice. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
From: Nishanth Aravamudan Date: Tue, 27 Oct 2015 15:20:10 -0700 > Well, looks like I should spin up a v4 anyways for the powerpc changes. > So, to make sure I understand your point, should I make the generic > dma_get_page_shift a compile-error kind of thing? It will only fail on > architectures that actually build the NVME driver (as the only caller). > But I'm not sure how exactly to achieve that, if you could give a bit > more detail I'd appreciate it! Yes, I am basically suggesting to simply not provide a default at all. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On Wed, 2015-10-28 at 10:43 +1100, Julian Calaby wrote: > Hi Nishanth, > You'll be CCing the maintainers of each architecture on the patches > to > add the functions, so if they do have specific requirements, I'm sure > they'll let you know or provide patches. That sort of accross-all-arch change tend to take forever. I'd rather get the existing series in to fix the problem, we can look into improving things later but I tend to think that the default of using PAGE_SHIFT in asm-generic will be fine for most archs. Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
Hi Nishanth, On Wed, Oct 28, 2015 at 10:40 AM, Nishanth Aravamudan wrote: > On 28.10.2015 [09:57:48 +1100], Julian Calaby wrote: >> Hi Nishanth, >> >> On Wed, Oct 28, 2015 at 9:20 AM, Nishanth Aravamudan >> wrote: >> > On 26.10.2015 [18:27:46 -0700], David Miller wrote: >> >> From: Nishanth Aravamudan >> >> Date: Fri, 23 Oct 2015 13:54:20 -0700 >> >> >> >> > 1) add a generic dma_get_page_shift implementation that just returns >> >> > PAGE_SHIFT >> >> >> >> I won't object to this patch series, but if I had implemented this I >> >> would have required the architectures to implement this explicitly, >> >> one-by-one. I think it is less error prone and more likely to end >> >> up with all the architectures setting this correctly. >> > >> > Well, looks like I should spin up a v4 anyways for the powerpc changes. >> > So, to make sure I understand your point, should I make the generic >> > dma_get_page_shift a compile-error kind of thing? It will only fail on >> > architectures that actually build the NVME driver (as the only caller). >> > But I'm not sure how exactly to achieve that, if you could give a bit >> > more detail I'd appreciate it! >> >> He's suggesting that you _don't_ put a generic implementation in >> /include/linux/dma-mapping.h and instead add it to _every_ >> architecture. > > Ah, I see! Well, I don't know much about the DMA internals of most > architectures -- and my approach kept things functionally the same > everywhere (using PAGE_SHIFT) except: > > a) Power, where I know it doesn't work as-is > and > b) sparc, where the code implied that a different value than PAGE_SHIFT > should be used. You'll be CCing the maintainers of each architecture on the patches to add the functions, so if they do have specific requirements, I'm sure they'll let you know or provide patches. Thanks, -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On 28.10.2015 [09:57:48 +1100], Julian Calaby wrote: > Hi Nishanth, > > On Wed, Oct 28, 2015 at 9:20 AM, Nishanth Aravamudan > wrote: > > On 26.10.2015 [18:27:46 -0700], David Miller wrote: > >> From: Nishanth Aravamudan > >> Date: Fri, 23 Oct 2015 13:54:20 -0700 > >> > >> > 1) add a generic dma_get_page_shift implementation that just returns > >> > PAGE_SHIFT > >> > >> I won't object to this patch series, but if I had implemented this I > >> would have required the architectures to implement this explicitly, > >> one-by-one. I think it is less error prone and more likely to end > >> up with all the architectures setting this correctly. > > > > Well, looks like I should spin up a v4 anyways for the powerpc changes. > > So, to make sure I understand your point, should I make the generic > > dma_get_page_shift a compile-error kind of thing? It will only fail on > > architectures that actually build the NVME driver (as the only caller). > > But I'm not sure how exactly to achieve that, if you could give a bit > > more detail I'd appreciate it! > > He's suggesting that you _don't_ put a generic implementation in > /include/linux/dma-mapping.h and instead add it to _every_ > architecture. Ah, I see! Well, I don't know much about the DMA internals of most architectures -- and my approach kept things functionally the same everywhere (using PAGE_SHIFT) except: a) Power, where I know it doesn't work as-is and b) sparc, where the code implied that a different value than PAGE_SHIFT should be used. Thanks, Nish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
Hi Nishanth, On Wed, Oct 28, 2015 at 9:20 AM, Nishanth Aravamudan wrote: > On 26.10.2015 [18:27:46 -0700], David Miller wrote: >> From: Nishanth Aravamudan >> Date: Fri, 23 Oct 2015 13:54:20 -0700 >> >> > 1) add a generic dma_get_page_shift implementation that just returns >> > PAGE_SHIFT >> >> I won't object to this patch series, but if I had implemented this I >> would have required the architectures to implement this explicitly, >> one-by-one. I think it is less error prone and more likely to end >> up with all the architectures setting this correctly. > > Well, looks like I should spin up a v4 anyways for the powerpc changes. > So, to make sure I understand your point, should I make the generic > dma_get_page_shift a compile-error kind of thing? It will only fail on > architectures that actually build the NVME driver (as the only caller). > But I'm not sure how exactly to achieve that, if you could give a bit > more detail I'd appreciate it! He's suggesting that you _don't_ put a generic implementation in /include/linux/dma-mapping.h and instead add it to _every_ architecture. Thanks, -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On Tue, Oct 27, 2015 at 03:20:10PM -0700, Nishanth Aravamudan wrote: > On 26.10.2015 [18:27:46 -0700], David Miller wrote: > > From: Nishanth Aravamudan > > Date: Fri, 23 Oct 2015 13:54:20 -0700 > > > > > 1) add a generic dma_get_page_shift implementation that just returns > > > PAGE_SHIFT > > > > I won't object to this patch series, but if I had implemented this I > > would have required the architectures to implement this explicitly, > > one-by-one. I think it is less error prone and more likely to end > > up with all the architectures setting this correctly. > > Well, looks like I should spin up a v4 anyways for the powerpc changes. > So, to make sure I understand your point, should I make the generic > dma_get_page_shift a compile-error kind of thing? It will only fail on > architectures that actually build the NVME driver (as the only caller). > But I'm not sure how exactly to achieve that, if you could give a bit > more detail I'd appreciate it! If you're suggesting to compile-time break architectures that currently work just fine with NVMe, let me stop you right there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
On 26.10.2015 [18:27:46 -0700], David Miller wrote: > From: Nishanth Aravamudan > Date: Fri, 23 Oct 2015 13:54:20 -0700 > > > 1) add a generic dma_get_page_shift implementation that just returns > > PAGE_SHIFT > > I won't object to this patch series, but if I had implemented this I > would have required the architectures to implement this explicitly, > one-by-one. I think it is less error prone and more likely to end > up with all the architectures setting this correctly. Well, looks like I should spin up a v4 anyways for the powerpc changes. So, to make sure I understand your point, should I make the generic dma_get_page_shift a compile-error kind of thing? It will only fail on architectures that actually build the NVME driver (as the only caller). But I'm not sure how exactly to achieve that, if you could give a bit more detail I'd appreciate it! Thanks, Nish -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
From: Nishanth Aravamudan Date: Fri, 23 Oct 2015 13:54:20 -0700 > 1) add a generic dma_get_page_shift implementation that just returns > PAGE_SHIFT I won't object to this patch series, but if I had implemented this I would have required the architectures to implement this explicitly, one-by-one. I think it is less error prone and more likely to end up with all the architectures setting this correctly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5 v3] Fix NVMe driver support on Power with 32-bit DMA
[Sorry, subject should have been 0/7!] On 23.10.2015 [13:54:20 -0700], Nishanth Aravamudan wrote: > We received a bug report recently when DDW (64-bit direct DMA on Power) > is not enabled for NVMe devices. In that case, we fall back to 32-bit > DMA via the IOMMU, which is always done via 4K TCEs (Translation Control > Entries). > > The NVMe device driver, though, assumes that the DMA alignment for the > PRP entries will match the device's page size, and that the DMA aligment > matches the kernel's page aligment. On Power, the the IOMMU page size, > as mentioned above, can be 4K, while the device can have a page size of > 8K, while the kernel has a page size of 64K. This eventually trips the > BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple > of 4K but not 8K (e.g., 0xF000). > > In this particular case, and generally, we want to use the IOMMU's page > size for the default device page size, rather than the kernel's page > size. > > This series consists of five patches: > > 1) add a generic dma_get_page_shift implementation that just returns > PAGE_SHIFT > 2) override the generic implementation on Power to use the IOMMU table's > page shift if available > 3) allow further specific overriding on power with machdep platform > overrides > 4) use the machdep override on pseries, as the DDW code puts the TCE > shift in a special property and there is no IOMMU table available > 5) move some sparc code around to make IOMMU_PAGE_SHIFT available in > include/asm > 6) override the generic implementation on sparce to use IOMMU_PAGE_SHIFT > 7) leverage the new API in the NVMe driver > > With these patches, a NVMe device survives our internal hardware > exerciser; the kernel BUGs within a few seconds without the patch. > > arch/powerpc/include/asm/dma-mapping.h | 3 +++ > arch/powerpc/include/asm/machdep.h | 3 ++- > arch/powerpc/kernel/dma.c | 11 +++ > arch/powerpc/platforms/pseries/iommu.c | 36 > > arch/sparc/include/asm/dma-mapping.h | 8 > arch/sparc/include/asm/iommu_common.h | 51 > +++ > arch/sparc/kernel/iommu.c | 2 +- > arch/sparc/kernel/iommu_common.h | 51 > --- > arch/sparc/kernel/pci_psycho.c | 2 +- > arch/sparc/kernel/pci_sabre.c | 2 +- > arch/sparc/kernel/pci_schizo.c | 2 +- > arch/sparc/kernel/pci_sun4v.c | 2 +- > arch/sparc/kernel/psycho_common.c | 2 +- > arch/sparc/kernel/sbus.c | 3 +-- > drivers/block/nvme-core.c | 3 ++- > include/linux/dma-mapping.h| 7 +++ > 16 files changed, 127 insertions(+), 61 deletions(-) > > v1 -> v2: > Based upon feedback from Christoph Hellwig, rather than using an > arch-specific hack, expose the DMA page shift via a generic DMA API and > override it on Power as needed. > v2 -> v3: > Based upon feedback from Christoph Hellwig, put the generic > implementation in include/linux/dma-mapping.h, since not all archs use > include/asm-generic/dma-mapping-common.h. > Add sparc implementation, as that arch seems to have a different IOMMU > page size. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/