Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Thu, Apr 21, 2016 at 4:39 PM, Christoph Hellwig wrote: > On Thu, Apr 21, 2016 at 04:37:42PM +0300, Or Gerlitz wrote: >> On Wed, Apr 20, 2016 at 9:40 PM, Eran Ben Elisha >> wrote: >> >> It is been 1.5 years since I reported the problem. We came up with three >> >> different solutions this week. I'd like to see a version of the solution >> >> to get merged until Mellanox comes up with a better solution with another >> >> patch. My proposal is to use this one. >> >> > We will post our suggestion here in the following days. >> >> To update, Haggai A from our driver team is working on a patch. He is >> providing a copy for >> testing over ARM to the folks that reported on the problem and will >> post it here early next week. > > Any chance you could give feedback to the patch I posted this week? Haggai just posted Mellanox fix to this issue. Your suggestion discards the option to work with fragmented memory at mlx4_ib, which is unnecessary. Please see our suggestion, comments are welcome.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Thu, Apr 21, 2016 at 4:39 PM, Christoph Hellwig wrote: > On Thu, Apr 21, 2016 at 04:37:42PM +0300, Or Gerlitz wrote: >> On Wed, Apr 20, 2016 at 9:40 PM, Eran Ben Elisha >> wrote: >> >> It is been 1.5 years since I reported the problem. We came up with three >> >> different solutions this week. I'd like to see a version of the solution >> >> to get merged until Mellanox comes up with a better solution with another >> >> patch. My proposal is to use this one. >> >> > We will post our suggestion here in the following days. >> >> To update, Haggai A from our driver team is working on a patch. He is >> providing a copy for >> testing over ARM to the folks that reported on the problem and will >> post it here early next week. > > Any chance you could give feedback to the patch I posted this week? I missed your patch, looking on it now down this thread, I think Haggai's patch is very much similar to yours, he's also touching some more code in the IB driver. Or.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Thu, Apr 21, 2016 at 04:37:42PM +0300, Or Gerlitz wrote: > On Wed, Apr 20, 2016 at 9:40 PM, Eran Ben Elisha > wrote: > >> It is been 1.5 years since I reported the problem. We came up with three > >> different solutions this week. I'd like to see a version of the solution > >> to get merged until Mellanox comes up with a better solution with another > >> patch. My proposal is to use this one. > > > We will post our suggestion here in the following days. > > To update, Haggai A from our driver team is working on a patch. He is > providing a copy for > testing over ARM to the folks that reported on the problem and will > post it here early next week. Any chance you could give feedback to the patch I posted this week?
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Wed, Apr 20, 2016 at 9:40 PM, Eran Ben Elisha wrote: >> It is been 1.5 years since I reported the problem. We came up with three >> different solutions this week. I'd like to see a version of the solution >> to get merged until Mellanox comes up with a better solution with another >> patch. My proposal is to use this one. > We will post our suggestion here in the following days. To update, Haggai A from our driver team is working on a patch. He is providing a copy for testing over ARM to the folks that reported on the problem and will post it here early next week. Or.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/20/2016 2:40 PM, Eran Ben Elisha wrote: >> >> It is been 1.5 years since I reported the problem. We came up with three >> different solutions this week. I'd like to see a version of the solution >> to get merged until Mellanox comes up with a better solution with another >> patch. My proposal is to use this one. >> > > We will post our suggestion here in the following days. > Thanks, please have me in CC. I'm not subscribed to this group normally. I can post a tested-by after testing. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
> > It is been 1.5 years since I reported the problem. We came up with three > different solutions this week. I'd like to see a version of the solution > to get merged until Mellanox comes up with a better solution with another > patch. My proposal is to use this one. > We will post our suggestion here in the following days.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
Sinan Kaya wrote: I'd like to see a version of the solution to get merged until Mellanox comes up with a better solution with another patch. Yes, I agree 100%. -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation collaborative project.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
Apologies, Replied to an older post by mistake. I was trying to reply to Eran. >Hi Sinan, > >We are working in Mellanox for a solution which >removes the vmap call and allocate contiguous memory (using >dma_alloc_coherent). > >Thanks, >Eran > > >On 4/20/2016 9:35 AM, Sinan Kaya wrote: > On 4/19/2016 2:22 PM, Christoph Hellwig wrote: >> What I think we need is something like the patch below. In the long >> ru nwe should also kill the mlx4_buf structure which now is pretty >> pointless. >> > It is been 1.5 years since I reported the problem. We came up with three different solutions this week. I'd like to see a version of the solution to get merged until Mellanox comes up with a better solution with another patch. My proposal is to use this one. > > -- a/drivers/net/ethernet/mellanox/mlx4/alloc.c > +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c > @@ -588,7 +588,7 @@ int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int > max_direct, > { > dma_addr_t t; > > - if (size <= max_direct) { > + if ((size <= max_direct) || (BITS_PER_LONG == 64)){ > buf->nbufs= 1; > buf->npages = 1; > buf->page_shift = get_order(size) + PAGE_SHIFT; > > Of course, this is assuming that you are not ready to submit your patch yet. > If you > are, feel free to post. > -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/19/2016 2:22 PM, Christoph Hellwig wrote: > What I think we need is something like the patch below. In the long > ru nwe should also kill the mlx4_buf structure which now is pretty > pointless. > It is been 1.5 years since I reported the problem. We came up with three different solutions this week. I'd like to see a version of the solution to get merged until Mellanox comes up with a better solution with another patch. My proposal is to use this one. -- a/drivers/net/ethernet/mellanox/mlx4/alloc.c +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c @@ -588,7 +588,7 @@ int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct, { dma_addr_t t; - if (size <= max_direct) { + if ((size <= max_direct) || (BITS_PER_LONG == 64)){ buf->nbufs= 1; buf->npages = 1; buf->page_shift = get_order(size) + PAGE_SHIFT; Of course, this is assuming that you are not ready to submit your patch yet. If you are, feel free to post. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
Hi Sinan, We are working in Mellanox for a solution which removes the vmap call and allocate contiguous memory (using dma_alloc_coherent). Thanks, Eran On Tue, Apr 19, 2016 at 9:37 PM, Sinan Kaya wrote: > On 4/19/2016 2:22 PM, Christoph Hellwig wrote: >> What I think we need is something like the patch below. In the long >> ru nwe should also kill the mlx4_buf structure which now is pretty >> pointless. > > Maybe; this could be the correct approach if we can guarantee that the > architecture can allocate the requested amount of memory with > dma_alloc_coherent. > > When I brought this issue a year ago, the objection was that my code > doesn't compile on intel (dma_to_phys) and also some arches run out of > DMA memory with existing customer base. > > If there is a real need to maintain compatibility with the existing > architectures due to limited amount of DMA memory, we need to correct this > code to make proper use of vmap via alloc_pages and also insert the > dma_sync in proper places for DMA API conformance. > > Also, the tx descriptors always has to be allocated from a single DMA region > or the tx code needs to be corrected to support page_list. > > If we can live with just using dma_alloc_coherent, your solution is > better. I was trying to put this support for 64bit arches only while > maintaining support for the existing code base. > >> >> --- >> From a493881d2a6c90152d3daabb7b6b3afd1d254d78 Mon Sep 17 00:00:00 2001 >> From: Christoph Hellwig >> Date: Tue, 19 Apr 2016 14:12:14 -0400 >> Subject: mlx4_en: don't try to split and vmap dma coherent allocations >> >> The memory returned by dma_alloc_coherent is not suitable for calling >> virt_to_page on it, as it might for example come from vmap allocator. >> >> Remove the code that calls virt_to_page and vmap on dma coherent >> allocations from the mlx4 drivers, and replace them with a single >> high-order dma_alloc_coherent call. >> >> Signed-off-by: Christoph Hellwig >> Reported-by: Sinan Kaya > > > -- > Sinan Kaya > Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/19/2016 2:22 PM, Christoph Hellwig wrote: > What I think we need is something like the patch below. In the long > ru nwe should also kill the mlx4_buf structure which now is pretty > pointless. Maybe; this could be the correct approach if we can guarantee that the architecture can allocate the requested amount of memory with dma_alloc_coherent. When I brought this issue a year ago, the objection was that my code doesn't compile on intel (dma_to_phys) and also some arches run out of DMA memory with existing customer base. If there is a real need to maintain compatibility with the existing architectures due to limited amount of DMA memory, we need to correct this code to make proper use of vmap via alloc_pages and also insert the dma_sync in proper places for DMA API conformance. Also, the tx descriptors always has to be allocated from a single DMA region or the tx code needs to be corrected to support page_list. If we can live with just using dma_alloc_coherent, your solution is better. I was trying to put this support for 64bit arches only while maintaining support for the existing code base. > > --- > From a493881d2a6c90152d3daabb7b6b3afd1d254d78 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig > Date: Tue, 19 Apr 2016 14:12:14 -0400 > Subject: mlx4_en: don't try to split and vmap dma coherent allocations > > The memory returned by dma_alloc_coherent is not suitable for calling > virt_to_page on it, as it might for example come from vmap allocator. > > Remove the code that calls virt_to_page and vmap on dma coherent > allocations from the mlx4 drivers, and replace them with a single > high-order dma_alloc_coherent call. > > Signed-off-by: Christoph Hellwig > Reported-by: Sinan Kaya -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
What I think we need is something like the patch below. In the long ru nwe should also kill the mlx4_buf structure which now is pretty pointless. --- >From a493881d2a6c90152d3daabb7b6b3afd1d254d78 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 19 Apr 2016 14:12:14 -0400 Subject: mlx4_en: don't try to split and vmap dma coherent allocations The memory returned by dma_alloc_coherent is not suitable for calling virt_to_page on it, as it might for example come from vmap allocator. Remove the code that calls virt_to_page and vmap on dma coherent allocations from the mlx4 drivers, and replace them with a single high-order dma_alloc_coherent call. Signed-off-by: Christoph Hellwig Reported-by: Sinan Kaya --- drivers/net/ethernet/mellanox/mlx4/alloc.c| 89 --- drivers/net/ethernet/mellanox/mlx4/en_cq.c| 7 -- drivers/net/ethernet/mellanox/mlx4/en_resources.c | 31 drivers/net/ethernet/mellanox/mlx4/en_rx.c| 8 -- drivers/net/ethernet/mellanox/mlx4/en_tx.c| 9 --- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 - drivers/net/ethernet/mellanox/mlx4/mr.c | 5 +- include/linux/mlx4/device.h | 8 +- 8 files changed, 16 insertions(+), 143 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c b/drivers/net/ethernet/mellanox/mlx4/alloc.c index 0c51c69..8d15b35 100644 --- a/drivers/net/ethernet/mellanox/mlx4/alloc.c +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c @@ -588,91 +588,30 @@ int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct, { dma_addr_t t; - if (size <= max_direct) { - buf->nbufs= 1; - buf->npages = 1; - buf->page_shift = get_order(size) + PAGE_SHIFT; - buf->direct.buf = dma_alloc_coherent(&dev->persist->pdev->dev, - size, &t, gfp); - if (!buf->direct.buf) - return -ENOMEM; - - buf->direct.map = t; - - while (t & ((1 << buf->page_shift) - 1)) { - --buf->page_shift; - buf->npages *= 2; - } + buf->npages = 1; + buf->page_shift = get_order(size) + PAGE_SHIFT; + buf->direct.buf = dma_alloc_coherent(&dev->persist->pdev->dev, + size, &t, gfp); + if (!buf->direct.buf) + return -ENOMEM; - memset(buf->direct.buf, 0, size); - } else { - int i; - - buf->direct.buf = NULL; - buf->nbufs = (size + PAGE_SIZE - 1) / PAGE_SIZE; - buf->npages = buf->nbufs; - buf->page_shift = PAGE_SHIFT; - buf->page_list = kcalloc(buf->nbufs, sizeof(*buf->page_list), - gfp); - if (!buf->page_list) - return -ENOMEM; - - for (i = 0; i < buf->nbufs; ++i) { - buf->page_list[i].buf = - dma_alloc_coherent(&dev->persist->pdev->dev, - PAGE_SIZE, - &t, gfp); - if (!buf->page_list[i].buf) - goto err_free; - - buf->page_list[i].map = t; - - memset(buf->page_list[i].buf, 0, PAGE_SIZE); - } + buf->direct.map = t; - if (BITS_PER_LONG == 64) { - struct page **pages; - pages = kmalloc(sizeof *pages * buf->nbufs, gfp); - if (!pages) - goto err_free; - for (i = 0; i < buf->nbufs; ++i) - pages[i] = virt_to_page(buf->page_list[i].buf); - buf->direct.buf = vmap(pages, buf->nbufs, VM_MAP, PAGE_KERNEL); - kfree(pages); - if (!buf->direct.buf) - goto err_free; - } + while (t & ((1 << buf->page_shift) - 1)) { + --buf->page_shift; + buf->npages *= 2; } + memset(buf->direct.buf, 0, size); return 0; - -err_free: - mlx4_buf_free(dev, size, buf); - - return -ENOMEM; } EXPORT_SYMBOL_GPL(mlx4_buf_alloc); void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf) { - int i; - - if (buf->nbufs == 1) - dma_free_coherent(&dev->persist->pdev->dev, size, - buf->direct.buf, - buf->direct.map); - else { - if (BITS_PER_LONG == 64) - vunmap(buf->direct.buf); - - for (i = 0; i < buf->nbufs; ++i) -
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/18/2016 11:40 AM, Christoph Hellwig wrote: > On Mon, Apr 18, 2016 at 11:21:12AM -0400, Sinan Kaya wrote: >> I was looking at the code. I don't see how removing virt_to_page + vmap >> would solve the issue. >> >> The code is trying to access the buffer space with direct.buf member >> from the CPU side. This member would become NULL, when this code is >> removed and also in mlx4_en_map_buffer. >> >> ... >> >> What am I missing? > > As mentioned before you'll also need to enforce you hit the nbufs = 1 > case for these. In fact most callers should simply switch to a plain > dma_zalloc_coherent call without all these wrappers. If we have a case > where we really want multiple buffers that don't have to be contiguous > (maybe the MTT case) I'd rather opencode that instead of building this > confusing interface on top of it. > So, I did my fair share of investigation. As I pointed out in my previous email, the code is allocating a bunch of page sized arrays and using them for receive, transmit and control descriptors. I'm unable to limit nbufs to 1 because, none of these allocations make a single contiguous allocation by default. They all go to multiple page approach due to 2 * PAGE_SIZE max_direct parameter passed. I tried changing the code to handle page_list vs. single allocation. I was able to do this for CQE and receive queue since both of them allocate fixed size chunks. However, I couldn't do this for the transmit queue. The transmit code uses the array of descriptors for variable sized transfers and it also assumes that the descriptors are contiguous. When used with pages, one tx data can spill beyond the first page and do illegal writes. In the end, my proposed code in this patch is much simpler than what I tried to achieve by removing vmap API. Another alternative is to force code use single DMA alloc for all 64 bit architectures. Something like this: -- a/drivers/net/ethernet/mellanox/mlx4/alloc.c +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c @@ -588,7 +588,7 @@ int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct, { dma_addr_t t; - if (size <= max_direct) { + if ((size <= max_direct) || (BITS_PER_LONG == 64)){ buf->nbufs= 1; buf->npages = 1; buf->page_shift = get_order(size) + PAGE_SHIFT; This also works on arm64. My proposal is more scalable for memory consumption compared to this one. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/18/2016 11:40 AM, Christoph Hellwig wrote: > On Mon, Apr 18, 2016 at 11:21:12AM -0400, Sinan Kaya wrote: >> I was looking at the code. I don't see how removing virt_to_page + vmap >> would solve the issue. >> >> The code is trying to access the buffer space with direct.buf member >> from the CPU side. This member would become NULL, when this code is >> removed and also in mlx4_en_map_buffer. >> >> ... >> >> What am I missing? > > As mentioned before you'll also need to enforce you hit the nbufs = 1 > case for these. In fact most callers should simply switch to a plain > dma_zalloc_coherent call without all these wrappers. If we have a case > where we really want multiple buffers that don't have to be contiguous > (maybe the MTT case) I'd rather opencode that instead of building this > confusing interface on top of it. > I hit the first problem with CQE. The alloc routine is allocating pages but CQE code is trying to do linear access with direct buf member. I see that this code implements page_list support. I'd like to do the same thing for CQE. Let me know if I'm in the right path. static struct mlx4_eqe *get_eqe(struct mlx4_eq *eq, u32 entry, u8 eqe_factor, u8 eqe_size) -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/18/2016 11:59 AM, David Miller wrote: > From: ok...@codeaurora.org > Date: Mon, 18 Apr 2016 01:06:27 -0400 > >> On 2016-04-18 00:00, David Miller wrote: >>> From: Sinan Kaya >>> Date: Sat, 16 Apr 2016 18:23:32 -0400 >>> Current code is assuming that the address returned by dma_alloc_coherent is a logical address. This is not true on ARM/ARM64 systems. This patch replaces dma_alloc_coherent with dma_map_page API. The address returned can later by virtually mapped from the CPU side with vmap API. Signed-off-by: Sinan Kaya >>> You can't do this. >>> The DMA map page API gives non-coherent mappings, and thus requires >>> proper flushing. >>> So a straight conversion like this is never legitimate. >> >> I would agree on proper dma api usage. However, the code is already >> assuming coherent architecture by mapping the cpu pages as >> page_kernel. >> >> Dma_map_page returns cached buffers and you don't need cache flushes >> on coherent architecture to make the data visible. > > All you are telling me is that there are two bugs instead of one, so now > both need to be fixed. > The removal of vmap also fixes the coherency assumption. It is one fix for both. I was thinking of submitting another patch to change the vmap argument PAGE_KERNEL based on the coherency support of the architecture. I don't need to do that anymore if the other experiment works. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: ok...@codeaurora.org Date: Mon, 18 Apr 2016 01:06:27 -0400 > On 2016-04-18 00:00, David Miller wrote: >> From: Sinan Kaya >> Date: Sat, 16 Apr 2016 18:23:32 -0400 >> >>> Current code is assuming that the address returned by >>> dma_alloc_coherent >>> is a logical address. This is not true on ARM/ARM64 systems. This >>> patch >>> replaces dma_alloc_coherent with dma_map_page API. The address >>> returned >>> can later by virtually mapped from the CPU side with vmap API. >>> Signed-off-by: Sinan Kaya >> You can't do this. >> The DMA map page API gives non-coherent mappings, and thus requires >> proper flushing. >> So a straight conversion like this is never legitimate. > > I would agree on proper dma api usage. However, the code is already > assuming coherent architecture by mapping the cpu pages as > page_kernel. > > Dma_map_page returns cached buffers and you don't need cache flushes > on coherent architecture to make the data visible. All you are telling me is that there are two bugs instead of one, so now both need to be fixed.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Mon, Apr 18, 2016 at 11:21:12AM -0400, Sinan Kaya wrote: > I was looking at the code. I don't see how removing virt_to_page + vmap > would solve the issue. > > The code is trying to access the buffer space with direct.buf member > from the CPU side. This member would become NULL, when this code is > removed and also in mlx4_en_map_buffer. > > ... > > What am I missing? As mentioned before you'll also need to enforce you hit the nbufs = 1 case for these. In fact most callers should simply switch to a plain dma_zalloc_coherent call without all these wrappers. If we have a case where we really want multiple buffers that don't have to be contiguous (maybe the MTT case) I'd rather opencode that instead of building this confusing interface on top of it.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/18/2016 11:15 AM, Timur Tabi wrote: > Sinan Kaya wrote: >> >> VMAP allows you to make several pages look contiguous to the CPU. >> It can only be used against logical addresses returned from kmalloc >> or alloc_page. >> >> You cannot take several virtually mapped addresses returned by >> dma_alloc_coherent >> and try to make them virtually contiguous again. >> >> The code happens to work on other architectures by pure luck. AFAIK, >> dma_alloc_coherent >> returns logical addresses on Intel systems until it runs out of DMA memory. >> After >> that intel arch will also start returning virtually mapped addresses and >> this code >> will also fail. ARM64 on the other hand always returns a virtually mapped >> address. >> >> The goal of this code is to allocate a bunch of page sized memory and make >> it look >> contiguous. It is just using the wrong API. The correct API is either >> kmalloc or >> alloc_page map it with dma_map_page not dma_alloc_coherent. >> >> The proper usage of dma_map_page requires code to call dma_sync API in >> correct >> places to be compatible with noncoherent systems. This code is already >> assuming >> coherency. It would be a nice to have dma_sync APIs in right places. There >> is no >> harm in calling dma_sync API for coherent systems as they are no-ops in DMA >> mapping >> layer whereas it is a cache flush for non-coherent systems. > > The text would be a great addition to the patch description. > I can do that on the next version. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/18/2016 11:17 AM, Christoph Hellwig wrote: > On Mon, Apr 18, 2016 at 02:39:36PM +, Eli Cohen wrote: >> Right, I did not suggest this as a patch but just wanted to pinpoint the >> problematic issue which is that virt_to_page does not give you the correct >> pointer to the page. > > Removing both the virt_to_page + vmap calls would solve the issue > indeed. > I was looking at the code. I don't see how removing virt_to_page + vmap would solve the issue. The code is trying to access the buffer space with direct.buf member from the CPU side. This member would become NULL, when this code is removed and also in mlx4_en_map_buffer. if (BITS_PER_LONG == 64) { struct page **pages; pages = kmalloc(sizeof *pages * buf->nbufs, gfp); if (!pages) goto err_free; ... ... if (!buf->direct.buf) goto err_free; } drivers/net/ethernet/mellanox/mlx4/en_tx.c (2 hits) Line 110: ring->buf = ring->wqres.buf.direct.buf; Line 114: (unsigned long long) ring->wqres.buf.direct.map); drivers/net/ethernet/mellanox/mlx4/en_rx.c (1 hit) Line 404: ring->buf = ring->wqres.buf.direct.buf; drivers/net/ethernet/mellanox/mlx4/en_cq.c (1 hit) Line 85: cq->buf = (struct mlx4_cqe *)cq->wqres.buf.direct.buf; What am I missing? -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Mon, Apr 18, 2016 at 02:39:36PM +, Eli Cohen wrote: > Right, I did not suggest this as a patch but just wanted to pinpoint the > problematic issue which is that virt_to_page does not give you the correct > pointer to the page. Removing both the virt_to_page + vmap calls would solve the issue indeed.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
Sinan Kaya wrote: VMAP allows you to make several pages look contiguous to the CPU. It can only be used against logical addresses returned from kmalloc or alloc_page. You cannot take several virtually mapped addresses returned by dma_alloc_coherent and try to make them virtually contiguous again. The code happens to work on other architectures by pure luck. AFAIK, dma_alloc_coherent returns logical addresses on Intel systems until it runs out of DMA memory. After that intel arch will also start returning virtually mapped addresses and this code will also fail. ARM64 on the other hand always returns a virtually mapped address. The goal of this code is to allocate a bunch of page sized memory and make it look contiguous. It is just using the wrong API. The correct API is either kmalloc or alloc_page map it with dma_map_page not dma_alloc_coherent. The proper usage of dma_map_page requires code to call dma_sync API in correct places to be compatible with noncoherent systems. This code is already assuming coherency. It would be a nice to have dma_sync APIs in right places. There is no harm in calling dma_sync API for coherent systems as they are no-ops in DMA mapping layer whereas it is a cache flush for non-coherent systems. The text would be a great addition to the patch description. -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation collaborative project.
RE: [PATCH V2] net: ethernet: mellanox: correct page conversion
Right, I did not suggest this as a patch but just wanted to pinpoint the problematic issue which is that virt_to_page does not give you the correct pointer to the page. -Original Message- From: Christoph Hellwig [mailto:h...@infradead.org] Sent: Monday, April 18, 2016 9:33 AM To: Eli Cohen Cc: Sinan Kaya ; linux-r...@vger.kernel.org; ti...@codeaurora.org; c...@codeaurora.org; Yishai Hadas ; netdev@vger.kernel.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH V2] net: ethernet: mellanox: correct page conversion On Mon, Apr 18, 2016 at 09:54:47AM +0300, Eli Cohen wrote: > Sinan, > > if we get rid of the part this code: > > if (BITS_PER_LONG == 64) { > struct page **pages; > pages = kmalloc(sizeof *pages * buf->nbufs, gfp); > if (!pages) > goto err_free; > ... > ... > if (!buf->direct.buf) > goto err_free; > } > > Does that solve the arm issue? Not quite. You still have code in mlx4_en_map_buffer that performs this mapping later if it it wasn't mapped in mlx4_buf_alloc. You'll need to get rid of that by ensuring max_direct for all the cases currently using mlx4_en_map_buffer as well.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Mon, Apr 18, 2016 at 09:54:47AM +0300, Eli Cohen wrote: > Sinan, > > if we get rid of the part this code: > > if (BITS_PER_LONG == 64) { > struct page **pages; > pages = kmalloc(sizeof *pages * buf->nbufs, gfp); > if (!pages) > goto err_free; > ... > ... > if (!buf->direct.buf) > goto err_free; > } > > Does that solve the arm issue? Not quite. You still have code in mlx4_en_map_buffer that performs this mapping later if it it wasn't mapped in mlx4_buf_alloc. You'll need to get rid of that by ensuring max_direct for all the cases currently using mlx4_en_map_buffer as well.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
Sure, this is not the complete patch. As far as I know the problem you're facing with arm is that virt_to_page() does not provide the correct page descriptor so my suggestion will eliminate the need for it. On Mon, Apr 18, 2016 at 09:53:30AM -0400, Sinan Kaya wrote: > On 4/18/2016 2:54 AM, Eli Cohen wrote: > > Sinan, > > > > if we get rid of the part this code: > > > > if (BITS_PER_LONG == 64) { > > struct page **pages; > > pages = kmalloc(sizeof *pages * buf->nbufs, gfp); > > if (!pages) > > goto err_free; > > ... > > ... > > if (!buf->direct.buf) > > goto err_free; > > } > > > > Does that solve the arm issue? > > I will test. As far as I know, there is one more place these DMA addresses > are called with vmap. This is in mlx4_en_map_buffer. > > I was trying to rearrange the allocation so that vmap actually works. > > What do you think about mlx4_en_map_buffer? > > > -- > Sinan Kaya > Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Mon, Apr 18, 2016 at 09:49:10AM -0400, Sinan Kaya wrote: > Here is a good description of logical address vs. virtual address. > > https://www.quora.com/What-is-the-Kernel-logical-and-virtual-addresses-What-is-the-difference-between-them-What-is-the-type-of-addresses-listed-in-the-System-map That's not how we use the terms in Linux. But it's not really the point of my question either. > > Is this correct? > > > No, the driver is plain broken without this patch. It causes a kernel panic > during driver probe. > > This is the definition of vmap API. > > https://www.kernel.org/doc/htmldocs/kernel-api/API-vmap.html Thanks for the pointer, but I'm actually the person who introduced vmap to Linux a long time ago, and this is once again not my question. > You cannot take several virtually mapped addresses returned by > dma_alloc_coherent > and try to make them virtually contiguous again. But now we're getting closer to the issue: the mlx4_en driver is using vmap on buffers allocated using dma_alloc_coherent if on a 64-bit architecture, and that's obviously broken. Now the big quetions is: why does it do that, given that dma_alloc_coherent can be used for high order allocations anyway (and in fact many architectures implement is using a version of vmap). Let's get some answers on these question from the Mellanox folks and work from there.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/18/2016 9:10 AM, Christoph Hellwig wrote: > On Mon, Apr 18, 2016 at 09:06:18AM -0400, ok...@codeaurora.org wrote: >> On 2016-04-18 08:12, Christoph Hellwig wrote: >>> On Sat, Apr 16, 2016 at 06:23:32PM -0400, Sinan Kaya wrote: Current code is assuming that the address returned by dma_alloc_coherent is a logical address. This is not true on ARM/ARM64 systems. >>> >>> Can you explain what you mean with a 'logical address' and what actual >>> issue you're trying to solve? >> Here is a good description of logical address vs. virtual address. https://www.quora.com/What-is-the-Kernel-logical-and-virtual-addresses-What-is-the-difference-between-them-What-is-the-type-of-addresses-listed-in-the-System-map >> Vmap call is failing on arm64 systems because dma alloc api already returns >> an address mapped with vmap. > > Please state your problem clearly. What I'm reverse engineering from > your posts is: because dma_alloc_coherent uses vmap-like mappings on > arm64 (all, some systems?) All arm64 systems. >a driver using a lot of them might run into > limits of the vmap pool size. > > Is this correct? > No, the driver is plain broken without this patch. It causes a kernel panic during driver probe. This is the definition of vmap API. https://www.kernel.org/doc/htmldocs/kernel-api/API-vmap.html VMAP allows you to make several pages look contiguous to the CPU. It can only be used against logical addresses returned from kmalloc or alloc_page. You cannot take several virtually mapped addresses returned by dma_alloc_coherent and try to make them virtually contiguous again. The code happens to work on other architectures by pure luck. AFAIK, dma_alloc_coherent returns logical addresses on Intel systems until it runs out of DMA memory. After that intel arch will also start returning virtually mapped addresses and this code will also fail. ARM64 on the other hand always returns a virtually mapped address. The goal of this code is to allocate a bunch of page sized memory and make it look contiguous. It is just using the wrong API. The correct API is either kmalloc or alloc_page map it with dma_map_page not dma_alloc_coherent. The proper usage of dma_map_page requires code to call dma_sync API in correct places to be compatible with noncoherent systems. This code is already assuming coherency. It would be a nice to have dma_sync APIs in right places. There is no harm in calling dma_sync API for coherent systems as they are no-ops in DMA mapping layer whereas it is a cache flush for non-coherent systems. >> >> Please see arch/arm64/mm directory. > ---end quoted text--- > I hope it is clear now. The previous email was the most I could type on my phone. -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 4/18/2016 2:54 AM, Eli Cohen wrote: > Sinan, > > if we get rid of the part this code: > > if (BITS_PER_LONG == 64) { > struct page **pages; > pages = kmalloc(sizeof *pages * buf->nbufs, gfp); > if (!pages) > goto err_free; > ... > ... > if (!buf->direct.buf) > goto err_free; > } > > Does that solve the arm issue? I will test. As far as I know, there is one more place these DMA addresses are called with vmap. This is in mlx4_en_map_buffer. I was trying to rearrange the allocation so that vmap actually works. What do you think about mlx4_en_map_buffer? -- Sinan Kaya Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Mon, Apr 18, 2016 at 09:06:18AM -0400, ok...@codeaurora.org wrote: > On 2016-04-18 08:12, Christoph Hellwig wrote: > >On Sat, Apr 16, 2016 at 06:23:32PM -0400, Sinan Kaya wrote: > >>Current code is assuming that the address returned by dma_alloc_coherent > >>is a logical address. This is not true on ARM/ARM64 systems. > > > >Can you explain what you mean with a 'logical address' and what actual > >issue you're trying to solve? > > Vmap call is failing on arm64 systems because dma alloc api already returns > an address mapped with vmap. Please state your problem clearly. What I'm reverse engineering from your posts is: because dma_alloc_coherent uses vmap-like mappings on arm64 (all, some systems?) a driver using a lot of them might run into limits of the vmap pool size. Is this correct? > > Please see arch/arm64/mm directory. ---end quoted text---
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 2016-04-18 08:12, Christoph Hellwig wrote: On Sat, Apr 16, 2016 at 06:23:32PM -0400, Sinan Kaya wrote: Current code is assuming that the address returned by dma_alloc_coherent is a logical address. This is not true on ARM/ARM64 systems. Can you explain what you mean with a 'logical address' and what actual issue you're trying to solve? Vmap call is failing on arm64 systems because dma alloc api already returns an address mapped with vmap. Please see arch/arm64/mm directory.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On Sat, Apr 16, 2016 at 06:23:32PM -0400, Sinan Kaya wrote: > Current code is assuming that the address returned by dma_alloc_coherent > is a logical address. This is not true on ARM/ARM64 systems. Can you explain what you mean with a 'logical address' and what actual issue you're trying to solve?
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
Sinan, if we get rid of the part this code: if (BITS_PER_LONG == 64) { struct page **pages; pages = kmalloc(sizeof *pages * buf->nbufs, gfp); if (!pages) goto err_free; ... ... if (!buf->direct.buf) goto err_free; } Does that solve the arm issue? The other parts of the driver using the allocated buffer can still work with list of coherent chunks. On Sat, Apr 16, 2016 at 06:23:32PM -0400, Sinan Kaya wrote: > Current code is assuming that the address returned by dma_alloc_coherent > is a logical address. This is not true on ARM/ARM64 systems. This patch > replaces dma_alloc_coherent with dma_map_page API. The address returned > can later by virtually mapped from the CPU side with vmap API. > > Signed-off-by: Sinan Kaya > --- > drivers/net/ethernet/mellanox/mlx4/alloc.c | 37 > ++ > 1 file changed, 27 insertions(+), 10 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c > b/drivers/net/ethernet/mellanox/mlx4/alloc.c > index 0c51c69..22a7ae7 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/alloc.c > +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c > @@ -618,13 +618,26 @@ int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int > max_direct, > return -ENOMEM; > > for (i = 0; i < buf->nbufs; ++i) { > - buf->page_list[i].buf = > - dma_alloc_coherent(&dev->persist->pdev->dev, > -PAGE_SIZE, > -&t, gfp); > - if (!buf->page_list[i].buf) > + struct page *page; > + > + page = alloc_page(GFP_KERNEL); > + if (!page) > goto err_free; > > + t = dma_map_page(&dev->persist->pdev->dev, page, 0, > + PAGE_SIZE, DMA_BIDIRECTIONAL); > + > + if (dma_mapping_error(&dev->persist->pdev->dev, t)) { > + __free_page(page); > + goto err_free; > + } > + > + buf->page_list[i].buf = page_address(page); > + if (!buf->page_list[i].buf) { > + __free_page(page); > + goto err_free; > + } > + > buf->page_list[i].map = t; > > memset(buf->page_list[i].buf, 0, PAGE_SIZE); > @@ -666,11 +679,15 @@ void mlx4_buf_free(struct mlx4_dev *dev, int size, > struct mlx4_buf *buf) > vunmap(buf->direct.buf); > > for (i = 0; i < buf->nbufs; ++i) > - if (buf->page_list[i].buf) > - dma_free_coherent(&dev->persist->pdev->dev, > - PAGE_SIZE, > - buf->page_list[i].buf, > - buf->page_list[i].map); > + if (buf->page_list[i].buf) { > + struct page *page; > + > + page = virt_to_page(buf->page_list[i].buf); > + dma_unmap_page(&dev->persist->pdev->dev, > +buf->page_list[i].map, > +PAGE_SIZE, DMA_BIDIRECTIONAL); > + __free_page(page); > + } > kfree(buf->page_list); > } > } > -- > 1.8.2.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
On 2016-04-18 00:00, David Miller wrote: From: Sinan Kaya Date: Sat, 16 Apr 2016 18:23:32 -0400 Current code is assuming that the address returned by dma_alloc_coherent is a logical address. This is not true on ARM/ARM64 systems. This patch replaces dma_alloc_coherent with dma_map_page API. The address returned can later by virtually mapped from the CPU side with vmap API. Signed-off-by: Sinan Kaya You can't do this. The DMA map page API gives non-coherent mappings, and thus requires proper flushing. So a straight conversion like this is never legitimate. I would agree on proper dma api usage. However, the code is already assuming coherent architecture by mapping the cpu pages as page_kernel. Dma_map_page returns cached buffers and you don't need cache flushes on coherent architecture to make the data visible.
Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Sinan Kaya Date: Sat, 16 Apr 2016 18:23:32 -0400 > Current code is assuming that the address returned by dma_alloc_coherent > is a logical address. This is not true on ARM/ARM64 systems. This patch > replaces dma_alloc_coherent with dma_map_page API. The address returned > can later by virtually mapped from the CPU side with vmap API. > > Signed-off-by: Sinan Kaya You can't do this. The DMA map page API gives non-coherent mappings, and thus requires proper flushing. So a straight conversion like this is never legitimate.