Re: [RFC PATCH 2/2] IB/uverbs: Add support for user registration of mmap memory

2010-12-02 Thread Ralph Campbell
On Thu, 2010-12-02 at 12:24 -0800, Tom Tucker wrote:
> Personally I think the biggest issue is that I don't think the pfn to dma 
> address mapping logic is portable.

Perhaps. That is why the core Linux VM folks should be
involved.

> On 12/2/10 1:35 PM, Ralph Campbell wrote:
> > I understand the need for something like this patch since
> > the GPU folks would also like to mmap memory (although the
> > memory is marked as vma->vm_flags&  VM_IO).
> >
> > It seems to me that duplicating the page walking code is
> > the wrong approach and exporting a new interface from
> > mm/memory.c is more appropriate.
> >
> Perhaps, but that's kernel proper (not a module) and has it's own issues. 
> For example, it represents an exported kernel interface and therefore a 
> kernel compatability commitment going forward. I suggest that a new kernel 
> interface is a separate effort that this code code utilize going forward.

I agree. That was what I was implying.

> > Also, the quick check to find_vma() is essentially duplicated
> > if get_user_pages() is called
> 
> You need to know the type before you know how to handle it. Unless you 
> want to tear up get_user_pages, i think this non-performance path double 
> lookup is a non issue.

You only need the type if the translation is handled as the
patch proposes. get_user_pages() could handle getting
the physical addresses differently as it checks each vma region.

> > and it doesn't handle the case
> > when the region spans multiple vma regions with different flags.
> 
> Actually, it specifically does not allow that and I'm not sure that is 
> something you would want to support.

True. It doesn't support it because without reference counting,
there would need to be a callback mechanism to let the caller know
if/when the mapping changes or is invalidated.

> > Maybe we can modify get_user_pages to have a new flag which
> > allows VM_PFNMAP segments to be accessed as IB memory regions.
> > The problem is that VM_PFNMAP means there is no corresponding
> > struct page to handle reference counting. What happens if the
> > device that exports the VM_PFNMAP memory is hot removed?
> 
> Bus Address Error.

> > Can the device reference count be incremented to prevent that?
> >
> 
> I don't think that would go in this code, it would go in the driver that 
> gave the user the address in the first place.

Perhaps. Once the IB core has the physical address, the user may
unmmap the range which would notify the exporting driver via unmmap but
IB wouldn't know that had happened without some sort of callback
notification. The IB hardware could happily go on DMA'ing to that
physical address.

> > On Thu, 2010-12-02 at 11:02 -0800, Tom Tucker wrote:
> >> Added support to the ib_umem_get helper function for handling
> >> mmaped memory.
> >>
> >> Signed-off-by: Tom Tucker
> >> ---
> >>
> >>   drivers/infiniband/core/umem.c |  272 
> >> +---
> >>   1 files changed, 253 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/core/umem.c 
> >> b/drivers/infiniband/core/umem.c
> >> index 415e186..357ca5e 100644
> >> --- a/drivers/infiniband/core/umem.c
> >> +++ b/drivers/infiniband/core/umem.c
> >> @@ -52,30 +52,24 @@ static void __ib_umem_release(struct ib_device *dev, 
> >> struct ib_umem *umem, int d
> >>int i;
> >>
> >>list_for_each_entry_safe(chunk, tmp,&umem->chunk_list, list) {
> >> -  ib_dma_unmap_sg(dev, chunk->page_list,
> >> -  chunk->nents, DMA_BIDIRECTIONAL);
> >> -  for (i = 0; i<  chunk->nents; ++i) {
> >> -  struct page *page = sg_page(&chunk->page_list[i]);
> >> -
> >> -  if (umem->writable&&  dirty)
> >> -  set_page_dirty_lock(page);
> >> -  put_page(page);
> >> -  }
> >> +  if (umem->type == IB_UMEM_MEM_MAP) {
> >> +  ib_dma_unmap_sg(dev, chunk->page_list,
> >> +  chunk->nents, DMA_BIDIRECTIONAL);
> >> +  for (i = 0; i<  chunk->nents; ++i) {
> >> +  struct page *page = 
> >> sg_page(&chunk->page_list[i]);
> >>
> >> +  if (umem->writable&&  dirty)
> >> +  set_page_dirty_lock(page);
> >> + 

Re: [RFC PATCH 2/2] IB/uverbs: Add support for user registration of mmap memory

2010-12-02 Thread Ralph Campbell
I understand the need for something like this patch since
the GPU folks would also like to mmap memory (although the
memory is marked as vma->vm_flags & VM_IO).

It seems to me that duplicating the page walking code is
the wrong approach and exporting a new interface from
mm/memory.c is more appropriate.

Also, the quick check to find_vma() is essentially duplicated
if get_user_pages() is called and it doesn't handle the case
when the region spans multiple vma regions with different flags.
Maybe we can modify get_user_pages to have a new flag which
allows VM_PFNMAP segments to be accessed as IB memory regions.
The problem is that VM_PFNMAP means there is no corresponding
struct page to handle reference counting. What happens if the
device that exports the VM_PFNMAP memory is hot removed?
Can the device reference count be incremented to prevent that?


On Thu, 2010-12-02 at 11:02 -0800, Tom Tucker wrote:
> Added support to the ib_umem_get helper function for handling
> mmaped memory.
> 
> Signed-off-by: Tom Tucker 
> ---
> 
>  drivers/infiniband/core/umem.c |  272 
> +---
>  1 files changed, 253 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 415e186..357ca5e 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -52,30 +52,24 @@ static void __ib_umem_release(struct ib_device *dev, 
> struct ib_umem *umem, int d
>   int i;
>  
>   list_for_each_entry_safe(chunk, tmp, &umem->chunk_list, list) {
> - ib_dma_unmap_sg(dev, chunk->page_list,
> - chunk->nents, DMA_BIDIRECTIONAL);
> - for (i = 0; i < chunk->nents; ++i) {
> - struct page *page = sg_page(&chunk->page_list[i]);
> -
> - if (umem->writable && dirty)
> - set_page_dirty_lock(page);
> - put_page(page);
> - }
> + if (umem->type == IB_UMEM_MEM_MAP) {
> + ib_dma_unmap_sg(dev, chunk->page_list,
> + chunk->nents, DMA_BIDIRECTIONAL);
> + for (i = 0; i < chunk->nents; ++i) {
> + struct page *page = 
> sg_page(&chunk->page_list[i]);
>  
> + if (umem->writable && dirty)
> + set_page_dirty_lock(page);
> + put_page(page);
> + }
> + }
>   kfree(chunk);
>   }
>  }
>  
> -/**
> - * ib_umem_get - Pin and DMA map userspace memory.
> - * @context: userspace context to pin memory for
> - * @addr: userspace virtual address to start at
> - * @size: length of region to pin
> - * @access: IB_ACCESS_xxx flags for memory being pinned
> - * @dmasync: flush in-flight DMA when the memory region is written
> - */
> -struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
> - size_t size, int access, int dmasync)
> +static struct ib_umem *__umem_get(struct ib_ucontext *context,
> +   unsigned long addr, size_t size,
> +   int access, int dmasync)
>  {
>   struct ib_umem *umem;
>   struct page **page_list;
> @@ -100,6 +94,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, 
> unsigned long addr,
>   if (!umem)
>   return ERR_PTR(-ENOMEM);
>  
> + umem->type  = IB_UMEM_MEM_MAP;
>   umem->context   = context;
>   umem->length= size;
>   umem->offset= addr & ~PAGE_MASK;
> @@ -215,6 +210,245 @@ out:
>  
>   return ret < 0 ? ERR_PTR(ret) : umem;
>  }
> +
> +/*
> + * Return the PFN for the specified address in the vma. This only
> + * works for a vma that is VM_PFNMAP.
> + */
> +static unsigned long __follow_io_pfn(struct vm_area_struct *vma,
> +  unsigned long address, int write)
> +{
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *ptep, pte;
> + spinlock_t *ptl;
> + unsigned long pfn;
> + struct mm_struct *mm = vma->vm_mm;
> +
> + pgd = pgd_offset(mm, address);
> + if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
> + return 0;
> +
> + pud = pud_offset(pgd, address);
> + if (pud_none(*pud))
> + return 0;
> + if (unlikely(pud_bad(*pud)))
> + return 0;
> +
> + pmd = pmd_offset(pud, address);
> + if (pmd_none(*pmd))
> + return 0;
> + if (unlikely(pmd_bad(*pmd)))
> + return 0;
> +
> + ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> + pte = *ptep;
> + if (!pte_present(pte))
> + goto bad;
> + if (write && !pte_write(pte))
> + goto bad;
> +
> + pfn = pte_pfn(pte);
> + pte_unmap_unlock(ptep, ptl);
> + return pfn;
> + bad:
> + pte_unmap_unlock(ptep, ptl);
> + retu

RE: [PATCH 3/3] ibacm: check for special handling of loopback requests

2010-11-16 Thread Ralph Campbell
I guess what I'm objecting to is hard coding mlx4.
I was trying to think of a way that would allow other HCAs
to support the block loopback option in the future.
It looks like ipoib sets IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK
for kernel QPs but this isn't defined in libibverbs yet.
It seems reasonable to add that feature some time in the future
and change ibacm to use it.

In the mean time, I guess I don't see an alternative to your patch.

On Tue, 2010-11-16 at 17:24 -0800, Hefty, Sean wrote:
> > Is there a way to make it HCA neutral?
> > Would it require extending the libibverbs API to set the option?
> 
> I'm not quite following what the problem is.  ACM doesn't care what HCA is 
> used.  It does adjust how it handles loopback addresses based on whether some 
> value is written in an HCA/OFED 1.5.2 release file, but it will work 
> regardless.  (This is worse than being HCA specific, we're HCA and OFED 
> release specific.)
> 
> In the worst case, ACM basically stops working correctly over mlx4 HCAs.  
> Loopback requests will end up going through all retries (default is 15) until 
> they time out (default ~45 seconds).  If the user is the librdmacm, it will 
> fall back to normal operation.
> 
> ACM has a configuration file that _could_ be used to specify a loopback 
> protocol.  However, that file is usually generated by the ib_acme utility, so 
> the check would move into it.
> 
> Since OFED 1.5.2 has shipped, I don't know how you fix it.  In a more ideal 
> world, this loopback issue would be limited only to ipoib QPs, or be 
> configurable per QP, or disabled by default, or have gone upstream first...
> 
> - Sean
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/3] ibacm: check for special handling of loopback requests

2010-11-16 Thread Ralph Campbell
On Tue, 2010-11-16 at 16:54 -0800, Hefty, Sean wrote:
> > This seems to introduce an HCA specific dependency.
> 
> yep :(  This is why ACM just handles it rather than exposing any sort of 
> option to a user.
> 
> > Isn't ibacm supposed to work with different HCAs?
> 
> It does and still will, even in a mixed environment.
> 
> One could argue that this change is reasonable regardless of the OFED kernel 
> patch.  It avoids sending multicast traffic when the destination is local.  
> The main drawback beyond the extra code is that a node can't send a multicast 
> message to itself, with the intent that remote listeners will be able to 
> cache the address data.
> 
> - Sean

Is there a way to make it HCA neutral?
Would it require extending the libibverbs API to set the option?


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ibacm: check for special handling of loopback requests

2010-11-16 Thread Ralph Campbell
On Tue, 2010-11-16 at 16:15 -0800, Hefty, Sean wrote:
...
> @@ -2620,6 +2663,12 @@ static void acm_set_options(void)
>   }
>  
>   fclose(f);
> +
> + if (!(f = fopen("/sys/module/mlx4_core/parameters/block_loopback", 
> "r")))
> + return;
> +
> + fscanf(f, "%d", &loopback_prot);
> + fclose(f);
>  }
>  
>  static void acm_log_options(void)

This seems to introduce an HCA specific dependency.
Isn't ibacm supposed to work with different HCAs?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] IB/qib: Allow driver to load if PCIe advanced error reporting fails

2010-10-25 Thread Ralph Campbell
On Sat, 2010-10-23 at 13:54 -0700, Roland Dreier wrote:
> I'm lost on which initialization / cleanup fixes are the right ones to
> take for qib.  Can someone point me to the definitive set of patches?
> 
>  - R.

If Jason agrees, they are the attached.
There are probably more error paths that need fixing
but I think separate patches make sense for those.
--- Begin Message ---
If CONFIG_PCI_MSI is not set, and a QLE7140 is present, the
pointer "dd" was uninitialized.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_init.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index f1d16d3..f3b5039 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1243,6 +1243,7 @@ static int __devinit qib_init_one(struct pci_dev *pdev,
qib_early_err(&pdev->dev, "QLogic PCIE device 0x%x cannot "
  "work if CONFIG_PCI_MSI is not enabled\n",
  ent->device);
+   dd = ERR_PTR(-ENODEV);
 #endif
break;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--- End Message ---
--- Begin Message ---
Some PCIe root complex chip sets don't support advanced error reporting.
Allow the driver to load OK if pci_enable_pcie_error_reporting() fails.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_pcie.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_pcie.c 
b/drivers/infiniband/hw/qib/qib_pcie.c
index 7fa6e55..8a64426 100644
--- a/drivers/infiniband/hw/qib/qib_pcie.c
+++ b/drivers/infiniband/hw/qib/qib_pcie.c
@@ -109,10 +109,12 @@ int qib_pcie_init(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
pci_set_master(pdev);
ret = pci_enable_pcie_error_reporting(pdev);
-   if (ret)
+   if (ret) {
qib_early_err(&pdev->dev,
  "Unable to enable pcie error reporting: %d\n",
  ret);
+   ret = 0;
+   }
goto done;
 
 bail:

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--- End Message ---
--- Begin Message ---
From: Jason Gunthorpe 

Clean up properly if pci_set_consistent_dma_mask() fails.

Signed-off-by: Jason Gunthorpe 
---

 drivers/infiniband/hw/qib/qib_pcie.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_pcie.c 
b/drivers/infiniband/hw/qib/qib_pcie.c
index 8a64426..48b6674 100644
--- a/drivers/infiniband/hw/qib/qib_pcie.c
+++ b/drivers/infiniband/hw/qib/qib_pcie.c
@@ -103,9 +103,11 @@ int qib_pcie_init(struct pci_dev *pdev, const struct 
pci_device_id *ent)
ret = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
} else
ret = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
-   if (ret)
+   if (ret) {
qib_early_err(&pdev->dev,
  "Unable to set DMA consistent mask: %d\n", ret);
+   goto bail;
+   }
 
pci_set_master(pdev);
ret = pci_enable_pcie_error_reporting(pdev);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--- End Message ---


Re: [PATCH] [libipathverbs] Fix 32 bit libipathverbs

2010-10-25 Thread Ralph Campbell
Thanks, applied.

On Fri, 2010-10-22 at 15:43 -0700, Jason Gunthorpe wrote:
> Trying to use a 64 bit kernel with a 32 bit userspace results in
> local protection violations reported to the CQ. This is caused by
> a difference in padding for ipath_rwe, so make the padding 64 bit
> uses explicit.
> 
> Signed-off-by: Jason Gunthorpe 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: fix RDMA write with immediate

2010-10-22 Thread Ralph Campbell
The immediate word for RDMA_WRITE_ONLY_WITH_IMMEDIATE was being
extracted from the wrong location in the header.
Thanks to Jason Gunthorpe  for
finding this.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_rc.c |5 -
 drivers/infiniband/hw/qib/qib_uc.c |6 --
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_rc.c 
b/drivers/infiniband/hw/qib/qib_rc.c
index a093111..955fb71 100644
--- a/drivers/infiniband/hw/qib/qib_rc.c
+++ b/drivers/infiniband/hw/qib/qib_rc.c
@@ -2068,7 +2068,10 @@ send_last:
goto nack_op_err;
if (!ret)
goto rnr_nak;
-   goto send_last_imm;
+   wc.ex.imm_data = ohdr->u.rc.imm_data;
+   hdrsize += 4;
+   wc.wc_flags = IB_WC_WITH_IMM;
+   goto send_last;
 
case OP(RDMA_READ_REQUEST): {
struct qib_ack_entry *e;
diff --git a/drivers/infiniband/hw/qib/qib_uc.c 
b/drivers/infiniband/hw/qib/qib_uc.c
index b9c8b63..aecd512 100644
--- a/drivers/infiniband/hw/qib/qib_uc.c
+++ b/drivers/infiniband/hw/qib/qib_uc.c
@@ -457,8 +457,10 @@ rdma_first:
}
if (opcode == OP(RDMA_WRITE_ONLY))
goto rdma_last;
-   else if (opcode == OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE))
+   if (opcode == OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE)) {
+   wc.ex.imm_data = ohdr->u.rc.imm_data;
goto rdma_last_imm;
+   }
/* FALLTHROUGH */
case OP(RDMA_WRITE_MIDDLE):
/* Check for invalid length PMTU or posted rwqe len. */
@@ -471,8 +473,8 @@ rdma_first:
break;
 
case OP(RDMA_WRITE_LAST_WITH_IMMEDIATE):
-rdma_last_imm:
wc.ex.imm_data = ohdr->u.imm_data;
+rdma_last_imm:
hdrsize += 4;
wc.wc_flags = IB_WC_WITH_IMM;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: clean up properly if pci_set_consistent_dma_mask() fails

2010-10-22 Thread Ralph Campbell
From: Jason Gunthorpe 

Clean up properly if pci_set_consistent_dma_mask() fails.

Signed-off-by: Jason Gunthorpe 
---

 drivers/infiniband/hw/qib/qib_pcie.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_pcie.c 
b/drivers/infiniband/hw/qib/qib_pcie.c
index 8a64426..48b6674 100644
--- a/drivers/infiniband/hw/qib/qib_pcie.c
+++ b/drivers/infiniband/hw/qib/qib_pcie.c
@@ -103,9 +103,11 @@ int qib_pcie_init(struct pci_dev *pdev, const struct 
pci_device_id *ent)
ret = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
} else
ret = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
-   if (ret)
+   if (ret) {
qib_early_err(&pdev->dev,
  "Unable to set DMA consistent mask: %d\n", ret);
+   goto bail;
+   }
 
pci_set_master(pdev);
ret = pci_enable_pcie_error_reporting(pdev);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] IB/qib: Allow driver to load if PCIe advanced error reporting fails

2010-10-22 Thread Ralph Campbell
Some PCIe root complex chip sets don't support advanced error reporting.
Allow the driver to load OK if pci_enable_pcie_error_reporting() fails.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_pcie.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_pcie.c 
b/drivers/infiniband/hw/qib/qib_pcie.c
index 7fa6e55..8a64426 100644
--- a/drivers/infiniband/hw/qib/qib_pcie.c
+++ b/drivers/infiniband/hw/qib/qib_pcie.c
@@ -109,10 +109,12 @@ int qib_pcie_init(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
pci_set_master(pdev);
ret = pci_enable_pcie_error_reporting(pdev);
-   if (ret)
+   if (ret) {
qib_early_err(&pdev->dev,
  "Unable to enable pcie error reporting: %d\n",
  ret);
+   ret = 0;
+   }
goto done;
 
 bail:

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] IB/qib: fix uninitialized pointer if CONFIG_PCI_MSI not set

2010-10-22 Thread Ralph Campbell
If CONFIG_PCI_MSI is not set, and a QLE7140 is present, the
pointer "dd" was uninitialized.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_init.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index f1d16d3..f3b5039 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1243,6 +1243,7 @@ static int __devinit qib_init_one(struct pci_dev *pdev,
qib_early_err(&pdev->dev, "QLogic PCIE device 0x%x cannot "
  "work if CONFIG_PCI_MSI is not enabled\n",
  ent->device);
+   dd = ERR_PTR(-ENODEV);
 #endif
break;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] [IB/QIB] Process RDMA WRITE ONLY with IMMEDIATE properly

2010-10-22 Thread Ralph Campbell
I don't think it needs the extra blank lines but I agree the code is correct.

From: Jason Gunthorpe [jguntho...@obsidianresearch.com]
Sent: Friday, October 22, 2010 3:00 PM
To: Ralph Campbell; RDMA list
Subject: [PATCH] [IB/QIB] Process RDMA WRITE ONLY with IMMEDIATE properly

See table 35 in IBA - the header order for RDMA_WRITE_ONLY_WITH_IMMEDIATE
and SEND_LAST_WITH_IMMEDIATE is different, the RDMA_WRITE_ONLY has
a RETH header before the immediate data, so we need a different code path
to extract the immediate data.

I tested this with a userspace app that does RDMA_WRITE with immediate
on a QLE7140.

Signed-off-by: Jason Gunthorpe 
---
 drivers/infiniband/hw/qib/qib_rc.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_rc.c 
b/drivers/infiniband/hw/qib/qib_rc.c
index 40c0a37..8d87ce3 100644
--- a/drivers/infiniband/hw/qib/qib_rc.c
+++ b/drivers/infiniband/hw/qib/qib_rc.c
@@ -2076,7 +2076,12 @@ send_last:
goto nack_op_err;
if (!ret)
goto rnr_nak;
-   goto send_last_imm;
+
+   wc.ex.imm_data = ohdr->u.rc.imm_data;
+   hdrsize += 4;
+   wc.wc_flags = IB_WC_WITH_IMM;
+
+   goto send_last;

case OP(RDMA_READ_REQUEST): {
struct qib_ack_entry *e;
--
1.6.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] [IB/QIB] Fix failure to load driver if PCI error reporting doesn't enable

2010-10-22 Thread Ralph Campbell
pci_enable_pcie_error_reporting() is optional.

From: linux-rdma-ow...@vger.kernel.org [linux-rdma-ow...@vger.kernel.org] On 
Behalf Of Jason Gunthorpe [jguntho...@obsidianresearch.com]
Sent: Friday, October 22, 2010 2:36 PM
To: Ralph Campbell
Cc: RDMA list
Subject: Re: [PATCH] [IB/QIB] Fix failure to load driver if PCI error   
reporting doesn't enable

On Fri, Oct 22, 2010 at 02:28:54PM -0700, Ralph Campbell wrote:

> I'm not sure this is the right change. I agree there are a lot of
> clean up on error bugs. pci_set_consistent_dma_mask() isn't optional
> because the chip doesn't support 32-bit PCIe addressing.

Do you think pci_enable_pcie_error_reporting is optional? My test
machine doesn't have a chipset with AER so that will always fail (or
something), the driver hasn't exploded yet ... :)

> I'll look it over and try to come up with something better.
>
> I'll submit a patch for the CONFIG_PCI_MSI not set case.

The resource leaks suck too, since you have to reboot to try again.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] [IB/QIB] Fix QIB early error reporting prints

2010-10-22 Thread Ralph Campbell
Acked-by: Ralph Campbell 

From: Jason Gunthorpe [jguntho...@obsidianresearch.com]
Sent: Friday, October 22, 2010 1:41 PM
To: Ralph Campbell; RDMA list
Subject: [PATCH] [IB/QIB] Fix QIB early error reporting prints

Noticed this odd looking thing in dmesg:

ib_qib :02:00.0: <3>ib_qib: Unable to enable pcie error reporting: -5

Which is due to a bad use of dev_info.

Signed-off-by: Jason Gunthorpe 
---
 drivers/infiniband/hw/qib/qib.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index 3593983..0268326 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -1402,7 +1402,7 @@ extern struct mutex qib_mutex;
  */
 #define qib_early_err(dev, fmt, ...) \
do { \
-   dev_info(dev, KERN_ERR QIB_DRV_NAME ": " fmt, ##__VA_ARGS__); \
+   dev_err(dev, fmt, ##__VA_ARGS__); \
} while (0)

 #define qib_dev_err(dd, fmt, ...) \
--
1.6.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] [IB/QIB] Fix failure to load driver if PCI error reporting doesn't enable

2010-10-22 Thread Ralph Campbell
I'm not sure this is the right change. I agree there are a lot of clean up on 
error
bugs. pci_set_consistent_dma_mask() isn't optional because the chip doesn't
support 32-bit PCIe addressing.
I'll look it over and try to come up with something better.

I'll submit a patch for the CONFIG_PCI_MSI not set case.

From: linux-rdma-ow...@vger.kernel.org [linux-rdma-ow...@vger.kernel.org] On 
Behalf Of Jason Gunthorpe [jguntho...@obsidianresearch.com]
Sent: Friday, October 22, 2010 1:42 PM
To: Ralph Campbell; RDMA list
Subject: [PATCH] [IB/QIB] Fix failure to load driver if PCI error reporting 
doesn't enable

This seems to be the intention of the code, since the jump to bail
is missing. PCI-E advanced error reporting seems optional, but
I wonder if pci_set_consistent_dma_mask is also optional?

This also fixes one case where the PCI region is leaked during
device startup. qib_init_one assumes that qib_pcie_init cleans up if
it fails.

Note: There appear to be several other leaks of the PCI region in
qib_init_one between qib_pcie_init and qib_init that I did not attempt
to fix, and a null pointer de-reference if CONFIG_PCI_MSI is not set
for 6120.

Signed-off-by: Jason Gunthorpe 
---
 drivers/infiniband/hw/qib/qib_pcie.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_pcie.c 
b/drivers/infiniband/hw/qib/qib_pcie.c
index 7fa6e55..16ce9e7 100644
--- a/drivers/infiniband/hw/qib/qib_pcie.c
+++ b/drivers/infiniband/hw/qib/qib_pcie.c
@@ -113,6 +113,7 @@ int qib_pcie_init(struct pci_dev *pdev, const struct 
pci_device_id *ent)
qib_early_err(&pdev->dev,
  "Unable to enable pcie error reporting: %d\n",
  ret);
+   ret = 0;
goto done;

 bail:
--
1.6.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: BUG dump from QIB

2010-10-22 Thread Ralph Campbell
The idea was to have QPs allocated to queues with specific CPUs
processing the queue "local" to the process that created it.
Since IRQs get distributed to all CPUs by default (at least with irqbalance)
and the fact that the CPU that created the QP may not be the CPU doing
the posts, it didn't help. I'll submit a patch to remove the "feature" :-).

From: Jason Gunthorpe [jguntho...@obsidianresearch.com]
Sent: Friday, October 22, 2010 1:40 PM
To: Ralph Campbell; RDMA list
Subject: BUG dump from QIB

Hi Ralph,

It doesn't look like smp_processor_id is callable the way QIB does, I
get kernel BUGS:

This is a stock upstream with CONFIG_PREEMPT/DEBUG_PREEMPT
$ uname -a
Linux ib6 2.6.35.7 #17 SMP PREEMPT Thu Oct 7 15:38:53 MDT 2010 x86_64 GNU/Linux

[   36.769466] BUG: using smp_processor_id() in preemptible [] code: 
insmod/5858
[   36.769500] caller is qib_create_qp+0x56c/0x745 [ib_qib]
[   36.769505] Pid: 5858, comm: insmod Not tainted 2.6.35.7 #17
[   36.769508] Call Trace:
[   36.769520]  [] debug_smp_processor_id+0xc2/0xdc
[   36.769538]  [] qib_create_qp+0x56c/0x745 [ib_qib]
[   36.769545]  [] ? sub_preempt_count+0x92/0xa5
[   36.769550]  [] ? get_parent_ip+0x11/0x41
[   36.769554]  [] ? get_parent_ip+0x11/0x41
[   36.769573]  [] ib_create_qp+0x18/0x8e [ib_core]
[   36.769582]  [] create_mad_qp+0x7d/0xce [ib_mad]
[   36.769589]  [] ? qp_event_handler+0x0/0x1e [ib_mad]
[   36.769596]  [] ib_mad_init_device+0x2cf/0x6ad [ib_mad]
[   36.769603]  [] ? _raw_spin_unlock_irqrestore+0x2c/0x37
[   36.769613]  [] ib_register_device+0x35d/0x401 [ib_core]
[   36.769632]  [] ? qib_create_port_files+0x0/0x265 [ib_qib]
[   36.769654]  [] qib_register_ib_device+0x76a/0x8a8 [ib_qib]
[   36.769673]  [] qib_init_one+0xdd/0x4f7 [ib_qib]
[   36.769680]  [] local_pci_probe+0x12/0x16
[   36.769685]  [] pci_device_probe+0x5f/0x89
[   36.769690]  [] ? driver_sysfs_add+0x4c/0x72
[   36.769695]  [] driver_probe_device+0xa3/0x151
[   36.769699]  [] __driver_attach+0x58/0x7b
[   36.769704]  [] ? __driver_attach+0x0/0x7b
[   36.769708]  [] bus_for_each_dev+0x4e/0x85
[   36.769712]  [] driver_attach+0x1c/0x1e
[   36.769716]  [] bus_add_driver+0xb8/0x20d
[   36.769721]  [] driver_register+0xb3/0x121
[   36.769736]  [] ? qlogic_ib_init+0x0/0x126 [ib_qib]
[   36.769740]  [] __pci_register_driver+0x51/0xbc
[   36.769754]  [] ? qlogic_ib_init+0x0/0x126 [ib_qib]
[   36.769769]  [] qlogic_ib_init+0xc3/0x126 [ib_qib]
[   36.769775]  [] do_one_initcall+0x5a/0x14a
[   36.769781]  [] sys_init_module+0x9a/0x1d8

Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.6.36-rc7] infiniband: update workqueue usage

2010-10-19 Thread Ralph Campbell
On Tue, 2010-10-19 at 08:24 -0700, Tejun Heo wrote:

> * qib_cq_wq is a separate singlethread workqueue.  Does the queue
>   require strict single thread execution ordering?  IOW, does each
>   work have to be executed in the exact queued order and no two works
>   should execute in parallel?  Or was the singlethreadedness chosen
>   just to reduce the number of workers?

The work functions need to be called in-order and single threaded
or memory will be freed multiple times and other "bad things".

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv10 03/12] ib_core: IBoE UD packet packing support

2010-10-14 Thread Ralph Campbell


From: linux-rdma-ow...@vger.kernel.org [linux-rdma-ow...@vger.kernel.org] On 
Behalf Of Eli Cohen [...@dev.mellanox.co.il]
Sent: Thursday, October 14, 2010 2:57 PM
To: Roland Dreier
Cc: RDMA list
Subject: Re: [PATCHv10 03/12] ib_core: IBoE UD packet packing support

On Thu, Oct 14, 2010 at 02:43:30PM -0700, Roland Dreier wrote:
>
> void ib_ud_header_init(intpayload_bytes,
>  int  lrh_present,
>  int  eth_present,
>  int  grh_present,
>  int  immediate_present,
>  struct ib_ud_header *header)
> {
>   memset(header, 0, sizeof *header);
>
>   if (lrh_present) {
>   u16 packet_length;
>
>   header->lrh.link_version = 0;
>   header->lrh.link_next_header =
>   grh_present ? IB_LNH_IBA_GLOBAL : IB_LNH_IBA_LOCAL;
>   packet_length = (IB_LRH_BYTES   +
>IB_BTH_BYTES   +
>IB_DETH_BYTES  +
>grh_present ? IB_GRH_BYTES : 0 +
>payload_bytes  +
>4  + /* ICRC */
>3) / 4;  /* round up */
>   header->lrh.packet_length = cpu_to_be16(packet_length);
>   }
>
>   if (grh_present) {
>   header->grh.ip_version  = 6;
>   header->grh.payload_length  =
>   cpu_to_be16((IB_BTH_BYTES +
>IB_DETH_BYTES+
>payload_bytes+
>4+ /* ICRC */
>3) & ~3);  /* round up */
>   header->grh.next_header = 0x1b;
>   }
>
>   if (header->immediate_present)

This should be "if (immediate_present)".
header->immediate_present is zero due to the memset() above.

>   header->bth.opcode   = 
> IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE;
>   else
>   header->bth.opcode   = IB_OPCODE_UD_SEND_ONLY;
>   header->bth.pad_count= (4 - payload_bytes) & 3;
>   header->bth.transport_header_version = 0;
>
>   header->lrh_present = lrh_present;
>   header->eth_present = eth_present;
>   header->grh_present = grh_present;
>   header->immediate_present = immediate_present;
> }
>
> which I think is reasonably clean for now.

Looks good.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Work completions generated after a queue pair has made the transition to an error state

2010-10-12 Thread Ralph Campbell
I haven't seen it. It isn't supposed to happen.

What hardware and software are you using and how do you
reproduce it?


On Tue, 2010-10-12 at 11:38 -0700, Bart Van Assche wrote:
> Hello,
> 
> Has anyone already tried to process the work completions generated by
> a HCA after the state of a queue pair has been changed to IB_QPS_ERR ?
> With the hardware/firmware/driver combination I have tested I have
> observed the following:
> * Multiple completions with the same wr_id and nonzero (error) status
> were received by the application, while all work requests queued with
> the flag IB_SEND_SIGNALED had a unique wr_id.
> * Completions with non-zero (error) status and a wr_id / opcode
> combination were received that were never queued by the application.
> Note: some work requests were queued with and some without the flag
> IB_SEND_SIGNALED. I'm not sure however whether that has anything to do
> with the observed behavior.
> 
> This behavior is easy to reproduce. If I interpret the InfiniBand
> Architecture Specification correctly, this behavior is non-compliant.
> 
> Has anyone been looking into this before ?
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path

2010-09-30 Thread Ralph Campbell
I was looking at the Rx connection tear down and found a bug.
I don't know if it would cause this panic but you might try it.
I haven't stress tested it but it compiles and basic network
connections work.

I also don't like the call to cancel_delayed_work(&priv->cm.stale_task)
at the end of ipoib_cm_dev_stop(). I think it should be called after
ib_destroy_cm_id() and priv->cm.id = NULL.

On Thu, 2010-09-02 at 20:41 -0700, Pradeep Satyanarayana wrote:
> Ralph,
> 
> I see the following crash sporadically (only under stress) with a Sles11SP1 
> (which is 2.6.32 kernel).
> I saw this crash with V4 of your patch and have not yet had a chance to try 
> V5. Have you seen this
> in your testing? If this not the crash stack can you please share what your 
> patch fixes?
> 
> <4>ib0: RX drain timing out
> <4>idr_remove called for id=11491974 which is not allocated.
> <4>Call Trace:
> <4>[c00749fe33b0] [c00129e4] .show_stack+0x6c/0x198 (unreliable)
> <4>[c00749fe3460] [c02ea594] .sub_remove+0x1ec/0x1f8
> <4>[c00749fe3520] [c02ea5e0] .idr_remove+0x40/0xf8
> <4>[c00749fe35b0] [d00012d84d70] .cm_destroy_id+0xa0/0x520 [ib_cm]
> <4>[c00749fe3680] [d0001b7fb644] 
> .ipoib_cm_free_rx_reap_list+0xd4/0x190 [ib_ipoib]
> <4>[c00749fe3740] [d0001b7fe404] .ipoib_cm_dev_stop+0x23c/0x360 
> [ib_ipoib]
> <4>[c00749fe3800] [d0001b7f4dbc] .ipoib_ib_dev_stop+0xe4/0x4b0 
> [ib_ipoib]
> <4>[c00749fe3960] [d0001b7f0f30] .ipoib_stop+0x88/0x178 [ib_ipoib]
> <4>[c00749fe39f0] [c04eacf4] .dev_close+0xdc/0x148
> <4>[c00749fe3a80] [c04ea2b8] .dev_change_flags+0x1f0/0x288
> <4>[c00749fe3b20] [d0001b7f11b8] .ipoib_remove_one+0xb8/0x140 
> [ib_ipoib]
> <4>[c00749fe3bc0] [d0001210425c] .ib_unregister_client+0xb4/0x1b8 
> [ib_core]
> <4>[c00749fe3c90] [d0001b7ffde8] .ipoib_cleanup_module+0x20/0x60 
> [ib_ipoib]
> <4>[c00749fe3d20] [c00ec408] .SyS_delete_module+0x238/0x320
> <4>[c00749fe3e30] [c00085b4] syscall_exit+0x0/0x40
> <1>Unable to handle kernel paging request for data at address 
> 0x4527228d1ffb
> <1>Faulting instruction address: 0xc05a8e88
> 12:mon> e
> cpu 0x12: Vector: 300 (Data Access) at [c00749fe3250]
> pc: c05a8e88: .wait_for_common+0xb8/0x268
> lr: c05a8e20: .wait_for_common+0x50/0x268
> sp: c00749fe34d0
>    msr: 80009032
>dar: 4527228d1ffb
>  dsisr: 4200
>   current = 0xc0074b4ce0e0
>   paca= 0xc0f64a00
> pid   = 13605, comm = modprobe
> 12:mon>
> 
> Thanks
> Pradeep

IB/ipoib: fix race when handling IPOIB_CM_RX_DRAIN_WRID

From: Ralph Campbell 

ipoib_cm_start_rx_drain() calls ib_post_send() and *then* moves the
struct ipoib_cm_rx onto the rx_drain_list. The ib_post_send() will
trigger a completion callback to ipoib_cm_handle_rx_wc() which
tries to move the rx_drain_list to the rx_reap_list but if the
callback happens before ipoib_cm_start_rx_drain() has moved the
structure, it is left in limbo. The fix is to change
ipoib_cm_start_rx_drain() to put the struct on the rx_drain_list and
then call ib_post_send().
Also, only move one struct from rx_flush_list to rx_drain_list since
concurrent IPOIB_CM_RX_DRAIN_WRID events on different QPs could put
multiple ipoib_cm_rx structs on rx_flush_list.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/ulp/ipoib/ipoib_cm.c |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)


diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index bb10041..dfff159 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -216,15 +216,21 @@ static void ipoib_cm_start_rx_drain(struct ipoib_dev_priv *priv)
 	!list_empty(&priv->cm.rx_drain_list))
 		return;
 
+	p = list_entry(priv->cm.rx_flush_list.next, typeof(*p), list);
+
+	/*
+	 * Put p on rx_drain_list before calling ib_post_send() or there
+	 * is a race with the ipoib_cm_handle_rx_wc() completion handler
+	 * trying to remove it from rx_drain_list.
+	 */
+	list_move(&p->list, &priv->cm.rx_drain_list);
+
 	/*
 	 * QPs on flush list are error state.  This way, a "flush
 	 * error" WC will be immediately generated for each WR we post.
 	 */
-	p = list_entry(priv->cm.rx_flush_list.next, typeof(*p), list);
 	if (ib_post_send(p->qp, &ipoib_cm_rx_drain_wr, &bad_wr))
 		ipoib_warn(priv, "failed to post drain wr\n");
-
-	list_splice_init(&priv->cm.rx_flush_list, &priv->cm.rx_drain_list);
 }
 
 static void ipoib_cm_rx_event_handler(struct ib_event *event, void *ctx)


Re: [PATCH 02/13] drivers/infiniband: Remove unnecessary casts of private_data

2010-09-07 Thread Ralph Campbell
Acked-by: Ralph Campbell 

On Sat, 2010-09-04 at 18:52 -0700, Joe Perches wrote:
> Signed-off-by: Joe Perches 
> ---
>  drivers/infiniband/hw/qib/qib_file_ops.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c 
> b/drivers/infiniband/hw/qib/qib_file_ops.c
> index 6b11645..cef5d67 100644
> --- a/drivers/infiniband/hw/qib/qib_file_ops.c
> +++ b/drivers/infiniband/hw/qib/qib_file_ops.c
> @@ -1722,7 +1722,7 @@ static int qib_close(struct inode *in, struct file *fp)
>  
>   mutex_lock(&qib_mutex);
>  
> - fd = (struct qib_filedata *) fp->private_data;
> + fd = fp->private_data;
>   fp->private_data = NULL;
>   rcd = fd->rcd;
>   if (!rcd) {
> @@ -1808,7 +1808,7 @@ static int qib_ctxt_info(struct file *fp, struct 
> qib_ctxt_info __user *uinfo)
>   struct qib_ctxtdata *rcd = ctxt_fp(fp);
>   struct qib_filedata *fd;
>  
> - fd = (struct qib_filedata *) fp->private_data;
> + fd = fp->private_data;
>  
>   info.num_active = qib_count_active_units();
>   info.unit = rcd->dd->unit;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path

2010-09-03 Thread Ralph Campbell
I haven't seen this stack trace before. Since it involves the RX
QP connection and my patch changes the TX QP connection,
I doubt my patch has any effect on this case. When I get some
time, I will look to see if I can find similar races in the RX connection
set up & tear down that might exist.

From: Pradeep Satyanarayana [prade...@linux.vnet.ibm.com]
Sent: Thursday, September 02, 2010 8:41 PM
To: Ralph Campbell
Cc: Roland Dreier; linux-rdma@vger.kernel.org
Subject: Re: [PATCH v5] IB/ipoib: fix dangling pointer references to 
ipoib_neigh and ipoib_path

Ralph,

I see the following crash sporadically (only under stress) with a Sles11SP1 
(which is 2.6.32 kernel).
I saw this crash with V4 of your patch and have not yet had a chance to try V5. 
Have you seen this
in your testing? If this not the crash stack can you please share what your 
patch fixes?

<4>ib0: RX drain timing out
<4>idr_remove called for id=11491974 which is not allocated.
<4>Call Trace:
<4>[c00749fe33b0] [c00129e4] .show_stack+0x6c/0x198 (unreliable)
<4>[c00749fe3460] [c02ea594] .sub_remove+0x1ec/0x1f8
<4>[c00749fe3520] [c02ea5e0] .idr_remove+0x40/0xf8
<4>[c00749fe35b0] [d00012d84d70] .cm_destroy_id+0xa0/0x520 [ib_cm]
<4>[c00749fe3680] [d0001b7fb644] .ipoib_cm_free_rx_reap_list+0xd4/0x190 
[ib_ipoib]
<4>[c00749fe3740] [d0001b7fe404] .ipoib_cm_dev_stop+0x23c/0x360 
[ib_ipoib]
<4>[c00749fe3800] [d0001b7f4dbc] .ipoib_ib_dev_stop+0xe4/0x4b0 
[ib_ipoib]
<4>[c00749fe3960] [d0001b7f0f30] .ipoib_stop+0x88/0x178 [ib_ipoib]
<4>[c00749fe39f0] [c04eacf4] .dev_close+0xdc/0x148
<4>[c00749fe3a80] [c04ea2b8] .dev_change_flags+0x1f0/0x288
<4>[c00749fe3b20] [d0001b7f11b8] .ipoib_remove_one+0xb8/0x140 [ib_ipoib]
<4>[c00749fe3bc0] [d0001210425c] .ib_unregister_client+0xb4/0x1b8 
[ib_core]
<4>[c00749fe3c90] [d0001b7ffde8] .ipoib_cleanup_module+0x20/0x60 
[ib_ipoib]
<4>[c00749fe3d20] [c00ec408] .SyS_delete_module+0x238/0x320
<4>[c00749fe3e30] [c00085b4] syscall_exit+0x0/0x40
<1>Unable to handle kernel paging request for data at address 0x4527228d1ffb
<1>Faulting instruction address: 0xc05a8e88
12:mon> e
cpu 0x12: Vector: 300 (Data Access) at [c00749fe3250]
pc: c05a8e88: .wait_for_common+0xb8/0x268
lr: c05a8e20: .wait_for_common+0x50/0x268
sp: c00749fe34d0
   msr: 80009032
   dar: 4527228d1ffb
 dsisr: 4200
  current = 0xc0074b4ce0e0
  paca= 0xc0f64a00
pid   = 13605, comm = modprobe
12:mon>

Thanks
Pradeep


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] ipoib: good references make good neighbors

2010-08-30 Thread Ralph Campbell
The problem with this solution is that it creates
a reference counting "loop" so that the reference
count never goes to zero.
struct neighbour in the kernel points to struct ipoib_neigh
which points back to struct neighbor. If the "back pointer"
holds a reference, then something besides ipoib_neigh_free()
has to do the neigh_release(neighbour).

I think the real fix is the patch I sent to linux-rdma:
https://patchwork.kernel.org/patch/120013/


On Mon, 2010-08-23 at 12:53 -0700, Chris Mason wrote:
> Hi everyone,
> 
> We're having a problem where a kernel tree based on 2.6.32 + OFED
> 1.5.1 is seeing random memory corruption, always in the form of zeros
> where good data is supposed to live.
> 
> CONFIG_PAGE_DEBUG_ALLOC showed a use after free here:
> 
> RIP: 0010:[]  [] 
> ipoib_neigh_free+0x16/0x59 [ib_ipoib]
> Call Trace:
>  [] ipoib_mcast_free+0x7a/0xfe [ib_ipoib]
>  [] ipoib_mcast_restart_task+0x388/0x419 [ib_ipoib]
>  [] ? need_resched+0x23/0x2d
>  [] ? ipoib_mcast_restart_task+0x0/0x419 [ib_ipoib]
>  [] worker_thread+0x149/0x1e5
>  [] ? autoremove_wake_function+0x0/0x3d
>  [] ? worker_thread+0x0/0x1e5
>  [] kthread+0x6e/0x76
>  [] child_rip+0xa/0x20
>  [] ? kthread+0x0/0x76
>  [] ? child_rip+0x0/0x20
> 
> The crashes usually pop up while rebooting (which rmmods ipoib), but we
> were able to hit it consistently by reseting IB switches, or flipping
> ports on and off.
> 
> Tina Yang noticed that when ipoib_neigh_alloc() takes a pointer to the
> neighbour struct, it doesn't take any references.  I cooked up the patch
> below and haven't been able to trigger our corruption since.
> 
> Signed-off-by: Chris Mason 
> 
> --- ofa_kernel-1.5.1/drivers/infiniband/ulp/ipoib/ipoib_main.c
> 2010-08-23 05:16:57.0 -0700
> +++ ofa_kernel-1.5.1-refs/drivers/infiniband/ulp/ipoib/ipoib_main.c   
> 2010-08-22 13:35:43.0 -0700
> @@ -919,6 +919,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
>   if (!neigh)
>   return NULL;
>  
> + neigh_hold(neighbour);
>   neigh->neighbour = neighbour;
>   neigh->dev = dev;
>   memset(&neigh->dgid.raw, 0, sizeof (union ib_gid));
> @@ -932,6 +933,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
>  void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh)
>  {
>   struct sk_buff *skb;
> + struct neighbour *neighbour = neigh->neighbour;
>   *to_ipoib_neigh(neigh->neighbour) = NULL;
>   while ((skb = __skb_dequeue(&neigh->queue))) {
>   ++dev->stats.tx_dropped;
> @@ -940,6 +942,7 @@ void ipoib_neigh_free(struct net_device 
>   if (ipoib_cm_get(neigh))
>   ipoib_cm_destroy_tx(ipoib_cm_get(neigh));
>   kfree(neigh);
> + neigh_release(neighbour);
>  }
>  
>  static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms 
> *parms)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/ipoib: Initialize ipoib_neigh list properly

2010-08-30 Thread Ralph Campbell
You are correct that it needs to be initialized but I
think you must have made an error in applying the patch.

If you look again at the latest version of the patch
this line is present.
https://patchwork.kernel.org/patch/120013/

On Tue, 2010-08-24 at 20:51 -0700, Ira Weiny wrote:
> I applied Ralph's "fix dangling pointer references to ipoib_neigh and
> ipoib_path" patch to our local RHEL based kernel and experienced crashes in
> ipoib_neigh_cleanup.  It turns out ipoib_neigh->list was not initialized
> properly.  So the following code from Ralph's patch caused issues.
> 
>   if (ipoib_cm_get(neigh))
>   ipoib_cm_destroy_tx(ipoib_cm_get(neigh));
> 
> Looking at Rolands upstream kernel it appears the same is true upstream.
> 
> The patch below initializes ipoib_neigh->list correctly.
> 
> 
> Signed-off-by: Ira Weiny 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index b4b2257..fa38ede 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -882,6 +882,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour 
> *neighbour,
>   if (!neigh)
>   return NULL;
>  
> + INIT_LIST_HEAD(&neigh->list);
>   neigh->neighbour = neighbour;
>   neigh->dev = dev;
>   memset(&neigh->dgid.raw, 0, sizeof (union ib_gid));


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/1] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path

2010-08-17 Thread Ralph Campbell
BTW, I will be around today and tomorrow to reply to comments
but then I will be on vacation through August.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/1] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path

2010-08-17 Thread Ralph Campbell
Hopefully, this is the last update needed to this patch and
it can be applied. It has been in use within QLogic for several months
without problems.

Changes from v4 to v5:

Removed the "break;" in ipoib_cm_flush_path() so now the function
comment matches the code.

Updated the error case code in ipoib_cm_tx_start() so the address
handle and cached GID are reset so the next call to ipoib_start_xmit()
will create a new CM connection.

Added netif_tx_lock_bh() in ipoib_neigh_cleanup()
in order to call ipoib_cm_destroy_tx().
I don't think it is strictly needed, I'm just being paranoid and I
don't think it is a performance path when deleting struct neighbour.

=
These are my notes on the IPoIB locking and what I figured out
in the process of creating the patch that follows.

IPoIB connected mode uses a separate QP for transmit and receive.
I will only talk about the transmit side although the some of the
data structures are used by both.

The executive summary for locking is that most things are protected
by the struct ipoib_dev_priv.lock. The network device lock,
netif_tx_lock(), netif_tx_lock_bh(), or stopping network output via
netif_stop_queue() is used to prevent ipoib_start_xmit() and
ipoib_cm_send() from being called which can access struct ipoib_neigh
and struct ipoib_cm_tx without holding locks.

struct sk_buff {
  // This pointer may hold a reference (see SKB_DST_NOREF).
  // IPoIB doesn't change this pointer so the locking rules aren't important.
  unsigned long _skb_refdst;
}

struct dst_entry {
  // The neighbour pointer holds a reference.
  // IPoIB doesn't change this pointer so the locking rules aren't important.
  struct neighbour *neighbour;

  atomic_t refcnt;
}

struct neighbour {
  // stores the IPoIB "hardware" address:
  // (control flags (one byte), QPN (3 bytes), destination GID (16 bytes),
  // padding (0 or 4 bytes), and a pointer to struct ipoib_neigh (4 or 8 bytes)
  // which is not reference counted.
  // It is protected by calling netif_tx_lock(dev) or netif_stop_queue(dev).
  // The Linux network stack can free the ipoib_neigh pointer by calling
  // ipoib_neigh_cleanup()
  uchar ha[];

  struct neigh_ops *ops;
  atomic_t refcnt;
}

struct ipoib_neigh {
  // This is protected by priv->lock and *does not* hold a reference
  struct neighbour *neighbour; // back pointer to containing struct

  // This is protected by priv->lock although it is accessed w/o
  // holding the priv->lock in ipoib_start_xmit() which means that
  // to clear the pointer, ipoib_start_xmit() has to be prevented from
  // being called if there is a chance that
  // "to_ipoib_neigh(skb_dst(skb)->neighbour)" could point to this struct.
  struct ipoib_cm_tx *cm;

  // This is protected by priv->lock and holds a reference
  struct ipoib_ah *ah;

  // Link for path->neigh_list or mcast->neigh_list.
  // This is protected by priv->lock.
  struct list_head list;
}

struct ipoib_cm_tx {
  // This is protected by priv->lock.
  struct ipoib_neigh *neigh;
}

struct ipoib_path {
  struct ib_sa_query *query; // non-NULL if SA path record query is pending

  // This is protected by priv->lock and holds a reference
  struct ipoib_ah *ah;

  struct completion   done;  // True if query is finished

  // list of all struct ipoib_neigh.list with the same ah pointer
  struct list_head neigh_list;
}

struct ipoib_dev_priv {
  // This lock protects a number of things associated with this
  // IPoIB network device.
  spinlock_t lock;

  // Contains the struct ipoib_mcast nodes indexed by MGID.
  // It is protected by priv->lock.
  struct rb_root multicast_tree;
}


Before any nodes can send IPoIB packets, the SA creates the
"all HCA multicast group GID" which encodes , ,
, .  For example, transient, link local scope, IPv4,
pkey of 0x8001, limited broadcast address we have:
FF12:401B:8001:::, in compressed format.
The group also has a P_Key, Q_Key, and MTU associated with it in the SA
path record.
This multicast group is used for IPv4 address resolution (ARP).
The group is joined when the IPoIB device is brought up and the details
saved in priv->broadcast.

The transmit process starts with the Linux networking code calling
netif_tx_lock(dev) and then calling:

ipoib_hard_header()
  // This prepends the "hardware address" onto the sk_buff header if
  // the neighbor address hasn't been set in the sk_buff.

ipoib_start_xmit(skb, dev)
  // The first time through, skb_dst(skb)->neighbour won't be set and
  // ipoib_hard_header() will have prepended the IPv4 broadcast
  // "hardware address" which we strip off and call the following.
  ipoib_mcast_send(dev, mgid, skb)
spin_lock_irqsave(&priv->lock, flags)
// Search for the mgid and create an entry if not found.
mcast = __ipoib_mcast_find(dev, mgid)
// Since mgid is probably the "all HCA multicast group GID" which
// was initialized when the network interface was started, we call:
spin_unlock_irqrestore(&priv->lock, flags)
// XXX bug? u

[PATCH v5] IB/ipoib: fix dangling pointer references to ipoib_neigh and ipoib_path

2010-08-17 Thread Ralph Campbell
Basically, a struct sk_buff has a pointer to a struct dst_entry
which has a pointer to a struct neighbour. IPoIB uses
struct neighbour.ha to store the IB "hardware address" and a pointer
to a struct ipoib_neigh. When using connected mode, struct ipoib_neigh
has a pointer to struct ipoib_cm_tx which contains pointers back to struct
ipoib_neigh and ipoib_path. The core network code will call
ipoib_neigh_cleanup() when it is destroying struct neighbour and IPoIB
should guarantee that the struct ipoib_neigh and all the memory it points
to are destroyed. Also, the connected mode RC QP contains a pointer back
to the struct ipoib_cm_tx which can be dereferenced in the CQ completion
handler.
The problem is that there are several places where the struct ipoib_cm_tx
can be used after it has been freed. The easiest way to reproduce this
bug is to run a UDP bandwidth test while loading/unloading the IPoIB
module or ifup/ifdown the interface.
The fix is rather complex since the RC QP connection setup, tear down,
and CQ completion draining are asynchronous processes. The struct
ipoib_cm_tx goes through four "states":
1) Newly created by ipoib_cm_create_tx()
   neigh!=NULL, flags==0, and linked on priv->cm.start_list.
2) Being used by ipoib_cm_tx_start()
   neigh!=NULL, not on priv->cm.start_list, flags==0, and in the process
   of starting CM.
3) Being used by CM or sending data on the RC QP
   neigh!=NULL, not on priv->cm.start_list, flags & IPOIB_FLAG_INITIALIZED.
4) Being destroyed
   tx->neigh==NULL and on priv->cm.reap_task list or being destroyed
   by ipoib_cm_tx_reap().

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/ulp/ipoib/ipoib.h   |   14 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  108 ++
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  266 
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   76 ++-
 4 files changed, 223 insertions(+), 241 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 753a983..5a842d7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -415,9 +415,7 @@ static inline struct ipoib_neigh **to_ipoib_neigh(struct 
neighbour *neigh)
 INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
- struct net_device *dev);
-void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
+void ipoib_neigh_flush(struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
 
@@ -464,7 +462,8 @@ void ipoib_dev_cleanup(struct net_device *dev);
 
 void ipoib_mcast_join_task(struct work_struct *work);
 void ipoib_mcast_carrier_on_task(struct work_struct *work);
-void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb);
+void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb,
+ struct ipoib_neigh *neigh);
 
 void ipoib_mcast_restart_task(struct work_struct *work);
 int ipoib_mcast_start_thread(struct net_device *dev);
@@ -570,6 +569,7 @@ void ipoib_cm_dev_cleanup(struct net_device *dev);
 struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct 
ipoib_path *path,
struct ipoib_neigh *neigh);
 void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx);
+void ipoib_cm_flush_path(struct net_device *dev, struct ipoib_path *path);
 void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb,
   unsigned int mtu);
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc);
@@ -659,6 +659,12 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
 }
 
 static inline
+void ipoib_cm_flush_path(struct net_device *dev, struct ipoib_path *path)
+{
+   return;
+}
+
+static inline
 int ipoib_cm_add_mode_attr(struct net_device *dev)
 {
return 0;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index bb10041..c1f3a65 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -799,31 +799,14 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct 
ib_wc *wc)
 
if (wc->status != IB_WC_SUCCESS &&
wc->status != IB_WC_WR_FLUSH_ERR) {
-   struct ipoib_neigh *neigh;
-
ipoib_dbg(priv, "failed cm send event "
   "(status=%d, wrid=%d vend_err %x)\n",
   wc->status, wr_id, wc->vendor_err);
 
spin_lock_irqsave(&priv->lock, flags);
-   neigh = tx->neigh;
-
-   if (neigh) {
-   neigh->cm = NULL;
-   list_del(&neigh->list);
-   if (neigh->ah)
-   ipoib_put_ah

RE: CQ overrun with ib_send_bw

2010-08-17 Thread Ralph Campbell
The patch is attached.

On Tue, 2010-08-17 at 04:36 -0700, Amir Ancel wrote:
> Hi Sean,
> 
> We've seen this issue as well.
> 
> Can you send the patch directly to us ?
> 
> Added Raz from my team which replaces Ido while he is OOO.
> 
> 
> Thanks,
> 
> Amir Ancel
> Performance Team Manager
> Mellanox Technologies
> 
> -Original Message-
> From: Tziporet Koren 
> Sent: Tuesday, August 17, 2010 2:19 PM
> To: Ralph Campbell; Hefty, Sean; Ido Shamay; Amir Ancel
> Cc: Sumeet Lahorani; linux-rdma@vger.kernel.org
> Subject: RE: CQ overrun with ib_send_bw
> 
> On 8/13/2010 10:21 PM, Ralph Campbell wrote:
> > On Fri, 2010-08-13 at 12:14 -0700, Hefty, Sean wrote:
> >>> I know there is a bug with "ib_send_bw -b" (bi-directional)
> >>> since it doesn't create a CQ that is large enough for all the
> >>> posted sends *and* receives.  I have tried several times to get the
> >>> following patch applied but I never got a reply and nothing was
> >>> done.
> >>
> >> Who's the maintainer of these tests?
> >
> > I believe it is:
> >
> > Ido Shamai 
> >
> > git://git.openfabrics.org/~shamoya/perftest.git
> >
> >
> 
> Yes Ido is the maintainer, however he is on vacation till Sep.
> I add Amir that may help for now
> 
> Tziporet
> 

diff --git a/send_bw.c b/send_bw.c
index ddd2b73..e3f644a 100644
--- a/send_bw.c
+++ b/send_bw.c
@@ -746,6 +746,8 @@ static struct pingpong_context *pp_init_ctx(struct ibv_device *ib_dev,
 	if (user_parm->use_mcg && !user_parm->servername) {
 		cq_rx_depth *= user_parm->num_of_clients_mcg;
 	}
+	if (user_parm->duplex)
+		cq_rx_depth += ctx->tx_depth;
 	ctx->cq = ibv_create_cq(ctx->context,cq_rx_depth, NULL, ctx->channel, 0);
 	if (!ctx->cq) {
 		fprintf(stderr, "Couldn't create CQ\n");


RE: CQ overrun with ib_send_bw

2010-08-13 Thread Ralph Campbell
On Fri, 2010-08-13 at 12:14 -0700, Hefty, Sean wrote:
> > I know there is a bug with "ib_send_bw -b" (bi-directional)
> > since it doesn't create a CQ that is large enough for all the
> > posted sends *and* receives.  I have tried several times to get the
> > following patch applied but I never got a reply and nothing was
> > done.
> 
> Who's the maintainer of these tests?

I believe it is:

Ido Shamai 

git://git.openfabrics.org/~shamoya/perftest.git

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CQ overrun with ib_send_bw

2010-08-13 Thread Ralph Campbell
I know there is a bug with "ib_send_bw -b" (bi-directional)
since it doesn't create a CQ that is large enough for all the
posted sends *and* receives.  I have tried several times to get the
following patch applied but I never got a reply and nothing was
done.

diff --git a/send_bw.c b/send_bw.c
index ddd2b73..e3f644a 100644
--- a/send_bw.c
+++ b/send_bw.c
@@ -746,6 +746,8 @@ static struct pingpong_context *pp_init_ctx(struct 
ibv_device *ib_dev,
if (user_parm->use_mcg && !user_parm->servername) {
cq_rx_depth *= user_parm->num_of_clients_mcg;
}
+   if (user_parm->duplex)
+   cq_rx_depth += ctx->tx_depth;
ctx->cq = ibv_create_cq(ctx->context,cq_rx_depth, NULL, ctx->channel, 
0);
if (!ctx->cq) {
fprintf(stderr, "Couldn't create CQ\n");

There should be enough CQEs in the normal case though.

On Fri, 2010-08-13 at 11:44 -0700, Sumeet Lahorani wrote:
> Hi,
> 
> If I run ib_send_bw with the -a option, we seem to be getting CQ overrun 
> errors.
> 
> Server :
> [r...@dscbad01 ~]# ib_send_bw
> --
> Send BW Test
> Connection type : RC
> Inline data is used up to 1 bytes message
>   local address:  LID 0x24, QPN 0x1c004c, PSN 0x85c292
>   remote address: LID 0x2a, QPN 0x14004a, PSN 0x858358
> Mtu : 2048
> --
>  #bytes #iterationsBW peak[MB/sec]BW average[MB/sec] 
> --
> 
> Client :
> [r...@dscbad03 ~]# ib_send_bw -a dscbad01
> --
> Send BW Test
> Connection type : RC
> Inline data is used up to 1 bytes message
>   local address:  LID 0x2a, QPN 0x14004a, PSN 0x858358
>   remote address: LID 0x24, QPN 0x1c004c, PSN 0x85c292
> Mtu : 2048
> --
>  #bytes #iterationsBW peak[MB/sec]BW average[MB/sec] 
>   21000   5.99  5.45
> Completion wth error at client:
> Failed status 12: wr_id 1 syndrom 0x81
> scnt=600, ccnt=300
> 
> and on the client console
> 
> mlx4_core :13:00.0: CQ overrun on CQN 86
> mlx4_core :13:00.0: Internal error detected:
> mlx4_core :13:00.0:   buf[00]: 00328f6f
> mlx4_core :13:00.0:   buf[01]: 
> mlx4_core :13:00.0:   buf[02]: 2007
> mlx4_core :13:00.0:   buf[03]: 
> mlx4_core :13:00.0:   buf[04]: 00328f3c
> mlx4_core :13:00.0:   buf[05]: 0014004a
> mlx4_core :13:00.0:   buf[06]: 0034
> mlx4_core :13:00.0:   buf[07]: 0044
> mlx4_core :13:00.0:   buf[08]: 0804
> mlx4_core :13:00.0:   buf[09]: 0804
> mlx4_core :13:00.0:   buf[0a]: 
> mlx4_core :13:00.0:   buf[0b]: 
> mlx4_core :13:00.0:   buf[0c]: 
> mlx4_core :13:00.0:   buf[0d]: 
> mlx4_core :13:00.0:   buf[0e]: 
> mlx4_core :13:00.0:   buf[0f]: 
> 
> This is with OFED 1.5.1 but it also happens with OFED 1.4.2. Sometimes, 
> the node crashes because it runs out of memory but most of the time, I 
> see just the above errors. What could be wrong?
> 
> - Sumeet
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: yet again the atomic operations

2010-08-10 Thread Ralph Campbell
On Tue, 2010-08-10 at 04:46 -0700, Rui Machado wrote:
> > There are two kinds supported. QLogic's driver does them in
> > the host driver so they are atomic with respect to all the CPUs
> > in the host.
> 
> I'm just curious about this: how does this work? Is the CPU getting
> interrupted and doing the operation while the Mellanox HCA does
> everything in hardware?

Yes.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: {RFC] ibv_post_send()/ibv_post_recv() kernel path optimizations

2010-08-06 Thread Ralph Campbell
On Fri, 2010-08-06 at 03:03 -0700, Walukiewicz, Miroslaw wrote:
> Currently the ibv_post_send()/ibv_post_recv() path through kernel 
> (using /dev/infiniband/rdmacm) could be optimized by removing dynamic memory 
> allocations on the path. 
> 
> Currently the transmit/receive path works following way:
> User calls ibv_post_send() where vendor specific function is called. 
> When the path should go through kernel the ibv_cmd_post_send() is called.
>  The function creates the POST_SEND message body that is passed to kernel. 
> As the number of sges is unknown the dynamic allocation for message body is 
> performed. 
> (see libibverbs/src/cmd.c)
> 
> In the kernel the message body is parsed and a structure of wr and sges is 
> recreated using dynamic allocations in kernel 
> The goal of this operation is having a similar structure like in user space. 
> 
> The proposed path optimization is removing of dynamic allocations 
> by redefining a structure definition passed to kernel. 
> From 
> 
> struct ibv_post_send {
> __u32 command;
> __u16 in_words;
> __u16 out_words;
> __u64 response;
> __u32 qp_handle;
> __u32 wr_count;
> __u32 sge_count;
> __u32 wqe_size;
> struct ibv_kern_send_wr send_wr[0];
> };
> To 
> 
> struct ibv_post_send {
> __u32 command;
> __u16 in_words;
> __u16 out_words;
> __u64 response;
> __u32 qp_handle;
> __u32 wr_count;
> __u32 sge_count;
> __u32 wqe_size;
> struct ibv_kern_send_wr send_wr[512];
> };
> 
> Similar change is required in kernel  struct ib_uverbs_post_send defined in 
> /ofa_kernel/include/rdma/ib_uverbs.h
> 
> This change limits a number of send_wr passed from unlimited (assured by 
> dynamic allocation) to reasonable number of 512. 
> I think this number should be a max number of QP entries available to send. 
> As the all iB/iWARP applications are low latency applications so the number 
> of WRs passed are never unlimited.
> 
> As the result instead of dynamic allocation the ibv_cmd_post_send() fills the 
> proposed structure 
> directly and passes it to kernel. Whenever the number of send_wr number 
> exceeds the limit the ENOMEM error is returned.
> 
> In kernel  in ib_uverbs_post_send() instead of dynamic allocation of the 
> ib_send_wr structures 
> the table of 512  ib_send_wr structures  will be defined and 
> all entries will be linked to unidirectional list so 
> qp->device->post_send(qp, wr, &bad_wr) API will be not changed. 
> 
> As I know no driver uses that kernel path to posting buffers so iWARP 
> multicast acceleration implemented in NES driver 
> Would be a first application that can utilize the optimized path. 
> 
> Regards,
> 
> Mirek
> 
> Signed-off-by: Mirek Walukiewicz 

The libipathverbs.so plug-in for libibverbs and
the ib_ipath and ib_qib kernel modules use this path for
ibv_post_send().

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: yet again the atomic operations

2010-08-06 Thread Ralph Campbell
On Fri, 2010-08-06 at 04:43 -0700, Rui Machado wrote:
> Hi there,
> 
> > There are two kinds supported. QLogic's driver does them in
> > the host driver so they are atomic with respect to all the CPUs
> > in the host. Mellanox uses HCA wide atomic which means the
> > HCA will do a memory read/write without allowing other reads
> > or writes from different QP operations passing through that
> > HCA to get in between. The CPUs on the host won't see
> > atomic operations since from their perspective, it looks
> > like a normal read and write from the PCIe bus.
> 
> So if the CPU writes/reads to/from the same address, even atomically
> (lock), there might be room for some inconsistency on the values? It
> is not really atomic from the whole system point of view, just for the
> HCA? If so, is there any possibility to make the whole operation
> 'system-wide' atomic?

Correct.
It won't be consistent from the HCA's point of view if other HCAs
or CPUs are modifying the memory - even if they do it atomically.
It is only consistent if a single HCA is doing atomic ops to the
memory.

There is no possibility to change this unless PCIe atomic
operations are used by the HCA and if the root complex supports
atomic operations. I don't know of any HCAs or root complex
chips which have this support yet.

> > You can see what type the HCA supports with "ibv_devinfo -v"
> > and look for "atomic_cap: ATOMIC_HCA (1)" or
> > "atomic_cap: ATOMIC_GLOB (2)".
> 
> ATOMIC_HCA (1) is what I see in my Mellanox hardware. This is the case
> you mentioned, "without allowing other reads or writes from different
> QP operations passing through that HCA to get in between"
> ATOMIC_GLOB (2) means with respect to all HCAs and even the CPU?

Correct.

> Cheers,
> Rui
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: yet again the atomic operations

2010-08-05 Thread Ralph Campbell
Atomic ops are optional. Mellanox and QLogic HCAs support them,
I don't know about the other HCAs.

There are two kinds supported. QLogic's driver does them in
the host driver so they are atomic with respect to all the CPUs
in the host. Mellanox uses HCA wide atomic which means the
HCA will do a memory read/write without allowing other reads
or writes from different QP operations passing through that
HCA to get in between. The CPUs on the host won't see
atomic operations since from their perspective, it looks
like a normal read and write from the PCIe bus.

You can see what type the HCA supports with "ibv_devinfo -v"
and look for "atomic_cap: ATOMIC_HCA (1)" or
"atomic_cap: ATOMIC_GLOB (2)".

On Wed, 2010-08-04 at 05:32 -0700, Rui Machado wrote:
> Hi all,
> 
> I would like to know how do the IB atomic operations work and how much
> can one rely on them? Particularly, how does the interaction with the
> CPU takes place when for example the CPU is referencing and possibly
> modifying the same address, atomically . Does one need to take care of
> memory say, with memory barriers?
> Does anyone has experience with this? There must be also different
> hardware support from the vendors, right?
> 
> I hope I'm not sounding too vague, I'm not such a kernel or hw guy. I
> tried to look for information on this but found pretty much nothing.
> Can the experts shed some light on the problem?
> 
> Cheers,
> Rui
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to know if SRQ is being used in SRP?

2010-08-05 Thread Ralph Campbell
SRP doesn't use a SRQ.

Look at drivers/infiniband/ulp/srp/ib_srp.c for
ib_create_qp() and the init_attr.cap.max_recv_wr is set
and init_attr.ib_srq is not set.

On Thu, 2010-08-05 at 09:38 -0700, Suresh Shelvapille wrote:
> Folks:
> 
> I have the envious task of figuring out whether SRQ is being used in an SRP 
> application (which I have very little
> knowledge about). I only have IB transport layer traces from the customer. 
> From the traces I can see Valid Credits in the Syndrome field of the AETH in 
> RDMA_ReadResponse_First/last messages.
> Although there are a bunch of RC SENDs where the AETH syndrome field is 
> Invalid, unfortunately I cannot seem to
> correlate these RC SENDS with the RDMA read/responses. 
> My questions is, since the RDMA response AETH field has valid CC credits is 
> it saying SRQ is not being used?
> 
> Many thanks,
> Suri   
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] IB/qib: fix race between qib_error_qp() and receive packet processing

2010-08-02 Thread Ralph Campbell
When transitioning a QP to the error state, in progress RWQEs need
to be marked complete. This also involves releasing the reference
count to the memory regions referenced in the SGEs. The locking in
the receive packet processing wasn't sufficient to prevent
qib_error_qp() from modifying the r_sge state at the same time,
thus leading to kernel panics.

Signed-off-by: Ralph Campbell 
---

 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_qp.c 
b/drivers/infiniband/hw/qib/qib_qp.c
index e0f65e3..6c39851 100644
--- a/drivers/infiniband/hw/qib/qib_qp.c
+++ b/drivers/infiniband/hw/qib/qib_qp.c
@@ -450,7 +450,7 @@ static void clear_mr_refs(struct qib_qp *qp, int clr_sends)
  *
  * Flushes both send and receive work queues.
  * Returns true if last WQE event should be generated.
- * The QP s_lock should be held and interrupts disabled.
+ * The QP r_lock and s_lock should be held and interrupts disabled.
  * If we are already in error state, just return.
  */
 int qib_error_qp(struct qib_qp *qp, enum ib_wc_status err)
diff --git a/drivers/infiniband/hw/qib/qib_rc.c 
b/drivers/infiniband/hw/qib/qib_rc.c
index 40c0a37..a093111 100644
--- a/drivers/infiniband/hw/qib/qib_rc.c
+++ b/drivers/infiniband/hw/qib/qib_rc.c
@@ -868,7 +868,7 @@ done:
 
 /*
  * Back up requester to resend the last un-ACKed request.
- * The QP s_lock should be held and interrupts disabled.
+ * The QP r_lock and s_lock should be held and interrupts disabled.
  */
 static void qib_restart_rc(struct qib_qp *qp, u32 psn, int wait)
 {
@@ -911,7 +911,8 @@ static void rc_timeout(unsigned long arg)
struct qib_ibport *ibp;
unsigned long flags;
 
-   spin_lock_irqsave(&qp->s_lock, flags);
+   spin_lock_irqsave(&qp->r_lock, flags);
+   spin_lock(&qp->s_lock);
if (qp->s_flags & QIB_S_TIMER) {
ibp = to_iport(qp->ibqp.device, qp->port_num);
ibp->n_rc_timeouts++;
@@ -920,7 +921,8 @@ static void rc_timeout(unsigned long arg)
qib_restart_rc(qp, qp->s_last_psn + 1, 1);
qib_schedule_send(qp);
}
-   spin_unlock_irqrestore(&qp->s_lock, flags);
+   spin_unlock(&qp->s_lock);
+   spin_unlock_irqrestore(&qp->r_lock, flags);
 }
 
 /*
@@ -1414,10 +1416,6 @@ static void qib_rc_rcv_resp(struct qib_ibport *ibp,
 
spin_lock_irqsave(&qp->s_lock, flags);
 
-   /* Double check we can process this now that we hold the s_lock. */
-   if (!(ib_qib_state_ops[qp->state] & QIB_PROCESS_RECV_OK))
-   goto ack_done;
-
/* Ignore invalid responses. */
if (qib_cmp24(psn, qp->s_next_psn) >= 0)
goto ack_done;
@@ -1661,9 +1659,6 @@ static int qib_rc_rcv_error(struct qib_other_headers 
*ohdr,
ibp->n_rc_dupreq++;
 
spin_lock_irqsave(&qp->s_lock, flags);
-   /* Double check we can process this now that we hold the s_lock. */
-   if (!(ib_qib_state_ops[qp->state] & QIB_PROCESS_RECV_OK))
-   goto unlock_done;
 
for (i = qp->r_head_ack_queue; ; i = prev) {
if (i == qp->s_tail_ack_queue)
@@ -1878,9 +1873,6 @@ void qib_rc_rcv(struct qib_ctxtdata *rcd, struct 
qib_ib_header *hdr,
psn = be32_to_cpu(ohdr->bth[2]);
opcode >>= 24;
 
-   /* Prevent simultaneous processing after APM on different CPUs */
-   spin_lock(&qp->r_lock);
-
/*
 * Process responses (ACKs) before anything else.  Note that the
 * packet sequence number will be for something in the send work
@@ -1891,14 +1883,14 @@ void qib_rc_rcv(struct qib_ctxtdata *rcd, struct 
qib_ib_header *hdr,
opcode <= OP(ATOMIC_ACKNOWLEDGE)) {
qib_rc_rcv_resp(ibp, ohdr, data, tlen, qp, opcode, psn,
hdrsize, pmtu, rcd);
-   goto runlock;
+   return;
}
 
/* Compute 24 bits worth of difference. */
diff = qib_cmp24(psn, qp->r_psn);
if (unlikely(diff)) {
if (qib_rc_rcv_error(ohdr, data, qp, opcode, psn, diff, rcd))
-   goto runlock;
+   return;
goto send_ack;
}
 
@@ -2090,9 +2082,6 @@ send_last:
if (next > QIB_MAX_RDMA_ATOMIC)
next = 0;
spin_lock_irqsave(&qp->s_lock, flags);
-   /* Double check we can process this while holding the s_lock. */
-   if (!(ib_qib_state_ops[qp->state] & QIB_PROCESS_RECV_OK))
-   goto srunlock;
if (unlikely(next == qp->s_tail_ack_queue)) {
if (!qp->s_ack_queue[next].sent)
goto nack_inv_unlck;
@@ -2146,7 +2135,7 @@ send_last:
qp->s_flags |= QIB_S_RESP_PENDING;
qib

[PATCH 1/2] IB/qib: limit the number of packets processed per interrupt

2010-08-02 Thread Ralph Campbell
Don't processes too many packets without allowing other IRQ
functions a chance to run. Otherwise, there is a chance of
getting "soft lockup" messages and poor application response times.

Signed-off-by: Ralph Campbell 
---

 0 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_driver.c 
b/drivers/infiniband/hw/qib/qib_driver.c
index f15ce07..9cd1936 100644
--- a/drivers/infiniband/hw/qib/qib_driver.c
+++ b/drivers/infiniband/hw/qib/qib_driver.c
@@ -335,7 +335,7 @@ u32 qib_kreceive(struct qib_ctxtdata *rcd, u32 *llic, u32 
*npkts)
smp_rmb();  /* prevent speculative reads of dma'ed hdrq */
}
 
-   for (last = 0, i = 1; !last; i += !last) {
+   for (last = 0, i = 1; !last && i <= 64; i += !last) {
hdr = dd->f_get_msgheader(dd, rhf_addr);
eflags = qib_hdrget_err_flags(rhf_addr);
etype = qib_hdrget_rcv_type(rhf_addr);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Suspected SPAM] Re: [RFC PATCH 2/4] uverbs: Add common ib_iomem_get service

2010-07-29 Thread Ralph Campbell
On Thu, 2010-07-29 at 15:57 -0700, Jason Gunthorpe wrote:
> > You would need to modify ib_umem_get() to check for the VM_PFNMAP
> > flag and build the struct ib_umem similar to the proposed
> > ib_iomem_get(). However, the page reference counting/sharing issue
> > would need to be solved. I think there are kernel level callbacks
> > for this that could be used.
> 
> But in this case the pages are already mmaped into a user process,
> there must be some mechanism to ensure they don't get pulled away?!

Yes. The VM_PFNMAP flag means the device is the owner of the memory
and has to handle unmmap. Since the rest of the kernel won't allow
that memory to be shared, swapped out, etc., only the driver has to
be notified or handle when it goes away.

> Though, I guess, what happens if you hot un-plug the PCI-E card that
> has a process mmaping its memory?!

The hot un-plug would have to close the device first which would
unmmap the memory or the device would not allow the hot un-plug
to happed by saying it is busy.

> What happens if you RDMA READ from PCI-E address space that does not
> have any device responding?

Well, RDMA reads with a Mellanox style HCA may cause problems
because I don't know if they support PCIe to PCIe DMAs.
If the device isn't responding to PCIe reads of its memory,
the root complex will timeout the read transaction and signal
the initiator (CPU) which will cause a bus error signal.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Suspected SPAM] Re: [RFC PATCH 2/4] uverbs: Add common ib_iomem_get service

2010-07-29 Thread Ralph Campbell
On Thu, 2010-07-29 at 13:41 -0700, Jason Gunthorpe wrote:
> On Thu, Jul 29, 2010 at 03:29:37PM -0500, Tom Tucker wrote:
> 
> >> Also, I'd like to see a strong defence of this new user space API
> >> particularly:
> >>   1) Why can't this be done with the existing ibv_reg_mr, like huge
> >>  pages are.
> 
> > The ibv_reg_mr API assumes that the memory being registered was  
> > allocated in user mode and is part of the current->mm VMA. It uses  
> > get_user_pages which will scoff and jeer at kernel memory.
> 
> I'm confused? What is the vaddr input then? How does userspace get
> that value? Isn't it created by mmap or the like?
> 
> Ie for the PCI-E example you gave I assume the flow is that userspace
> mmaps devices/pci:00/:00:XX.X/resourceX to get the IO memory
> and then passes that through to ibv_reg_mr?
> 
> IMHO, ibv_reg_mr API should accept any valid vaddr available to the
> process and if it bombs for certain kinds of vaddrs then it is just a
> bug..

The reason that get_user_pages() returns an error for
VM_IO and VM_PFNMAP mappings is that there may not be a struct page
associated with the physical address so there is no general way
for the VM layer to allow sharing the page and know when the page
(not the mapping into user space) is not being used.

I looked at this issue for supporting Nvidia CUDA driver which
sets the VM_IO flag on its memory mmaped into user space.
Nvidia wanted to add a set of callback functions and a new
get_driver_pages() function which should be called instead of
get_user_pages() very similar to what is being proposed here for
VM_PFNMAP.
My solution to the problem was to change the Nvidia driver by
not setting VM_IO (since the memory was normal kernel memory with
a struct page) but Nvidia didn't like that solution.

> >>   2) How is it possible for userspace to know when it should use
> >>  ibv_reg_mr vs ibv_reg_io_mr?
> 
> > By virtue of the device that it is mmap'ing. If I mmap my_vmstat_driver,  
> > I know that the memory I am mapping is a kernel buffer.
> 
> Yah, but what if the next version of your vmstat driver changes the
> kind of memory it returns?
> 
> >> On first glance, this seems like a hugely bad API to me :)
> 
> > Well hopefully now that it's purpose is revealed you will change your  
> > view and we can collaboratively make it better :-)
> 
> I don't object to the idea, just to the notion that user space is
> supposed to somehow know that one vaddr is different from another
> vaddr and call the right API - seems impossible to use correctly to
> me.

I agree unless there is some function that can be called which
calls into the kernel and returns an enum or something to
indicate which call to use.

> What would you have to do to implement this using scheme using
> ibv_reg_mr as the entry point?
> 
> Jason

You would need to modify ib_umem_get() to check for the VM_PFNMAP
flag and build the struct ib_umem similar to the proposed
ib_iomem_get(). However, the page reference counting/sharing issue
would need to be solved. I think there are kernel level callbacks
for this that could be used.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] libibverbs: Add reg/unreg I/O memory verbs

2010-07-29 Thread Ralph Campbell
How does an application know when to call ibv_reg_io_mr()
instead of ibv_reg_mr()? It isn't going to know that some
address returned by mmap() is going to have the VM_PFNMAP
flag set.

How does an application know that the HCA supports
ibv_reg_io_mr() or not? (see below)
I think returning ENOTSUP or something would be good.

On Thu, 2010-07-29 at 09:32 -0700, Tom Tucker wrote:
> From: Tom Tucker 
> 
> Add the ibv_reg_io_mr and ibv_dereg_io_mr verbs.
> 
> Signed-off-by: Tom Tucker 
> ---
> 
>  include/infiniband/driver.h |6 ++
>  include/infiniband/verbs.h  |   14 ++
>  src/verbs.c |   35 +++
>  3 files changed, 55 insertions(+), 0 deletions(-)
> 
> diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h
> index 9a81416..37c0ed1 100644
> --- a/include/infiniband/driver.h
> +++ b/include/infiniband/driver.h
> @@ -82,6 +82,12 @@ int ibv_cmd_reg_mr(struct ibv_pd *pd, void *addr, size_t 
> length,
>  size_t cmd_size,
>  struct ibv_reg_mr_resp *resp, size_t resp_size);
>  int ibv_cmd_dereg_mr(struct ibv_mr *mr);
> +int ibv_cmd_reg_io_mr(struct ibv_pd *pd, void *addr, size_t length,
> +   uint64_t hca_va, int access,
> +   struct ibv_mr *mr, struct ibv_reg_io_mr *cmd,
> +   size_t cmd_size,
> +   struct ibv_reg_io_mr_resp *resp, size_t resp_size);
> +int ibv_cmd_dereg_io_mr(struct ibv_mr *mr);
>  int ibv_cmd_create_cq(struct ibv_context *context, int cqe,
> struct ibv_comp_channel *channel,
> int comp_vector, struct ibv_cq *cq,
> diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
> index 0f1cb2e..a0d969a 100644
> --- a/include/infiniband/verbs.h
> +++ b/include/infiniband/verbs.h
> @@ -640,6 +640,9 @@ struct ibv_context_ops {
>   size_t length,
>   int access);
>   int (*dereg_mr)(struct ibv_mr *mr);
> +struct ibv_mr * (*reg_io_mr)(struct ibv_pd *pd, void *addr, 
> size_t length,
> +  int access);
> +int (*dereg_io_mr)(struct ibv_mr *mr);
>   struct ibv_mw * (*alloc_mw)(struct ibv_pd *pd, enum ibv_mw_type 
> type);
>   int (*bind_mw)(struct ibv_qp *qp, struct ibv_mw *mw,
>  struct ibv_mw_bind *mw_bind);

Doesn't adding these in the middle of the struct break the
libibverbs to libxxxverbs.so binary interface?
Shouldn't they be added at the end of the struct?
I'm not sure how the versioning works between libibverbs and
device plugins. Don't we need to protect against libibverbs
being upgraded but the libxxxverbs.so being older?

> @@ -801,6 +804,17 @@ struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr,
>  int ibv_dereg_mr(struct ibv_mr *mr);
>  
>  /**
> + * ibv_reg_io_mr - Register a physical memory region
> + */
> +struct ibv_mr *ibv_reg_io_mr(struct ibv_pd *pd, void *addr,
> + size_t length, int access);
> +
> +/**
> + * ibv_dereg_io_mr - Deregister a physical memory region
> + */
> +int ibv_dereg_io_mr(struct ibv_mr *mr);
> +
> +/**
>   * ibv_create_comp_channel - Create a completion event channel
>   */
>  struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context 
> *context);
> diff --git a/src/verbs.c b/src/verbs.c
> index ba3c0a4..7d215c1 100644
> --- a/src/verbs.c
> +++ b/src/verbs.c
> @@ -189,6 +189,41 @@ int __ibv_dereg_mr(struct ibv_mr *mr)
>  }
>  default_symver(__ibv_dereg_mr, ibv_dereg_mr);
>  
> +struct ibv_mr *__ibv_reg_io_mr(struct ibv_pd *pd, void *addr,
> +   size_t length, int access)
> +{
> +struct ibv_mr *mr;
> +
> +if (ibv_dontfork_range(addr, length))
> +return NULL;
> +
> +mr = pd->context->ops.reg_io_mr(pd, addr, length, access);

Won't reg_io_mr pointer be NULL for other HCAs?
What happens if the device doesn't yet implement this function?

> +if (mr) {
> +mr->context = pd->context;
> +mr->pd  = pd;
> +mr->addr= addr;
> +mr->length  = length;
> +} else
> +ibv_dofork_range(addr, length);
> +
> +return mr;
> +}
> +default_symver(__ibv_reg_io_mr, ibv_reg_io_mr);
> +
> +int __ibv_dereg_io_mr(struct ibv_mr *mr)
> +{
> +int ret;
> +void *addr  = mr->addr;
> +size_t length   = mr->length;
> +
> +ret = mr->context->ops.dereg_io_mr(mr);
> +if (!ret)
> +ibv_dofork_range(addr, length);
> +
> +return ret;
> +}
> +default_symver(__ibv_dereg_io_mr, ibv_dereg_io_mr);
> +
>  static struct ibv_comp_channel *ibv_create_comp_channel_v2(struct 
> ibv_context *context)
>  {
>   struct ibv_abi_compat_v2 *t = context->abi_compat;

Re: [RFC PATCH 2/4] uverbs: Add common ib_iomem_get service

2010-07-29 Thread Ralph Campbell
I forgot to ask how pages are marked as being locked.
I see that the user process amount of locked memory is
adjusted but the actual pages themselves aren't converted
to struct page and the refcount incremented.
Presumably, the device which created the
vma->vm_flags & VM_PFNMAP mapping "owns" the pages and IB
is sharing that mapping.
I'm worried about what happens if the vma mapping is modified or
unmapped.
I guess the ummunotify code could be used to tell the
application of this event but that might be after the
page is reallocated and an IB DMA to the page corrupts it.

A few more comments below...


On Thu, 2010-07-29 at 12:07 -0700, Tom Tucker wrote:
> On 7/29/10 1:22 PM, Ralph Campbell wrote:
> > On Thu, 2010-07-29 at 09:25 -0700, Tom Tucker wrote:
> >
> >> From: Tom Tucker
> >>
> >> Add an ib_iomem_get service that converts a vma to an array of
> >> physical addresses. This makes it easier for each device driver to
> >> add support for the reg_io_mr provider method.
> >>
> >> Signed-off-by: Tom Tucker
> >> ---
> >>
> >>   drivers/infiniband/core/umem.c |  248 
> >> ++--
> >>   include/rdma/ib_umem.h |   14 ++
> >>   2 files changed, 251 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/core/umem.c 
> >> b/drivers/infiniband/core/umem.c
> >> index 415e186..f103956 100644
> >> --- a/drivers/infiniband/core/umem.c
> >> +++ b/drivers/infiniband/core/umem.c
> >>  
> > ...
> >
> >> @@ -292,3 +295,226 @@ int ib_umem_page_count(struct ib_umem *umem)
> >>return n;
> >>   }
> >>   EXPORT_SYMBOL(ib_umem_page_count);
> >> +/*
> >> + * Return the PFN for the specified address in the vma. This only
> >> + * works for a vma that is VM_PFNMAP.
> >> + */
> >> +static unsigned long follow_io_pfn(struct vm_area_struct *vma,
> >> + unsigned long address, int write)
> >> +{
> >> +  pgd_t *pgd;
> >> +  pud_t *pud;
> >> +  pmd_t *pmd;
> >> +  pte_t *ptep, pte;
> >> +  spinlock_t *ptl;
> >> +  unsigned long pfn;
> >> +  struct mm_struct *mm = vma->vm_mm;
> >> +
> >> +  BUG_ON(0 == (vma->vm_flags&  VM_PFNMAP));
> >>  
> > Why use BUG_ON?
> > WARN_ON is more appropriate but
> > if (!(vma->vm_flags&  VM_PFNMAP))
> > return 0;
> > seems better.
> > In fact, move it outside the inner do loop in ib_get_io_pfn().
> >
> >
> It's paranoia from the debug phase. It's already in the 'outer loop'. I 
> should just delete it I think.
> >> +  pgd = pgd_offset(mm, address);
> >> +  if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
> >> +  return 0;
> >> +
> >> +  pud = pud_offset(pgd, address);
> >> +  if (pud_none(*pud))
> >> +  return 0;
> >> +  if (unlikely(pud_bad(*pud)))
> >> +  return 0;
> >> +
> >> +  pmd = pmd_offset(pud, address);
> >> +  if (pmd_none(*pmd))
> >> +  return 0;
> >> +  if (unlikely(pmd_bad(*pmd)))
> >> +  return 0;
> >> +
> >> +  ptep = pte_offset_map_lock(mm, pmd, address,&ptl);
> >> +  pte = *ptep;
> >> +  if (!pte_present(pte))
> >> +  goto bad;
> >> +  if (write&&  !pte_write(pte))
> >> +  goto bad;
> >> +
> >> +  pfn = pte_pfn(pte);
> >> +  pte_unmap_unlock(ptep, ptl);
> >> +  return pfn;
> >> + bad:
> >> +  pte_unmap_unlock(ptep, ptl);
> >> +  return 0;
> >> +}
> >> +
> >> +int ib_get_io_pfn(struct task_struct *tsk, struct mm_struct *mm,
> >> +unsigned long start, int len, int write, int force,
> >> +unsigned long *pfn_list, struct vm_area_struct **vmas)
> >> +{
> >> +  unsigned long pfn;
> >> +  int i;
> >> +  if (len<= 0)
> >> +  return 0;
> >> +
> >> +  i = 0;
> >> +  do {
> >> +  struct vm_area_struct *vma;
> >> +
> >> +  vma = find_vma(mm, start);
> >> +  if (0 == (vma->vm_flags&  VM_PFNMAP))
> >> +  return -EINVAL;
> >>  
> > Style nit: I would use ! instead of 0 ==
> >
> >
> 
> ok.
> 
> >> +  if (0 == (vma->vm_flags&  VM_IO))
&

Re: [RFC PATCH 2/4] uverbs: Add common ib_iomem_get service

2010-07-29 Thread Ralph Campbell
On Thu, 2010-07-29 at 09:25 -0700, Tom Tucker wrote:
> From: Tom Tucker 
> 
> Add an ib_iomem_get service that converts a vma to an array of
> physical addresses. This makes it easier for each device driver to
> add support for the reg_io_mr provider method.
> 
> Signed-off-by: Tom Tucker 
> ---
> 
>  drivers/infiniband/core/umem.c |  248 
> ++--
>  include/rdma/ib_umem.h |   14 ++
>  2 files changed, 251 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 415e186..f103956 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
...
> @@ -292,3 +295,226 @@ int ib_umem_page_count(struct ib_umem *umem)
>   return n;
>  }
>  EXPORT_SYMBOL(ib_umem_page_count);
> +/*
> + * Return the PFN for the specified address in the vma. This only
> + * works for a vma that is VM_PFNMAP.
> + */
> +static unsigned long follow_io_pfn(struct vm_area_struct *vma,
> +unsigned long address, int write)
> +{
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *ptep, pte;
> + spinlock_t *ptl;
> + unsigned long pfn;
> + struct mm_struct *mm = vma->vm_mm;
> +
> + BUG_ON(0 == (vma->vm_flags & VM_PFNMAP));

Why use BUG_ON?
WARN_ON is more appropriate but
if (!(vma->vm_flags & VM_PFNMAP))
return 0;
seems better.
In fact, move it outside the inner do loop in ib_get_io_pfn().

> + pgd = pgd_offset(mm, address);
> + if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
> + return 0;
> +
> + pud = pud_offset(pgd, address);
> + if (pud_none(*pud))
> + return 0;
> + if (unlikely(pud_bad(*pud)))
> + return 0;
> +
> + pmd = pmd_offset(pud, address);
> + if (pmd_none(*pmd))
> + return 0;
> + if (unlikely(pmd_bad(*pmd)))
> + return 0;
> +
> + ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> + pte = *ptep;
> + if (!pte_present(pte))
> + goto bad;
> + if (write && !pte_write(pte))
> + goto bad;
> +
> + pfn = pte_pfn(pte);
> + pte_unmap_unlock(ptep, ptl);
> + return pfn;
> + bad:
> + pte_unmap_unlock(ptep, ptl);
> + return 0;
> +}
> +
> +int ib_get_io_pfn(struct task_struct *tsk, struct mm_struct *mm,
> +   unsigned long start, int len, int write, int force,
> +   unsigned long *pfn_list, struct vm_area_struct **vmas)
> +{
> + unsigned long pfn;
> + int i;
> + if (len <= 0)
> + return 0;
> +
> + i = 0;
> + do {
> + struct vm_area_struct *vma;
> +
> + vma = find_vma(mm, start);
> + if (0 == (vma->vm_flags & VM_PFNMAP))
> + return -EINVAL;

Style nit: I would use ! instead of 0 ==

> + if (0 == (vma->vm_flags & VM_IO))
> + return -EFAULT;
> +
> + if (is_vm_hugetlb_page(vma))
> + return -EFAULT;
> +
> + do {
> + cond_resched();
> + pfn = follow_io_pfn(vma, start, write);
> + if (!pfn)
> + return -EFAULT;
> + if (pfn_list)
> + pfn_list[i] = pfn;
> + if (vmas)
> + vmas[i] = vma;
> + i++;
> + start += PAGE_SIZE;
> + len--;
> + } while (len && start < vma->vm_end);
> + } while (len);
> + return i;
> +}
> +
> +/**
> + * ib_iomem_get - DMA map a userspace map of IO memory.
> + * @context: userspace context to map memory for
> + * @addr: userspace virtual address to start at
> + * @size: length of region to map
> + * @access: IB_ACCESS_xxx flags for memory being mapped
> + * @dmasync: flush in-flight DMA when the memory region is written
> + */
> +struct ib_umem *ib_iomem_get(struct ib_ucontext *context, unsigned long addr,
> +  size_t size, int access, int dmasync)
> +{
> + struct ib_umem *umem;
> + unsigned long *pfn_list;
> + struct ib_umem_chunk *chunk;
> + unsigned long locked;
> + unsigned long lock_limit;
> + unsigned long cur_base;
> + unsigned long npages;
> + int ret;
> + int off;
> + int i;
> + DEFINE_DMA_ATTRS(attrs);
> +
> + if (dmasync)
> + dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
> +
> + if (!can_do_mlock())
> + return ERR_PTR(-EPERM);
> +
> + umem = kmalloc(sizeof *umem, GFP_KERNEL);
> + if (!umem)
> + return ERR_PTR(-ENOMEM);
> +
> + umem->type  = IB_UMEM_IO_MAP;
> + umem->context   = context;
> + umem->length= size;
> + umem->offset= addr & ~PAGE_MASK;
> + umem->page_size = PAGE_SIZE;
> + /*
> +  * We ask for writable memory if any access flags other than

Re: About a shortcoming of the verbs API

2010-07-28 Thread Ralph Campbell
On Wed, 2010-07-28 at 11:16 -0700, Roland Dreier wrote:
> >  > Actually, I tried to implement the completion callback
>  >  > in a workqueue thread but ipoib_cm_handle_tx_wc() calls
>  >  > netif_tx_lock() which isn't safe unless it is called
>  >  > from an IRQ handler or netif_tx_lock_bh() is called first.
> 
>  > Oh, sounds like a bug in IPoIB.  I guess we could fix it by just
>  > changing it to netif_tx_lock_bh()?  (Or is that not safe from an IRQ 
> handler?)
> 
> Wait, is this still a problem with IPoIB?  As far as I can tell, the
> IPoIB completion handlers don't do anything except enable the NAPI poll
> routine or the transmit ring timer (ie they just do napi_schedule() or
> mod_timer()), so the context that the CQ callback is called in doesn't
> matter.  In particular I don't see any way ipoib_cm_handle_tx_wc() could
> be reached except from the NAPI polling loop.
> 
>  - R.

I don't remember now whether I hit the problem in a backported IPoIB
or in a recent kernel but I did need to single thread and call
local_bh_disable() for completion callbacks or I would get deadlocks.
I just assumed that ULPs were being written with that as a requirement.

This is what makes understanding the "locking conventions" for
IPoIB really complex. Sometimes you need a lock and sometimes
you don't depending on the state of the network stack.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About a shortcoming of the verbs API

2010-07-28 Thread Ralph Campbell
On Wed, 2010-07-28 at 11:05 -0700, Roland Dreier wrote:
> > Actually, I tried to implement the completion callback
>  > in a workqueue thread but ipoib_cm_handle_tx_wc() calls
>  > netif_tx_lock() which isn't safe unless it is called
>  > from an IRQ handler or netif_tx_lock_bh() is called first.
> 
> Oh, sounds like a bug in IPoIB.  I guess we could fix it by just
> changing it to netif_tx_lock_bh()?  (Or is that not safe from an IRQ handler?)

netif_tx_lock_bh() is an inline function for
local_bh_disable();
netif_tx_lock();

so I meant to say local_bh_disable(), not netif_tx_lock_bh().

Basically, we would need a "irqsave" version of netif_tx_lock()
so that it could be called from either IRQ or non-IRQ context
and save/restore the prior state.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About a shortcoming of the verbs API

2010-07-28 Thread Ralph Campbell
Actually, I tried to implement the completion callback
in a workqueue thread but ipoib_cm_handle_tx_wc() calls
netif_tx_lock() which isn't safe unless it is called
from an IRQ handler or netif_tx_lock_bh() is called first.

On Wed, 2010-07-28 at 10:42 -0700, Roland Dreier wrote:
> > - Some time ago I observed that the kernel reported soft lockups
>  > because of spin_lock() calls inside a completion handler. These
>  > spinlocks were not locked in any other context than the completion
>  > handler itself. And the lockups disappeared after having replaced the
>  > spin_lock() calls by spin_lock_irqsave(). Can it be concluded from
>  > this observation that completion handlers are not always invoked from
>  > interrupt context ?
> 
> Did you get a soft lockup report or a lockdep report?  Anyway, the very
> next paragraph of the documentation I quoted says:
> 
>   The context in which completion event and asynchronous event
>   callbacks run is not defined.  Depending on the low-level driver, it
>   may be process context, softirq context, or interrupt context.
>   Upper level protocol consumers may not sleep in a callback.
> 
> So yes, it is possible that a completion callback gets called in
> non-interrupt context.
> 
> However as far as I know, at least mthca and mlx4 only call completion
> callbacks from the interrupt handler.  But without the actual code in
> question it's hard to know what the real problem was.
> 
>  - R.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: set cfgctxts to number of CPUs by default

2010-07-21 Thread Ralph Campbell
Up to now, we have set the number of available user contexts based on
the number of hardware contexts which is set according to the number
of available CPUs. This was fine since most CPUs had a power of two
number of cores and the chip supported 4, 8, or 16 user contexts.
Now that some systems have 12 cores, the default isn't optimal and
should be set to 12 even though 16 hardware contexts need to be enabled.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_iba7322.c |2 +-
 drivers/infiniband/hw/qib/qib_init.c|2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c 
b/drivers/infiniband/hw/qib/qib_iba7322.c
index 5eedf83..9031cd8 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -5864,7 +5864,7 @@ static void write_7322_initregs(struct qib_devdata *dd)
 * Doesn't clear any of the error bits that might be set.
 */
val = TIDFLOW_ERRBITS; /* these are W1C */
-   for (i = 0; i < dd->ctxtcnt; i++) {
+   for (i = 0; i < dd->cfgctxts; i++) {
int flow;
for (flow = 0; flow < NUM_TIDFLOWS_CTXT; flow++)
qib_write_ureg(dd, ur_rcvflowtable+flow, val, i);
diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index a873dd5..f1d16d3 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -93,7 +93,7 @@ unsigned long *qib_cpulist;
 void qib_set_ctxtcnt(struct qib_devdata *dd)
 {
if (!qib_cfgctxts)
-   dd->cfgctxts = dd->ctxtcnt;
+   dd->cfgctxts = dd->first_user_ctxt + num_online_cpus();
else if (qib_cfgctxts < dd->num_pports)
dd->cfgctxts = dd->ctxtcnt;
else if (qib_cfgctxts <= dd->ctxtcnt)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] ib_qib: Allow writes to the diag_counters to be able to clear them

2010-07-21 Thread Ralph Campbell
Acked-by: Ralph Campbell 

On Tue, 2010-07-13 at 18:53 -0700, Ira Weiny wrote:
> From: Ira Weiny 
> Date: Wed, 7 Jul 2010 17:35:34 -0700
> Subject: [PATCH] ib_qib: Allow writes to the diag_counters to be able to 
> clear them
> 
> Changes in V3:
>   Add non-number error check
>   Return "proper" proper length
> 
> Changes in V2:
>   Add check for negative values
>   Return proper length
> 
> Signed-off-by: Ira Weiny 
> ---
>  drivers/infiniband/hw/qib/qib_sysfs.c |   21 -
>  1 files changed, 20 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qib/qib_sysfs.c 
> b/drivers/infiniband/hw/qib/qib_sysfs.c
> index dab4d9f..b214eff 100644
> --- a/drivers/infiniband/hw/qib/qib_sysfs.c
> +++ b/drivers/infiniband/hw/qib/qib_sysfs.c
> @@ -347,7 +347,7 @@ static struct kobj_type qib_sl2vl_ktype = {
>  
>  #define QIB_DIAGC_ATTR(N) \
>   static struct qib_diagc_attr qib_diagc_attr_##N = { \
> - .attr = { .name = __stringify(N), .mode = 0444 }, \
> + .attr = { .name = __stringify(N), .mode = 0664 }, \
>   .counter = offsetof(struct qib_ibport, n_##N) \
>   }
>  
> @@ -403,8 +403,27 @@ static ssize_t diagc_attr_show(struct kobject *kobj, 
> struct attribute *attr,
>   return sprintf(buf, "%u\n", *(u32 *)((char *)qibp + dattr->counter));
>  }
>  
> +static ssize_t diagc_attr_store(struct kobject *kobj, struct attribute *attr,
> + const char *buf, size_t size)
> +{
> + struct qib_diagc_attr *dattr =
> + container_of(attr, struct qib_diagc_attr, attr);
> + struct qib_pportdata *ppd =
> + container_of(kobj, struct qib_pportdata, diagc_kobj);
> + struct qib_ibport *qibp = &ppd->ibport_data;
> + char *endp;
> + long val = simple_strtol(buf, &endp, 0);
> +
> + if (val < 0 || endp == buf)
> + return -EINVAL;
> +
> + *(u32 *)((char *)qibp + dattr->counter) = (u32)val;
> + return size;
> +}
> +
>  static const struct sysfs_ops qib_diagc_ops = {
>   .show = diagc_attr_show,
> + .store = diagc_attr_store,
>  };
>  
>  static struct kobj_type qib_diagc_ktype = {


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IB/ipoib: fix dangling pointer reference to ipoib_neigh and ipoib_path -when will it go upstream?

2010-07-16 Thread Ralph Campbell
On Fri, 2010-07-16 at 02:13 -0700, Pradeep Satyanarayana wrote:
> Ralph Campbell wrote:
> > On Thu, 2010-07-15 at 04:56 -0700, Pradeep Satyanarayana wrote:
> >> Pradeep Satyanarayana wrote:
> >>> Pradeep Satyanarayana wrote:
> >>>> Roland Dreier wrote:
> >>>>>  > I guess I came to a premature conclusion. One set of tests ran fine 
> >>>>> and I made that
> >>>>>  > conclusion. Another set of tests caused the following crash:
> >>>>>
> >>>>> I don't really know how to interpret this.  Is this crash new, or is it
> >>>>> the same crash you were hoping this patch fixed?
> >>>> This is a new crash.
> >>> I see other manifestations resulting in different crashes :
> >>>
> >>> :mon> t
> >>> [c0074603ba20] d000193527ac .ipoib_neigh_flush+0x6c/0x350 
> >>> [ib_ipoib]
> >>> [c0074603bb10] d00019356dac .ipoib_mcast_free+0x74/0x2a0 
> >>> [ib_ipoib]
> >>> [c0074603bbe0] d00019358558 .ipoib_mcast_restart_task+0x3d0/0x560 
> >>> [ib_ipoib]
> >>> [c0074603bd40] c00c6fe4 .run_workqueue+0xf4/0x1e0
> >>> [c0074603be00] c00c7190 .worker_thread+0xc0/0x180
> >>> [c0074603bed0] c00ccf4c .kthread+0xb4/0xc0
> >>> [c0074603bf90] c00309fc .kernel_thread+0x54/0x70
> >>> 9:mon> e
> >>> cpu 0x9: Vector: 300 (Data Access) at [c0074603b720]
> >>> pc: c05ac390: ._spin_lock+0x20/0xc8
> >>> lr: d000193527ac: .ipoib_neigh_flush+0x6c/0x350 [ib_ipoib]
> >>> sp: c0074603b9a0
> >>>msr: 80009032
> >>>dar: 3a0
> >>>  dsisr: 4000
> >>>   current = 0xc00756ce8b00
> >>>   paca= 0xc0f63800
> >>> pid   = 18095, comm = ipoib
> >>> 9:mon>
> >> Recreating the crash has been tricky. I have tried several several hundred 
> >> times today
> >> to unload and reload IPoIB while there is traffic and no crashes happened. 
> >> I took
> >> a closer look at the IPoIB CM code and I see a few things that look 
> >> suspicious.
> >>
> >> In the ipoib_cm_send() path no priv->lock is held, whereas the priv->lock 
> >> is held before 
> >> calling ipoib_cm_destroy_tx(). This is true with and without Ralph's patch 
> >> (fix dangling pointer).
> >> Is this a potential race?
> > 
> > ipoib_cm_send() is only called by ipoib_start_xmit() so it is protected
> > by netif_tx_lock(dev) or stopping the ipoib network device.
> 
> I still see one case in ipoib_neigh_cleanup() wherein ipoib_cm_destroy_tx() 
> appears to be called
> without netif_tx_lock(dev) held. Is that correct?
> 
> Thanks
> Pradeep

ipoib_neigh_cleanup() is called by neigh_cleanup_and_release() when
freeing a struct neighbour. I assume the Linux network stack is
not going to call into the IPoIB driver to send sk_buffs in that
case but I could be wrong. If it can, then you are correct that
the netif_tx_lock(dev) should be acquired.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IB/ipoib: fix dangling pointer reference to ipoib_neigh and ipoib_path -when will it go upstream?

2010-07-15 Thread Ralph Campbell
On Thu, 2010-07-15 at 04:56 -0700, Pradeep Satyanarayana wrote:
> Pradeep Satyanarayana wrote:
> > Pradeep Satyanarayana wrote:
> >> Roland Dreier wrote:
> >>>  > I guess I came to a premature conclusion. One set of tests ran fine 
> >>> and I made that
> >>>  > conclusion. Another set of tests caused the following crash:
> >>>
> >>> I don't really know how to interpret this.  Is this crash new, or is it
> >>> the same crash you were hoping this patch fixed?
> >> This is a new crash.
> > 
> > I see other manifestations resulting in different crashes :
> > 
> > :mon> t
> > [c0074603ba20] d000193527ac .ipoib_neigh_flush+0x6c/0x350 [ib_ipoib]
> > [c0074603bb10] d00019356dac .ipoib_mcast_free+0x74/0x2a0 [ib_ipoib]
> > [c0074603bbe0] d00019358558 .ipoib_mcast_restart_task+0x3d0/0x560 
> > [ib_ipoib]
> > [c0074603bd40] c00c6fe4 .run_workqueue+0xf4/0x1e0
> > [c0074603be00] c00c7190 .worker_thread+0xc0/0x180
> > [c0074603bed0] c00ccf4c .kthread+0xb4/0xc0
> > [c0074603bf90] c00309fc .kernel_thread+0x54/0x70
> > 9:mon> e
> > cpu 0x9: Vector: 300 (Data Access) at [c0074603b720]
> > pc: c05ac390: ._spin_lock+0x20/0xc8
> > lr: d000193527ac: .ipoib_neigh_flush+0x6c/0x350 [ib_ipoib]
> > sp: c0074603b9a0
> >msr: 80009032
> >dar: 3a0
> >  dsisr: 4000
> >   current = 0xc00756ce8b00
> >   paca= 0xc0f63800
> > pid   = 18095, comm = ipoib
> > 9:mon>
> 
> Recreating the crash has been tricky. I have tried several several hundred 
> times today
> to unload and reload IPoIB while there is traffic and no crashes happened. I 
> took
> a closer look at the IPoIB CM code and I see a few things that look 
> suspicious.
> 
> In the ipoib_cm_send() path no priv->lock is held, whereas the priv->lock is 
> held before 
> calling ipoib_cm_destroy_tx(). This is true with and without Ralph's patch 
> (fix dangling pointer).
> Is this a potential race?

ipoib_cm_send() is only called by ipoib_start_xmit() so it is protected
by netif_tx_lock(dev) or stopping the ipoib network device.
It all depends on what pointer or data structure you think is being
accessed while free or being modified without the proper protection.

> In Roland's git tree I do see a test_and_clear_bit(IPOIB_FLAG_INITIALIZED, 
> &tx->flags) in 
> ipoib_cm_destroy_tx() which seems to be missing in Ralph's patch. In Ralph's 
> patch) there is a 
> clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags) called before calling 
> ipoib_cm_destroy_tx() only in 
> select cases. Was that intended?

The v4 patch comments explain the changes:
http://www.spinics.net/lists/linux-rdma/msg03733.html
Basically, IPOIB_FLAG_INITIALIZED now means that the struct ipoib_cm_tx
has completed the RC QP creation process via the CM instead of simply
when ipoib_cm_create_tx() allocates the structure.
The test and clear was used to indicate the struct ipoib_cm_tx
had been put on the destroy list and the reaper thread woken up.
Now ipoib_cm_destroy_tx() uses the tx->neigh pointer != NULL to
indicate that ipoib_cm_destroy_tx() has started the destroy process.
ipoib_cm_destroy_tx() is only called when netif_tx_lock() and priv->lock
are held to protect tx->neigh.

> Thanks
> Pradeep

The longer write up on locking is turning out to be very complex.
I will keep working on it but I think it will be just as hard
to understand as slogging through the code.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IB/ipoib: fix dangling pointer reference to ipoib_neigh and ipoib_path -when will it go upstream?

2010-07-15 Thread Ralph Campbell
I will write up a description of the locking as I
understand it and the changes I made. Give
me a day or two to write it up and check it.

On Thu, 2010-07-15 at 04:56 -0700, Pradeep Satyanarayana wrote:
> Pradeep Satyanarayana wrote:
> > Pradeep Satyanarayana wrote:
> >> Roland Dreier wrote:
> >>>  > I guess I came to a premature conclusion. One set of tests ran fine 
> >>> and I made that
> >>>  > conclusion. Another set of tests caused the following crash:
> >>>
> >>> I don't really know how to interpret this.  Is this crash new, or is it
> >>> the same crash you were hoping this patch fixed?
> >> This is a new crash.
> > 
> > I see other manifestations resulting in different crashes :
> > 
> > :mon> t
> > [c0074603ba20] d000193527ac .ipoib_neigh_flush+0x6c/0x350 [ib_ipoib]
> > [c0074603bb10] d00019356dac .ipoib_mcast_free+0x74/0x2a0 [ib_ipoib]
> > [c0074603bbe0] d00019358558 .ipoib_mcast_restart_task+0x3d0/0x560 
> > [ib_ipoib]
> > [c0074603bd40] c00c6fe4 .run_workqueue+0xf4/0x1e0
> > [c0074603be00] c00c7190 .worker_thread+0xc0/0x180
> > [c0074603bed0] c00ccf4c .kthread+0xb4/0xc0
> > [c0074603bf90] c00309fc .kernel_thread+0x54/0x70
> > 9:mon> e
> > cpu 0x9: Vector: 300 (Data Access) at [c0074603b720]
> > pc: c05ac390: ._spin_lock+0x20/0xc8
> > lr: d000193527ac: .ipoib_neigh_flush+0x6c/0x350 [ib_ipoib]
> > sp: c0074603b9a0
> >msr: 80009032
> >dar: 3a0
> >  dsisr: 4000
> >   current = 0xc00756ce8b00
> >   paca= 0xc0f63800
> > pid   = 18095, comm = ipoib
> > 9:mon>
> 
> Recreating the crash has been tricky. I have tried several several hundred 
> times today
> to unload and reload IPoIB while there is traffic and no crashes happened. I 
> took
> a closer look at the IPoIB CM code and I see a few things that look 
> suspicious.
> 
> In the ipoib_cm_send() path no priv->lock is held, whereas the priv->lock is 
> held before 
> calling ipoib_cm_destroy_tx(). This is true with and without Ralph's patch 
> (fix dangling pointer).
> Is this a potential race?
> 
> In Roland's git tree I do see a test_and_clear_bit(IPOIB_FLAG_INITIALIZED, 
> &tx->flags) in 
> ipoib_cm_destroy_tx() which seems to be missing in Ralph's patch. In Ralph's 
> patch) there is a 
> clear_bit(IPOIB_FLAG_OPER_UP, &tx->flags) called before calling 
> ipoib_cm_destroy_tx() only in 
> select cases. Was that intended?
> 
> Thanks
> Pradeep
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IB/ipoib: fix dangling pointer reference to ipoib_neigh and ipoib_path -when will it go upstream?

2010-07-13 Thread Ralph Campbell
On Mon, 2010-07-12 at 03:35 -0700, Bart Van Assche wrote:
> On Mon, Jul 12, 2010 at 12:20 PM, Pradeep Satyanarayana
>  wrote:
> > Bart Van Assche wrote:
> >> On Mon, Jul 12, 2010 at 6:57 AM, Pradeep Satyanarayana
> >>  wrote:
> >>> I realize that the following patch:
> >>>
> >>> https://patchwork.kernel.org/patch/97243/
> >>>
> >>> is queued in your backlog of patches and unlikely that it will go into 
> >>> 2.6.35.
> >>> What are the chances that it will make it into 2.6.36? This patch has 
> >>> fixed a
> >>> a rarely seen crash and we would like it to go upstream ASAP.
> >>
> >> The following comment was made on that patch by Ralph Campbell (see
> >> also http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg03125.html):
> >>
> >> "Quite right. I should also use list_for_each_entry_safe(). I will fix 
> >> this."
> >>
> >> This makes me wonder whether version three of this patch can go in 
> >> unmodified ?
> >
> > There was a version 4 that followed. That was what I was referring to.
> 
> Thanks for the info -- I had missed version four of that patch. Now
> that I had a look at it, why does the comment above
> ipoib_cm_flush_path() say that it removes all entries while the loop
> inside that function is stopped after the first entry has been found
> and removed ? Why does that function use list_for_each_entry_safe()
> while only a single entry is removed ?
> 
> Bart.

There is only one matching entry on the list at any given time
so I guess list_for_each_entry_safe() isn't required.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/36] drivers/infiniband: Remove unnecessary casts of private_data

2010-07-13 Thread Ralph Campbell
Acked-by: Ralph Campbell 

On Mon, 2010-07-12 at 13:50 -0700, Joe Perches wrote:
> Signed-off-by: Joe Perches 
> ---
>  drivers/infiniband/hw/ipath/ipath_file_ops.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c 
> b/drivers/infiniband/hw/ipath/ipath_file_ops.c
> index 9c5c66d..65eb892 100644
> --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
> +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
> @@ -2055,7 +2055,7 @@ static int ipath_close(struct inode *in, struct file 
> *fp)
>  
>   mutex_lock(&ipath_mutex);
>  
> - fd = (struct ipath_filedata *) fp->private_data;
> + fd = fp->private_data;
>   fp->private_data = NULL;
>   pd = fd->pd;
>   if (!pd) {


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: if qib_init() fails, driver fails to clean up properly

2010-07-01 Thread Ralph Campbell
If qib_init() fails, the driver fails to free memory, unregister device
files, and unregister with the PCIe framework. The driver will unload
without error but a subsequent driver load will cause the system to panic.
This was found by changing the 7220 code to load the serdes microcode
separately and not installing the microcode file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_init.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index 7831ff8..a873dd5 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1289,8 +1289,18 @@ static int __devinit qib_init_one(struct pci_dev *pdev,
 
if (qib_mini_init || initfail || ret) {
qib_stop_timers(dd);
+   flush_scheduled_work();
for (pidx = 0; pidx < dd->num_pports; ++pidx)
dd->f_quiet_serdes(dd->pport + pidx);
+   if (qib_mini_init)
+   goto bail;
+   if (!j) {
+   (void) qibfs_remove(dd);
+   qib_device_remove(dd);
+   }
+   if (!ret)
+   qib_unregister_ib_device(dd);
+   qib_postinit_cleanup(dd);
if (initfail)
ret = initfail;
goto bail;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/7] various fixes for QIB driver

2010-06-23 Thread Ralph Campbell
The patches are all independent.

In the future, I will separate them if I think they are going into different
kernel versions.

Thanks!

From: Roland Dreier [rdre...@cisco.com]
Sent: Wednesday, June 23, 2010 11:14 AM
To: Ralph Campbell
Cc: linux-rdma@vger.kernel.org
Subject: Re: [PATCH 0/7] various fixes for QIB driver

 > The following patches are for various bug fixes.
 > I'm not sure what counts as a regression for code that is newly introduced.
 > I'm hoping that all except #2 can be made for 2.6.35 whereas
 > #2 can wait for 2.6.36 since it is actually a feature.

All except #2 look OK for 2.6.35.  I'll hold #2 for 2.6.36 -- I hope
it's independent?

In the future it might be cleaner to send a series 1-6 of fixes for
2.6.35 and then send the port assignment one as a 2.6.36 patch separate
from the series.  (No need to resend here)

 - R.
--
Roland Dreier  || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: turn off IB latency mode

2010-06-23 Thread Ralph Campbell
Turn off IB latency mode. This improves link quality for slower
process chips.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_iba7322.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c 
b/drivers/infiniband/hw/qib/qib_iba7322.c
index 5eedf83..fc14ef8 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -7271,6 +7271,8 @@ static int serdes_7322_init(struct qib_pportdata *ppd)
ibsd_wr_allchans(ppd, 20, (4 << 13), BMASK(15, 13)); /* SDR */
 
data = qib_read_kreg_port(ppd, krp_serdesctrl);
+   /* Turn off IB latency mode */
+   data &= ~SYM_MASK(IBSerdesCtrl_0, IB_LAT_MODE);
qib_write_kreg_port(ppd, krp_serdesctrl, data |
SYM_MASK(IBSerdesCtrl_0, RXLOSEN));
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] IB/qib: completion queue callback needs to be single threaded

2010-06-17 Thread Ralph Campbell
Workqueues aren't exactly equivalent to tasklets since the callback
function may be called from multiple CPUs before the callback returns.
This causes completion notification callbacks to have MT bugs since
they weren't expecting this behavior. The fix is to use a single
threaded work queue.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_init.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index 1d4db4b..7831ff8 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1059,7 +1059,7 @@ static int __init qlogic_ib_init(void)
goto bail_dev;
}
 
-   qib_cq_wq = create_workqueue("qib_cq");
+   qib_cq_wq = create_singlethread_workqueue("qib_cq");
if (!qib_cq_wq) {
ret = -ENOMEM;
goto bail_wq;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] IB/qib: update 7322 serdes tables

2010-06-17 Thread Ralph Campbell
Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_iba7322.c |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c 
b/drivers/infiniband/hw/qib/qib_iba7322.c
index 8ee0ac6..5eedf83 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -543,7 +543,7 @@ struct vendor_txdds_ent {
 static void write_tx_serdes_param(struct qib_pportdata *, struct txdds_ent *);
 
 #define TXDDS_TABLE_SZ 16 /* number of entries per speed in onchip table */
-#define TXDDS_EXTRA_SZ 11 /* number of extra tx settings entries */
+#define TXDDS_EXTRA_SZ 13 /* number of extra tx settings entries */
 #define SERDES_CHANS 4 /* yes, it's obvious, but one less magic number */
 
 #define H1_FORCE_VAL 8
@@ -5629,6 +5629,8 @@ static void set_no_qsfp_atten(struct qib_devdata *dd, int 
change)
if (ppd->port != port || !ppd->link_speed_supported)
continue;
ppd->cpspec->no_eep = val;
+   if (seth1)
+   ppd->cpspec->h1_val = h1;
/* now change the IBC and serdes, overriding generic */
init_txdds_table(ppd, 1);
any++;
@@ -6069,9 +6071,9 @@ static int qib_init_7322_variables(struct qib_devdata *dd)
 * the "cable info" setup here.  Can be overridden
 * in adapter-specific routines.
 */
-   if (!(ppd->dd->flags & QIB_HAS_QSFP)) {
-   if (!IS_QMH(ppd->dd) && !IS_QME(ppd->dd))
-   qib_devinfo(ppd->dd->pcidev, "IB%u:%u: "
+   if (!(dd->flags & QIB_HAS_QSFP)) {
+   if (!IS_QMH(dd) && !IS_QME(dd))
+   qib_devinfo(dd->pcidev, "IB%u:%u: "
"Unknown mezzanine card type\n",
dd->unit, ppd->port);
cp->h1_val = IS_QMH(dd) ? H1_FORCE_QMH : H1_FORCE_QME;
@@ -6953,6 +6955,8 @@ static const struct txdds_ent 
txdds_extra_sdr[TXDDS_EXTRA_SZ] = {
{  0, 0, 0, 11 },   /* QME7342 backplane settings */
{  0, 0, 0, 11 },   /* QME7342 backplane settings */
{  0, 0, 0, 11 },   /* QME7342 backplane settings */
+   {  0, 0, 0,  3 },   /* QMH7342 backplane settings */
+   {  0, 0, 0,  4 },   /* QMH7342 backplane settings */
 };
 
 static const struct txdds_ent txdds_extra_ddr[TXDDS_EXTRA_SZ] = {
@@ -6968,6 +6972,8 @@ static const struct txdds_ent 
txdds_extra_ddr[TXDDS_EXTRA_SZ] = {
{  0, 0, 0, 13 },   /* QME7342 backplane settings */
{  0, 0, 0, 13 },   /* QME7342 backplane settings */
{  0, 0, 0, 13 },   /* QME7342 backplane settings */
+   {  0, 0, 0,  9 },   /* QMH7342 backplane settings */
+   {  0, 0, 0, 10 },   /* QMH7342 backplane settings */
 };
 
 static const struct txdds_ent txdds_extra_qdr[TXDDS_EXTRA_SZ] = {
@@ -6983,6 +6989,8 @@ static const struct txdds_ent 
txdds_extra_qdr[TXDDS_EXTRA_SZ] = {
{  0, 1, 12,  6 },  /* QME7342 backplane setting */
{  0, 1, 12,  7 },  /* QME7342 backplane setting */
{  0, 1, 12,  8 },  /* QME7342 backplane setting */
+   {  0, 1,  0, 10 },  /* QMH7342 backplane settings */
+   {  0, 1,  0, 12 },  /* QMH7342 backplane settings */
 };
 
 static const struct txdds_ent *get_atten_table(const struct txdds_ent *txdds,

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] IB/qib: clear 6120 hardware error register

2010-06-17 Thread Ralph Campbell
The hardware error register needs to be cleared or another interrupt
will be generated, thus causing an infinite loop.
This is a regression introduced when removing debug output.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_iba6120.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba6120.c 
b/drivers/infiniband/hw/qib/qib_iba6120.c
index 1eadadc..a5e29db 100644
--- a/drivers/infiniband/hw/qib/qib_iba6120.c
+++ b/drivers/infiniband/hw/qib/qib_iba6120.c
@@ -1355,8 +1355,7 @@ static int qib_6120_bringup_serdes(struct qib_pportdata 
*ppd)
hwstat = qib_read_kreg64(dd, kr_hwerrstatus);
if (hwstat) {
/* should just have PLL, clear all set, in an case */
-   if (hwstat & ~QLOGIC_IB_HWE_SERDESPLLFAILED)
-   qib_write_kreg(dd, kr_hwerrclear, hwstat);
+   qib_write_kreg(dd, kr_hwerrclear, hwstat);
qib_write_kreg(dd, kr_errclear, ERR_MASK(HardwareErr));
}
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] IB/qib: clear eager buffer memory for each new process

2010-06-17 Thread Ralph Campbell
The eager buffers are not being cleared before being mmapped into a new
user address space. This is a potential security risk and should be fixed.
Note that the eager header queue is already being cleared OK.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_init.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index 2589599..1d4db4b 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1472,6 +1472,9 @@ int qib_setup_eagerbufs(struct qib_ctxtdata *rcd)
dma_addr_t pa = rcd->rcvegrbuf_phys[chunk];
unsigned i;
 
+   /* clear for security and sanity on each use */
+   memset(rcd->rcvegrbuf[chunk], 0, size);
+
for (i = 0; e < egrcnt && i < egrperchunk; e++, i++) {
dd->f_put_tid(dd, e + egroff +
  (u64 __iomem *)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] IB/qib: mask hardware error during link reset

2010-06-17 Thread Ralph Campbell
The HCA checks for certain hardware errors which can be falsely
triggered when the IB link is reset. The fix is to mask them rather
than report them.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_7322_regs.h |   48 +++--
 drivers/infiniband/hw/qib/qib_iba7322.c   |9 -
 2 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_7322_regs.h 
b/drivers/infiniband/hw/qib/qib_7322_regs.h
index a97440b..32dc81f 100644
--- a/drivers/infiniband/hw/qib/qib_7322_regs.h
+++ b/drivers/infiniband/hw/qib/qib_7322_regs.h
@@ -742,15 +742,15 @@
 #define QIB_7322_HwErrMask_IBCBusFromSPCParityErrMask_1_LSB 0xF
 #define QIB_7322_HwErrMask_IBCBusFromSPCParityErrMask_1_MSB 0xF
 #define QIB_7322_HwErrMask_IBCBusFromSPCParityErrMask_1_RMASK 0x1
-#define QIB_7322_HwErrMask_statusValidNoEopMask_1_LSB 0xE
-#define QIB_7322_HwErrMask_statusValidNoEopMask_1_MSB 0xE
-#define QIB_7322_HwErrMask_statusValidNoEopMask_1_RMASK 0x1
+#define QIB_7322_HwErrMask_IBCBusToSPCParityErrMask_1_LSB 0xE
+#define QIB_7322_HwErrMask_IBCBusToSPCParityErrMask_1_MSB 0xE
+#define QIB_7322_HwErrMask_IBCBusToSPCParityErrMask_1_RMASK 0x1
 #define QIB_7322_HwErrMask_IBCBusFromSPCParityErrMask_0_LSB 0xD
 #define QIB_7322_HwErrMask_IBCBusFromSPCParityErrMask_0_MSB 0xD
 #define QIB_7322_HwErrMask_IBCBusFromSPCParityErrMask_0_RMASK 0x1
-#define QIB_7322_HwErrMask_statusValidNoEopMask_0_LSB 0xC
-#define QIB_7322_HwErrMask_statusValidNoEopMask_0_MSB 0xC
-#define QIB_7322_HwErrMask_statusValidNoEopMask_0_RMASK 0x1
+#define QIB_7322_HwErrMask_statusValidNoEopMask_LSB 0xC
+#define QIB_7322_HwErrMask_statusValidNoEopMask_MSB 0xC
+#define QIB_7322_HwErrMask_statusValidNoEopMask_RMASK 0x1
 #define QIB_7322_HwErrMask_LATriggeredMask_LSB 0xB
 #define QIB_7322_HwErrMask_LATriggeredMask_MSB 0xB
 #define QIB_7322_HwErrMask_LATriggeredMask_RMASK 0x1
@@ -796,15 +796,15 @@
 #define QIB_7322_HwErrStatus_IBCBusFromSPCParityErr_1_LSB 0xF
 #define QIB_7322_HwErrStatus_IBCBusFromSPCParityErr_1_MSB 0xF
 #define QIB_7322_HwErrStatus_IBCBusFromSPCParityErr_1_RMASK 0x1
-#define QIB_7322_HwErrStatus_statusValidNoEop_1_LSB 0xE
-#define QIB_7322_HwErrStatus_statusValidNoEop_1_MSB 0xE
-#define QIB_7322_HwErrStatus_statusValidNoEop_1_RMASK 0x1
+#define QIB_7322_HwErrStatus_IBCBusToSPCParityErr_1_LSB 0xE
+#define QIB_7322_HwErrStatus_IBCBusToSPCParityErr_1_MSB 0xE
+#define QIB_7322_HwErrStatus_IBCBusToSPCParityErr_1_RMASK 0x1
 #define QIB_7322_HwErrStatus_IBCBusFromSPCParityErr_0_LSB 0xD
 #define QIB_7322_HwErrStatus_IBCBusFromSPCParityErr_0_MSB 0xD
 #define QIB_7322_HwErrStatus_IBCBusFromSPCParityErr_0_RMASK 0x1
-#define QIB_7322_HwErrStatus_statusValidNoEop_0_LSB 0xC
-#define QIB_7322_HwErrStatus_statusValidNoEop_0_MSB 0xC
-#define QIB_7322_HwErrStatus_statusValidNoEop_0_RMASK 0x1
+#define QIB_7322_HwErrStatus_statusValidNoEop_LSB 0xC
+#define QIB_7322_HwErrStatus_statusValidNoEop_MSB 0xC
+#define QIB_7322_HwErrStatus_statusValidNoEop_RMASK 0x1
 #define QIB_7322_HwErrStatus_LATriggered_LSB 0xB
 #define QIB_7322_HwErrStatus_LATriggered_MSB 0xB
 #define QIB_7322_HwErrStatus_LATriggered_RMASK 0x1
@@ -850,15 +850,15 @@
 #define QIB_7322_HwErrClear_IBCBusFromSPCParityErrClear_1_LSB 0xF
 #define QIB_7322_HwErrClear_IBCBusFromSPCParityErrClear_1_MSB 0xF
 #define QIB_7322_HwErrClear_IBCBusFromSPCParityErrClear_1_RMASK 0x1
-#define QIB_7322_HwErrClear_IBCBusToSPCparityErrClear_1_LSB 0xE
-#define QIB_7322_HwErrClear_IBCBusToSPCparityErrClear_1_MSB 0xE
-#define QIB_7322_HwErrClear_IBCBusToSPCparityErrClear_1_RMASK 0x1
+#define QIB_7322_HwErrClear_IBCBusToSPCParityErrClear_1_LSB 0xE
+#define QIB_7322_HwErrClear_IBCBusToSPCParityErrClear_1_MSB 0xE
+#define QIB_7322_HwErrClear_IBCBusToSPCParityErrClear_1_RMASK 0x1
 #define QIB_7322_HwErrClear_IBCBusFromSPCParityErrClear_0_LSB 0xD
 #define QIB_7322_HwErrClear_IBCBusFromSPCParityErrClear_0_MSB 0xD
 #define QIB_7322_HwErrClear_IBCBusFromSPCParityErrClear_0_RMASK 0x1
-#define QIB_7322_HwErrClear_IBCBusToSPCparityErrClear_0_LSB 0xC
-#define QIB_7322_HwErrClear_IBCBusToSPCparityErrClear_0_MSB 0xC
-#define QIB_7322_HwErrClear_IBCBusToSPCparityErrClear_0_RMASK 0x1
+#define QIB_7322_HwErrClear_statusValidNoEopClear_LSB 0xC
+#define QIB_7322_HwErrClear_statusValidNoEopClear_MSB 0xC
+#define QIB_7322_HwErrClear_statusValidNoEopClear_RMASK 0x1
 #define QIB_7322_HwErrClear_LATriggeredClear_LSB 0xB
 #define QIB_7322_HwErrClear_LATriggeredClear_MSB 0xB
 #define QIB_7322_HwErrClear_LATriggeredClear_RMASK 0x1
@@ -880,15 +880,15 @@
 #define QIB_7322_HwDiagCtrl_ForceIBCBusFromSPCParityErr_1_LSB 0xF
 #define QIB_7322_HwDiagCtrl_ForceIBCBusFromSPCParityErr_1_MSB 0xF
 #define QIB_7322_HwDiagCtrl_ForceIBCBusFromSPCParityErr_1_RMASK 0x1
-#define QIB_7322_HwDiagCtrl_ForcestatusValidNoEop_1_LSB 0xE
-#define QIB_7322_HwDiagCtrl_ForcestatusValidNoEop_1_MSB 0xE
-#define QIB_7322_HwDiagCtrl_ForcestatusValidNoEop_1_RMASK 0x1
+#define QIB_7322_HwDiagCtrl_ForceIBCBusToSPCParityErr_1_LSB 0xE
+#define

[PATCH 2/7] IB/qib: allow PSM to select from multiple port assignment algorithms

2010-06-17 Thread Ralph Campbell
From: Dave Olson 

We formerly allowed only full specification, or using all contexts
within an HCA before moving to the next HCA.  We now allow an additional
method, of round-robining through HCAs, and make that the default.

Signed-off-by: Dave Olson 
---

 drivers/infiniband/hw/qib/qib_common.h   |   16 ++
 drivers/infiniband/hw/qib/qib_file_ops.c |  203 +++---
 2 files changed, 118 insertions(+), 101 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_common.h 
b/drivers/infiniband/hw/qib/qib_common.h
index b3955ed..145da40 100644
--- a/drivers/infiniband/hw/qib/qib_common.h
+++ b/drivers/infiniband/hw/qib/qib_common.h
@@ -279,7 +279,7 @@ struct qib_base_info {
  * may not be implemented; the user code must deal with this if it
  * cares, or it must abort after initialization reports the difference.
  */
-#define QIB_USER_SWMINOR 10
+#define QIB_USER_SWMINOR 11
 
 #define QIB_USER_SWVERSION ((QIB_USER_SWMAJOR << 16) | QIB_USER_SWMINOR)
 
@@ -302,6 +302,18 @@ struct qib_base_info {
 #define QIB_KERN_SWVERSION ((QIB_KERN_TYPE << 31) | QIB_USER_SWVERSION)
 
 /*
+ * If the unit is specified via open, HCA choice is fixed.  If port is
+ * specified, it's also fixed.  Otherwise we try to spread contexts
+ * across ports and HCAs, using different algorithims.  WITHIN is
+ * the old default, prior to this mechanism.
+ */
+#define QIB_PORT_ALG_ACROSS 0 /* round robin contexts across HCAs, then
+  * ports; this is the default */
+#define QIB_PORT_ALG_WITHIN 1 /* use all contexts on an HCA (round robin
+  * active ports within), then next HCA */
+#define QIB_PORT_ALG_COUNT 2 /* number of algorithm choices */
+
+/*
  * This structure is passed to qib_userinit() to tell the driver where
  * user code buffers are, sizes, etc.   The offsets and sizes of the
  * fields must remain unchanged, for binary compatibility.  It can
@@ -319,7 +331,7 @@ struct qib_user_info {
/* size of struct base_info to write to */
__u32 spu_base_info_size;
 
-   __u32 _spu_unused3;
+   __u32 spu_port_alg; /* which QIB_PORT_ALG_*; unused user minor < 11 */
 
/*
 * If two or more processes wish to share a context, each process
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c 
b/drivers/infiniband/hw/qib/qib_file_ops.c
index a142a9e..6b11645 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -1294,128 +1294,130 @@ bail:
return ret;
 }
 
-static inline int usable(struct qib_pportdata *ppd, int active_only)
+static inline int usable(struct qib_pportdata *ppd)
 {
struct qib_devdata *dd = ppd->dd;
-   u32 linkok = active_only ? QIBL_LINKACTIVE :
-(QIBL_LINKINIT | QIBL_LINKARMED | QIBL_LINKACTIVE);
 
return dd && (dd->flags & QIB_PRESENT) && dd->kregbase && ppd->lid &&
-   (ppd->lflags & linkok);
+   (ppd->lflags & QIBL_LINKACTIVE);
 }
 
-static int find_free_ctxt(int unit, struct file *fp,
- const struct qib_user_info *uinfo)
+/*
+ * Select a context on the given device, either using a requested port
+ * or the port based on the context number.
+ */
+static int choose_port_ctxt(struct file *fp, struct qib_devdata *dd, u32 port,
+   const struct qib_user_info *uinfo)
 {
-   struct qib_devdata *dd = qib_lookup(unit);
struct qib_pportdata *ppd = NULL;
-   int ret;
-   u32 ctxt;
+   int ret, ctxt;
 
-   if (!dd || (uinfo->spu_port && uinfo->spu_port > dd->num_pports)) {
-   ret = -ENODEV;
-   goto bail;
-   }
-
-   /*
-* If users requests specific port, only try that one port, else
-* select "best" port below, based on context.
-*/
-   if (uinfo->spu_port) {
-   ppd = dd->pport + uinfo->spu_port - 1;
-   if (!usable(ppd, 0)) {
+   if (port) {
+   if (!usable(dd->pport + port - 1)) {
ret = -ENETDOWN;
-   goto bail;
-   }
+   goto done;
+   } else
+   ppd = dd->pport + port - 1;
}
-
-   for (ctxt = dd->first_user_ctxt; ctxt < dd->cfgctxts; ctxt++) {
-   if (dd->rcd[ctxt])
-   continue;
-   /*
-* The setting and clearing of user context rcd[x] protected
-* by the qib_mutex
-*/
-   if (!ppd) {
-   /* choose port based on ctxt, if up, else 1st up */
-   ppd = dd->pport + (ctxt % dd->num_pports);
-   if (!usable(ppd, 0)) {
-   int i;
-   for (i = 0; i < dd->num_pports; i++) {
-   ppd = dd->pport + i;
-   if (usable(ppd, 0))
- 

[PATCH 1/7] IB/qib: avoid a rare 7322 chip problem by not marking VL15 bufs as WC

2010-06-17 Thread Ralph Campbell
From: Dave Olson 

Don't set write combining via PAT on the VL15 buffers to avoid a
rare problem with unaligned writes from interrupt-flushed store buffers.

Signed-off-by: Dave Olson 
---

 drivers/infiniband/hw/qib/qib.h |1 +
 drivers/infiniband/hw/qib/qib_diag.c|   19 +++
 drivers/infiniband/hw/qib/qib_iba7322.c |   18 +-
 drivers/infiniband/hw/qib/qib_init.c|6 ++
 drivers/infiniband/hw/qib/qib_pcie.c|2 ++
 drivers/infiniband/hw/qib/qib_tx.c  |6 +-
 6 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index 32d9208..3593983 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -686,6 +686,7 @@ struct qib_devdata {
void __iomem *piobase;
/* mem-mapped pointer to base of user chip regs (if using WC PAT) */
u64 __iomem *userbase;
+   void __iomem *piovl15base; /* base of VL15 buffers, if not WC */
/*
 * points to area where PIOavail registers will be DMA'ed.
 * Has to be on a page of it's own, because the page will be
diff --git a/drivers/infiniband/hw/qib/qib_diag.c 
b/drivers/infiniband/hw/qib/qib_diag.c
index ca98dd5..05dcf0d 100644
--- a/drivers/infiniband/hw/qib/qib_diag.c
+++ b/drivers/infiniband/hw/qib/qib_diag.c
@@ -233,6 +233,7 @@ static u32 __iomem *qib_remap_ioaddr32(struct qib_devdata 
*dd, u32 offset,
u32 __iomem *krb32 = (u32 __iomem *)dd->kregbase;
u32 __iomem *map = NULL;
u32 cnt = 0;
+   u32 tot4k, offs4k;
 
/* First, simplest case, offset is within the first map. */
kreglen = (dd->kregend - dd->kregbase) * sizeof(u64);
@@ -250,7 +251,8 @@ static u32 __iomem *qib_remap_ioaddr32(struct qib_devdata 
*dd, u32 offset,
if (dd->userbase) {
/* If user regs mapped, they are after send, so set limit. */
u32 ulim = (dd->cfgctxts * dd->ureg_align) + dd->uregbase;
-   snd_lim = dd->uregbase;
+   if (!dd->piovl15base)
+   snd_lim = dd->uregbase;
krb32 = (u32 __iomem *)dd->userbase;
if (offset >= dd->uregbase && offset < ulim) {
map = krb32 + (offset - dd->uregbase) / sizeof(u32);
@@ -277,14 +279,14 @@ static u32 __iomem *qib_remap_ioaddr32(struct qib_devdata 
*dd, u32 offset,
/* If 4k buffers exist, account for them by bumping
 * appropriate limit.
 */
+   tot4k = dd->piobcnt4k * dd->align4k;
+   offs4k = dd->piobufbase >> 32;
if (dd->piobcnt4k) {
-   u32 tot4k = dd->piobcnt4k * dd->align4k;
-   u32 offs4k = dd->piobufbase >> 32;
if (snd_bottom > offs4k)
snd_bottom = offs4k;
else {
/* 4k above 2k. Bump snd_lim, if needed*/
-   if (!dd->userbase)
+   if (!dd->userbase || dd->piovl15base)
snd_lim = offs4k + tot4k;
}
}
@@ -298,6 +300,15 @@ static u32 __iomem *qib_remap_ioaddr32(struct qib_devdata 
*dd, u32 offset,
cnt = snd_lim - offset;
}
 
+   if (!map && offs4k && dd->piovl15base) {
+   snd_lim = offs4k + tot4k + 2 * dd->align4k;
+   if (offset >= (offs4k + tot4k) && offset < snd_lim) {
+   map = (u32 __iomem *)dd->piovl15base +
+   ((offset - (offs4k + tot4k)) / sizeof(u32));
+   cnt = snd_lim - offset;
+   }
+   }
+
 mapped:
if (cntp)
*cntp = cnt;
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c 
b/drivers/infiniband/hw/qib/qib_iba7322.c
index 503992d..3e9828b 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -6119,9 +6119,25 @@ static int qib_init_7322_variables(struct qib_devdata 
*dd)
qib_set_ctxtcnt(dd);
 
if (qib_wc_pat) {
-   ret = init_chip_wc_pat(dd, NUM_VL15_BUFS * dd->align4k);
+   resource_size_t vl15off;
+   /*
+* We do not set WC on the VL15 buffers to avoid
+* a rare problem with unaligned writes from
+* interrupt-flushed store buffers, so we need
+* to map those separately here.  We can't solve
+* this for the rarely used mtrr case.
+*/
+   ret = init_chip_wc_pat(dd, 0);
if (ret)
goto bail;
+
+   /* vl15 buffers start just after the 4k buffers */
+   vl15off = dd->physaddr + (dd->piobufbase >> 32) +
+   dd->piobcnt4k * dd->align4k;
+   dd->piovl15base = ioremap_nocache(vl15off,
+ NUM_VL15_BUFS * dd->align4k);
+ 

[PATCH 0/7] various fixes for QIB driver

2010-06-17 Thread Ralph Campbell
The following patches are for various bug fixes.
I'm not sure what counts as a regression for code that is newly introduced.
I'm hoping that all except #2 can be made for 2.6.35 whereas
#2 can wait for 2.6.36 since it is actually a feature.

IB/qib: avoid a rare 7322 chip problem by not marking VL15 bufs as WC
IB/qib: allow PSM to select from multiple port assignment algorithms
IB/qib: mask hardware error during link reset
IB/qib: clear eager buffer memory for each new process
IB/qib: clear 6120 hardware error register
IB/qib: update 7322 serdes tables
IB/qib: completion queue callback needs to be single threaded
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mellanox implementation for atomic operations

2010-06-16 Thread Ralph Campbell
The ib_qib driver supports atomic IB operations and
they are global since it does it in the host software
instead of PCIe bus transactions which don't have
global atomic support (yet).

On Tue, 2010-06-15 at 13:50 -0700, Dotan Barak wrote:
> Hi.
> 
> On 10/05/2010 08:42, lihaidong wrote:
> > Hi,
> >I have a question about atomic operations.
> >According to IB specification o10-48, all atomic operation request made 
> > to the same HCA, referencing the same physical memory are serialized with 
> > respect to each other.  I know this should be complied with if HCA supports 
> > atomic operations, right?
> >   
> Right.
> >According to  IB specification o10-49, all atomic operations requests 
> > that referencing the same physical memory are serialized with respect to 
> > each other. This means that atomic operations performed by processors 
> > should serialized with atomic operations performed by HCAs, too, if they 
> > were referencing the same physical memory.
> >   
> So far so good.
> >   I want to know whether Mellanox implementation for atomic operations 
> > comply with o10-49 or not.
> >   if not ,to what extent it comply with the rule? 
> >   I also was intrested in how this rule is complied with by others vendors?
> >
> >   
> Let give you a general answer:
> The struct ibv_device_attr contains the atomic_cap attribute, this
> attribute defines the atomicity
> level that the HCA support (None, only within the HCA, between all HCAs
> (global)).
> 
> I think that your code should check this attribute
> (This way your code will support all vendors HCAs).
> 
> As much as i know, atomic operations are only supported within one HCA.
> 
> I hope that this answer helped you ..
> Dotan
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: define max IB ports instead of variable stack allocation

2010-06-02 Thread Ralph Campbell
Rather than use a variable size array allocation on the stack,
define a constant for the maximum array size possible.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib.h|3 +++
 drivers/infiniband/hw/qib/qib_tx.c |2 +-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h
index 32d9208..211ff6a 100644
--- a/drivers/infiniband/hw/qib/qib.h
+++ b/drivers/infiniband/hw/qib/qib.h
@@ -326,6 +326,9 @@ struct qib_verbs_txreq {
 
 #define QIB_DEFAULT_MTU 4096
 
+/* max number of IB ports supported per HCA */
+#define QIB_MAX_IB_PORTS 2
+
 /*
  * Possible IB config parameters for f_get/set_ib_table()
  */
diff --git a/drivers/infiniband/hw/qib/qib_tx.c 
b/drivers/infiniband/hw/qib/qib_tx.c
index f7eb1dd..77909b5 100644
--- a/drivers/infiniband/hw/qib/qib_tx.c
+++ b/drivers/infiniband/hw/qib/qib_tx.c
@@ -170,7 +170,7 @@ static int find_ctxt(struct qib_devdata *dd, unsigned bufn)
 void qib_disarm_piobufs_set(struct qib_devdata *dd, unsigned long *mask,
unsigned cnt)
 {
-   struct qib_pportdata *ppd, *pppd[dd->num_pports];
+   struct qib_pportdata *ppd, *pppd[QIB_MAX_IB_PORTS];
unsigned i;
unsigned long flags;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: variable length array in qib

2010-06-02 Thread Ralph Campbell
Sure. I will work on a patch.

On Wed, 2010-06-02 at 14:37 -0700, Roland Dreier wrote:
> qib has the code
> 
>   void qib_disarm_piobufs_set(struct qib_devdata *dd, unsigned long *mask,
>   unsigned cnt)
>   {
>   struct qib_pportdata *ppd, *pppd[dd->num_pports];
> 
> it would probably be safer to avoid the variable length array pppd[] on
> the kernel stack here... could this code be easily refactored to avoid
> doing that?


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for May 27 (infiniband: qib)

2010-05-27 Thread Ralph Campbell
On Thu, 2010-05-27 at 15:06 -0700, Randy Dunlap wrote:
> On 05/27/10 14:45, Roland Dreier wrote:
> >  > However, it looks like qib needs to handle DCA config in a way that
> >  > is similar to how it is handled in drivers/net/{myri10ge,igb,ixgbe}/
> >  > instead of assuming that DCA is enabled.
> > 
> > Looks like we're just going to rip out DCA support for now.
> > 
> >  > And please fix the linux-next 2010-may-25 reported qib problem:
> >  >   http://lkml.org/lkml/2010/5/25/321
> > 
> > I think that should be fixed in my for-next branch already (at least I
> > have a patch from Ralph called "IB/qib: Fix undefined symbol error when
> > CONFIG_PCI_MSI=n" in there).
> 
> Sounds good.  Was is posted to linux-rdma?
> 
> thanks,

Yes, I posted it to linux-rdma.

http://www.spinics.net/lists/linux-rdma/msg04140.html

http://www.spinics.net/lists/linux-rdma/msg04187.html


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: remove DCA support until feature is finished

2010-05-27 Thread Ralph Campbell
The DCA code was left over from internal development to test
the hardware feature and allow performance testing.
The results were mixed and will require some additional work
to make full use of the feature. Therefore, it is being removed
for now.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_iba7322.c |  189 ---
 1 files changed, 0 insertions(+), 189 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c 
b/drivers/infiniband/hw/qib/qib_iba7322.c
index 23fb9ef..503992d 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -42,9 +42,6 @@
 #include 
 #include 
 #include 
-#if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE)
-#include 
-#endif
 
 #include "qib.h"
 #include "qib_7322_regs.h"
@@ -518,12 +515,6 @@ struct qib_chip_specific {
u32 lastbuf_for_pio;
u32 stay_in_freeze;
u32 recovery_ports_initted;
-#if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE)
-   u32 dca_ctrl;
-   int rhdr_cpu[18];
-   int sdma_cpu[2];
-   u64 dca_rcvhdr_ctrl[5]; /* B, C, D, E, F */
-#endif
struct msix_entry *msix_entries;
void  **msix_arg;
unsigned long *sendchkenable;
@@ -642,52 +633,6 @@ static struct {
SYM_LSB(IntStatus, SDmaCleanupDone_1), 2 },
 };
 
-#if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE)
-static const struct dca_reg_map {
-   int shadow_inx;
-   int lsb;
-   u64 mask;
-   u16 regno;
-} dca_rcvhdr_reg_map[] = {
-   { 0, SYM_LSB(DCACtrlB, RcvHdrq0DCAOPH),
-  ~SYM_MASK(DCACtrlB, RcvHdrq0DCAOPH) , KREG_IDX(DCACtrlB) },
-   { 0, SYM_LSB(DCACtrlB, RcvHdrq1DCAOPH),
-  ~SYM_MASK(DCACtrlB, RcvHdrq1DCAOPH) , KREG_IDX(DCACtrlB) },
-   { 0, SYM_LSB(DCACtrlB, RcvHdrq2DCAOPH),
-  ~SYM_MASK(DCACtrlB, RcvHdrq2DCAOPH) , KREG_IDX(DCACtrlB) },
-   { 0, SYM_LSB(DCACtrlB, RcvHdrq3DCAOPH),
-  ~SYM_MASK(DCACtrlB, RcvHdrq3DCAOPH) , KREG_IDX(DCACtrlB) },
-   { 1, SYM_LSB(DCACtrlC, RcvHdrq4DCAOPH),
-  ~SYM_MASK(DCACtrlC, RcvHdrq4DCAOPH) , KREG_IDX(DCACtrlC) },
-   { 1, SYM_LSB(DCACtrlC, RcvHdrq5DCAOPH),
-  ~SYM_MASK(DCACtrlC, RcvHdrq5DCAOPH) , KREG_IDX(DCACtrlC) },
-   { 1, SYM_LSB(DCACtrlC, RcvHdrq6DCAOPH),
-  ~SYM_MASK(DCACtrlC, RcvHdrq6DCAOPH) , KREG_IDX(DCACtrlC) },
-   { 1, SYM_LSB(DCACtrlC, RcvHdrq7DCAOPH),
-  ~SYM_MASK(DCACtrlC, RcvHdrq7DCAOPH) , KREG_IDX(DCACtrlC) },
-   { 2, SYM_LSB(DCACtrlD, RcvHdrq8DCAOPH),
-  ~SYM_MASK(DCACtrlD, RcvHdrq8DCAOPH) , KREG_IDX(DCACtrlD) },
-   { 2, SYM_LSB(DCACtrlD, RcvHdrq9DCAOPH),
-  ~SYM_MASK(DCACtrlD, RcvHdrq9DCAOPH) , KREG_IDX(DCACtrlD) },
-   { 2, SYM_LSB(DCACtrlD, RcvHdrq10DCAOPH),
-  ~SYM_MASK(DCACtrlD, RcvHdrq10DCAOPH) , KREG_IDX(DCACtrlD) },
-   { 2, SYM_LSB(DCACtrlD, RcvHdrq11DCAOPH),
-  ~SYM_MASK(DCACtrlD, RcvHdrq11DCAOPH) , KREG_IDX(DCACtrlD) },
-   { 3, SYM_LSB(DCACtrlE, RcvHdrq12DCAOPH),
-  ~SYM_MASK(DCACtrlE, RcvHdrq12DCAOPH) , KREG_IDX(DCACtrlE) },
-   { 3, SYM_LSB(DCACtrlE, RcvHdrq13DCAOPH),
-  ~SYM_MASK(DCACtrlE, RcvHdrq13DCAOPH) , KREG_IDX(DCACtrlE) },
-   { 3, SYM_LSB(DCACtrlE, RcvHdrq14DCAOPH),
-  ~SYM_MASK(DCACtrlE, RcvHdrq14DCAOPH) , KREG_IDX(DCACtrlE) },
-   { 3, SYM_LSB(DCACtrlE, RcvHdrq15DCAOPH),
-  ~SYM_MASK(DCACtrlE, RcvHdrq15DCAOPH) , KREG_IDX(DCACtrlE) },
-   { 4, SYM_LSB(DCACtrlF, RcvHdrq16DCAOPH),
-  ~SYM_MASK(DCACtrlF, RcvHdrq16DCAOPH) , KREG_IDX(DCACtrlF) },
-   { 4, SYM_LSB(DCACtrlF, RcvHdrq17DCAOPH),
-  ~SYM_MASK(DCACtrlF, RcvHdrq17DCAOPH) , KREG_IDX(DCACtrlF) },
-};
-#endif
-
 /* ibcctrl bits */
 #define QLOGIC_IB_IBCC_LINKINITCMD_DISABLE 1
 /* cycle through TS1/TS2 till OK */
@@ -2538,95 +2483,6 @@ static void qib_setup_7322_setextled(struct 
qib_pportdata *ppd, u32 on)
qib_write_kreg_port(ppd, krp_rcvpktledcnt, ledblink);
 }
 
-#if defined(CONFIG_DCA) || defined(CONFIG_DCA_MODULE)
-static void qib_update_rhdrq_dca(struct qib_ctxtdata *rcd)
-{
-   struct qib_devdata *dd = rcd->dd;
-   struct qib_chip_specific *cspec = dd->cspec;
-   int cpu = get_cpu();
-
-   if (cspec->rhdr_cpu[rcd->ctxt] != cpu) {
-   const struct dca_reg_map *rmp;
-
-   cspec->rhdr_cpu[rcd->ctxt] = cpu;
-   rmp = &dca_rcvhdr_reg_map[rcd->ctxt];
-   cspec->dca_rcvhdr_ctrl[rmp->shadow_inx] &= rmp->mask;
-   cspec->dca_rcvhdr_ctrl[rmp->shadow_inx] |=
-   (u64) dca3_get_tag(&dd->pcidev->dev, cpu) << rmp->lsb;
-   qib_write_kreg(dd, rmp->regno,
-  cspec->dca_rcvhdr_ctrl[rmp->shadow_inx]);
-   cspec->dca_ctrl |= SYM_MASK(DCACtrlA, RcvHdrqDCAEnable);
-   qib_write_kreg(dd, KREG_IDX(DCACtrl

[PATCH] IB/qib: use a single txselect module parameter for serdes tuning

2010-05-26 Thread Ralph Campbell
As part of the earlier patches submitted and reviewed, it was agreed
to change the way serdes tuning parameters were specified to the
driver. The updated patch got dropped by the linux-rdma email list
so the earlier version of qib_iba7322.c was integrated.
This patch updates qib_iab7322.c with the simpler, single parameter
method of setting the serdes parameters.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_iba7322.c |  582 ++-
 1 files changed, 179 insertions(+), 403 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c 
b/drivers/infiniband/hw/qib/qib_iba7322.c
index 2c24eab..23fb9ef 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -114,40 +114,18 @@ static ushort qib_singleport;
 module_param_named(singleport, qib_singleport, ushort, S_IRUGO);
 MODULE_PARM_DESC(singleport, "Use only IB port 1; more per-port buffer space");
 
-
-/*
- * Setup QMH7342 receive and transmit parameters, necessary because
- * each bay, Mez connector, and IB port need different tuning, beyond
- * what the switch and HCA can do automatically.
- * It's expected to be done by cat'ing files to the modules file,
- * rather than setting up as a module parameter.
- * It's a "write-only" file, returns 0 when read back.
- * The unit, port, bay (if given), and values MUST be done as a single write.
- * The unit, port, and bay must precede the values to be effective.
- */
-static int setup_qmh_params(const char *, struct kernel_param *);
-static unsigned dummy_qmh_params;
-module_param_call(qmh_serdes_setup, setup_qmh_params, param_get_uint,
- &dummy_qmh_params, S_IWUSR | S_IRUGO);
-
-/* similarly for QME7342, but it's simpler */
-static int setup_qme_params(const char *, struct kernel_param *);
-static unsigned dummy_qme_params;
-module_param_call(qme_serdes_setup, setup_qme_params, param_get_uint,
- &dummy_qme_params, S_IWUSR | S_IRUGO);
-
 #define MAX_ATTEN_LEN 64 /* plenty for any real system */
 /* for read back, default index is ~5m copper cable */
-static char cable_atten_list[MAX_ATTEN_LEN] = "10";
-static struct kparam_string kp_cable_atten = {
-   .string = cable_atten_list,
+static char txselect_list[MAX_ATTEN_LEN] = "10";
+static struct kparam_string kp_txselect = {
+   .string = txselect_list,
.maxlen = MAX_ATTEN_LEN
 };
-static int  setup_cable_atten(const char *, struct kernel_param *);
-module_param_call(cable_atten, setup_cable_atten, param_get_string,
- &kp_cable_atten, S_IWUSR | S_IRUGO);
-MODULE_PARM_DESC(cable_atten, \
-"cable attenuation indices for cables with invalid EEPROM");
+static int  setup_txselect(const char *, struct kernel_param *);
+module_param_call(txselect, setup_txselect, param_get_string,
+ &kp_txselect, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(txselect, \
+"Tx serdes indices (for no QSFP or invalid QSFP data)");
 
 #define BOARD_QME7342 5
 #define BOARD_QMH7342 6
@@ -574,11 +552,12 @@ struct vendor_txdds_ent {
 static void write_tx_serdes_param(struct qib_pportdata *, struct txdds_ent *);
 
 #define TXDDS_TABLE_SZ 16 /* number of entries per speed in onchip table */
+#define TXDDS_EXTRA_SZ 11 /* number of extra tx settings entries */
 #define SERDES_CHANS 4 /* yes, it's obvious, but one less magic number */
 
 #define H1_FORCE_VAL 8
-#define H1_FORCE_QME 1 /*  may be overridden via setup_qme_params() */
-#define H1_FORCE_QMH 7 /*  may be overridden via setup_qmh_params() */
+#define H1_FORCE_QME 1 /*  may be overridden via setup_txselect() */
+#define H1_FORCE_QMH 7 /*  may be overridden via setup_txselect() */
 
 /* The static and dynamic registers are paired, and the pairs indexed by spd */
 #define krp_static_adapt_dis(spd) (KREG_IBPORT_IDX(ADAPT_DISABLE_STATIC_SDR) \
@@ -590,15 +569,6 @@ static void write_tx_serdes_param(struct qib_pportdata *, 
struct txdds_ent *);
 #define QDR_STATIC_ADAPT_INIT 0xffULL /* up, disable H0,H1-8, LE */
 #define QDR_STATIC_ADAPT_INIT_R1 0xf0ULL /* r1 up, disable H0,H1-8 */
 
-static const struct txdds_ent qmh_sdr_txdds =  { 11, 0,  5,  6 };
-static const struct txdds_ent qmh_ddr_txdds =  {  7, 0,  2,  8 };
-static const struct txdds_ent qmh_qdr_txdds =  {  0, 1,  3, 10 };
-
-/* this is used for unknown mez cards also */
-static const struct txdds_ent qme_sdr_txdds =  { 11, 0,  4,  4 };
-static const struct txdds_ent qme_ddr_txdds =  {  7, 0,  2,  7 };
-static const struct txdds_ent qme_qdr_txdds =  {  0, 1, 12, 11 };
-
 struct qib_chippport_specific {
u64 __iomem *kpregbase;
u64 __iomem *cpregbase;
@@ -637,12 +607,8 @@ struct qib_chippport_specific {
 * Per-bay per-channel rcv QMH H1 values and Tx values for QDR.
 * entry zero is unused, to simplify indexing
 */
-   u16 h1_val;
- 

[PATCH] IB/qib: fix powerpc compile warnings

2010-05-26 Thread Ralph Campbell
Fix the compile warnings for uninitialized variables in qib_fs.c
when compiling for powerpc.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_fs.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_fs.c 
b/drivers/infiniband/hw/qib/qib_fs.c
index 7554704..8a354f7 100644
--- a/drivers/infiniband/hw/qib/qib_fs.c
+++ b/drivers/infiniband/hw/qib/qib_fs.c
@@ -143,7 +143,7 @@ static const struct file_operations driver_ops[] = {
 static ssize_t dev_counters_read(struct file *file, char __user *buf,
 size_t count, loff_t *ppos)
 {
-   u64 *counters;
+   u64 *counters = NULL;
struct qib_devdata *dd = private2dd(file);
 
return simple_read_from_buffer(buf, count, ppos, counters,
@@ -154,7 +154,7 @@ static ssize_t dev_counters_read(struct file *file, char 
__user *buf,
 static ssize_t dev_names_read(struct file *file, char __user *buf,
  size_t count, loff_t *ppos)
 {
-   char *names;
+   char *names = NULL;
struct qib_devdata *dd = private2dd(file);
 
return simple_read_from_buffer(buf, count, ppos, names,
@@ -175,7 +175,7 @@ static const struct file_operations cntr_ops[] = {
 static ssize_t portnames_read(struct file *file, char __user *buf,
  size_t count, loff_t *ppos)
 {
-   char *names;
+   char *names = NULL;
struct qib_devdata *dd = private2dd(file);
 
return simple_read_from_buffer(buf, count, ppos, names,
@@ -186,7 +186,7 @@ static ssize_t portnames_read(struct file *file, char 
__user *buf,
 static ssize_t portcntrs_1_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
 {
-   u64 *counters;
+   u64 *counters = NULL;
struct qib_devdata *dd = private2dd(file);
 
return simple_read_from_buffer(buf, count, ppos, counters,
@@ -197,7 +197,7 @@ static ssize_t portcntrs_1_read(struct file *file, char 
__user *buf,
 static ssize_t portcntrs_2_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
 {
-   u64 *counters;
+   u64 *counters = NULL;
struct qib_devdata *dd = private2dd(file);
 
return simple_read_from_buffer(buf, count, ppos, counters,

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/qib: fix undefined symbol error when CONFIG_PCI_MSI undefined

2010-05-25 Thread Ralph Campbell
This patch fixes a compile error saying qib_init_iba6120_funcs() is
undefined when CONFIG_PCI_MSI is not defined.
Thanks to Randy Dunlap  for finding this and
suggesting the fix.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_iba6120.c |   12 
 drivers/infiniband/hw/qib/qib_init.c|6 ++
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_iba6120.c 
b/drivers/infiniband/hw/qib/qib_iba6120.c
index 7b6549f..1eadadc 100644
--- a/drivers/infiniband/hw/qib/qib_iba6120.c
+++ b/drivers/infiniband/hw/qib/qib_iba6120.c
@@ -3475,14 +3475,6 @@ struct qib_devdata *qib_init_iba6120_funcs(struct 
pci_dev *pdev,
struct qib_devdata *dd;
int ret;
 
-#ifndef CONFIG_PCI_MSI
-   qib_early_err(&pdev->dev, "QLogic PCIE device 0x%x cannot "
- "work if CONFIG_PCI_MSI is not enabled\n",
- ent->device);
-   dd = ERR_PTR(-ENODEV);
-   goto bail;
-#endif
-
dd = qib_alloc_devdata(pdev, sizeof(struct qib_pportdata) +
   sizeof(struct qib_chip_specific));
if (IS_ERR(dd))
@@ -3554,10 +3546,6 @@ struct qib_devdata *qib_init_iba6120_funcs(struct 
pci_dev *pdev,
if (qib_mini_init)
goto bail;
 
-#ifndef CONFIG_PCI_MSI
-   qib_dev_err(dd, "PCI_MSI not configured, NO interrupts\n");
-#endif
-
if (qib_pcie_params(dd, 8, NULL, NULL))
qib_dev_err(dd, "Failed to setup PCIe or interrupts; "
"continuing anyway\n");
diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
index c0139c0..9b40f34 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1237,7 +1237,13 @@ static int __devinit qib_init_one(struct pci_dev *pdev,
 */
switch (ent->device) {
case PCI_DEVICE_ID_QLOGIC_IB_6120:
+#ifdef CONFIG_PCI_MSI
dd = qib_init_iba6120_funcs(pdev, ent);
+#else
+   qib_early_err(&pdev->dev, "QLogic PCIE device 0x%x cannot "
+ "work if CONFIG_PCI_MSI is not enabled\n",
+ ent->device);
+#endif
break;
 
case PCI_DEVICE_ID_QLOGIC_IB_7220:

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 11/11] IB/qib: Add qib_verbs.h

2010-05-25 Thread Ralph Campbell
On Mon, 2010-05-24 at 21:22 -0700, Roland Dreier wrote:
> OK, I merged all the qib-related stuff up and put it in my for-next
> branch.  I expect ot send it to Linus soon.


Thank you very much!

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 11/11] IB/qib: Add qib_verbs.h

2010-05-18 Thread Ralph Campbell
creates the qib_verbs.h file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_verbs.h | 1100 +
 1 files changed, 1100 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_verbs.h

diff --git a/drivers/infiniband/hw/qib/qib_verbs.h 
b/drivers/infiniband/hw/qib/qib_verbs.h
new file mode 100644
index 000..bd57c12
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -0,0 +1,1100 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009, 2010 QLogic Corporation.
+ * All rights reserved.
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef QIB_VERBS_H
+#define QIB_VERBS_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct qib_ctxtdata;
+struct qib_pportdata;
+struct qib_devdata;
+struct qib_verbs_txreq;
+
+#define QIB_MAX_RDMA_ATOMIC 16
+#define QIB_GUIDS_PER_PORT 5
+
+#define QPN_MAX (1 << 24)
+#define QPNMAP_ENTRIES  (QPN_MAX / PAGE_SIZE / BITS_PER_BYTE)
+
+/*
+ * Increment this value if any changes that break userspace ABI
+ * compatibility are made.
+ */
+#define QIB_UVERBS_ABI_VERSION   2
+
+/*
+ * Define an ib_cq_notify value that is not valid so we know when CQ
+ * notifications are armed.
+ */
+#define IB_CQ_NONE  (IB_CQ_NEXT_COMP + 1)
+
+#define IB_SEQ_NAK (3 << 29)
+
+/* AETH NAK opcode values */
+#define IB_RNR_NAK  0x20
+#define IB_NAK_PSN_ERROR0x60
+#define IB_NAK_INVALID_REQUEST  0x61
+#define IB_NAK_REMOTE_ACCESS_ERROR  0x62
+#define IB_NAK_REMOTE_OPERATIONAL_ERROR 0x63
+#define IB_NAK_INVALID_RD_REQUEST   0x64
+
+/* Flags for checking QP state (see ib_qib_state_ops[]) */
+#define QIB_POST_SEND_OK0x01
+#define QIB_POST_RECV_OK0x02
+#define QIB_PROCESS_RECV_OK 0x04
+#define QIB_PROCESS_SEND_OK 0x08
+#define QIB_PROCESS_NEXT_SEND_OK0x10
+#define QIB_FLUSH_SEND 0x20
+#define QIB_FLUSH_RECV 0x40
+#define QIB_PROCESS_OR_FLUSH_SEND \
+   (QIB_PROCESS_SEND_OK | QIB_FLUSH_SEND)
+
+/* IB Performance Manager status values */
+#define IB_PMA_SAMPLE_STATUS_DONE   0x00
+#define IB_PMA_SAMPLE_STATUS_STARTED0x01
+#define IB_PMA_SAMPLE_STATUS_RUNNING0x02
+
+/* Mandatory IB performance counter select values. */
+#define IB_PMA_PORT_XMIT_DATA   cpu_to_be16(0x0001)
+#define IB_PMA_PORT_RCV_DATAcpu_to_be16(0x0002)
+#define IB_PMA_PORT_XMIT_PKTS   cpu_to_be16(0x0003)
+#define IB_PMA_PORT_RCV_PKTScpu_to_be16(0x0004)
+#define IB_PMA_PORT_XMIT_WAIT   cpu_to_be16(0x0005)
+
+#define QIB_VENDOR_IPG cpu_to_be16(0xFFA0)
+
+#define IB_BTH_REQ_ACK (1 << 31)
+#define IB_BTH_SOLICITED   (1 << 23)
+#define IB_BTH_MIG_REQ (1 << 22)
+
+/* XXX Should be defined in ib_verbs.h enum ib_port_cap_flags */
+#define IB_PORT_OTHER_LOCAL_CHANGES_SUP (1 << 26)
+
+#define IB_GRH_VERSION 6
+#define IB_GRH_VERSION_MASK0xF
+#define IB_GRH_VERSION_SHIFT   28
+#define IB_GRH_TCLASS_MASK 0xFF
+#define IB_GRH_TCLASS_SHIFT20
+#define IB_GRH_FLOW_MASK   0xF
+#define IB_GRH_FLOW_SHIFT  0
+#define IB_GRH_NEXT_HDR0x1B
+
+#define IB_DEFAULT_GID_PREFIX  cpu_to_be64(0xfe80ULL)
+
+/* Values for set/get portinfo VLCap OperationalVLs */
+#define IB_VL_VL0   1
+#define IB_VL_VL0_1 2
+#define IB_VL_VL0_3 3
+#define IB_VL_VL0_7 4
+#define IB_VL_VL0_145
+
+static inline i

[PATCH v4 10/11] IB/qib: Add qib_sd7220.c

2010-05-18 Thread Ralph Campbell
creates the qib_sd7220.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_sd7220.c | 1413 
 1 files changed, 1413 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_sd7220.c

diff --git a/drivers/infiniband/hw/qib/qib_sd7220.c 
b/drivers/infiniband/hw/qib/qib_sd7220.c
new file mode 100644
index 000..0aeed0e
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_sd7220.c
@@ -0,0 +1,1413 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+/*
+ * This file contains all of the code that is specific to the SerDes
+ * on the QLogic_IB 7220 chip.
+ */
+
+#include 
+#include 
+
+#include "qib.h"
+#include "qib_7220.h"
+
+/*
+ * Same as in qib_iba7220.c, but just the registers needed here.
+ * Could move whole set to qib_7220.h, but decided better to keep
+ * local.
+ */
+#define KREG_IDX(regname) (QIB_7220_##regname##_OFFS / sizeof(u64))
+#define kr_hwerrclear KREG_IDX(HwErrClear)
+#define kr_hwerrmask KREG_IDX(HwErrMask)
+#define kr_hwerrstatus KREG_IDX(HwErrStatus)
+#define kr_ibcstatus KREG_IDX(IBCStatus)
+#define kr_ibserdesctrl KREG_IDX(IBSerDesCtrl)
+#define kr_scratch KREG_IDX(Scratch)
+#define kr_xgxs_cfg KREG_IDX(XGXSCfg)
+/* these are used only here, not in qib_iba7220.c */
+#define kr_ibsd_epb_access_ctrl KREG_IDX(ibsd_epb_access_ctrl)
+#define kr_ibsd_epb_transaction_reg KREG_IDX(ibsd_epb_transaction_reg)
+#define kr_pciesd_epb_transaction_reg KREG_IDX(pciesd_epb_transaction_reg)
+#define kr_pciesd_epb_access_ctrl KREG_IDX(pciesd_epb_access_ctrl)
+#define kr_serdes_ddsrxeq0 KREG_IDX(SerDes_DDSRXEQ0)
+
+/*
+ * The IBSerDesMappTable is a memory that holds values to be stored in
+ * various SerDes registers by IBC.
+ */
+#define kr_serdes_maptable KREG_IDX(IBSerDesMappTable)
+
+/*
+ * Below used for sdnum parameter, selecting one of the two sections
+ * used for PCIe, or the single SerDes used for IB.
+ */
+#define PCIE_SERDES0 0
+#define PCIE_SERDES1 1
+
+/*
+ * The EPB requires addressing in a particular form. EPB_LOC() is intended
+ * to make #definitions a little more readable.
+ */
+#define EPB_ADDR_SHF 8
+#define EPB_LOC(chn, elt, reg) \
+   (((elt & 0xf) | ((chn & 7) << 4) | ((reg & 0x3f) << 9)) << \
+EPB_ADDR_SHF)
+#define EPB_IB_QUAD0_CS_SHF (25)
+#define EPB_IB_QUAD0_CS (1U <<  EPB_IB_QUAD0_CS_SHF)
+#define EPB_IB_UC_CS_SHF (26)
+#define EPB_PCIE_UC_CS_SHF (27)
+#define EPB_GLOBAL_WR (1U << (EPB_ADDR_SHF + 8))
+
+/* Forward declarations. */
+static int qib_sd7220_reg_mod(struct qib_devdata *dd, int sdnum, u32 loc,
+ u32 data, u32 mask);
+static int ibsd_mod_allchnls(struct qib_devdata *dd, int loc, int val,
+int mask);
+static int qib_sd_trimdone_poll(struct qib_devdata *dd);
+static void qib_sd_trimdone_monitor(struct qib_devdata *dd, const char *where);
+static int qib_sd_setvals(struct qib_devdata *dd);
+static int qib_sd_early(struct qib_devdata *dd);
+static int qib_sd_dactrim(struct qib_devdata *dd);
+static int qib_internal_presets(struct qib_devdata *dd);
+/* Tweak the register (CMUCTRL5) that contains the TRIMSELF controls */
+static int qib_sd_trimself(struct qib_devdata *dd, int val);
+static int epb_access(struct qib_devdata *dd, int sdnum, int claim);
+
+/*
+ * Below keeps track of whether the "once per power-on" initialization has
+ * been done, because uC c

[PATCH v4 09/11] IB/qib: Add qib_init.c

2010-05-18 Thread Ralph Campbell
creates the qib_init.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_init.c | 1580 ++
 1 files changed, 1580 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_init.c

diff --git a/drivers/infiniband/hw/qib/qib_init.c 
b/drivers/infiniband/hw/qib/qib_init.c
new file mode 100644
index 000..c0139c0
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -0,0 +1,1580 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009, 2010 QLogic Corporation.
+ * All rights reserved.
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qib.h"
+#include "qib_common.h"
+
+/*
+ * min buffers we want to have per context, after driver
+ */
+#define QIB_MIN_USER_CTXT_BUFCNT 7
+
+#define QLOGIC_IB_R_SOFTWARE_MASK 0xFF
+#define QLOGIC_IB_R_SOFTWARE_SHIFT 24
+#define QLOGIC_IB_R_EMULATOR_MASK (1ULL<<62)
+
+/*
+ * Number of ctxts we are configured to use (to allow for more pio
+ * buffers per ctxt, etc.)  Zero means use chip value.
+ */
+ushort qib_cfgctxts;
+module_param_named(cfgctxts, qib_cfgctxts, ushort, S_IRUGO);
+MODULE_PARM_DESC(cfgctxts, "Set max number of contexts to use");
+
+/*
+ * If set, do not write to any regs if avoidable, hack to allow
+ * check for deranged default register values.
+ */
+ushort qib_mini_init;
+module_param_named(mini_init, qib_mini_init, ushort, S_IRUGO);
+MODULE_PARM_DESC(mini_init, "If set, do minimal diag init");
+
+unsigned qib_n_krcv_queues;
+module_param_named(krcvqs, qib_n_krcv_queues, uint, S_IRUGO);
+MODULE_PARM_DESC(krcvqs, "number of kernel receive queues per IB port");
+
+/*
+ * qib_wc_pat parameter:
+ *  0 is WC via MTRR
+ *  1 is WC via PAT
+ *  If PAT initialization fails, code reverts back to MTRR
+ */
+unsigned qib_wc_pat = 1; /* default (1) is to use PAT, not MTRR */
+module_param_named(wc_pat, qib_wc_pat, uint, S_IRUGO);
+MODULE_PARM_DESC(wc_pat, "enable write-combining via PAT mechanism");
+
+struct workqueue_struct *qib_wq;
+struct workqueue_struct *qib_cq_wq;
+
+static void verify_interrupt(unsigned long);
+
+static struct idr qib_unit_table;
+u32 qib_cpulist_count;
+unsigned long *qib_cpulist;
+
+/* set number of contexts we'll actually use */
+void qib_set_ctxtcnt(struct qib_devdata *dd)
+{
+   if (!qib_cfgctxts)
+   dd->cfgctxts = dd->ctxtcnt;
+   else if (qib_cfgctxts < dd->num_pports)
+   dd->cfgctxts = dd->ctxtcnt;
+   else if (qib_cfgctxts <= dd->ctxtcnt)
+   dd->cfgctxts = qib_cfgctxts;
+   else
+   dd->cfgctxts = dd->ctxtcnt;
+}
+
+/*
+ * Common code for creating the receive context array.
+ */
+int qib_create_ctxts(struct qib_devdata *dd)
+{
+   unsigned i;
+   int ret;
+
+   /*
+* Allocate full ctxtcnt array, rather than just cfgctxts, because
+* cleanup iterates across all possible ctxts.
+*/
+   dd->rcd = kzalloc(sizeof(*dd->rcd) * dd->ctxtcnt, GFP_KERNEL);
+   if (!dd->rcd) {
+   qib_dev_err(dd, "Unable to allocate ctxtdata array, "
+   "failing\n");
+   ret = -ENOMEM;
+   goto done;
+   }
+
+   /* create (one or more) kctxt */
+   for (i = 0; i < dd->first_user_ctxt; ++i) {
+   struct qib_pportdata *ppd;
+   struct qib_ctxtdata *rcd;
+
+  

[PATCH v4 07/11] IB/qib: Add qib_dma.c

2010-05-18 Thread Ralph Campbell
creates the qib_dma.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_dma.c |  182 +++
 1 files changed, 182 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_dma.c

diff --git a/drivers/infiniband/hw/qib/qib_dma.c 
b/drivers/infiniband/hw/qib/qib_dma.c
new file mode 100644
index 000..2920bb3
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_dma.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright (c) 2006, 2009, 2010 QLogic, Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include 
+#include 
+
+#include "qib_verbs.h"
+
+#define BAD_DMA_ADDRESS ((u64) 0)
+
+/*
+ * The following functions implement driver specific replacements
+ * for the ib_dma_*() functions.
+ *
+ * These functions return kernel virtual addresses instead of
+ * device bus addresses since the driver uses the CPU to copy
+ * data instead of using hardware DMA.
+ */
+
+static int qib_mapping_error(struct ib_device *dev, u64 dma_addr)
+{
+   return dma_addr == BAD_DMA_ADDRESS;
+}
+
+static u64 qib_dma_map_single(struct ib_device *dev, void *cpu_addr,
+ size_t size, enum dma_data_direction direction)
+{
+   BUG_ON(!valid_dma_direction(direction));
+   return (u64) cpu_addr;
+}
+
+static void qib_dma_unmap_single(struct ib_device *dev, u64 addr, size_t size,
+enum dma_data_direction direction)
+{
+   BUG_ON(!valid_dma_direction(direction));
+}
+
+static u64 qib_dma_map_page(struct ib_device *dev, struct page *page,
+   unsigned long offset, size_t size,
+   enum dma_data_direction direction)
+{
+   u64 addr;
+
+   BUG_ON(!valid_dma_direction(direction));
+
+   if (offset + size > PAGE_SIZE) {
+   addr = BAD_DMA_ADDRESS;
+   goto done;
+   }
+
+   addr = (u64) page_address(page);
+   if (addr)
+   addr += offset;
+   /* TODO: handle highmem pages */
+
+done:
+   return addr;
+}
+
+static void qib_dma_unmap_page(struct ib_device *dev, u64 addr, size_t size,
+  enum dma_data_direction direction)
+{
+   BUG_ON(!valid_dma_direction(direction));
+}
+
+static int qib_map_sg(struct ib_device *dev, struct scatterlist *sgl,
+ int nents, enum dma_data_direction direction)
+{
+   struct scatterlist *sg;
+   u64 addr;
+   int i;
+   int ret = nents;
+
+   BUG_ON(!valid_dma_direction(direction));
+
+   for_each_sg(sgl, sg, nents, i) {
+   addr = (u64) page_address(sg_page(sg));
+   /* TODO: handle highmem pages */
+   if (!addr) {
+   ret = 0;
+   break;
+   }
+   }
+   return ret;
+}
+
+static void qib_unmap_sg(struct ib_device *dev,
+struct scatterlist *sg, int nents,
+enum dma_data_direction direction)
+{
+   BUG_ON(!valid_dma_direction(direction));
+}
+
+static u64 qib_sg_dma_address(struct ib_device *dev, struct scatterlist *sg)
+{
+   u64 addr = (u64) page_address(sg_page(sg));
+
+   if (addr)
+   addr += sg->offset;
+   return addr;
+}
+
+static unsigned int qib_sg_dma_len(struct ib_device *dev,
+  struct scatterlist *sg)
+{
+   return sg->length;
+}
+
+static void qib_sync_single_for_cpu(struct ib_device *dev, u64 addr,
+   size_t size, enum dma_data_direction dir)
+{
+}
+
+static void qib_

[PATCH v4 06/11] IB/qib: Add qib_diag.c

2010-05-18 Thread Ralph Campbell
creates the qib_diag.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_diag.c |  894 ++
 1 files changed, 894 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_diag.c

diff --git a/drivers/infiniband/hw/qib/qib_diag.c 
b/drivers/infiniband/hw/qib/qib_diag.c
new file mode 100644
index 000..ca98dd5
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_diag.c
@@ -0,0 +1,894 @@
+/*
+ * Copyright (c) 2010 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file contains support for diagnostic functions.  It is accessed by
+ * opening the qib_diag device, normally minor number 129.  Diagnostic use
+ * of the QLogic_IB chip may render the chip or board unusable until the
+ * driver is unloaded, or in some cases, until the system is rebooted.
+ *
+ * Accesses to the chip through this interface are not similar to going
+ * through the /sys/bus/pci resource mmap interface.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qib.h"
+#include "qib_common.h"
+
+/*
+ * Each client that opens the diag device must read then write
+ * offset 0, to prevent lossage from random cat or od. diag_state
+ * sequences this "handshake".
+ */
+enum diag_state { UNUSED = 0, OPENED, INIT, READY };
+
+/* State for an individual client. PID so children cannot abuse handshake */
+static struct qib_diag_client {
+   struct qib_diag_client *next;
+   struct qib_devdata *dd;
+   pid_t pid;
+   enum diag_state state;
+} *client_pool;
+
+/*
+ * Get a client struct. Recycled if possible, else kmalloc.
+ * Must be called with qib_mutex held
+ */
+static struct qib_diag_client *get_client(struct qib_devdata *dd)
+{
+   struct qib_diag_client *dc;
+
+   dc = client_pool;
+   if (dc)
+   /* got from pool remove it and use */
+   client_pool = dc->next;
+   else
+   /* None in pool, alloc and init */
+   dc = kmalloc(sizeof *dc, GFP_KERNEL);
+
+   if (dc) {
+   dc->next = NULL;
+   dc->dd = dd;
+   dc->pid = current->pid;
+   dc->state = OPENED;
+   }
+   return dc;
+}
+
+/*
+ * Return to pool. Must be called with qib_mutex held
+ */
+static void return_client(struct qib_diag_client *dc)
+{
+   struct qib_devdata *dd = dc->dd;
+   struct qib_diag_client *tdc, *rdc;
+
+   rdc = NULL;
+   if (dc == dd->diag_client) {
+   dd->diag_client = dc->next;
+   rdc = dc;
+   } else {
+   tdc = dc->dd->diag_client;
+   while (tdc) {
+   if (dc == tdc->next) {
+   tdc->next = dc->next;
+   rdc = dc;
+   break;
+   }
+   tdc = tdc->next;
+   }
+   }
+   if (rdc) {
+   rdc->state = UNUSED;
+   rdc->dd = NULL;
+   rdc->pid = 0;
+   rdc->next = client_pool;
+   client_pool = rdc;
+   }
+}
+
+static int qib_diag_open(struct inode *in, struct file *fp);
+static int qib_diag_release(struct inode *in, struct file *fp);
+static ssize_t qib_diag_read(struct file *fp, char __user *data,
+ 

[PATCH v4 05/11] IB/qib: Add qib_cq.c

2010-05-18 Thread Ralph Campbell
creates the qib_cq.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_cq.c |  483 
 1 files changed, 483 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_cq.c

diff --git a/drivers/infiniband/hw/qib/qib_cq.c 
b/drivers/infiniband/hw/qib/qib_cq.c
new file mode 100644
index 000..03fe674
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_cq.c
@@ -0,0 +1,483 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2010 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "qib_verbs.h"
+
+/**
+ * qib_cq_enter - add a new entry to the completion queue
+ * @cq: completion queue
+ * @entry: work completion entry to add
+ * @sig: true if @entry is a solicitated entry
+ *
+ * This may be called with qp->s_lock held.
+ */
+void qib_cq_enter(struct qib_cq *cq, struct ib_wc *entry, int solicited)
+{
+   struct qib_cq_wc *wc;
+   unsigned long flags;
+   u32 head;
+   u32 next;
+
+   spin_lock_irqsave(&cq->lock, flags);
+
+   /*
+* Note that the head pointer might be writable by user processes.
+* Take care to verify it is a sane value.
+*/
+   wc = cq->queue;
+   head = wc->head;
+   if (head >= (unsigned) cq->ibcq.cqe) {
+   head = cq->ibcq.cqe;
+   next = 0;
+   } else
+   next = head + 1;
+   if (unlikely(next == wc->tail)) {
+   spin_unlock_irqrestore(&cq->lock, flags);
+   if (cq->ibcq.event_handler) {
+   struct ib_event ev;
+
+   ev.device = cq->ibcq.device;
+   ev.element.cq = &cq->ibcq;
+   ev.event = IB_EVENT_CQ_ERR;
+   cq->ibcq.event_handler(&ev, cq->ibcq.cq_context);
+   }
+   return;
+   }
+   if (cq->ip) {
+   wc->uqueue[head].wr_id = entry->wr_id;
+   wc->uqueue[head].status = entry->status;
+   wc->uqueue[head].opcode = entry->opcode;
+   wc->uqueue[head].vendor_err = entry->vendor_err;
+   wc->uqueue[head].byte_len = entry->byte_len;
+   wc->uqueue[head].ex.imm_data =
+   (__u32 __force)entry->ex.imm_data;
+   wc->uqueue[head].qp_num = entry->qp->qp_num;
+   wc->uqueue[head].src_qp = entry->src_qp;
+   wc->uqueue[head].wc_flags = entry->wc_flags;
+   wc->uqueue[head].pkey_index = entry->pkey_index;
+   wc->uqueue[head].slid = entry->slid;
+   wc->uqueue[head].sl = entry->sl;
+   wc->uqueue[head].dlid_path_bits = entry->dlid_path_bits;
+   wc->uqueue[head].port_num = entry->port_num;
+   /* Make sure entry is written before the head index. */
+   smp_wmb();
+   } else
+   wc->kqueue[head] = *entry;
+   wc->head = next;
+
+   if (cq->notify == IB_CQ_NEXT_COMP ||
+   (cq->notify == IB_CQ_SOLICITED && solicited)) {
+   cq->notify = IB_CQ_NONE;
+   cq->triggered++;
+   /*
+* This will cause send_complete() to be called in
+* another thread.
+*/
+   queue_work(qib_cq_wq, &cq->comptask)

[PATCH v4 02/11] IB/qib: Add qib_7220.h

2010-05-18 Thread Ralph Campbell
creates the qib_7220.h file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_7220.h |  156 ++
 1 files changed, 156 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_7220.h

diff --git a/drivers/infiniband/hw/qib/qib_7220.h 
b/drivers/infiniband/hw/qib/qib_7220.h
new file mode 100644
index 000..ea0bfd8
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_7220.h
@@ -0,0 +1,156 @@
+#ifndef _QIB_7220_H
+#define _QIB_7220_H
+/*
+ * Copyright (c) 2007, 2009, 2010 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* grab register-defs auto-generated by HW */
+#include "qib_7220_regs.h"
+
+/* The number of eager receive TIDs for context zero. */
+#define IBA7220_KRCVEGRCNT  2048U
+
+#define IB_7220_LT_STATE_CFGRCVFCFG  0x09
+#define IB_7220_LT_STATE_CFGWAITRMT  0x0a
+#define IB_7220_LT_STATE_TXREVLANES  0x0d
+#define IB_7220_LT_STATE_CFGENH  0x10
+
+struct qib_chip_specific {
+   u64 __iomem *cregbase;
+   u64 *cntrs;
+   u64 *portcntrs;
+   spinlock_t sdepb_lock; /* serdes EPB bus */
+   spinlock_t rcvmod_lock; /* protect rcvctrl shadow changes */
+   spinlock_t gpio_lock; /* RMW of shadows/regs for ExtCtrl and GPIO */
+   u64 hwerrmask;
+   u64 errormask;
+   u64 gpio_out; /* shadow of kr_gpio_out, for rmw ops */
+   u64 gpio_mask; /* shadow the gpio mask register */
+   u64 extctrl; /* shadow the gpio output enable, etc... */
+   u32 ncntrs;
+   u32 nportcntrs;
+   u32 cntrnamelen;
+   u32 portcntrnamelen;
+   u32 numctxts;
+   u32 rcvegrcnt;
+   u32 autoneg_tries;
+   u32 serdes_first_init_done;
+   u32 sdmabufcnt;
+   u32 lastbuf_for_pio;
+   u32 updthresh; /* current AvailUpdThld */
+   u32 updthresh_dflt; /* default AvailUpdThld */
+   int irq;
+   u8 presets_needed;
+   u8 relock_timer_active;
+   char emsgbuf[128];
+   char sdmamsgbuf[192];
+   char bitsmsgbuf[64];
+   struct timer_list relock_timer;
+   unsigned int relock_interval; /* in jiffies */
+};
+
+struct qib_chippport_specific {
+   struct qib_pportdata pportdata;
+   wait_queue_head_t autoneg_wait;
+   struct delayed_work autoneg_work;
+   struct timer_list chase_timer;
+   /*
+* these 5 fields are used to establish deltas for IB symbol
+* errors and linkrecovery errors.  They can be reported on
+* some chips during link negotiation prior to INIT, and with
+* DDR when faking DDR negotiations with non-IBTA switches.
+* The chip counters are adjusted at driver unload if there is
+* a non-zero delta.
+*/
+   u64 ibdeltainprog;
+   u64 ibsymdelta;
+   u64 ibsymsnap;
+   u64 iblnkerrdelta;
+   u64 iblnkerrsnap;
+   u64 ibcctrl; /* kr_ibcctrl shadow */
+   u64 ibcddrctrl; /* kr_ibcddrctrl shadow */
+   u64 chase_end;
+   u32 last_delay_mult;
+};
+
+/*
+ * This header file provides the declarations and common definitions
+ * for (mostly) manipulation of the SerDes blocks within the IBA7220.
+ * the functions declared should only be called from within other
+ * 7220-related files such as qib_iba7220.c or qib_sd7220.c.
+ */
+int qib_sd7220_presets(struct qib_devdata *dd);
+int qib_sd7220_init(struct qib_devdata *dd);
+int qib_sd7220_prog_ld(struct qib_devdata *dd, int sdnum, u8 *img,
+  int len, int offset);
+int qib_sd7220_prog_vfy(struct qib_devdata *dd, int sdnum, const u8 *img,
+   int len

[PATCH v4 01/11] IB/qib: Add qib_6120_regs.h

2010-05-18 Thread Ralph Campbell
This creates the qib_6120_regs.h file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_6120_regs.h |  977 +
 1 files changed, 977 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_6120_regs.h

diff --git a/drivers/infiniband/hw/qib/qib_6120_regs.h 
b/drivers/infiniband/hw/qib/qib_6120_regs.h
new file mode 100644
index 000..e16cb6f
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_6120_regs.h
@@ -0,0 +1,977 @@
+/*
+ * Copyright (c) 2008, 2009, 2010 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/* This file is mechanically generated from RTL. Any hand-edits will be lost! 
*/
+
+#define QIB_6120_Revision_OFFS 0x0
+#define QIB_6120_Revision_R_Simulator_LSB 0x3F
+#define QIB_6120_Revision_R_Simulator_RMASK 0x1
+#define QIB_6120_Revision_Reserved_LSB 0x28
+#define QIB_6120_Revision_Reserved_RMASK 0x7F
+#define QIB_6120_Revision_BoardID_LSB 0x20
+#define QIB_6120_Revision_BoardID_RMASK 0xFF
+#define QIB_6120_Revision_R_SW_LSB 0x18
+#define QIB_6120_Revision_R_SW_RMASK 0xFF
+#define QIB_6120_Revision_R_Arch_LSB 0x10
+#define QIB_6120_Revision_R_Arch_RMASK 0xFF
+#define QIB_6120_Revision_R_ChipRevMajor_LSB 0x8
+#define QIB_6120_Revision_R_ChipRevMajor_RMASK 0xFF
+#define QIB_6120_Revision_R_ChipRevMinor_LSB 0x0
+#define QIB_6120_Revision_R_ChipRevMinor_RMASK 0xFF
+
+#define QIB_6120_Control_OFFS 0x8
+#define QIB_6120_Control_TxLatency_LSB 0x4
+#define QIB_6120_Control_TxLatency_RMASK 0x1
+#define QIB_6120_Control_PCIERetryBufDiagEn_LSB 0x3
+#define QIB_6120_Control_PCIERetryBufDiagEn_RMASK 0x1
+#define QIB_6120_Control_LinkEn_LSB 0x2
+#define QIB_6120_Control_LinkEn_RMASK 0x1
+#define QIB_6120_Control_FreezeMode_LSB 0x1
+#define QIB_6120_Control_FreezeMode_RMASK 0x1
+#define QIB_6120_Control_SyncReset_LSB 0x0
+#define QIB_6120_Control_SyncReset_RMASK 0x1
+
+#define QIB_6120_PageAlign_OFFS 0x10
+
+#define QIB_6120_PortCnt_OFFS 0x18
+
+#define QIB_6120_SendRegBase_OFFS 0x30
+
+#define QIB_6120_UserRegBase_OFFS 0x38
+
+#define QIB_6120_CntrRegBase_OFFS 0x40
+
+#define QIB_6120_Scratch_OFFS 0x48
+#define QIB_6120_Scratch_TopHalf_LSB 0x20
+#define QIB_6120_Scratch_TopHalf_RMASK 0x
+#define QIB_6120_Scratch_BottomHalf_LSB 0x0
+#define QIB_6120_Scratch_BottomHalf_RMASK 0x
+
+#define QIB_6120_IntBlocked_OFFS 0x60
+#define QIB_6120_IntBlocked_ErrorIntBlocked_LSB 0x1F
+#define QIB_6120_IntBlocked_ErrorIntBlocked_RMASK 0x1
+#define QIB_6120_IntBlocked_PioSetIntBlocked_LSB 0x1E
+#define QIB_6120_IntBlocked_PioSetIntBlocked_RMASK 0x1
+#define QIB_6120_IntBlocked_PioBufAvailIntBlocked_LSB 0x1D
+#define QIB_6120_IntBlocked_PioBufAvailIntBlocked_RMASK 0x1
+#define QIB_6120_IntBlocked_assertGPIOIntBlocked_LSB 0x1C
+#define QIB_6120_IntBlocked_assertGPIOIntBlocked_RMASK 0x1
+#define QIB_6120_IntBlocked_Reserved_LSB 0xF
+#define QIB_6120_IntBlocked_Reserved_RMASK 0x1FFF
+#define QIB_6120_IntBlocked_RcvAvail4IntBlocked_LSB 0x10
+#define QIB_6120_IntBlocked_RcvAvail4IntBlocked_RMASK 0x1
+#define QIB_6120_IntBlocked_RcvAvail3IntBlocked_LSB 0xF
+#define QIB_6120_IntBlocked_RcvAvail3IntBlocked_RMASK 0x1
+#define QIB_6120_IntBlocked_RcvAvail2IntBlocked_LSB 0xE
+#define QIB_6120_IntBlocked_RcvAvail2IntBlocked_RMASK 0x1
+#define QIB_6120_IntBlocked_RcvAvail1IntBlocked_LSB 0xD
+#define QIB_6120_IntBlocked_RcvAvail1IntBlocked_RMASK 0x1
+#define QIB_6120_IntBlocked_RcvAvail0IntBlocked_LSB 0xC
+#define QIB_6120_IntBlocked_RcvAvail0IntBlocked_RMASK 0x1
+#define QIB_6120_IntBlocked_Reserved1_LSB 0x5
+#define QIB_6120_IntBlocked_Reserved1_RMASK 0x7

[PATCH 0/52] IB/qib: add

2010-05-18 Thread Ralph Campbell
The following patches introduce an updated and renamed version of
the ipath HCA driver which supports the QLogic PCIe QLE SDR, DDR,
and QDR series of HCAs.
Rather than try to patch the ipath driver to include support for QDR,
multiple ports, bug fixes, and many other structual changes, the
ib_qib driver replaces the ib_ipath driver.

Changes in v4:

Change qib_sd7220.c to not use atomic_inc_return() - it wasn't needed.
Change qib_iba7322.c to replace the "bogus" QMH/QME module parameters.
Update copyrights to 2010.
Fix for CQ completion callbacks which could be out-of-order.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 52/52] IB/core: allow HCAs to create IB port sysfs files

2010-05-06 Thread Ralph Campbell
This patch adds a new parameter to ib_register_device() so that
HCAs can create files in /sys/class/infiniband//ports//.
There is no need for an unregister function since the kobject
reference will go to zero when ib_unregister_device() is called.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/core/core_priv.h  |2 +-
 drivers/infiniband/core/device.c |4 ++--
 drivers/infiniband/core/sysfs.c  |   24 +++-
 drivers/infiniband/hw/amso1100/c2_provider.c |2 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |2 +-
 drivers/infiniband/hw/ehca/ehca_main.c   |2 +-
 drivers/infiniband/hw/ipath/ipath_verbs.c|2 +-
 drivers/infiniband/hw/mlx4/main.c|2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c |2 +-
 drivers/infiniband/hw/nes/nes_verbs.c|2 +-
 include/rdma/ib_verbs.h  |5 -
 11 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h 
b/drivers/infiniband/core/core_priv.h
index 05ac36e..e70c809 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -38,7 +38,7 @@
 
 #include 
 
-int  ib_device_register_sysfs(struct ib_device *device);
+int  ib_device_register_sysfs(struct ib_device *device, sysfs_cb cb);
 void ib_device_unregister_sysfs(struct ib_device *device);
 
 int  ib_sysfs_setup(void);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index d1fba41..9ef8093 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -267,7 +267,7 @@ out:
  * callback for each device that is added. @device must be allocated
  * with ib_alloc_device().
  */
-int ib_register_device(struct ib_device *device)
+int ib_register_device(struct ib_device *device, sysfs_cb cb)
 {
int ret;
 
@@ -296,7 +296,7 @@ int ib_register_device(struct ib_device *device)
goto out;
}
 
-   ret = ib_device_register_sysfs(device);
+   ret = ib_device_register_sysfs(device, cb);
if (ret) {
printk(KERN_WARNING "Couldn't register device %s with driver 
model\n",
   device->name);
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index f901957..a47bac8 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -754,7 +754,23 @@ static struct attribute_group iw_stats_group = {
.attrs  = iw_proto_stats_attrs,
 };
 
-int ib_device_register_sysfs(struct ib_device *device)
+static int sysfs_create_port_files(struct ib_device *device, sysfs_cb cb)
+{
+   struct kobject *p;
+   struct ib_port *port;
+   int ret = 0;
+
+   list_for_each_entry(p, &device->port_list, entry) {
+   port = container_of(p, struct ib_port, kobj);
+   ret = cb(device, port->port_num, &port->kobj);
+   if (ret)
+   break;
+   }
+
+   return ret;
+}
+
+int ib_device_register_sysfs(struct ib_device *device, sysfs_cb cb)
 {
struct device *class_dev = &device->dev;
int ret;
@@ -802,6 +818,12 @@ int ib_device_register_sysfs(struct ib_device *device)
goto err_put;
}
 
+   if (cb) {
+   ret = sysfs_create_port_files(device, cb);
+   if (ret)
+   goto err_put;
+   }
+
return 0;
 
 err_put:
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c 
b/drivers/infiniband/hw/amso1100/c2_provider.c
index ad723bd..d128386 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -864,7 +864,7 @@ int c2_register_device(struct c2_dev *dev)
dev->ibdev.iwcm->create_listen = c2_service_create;
dev->ibdev.iwcm->destroy_listen = c2_service_destroy;
 
-   ret = ib_register_device(&dev->ibdev);
+   ret = ib_register_device(&dev->ibdev, NULL);
if (ret)
goto out_free_iwcm;
 
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c 
b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 47b35c6..26f22ec 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1427,7 +1427,7 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.iwcm->rem_ref = iwch_qp_rem_ref;
dev->ibdev.iwcm->get_qp = iwch_get_qp;
 
-   ret = ib_register_device(&dev->ibdev);
+   ret = ib_register_device(&dev->ibdev, NULL);
if (ret)
goto bail1;
 
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c 
b/drivers/infiniband/hw/ehca/ehca_main.c
index 129a6be..d1a9278 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -798,7 +798,7 @@ static int __devinit ehca_probe(struct of_device *d

[PATCH v3 50/52] IB/qib: Hooks for adding the QIB driver into the framework

2010-05-06 Thread Ralph Campbell
Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/Kconfig  |1 +
 drivers/infiniband/Makefile |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 975adce..3a04822 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -43,6 +43,7 @@ config INFINIBAND_ADDR_TRANS
 
 source "drivers/infiniband/hw/mthca/Kconfig"
 source "drivers/infiniband/hw/ipath/Kconfig"
+source "drivers/infiniband/hw/qib/Kconfig"
 source "drivers/infiniband/hw/ehca/Kconfig"
 source "drivers/infiniband/hw/amso1100/Kconfig"
 source "drivers/infiniband/hw/cxgb3/Kconfig"
diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index ed35e44..27b66b1 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_INFINIBAND)   += core/
 obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mthca/
 obj-$(CONFIG_INFINIBAND_IPATH) += hw/ipath/
+obj-$(CONFIG_INFINIBAND_QIB)   += hw/qib/
 obj-$(CONFIG_INFINIBAND_EHCA)  += hw/ehca/
 obj-$(CONFIG_INFINIBAND_AMSO1100)  += hw/amso1100/
 obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 49/52] IB/qib: Add qib_wc_x86_64.c

2010-05-06 Thread Ralph Campbell
creates the qib_wc_x86_64.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_wc_x86_64.c |  171 +
 1 files changed, 171 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_wc_x86_64.c

diff --git a/drivers/infiniband/hw/qib/qib_wc_x86_64.c 
b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
new file mode 100644
index 000..561b8bc
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
@@ -0,0 +1,171 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file is conditionally built on x86_64 only.  Otherwise weak symbol
+ * versions of the functions exported from here are used.
+ */
+
+#include 
+#include 
+#include 
+
+#include "qib.h"
+
+/**
+ * qib_enable_wc - enable write combining for MMIO writes to the device
+ * @dd: qlogic_ib device
+ *
+ * This routine is x86_64-specific; it twiddles the CPU's MTRRs to enable
+ * write combining.
+ */
+int qib_enable_wc(struct qib_devdata *dd)
+{
+   int ret = 0;
+   u64 pioaddr, piolen;
+   unsigned bits;
+   const unsigned long addr = pci_resource_start(dd->pcidev, 0);
+   const size_t len = pci_resource_len(dd->pcidev, 0);
+
+   /*
+* Set the PIO buffers to be WCCOMB, so we get HT bursts to the
+* chip.  Linux (possibly the hardware) requires it to be on a power
+* of 2 address matching the length (which has to be a power of 2).
+* For rev1, that means the base address, for rev2, it will be just
+* the PIO buffers themselves.
+* For chips with two sets of buffers, the calculations are
+* somewhat more complicated; we need to sum, and the piobufbase
+* register has both offsets, 2K in low 32 bits, 4K in high 32 bits.
+* The buffers are still packed, so a single range covers both.
+*/
+   if (dd->piobcnt2k && dd->piobcnt4k) {
+   /* 2 sizes for chip */
+   unsigned long pio2kbase, pio4kbase;
+   pio2kbase = dd->piobufbase & 0xUL;
+   pio4kbase = (dd->piobufbase >> 32) & 0xUL;
+   if (pio2kbase < pio4kbase) {
+   /* all current chips */
+   pioaddr = addr + pio2kbase;
+   piolen = pio4kbase - pio2kbase +
+   dd->piobcnt4k * dd->align4k;
+   } else {
+   pioaddr = addr + pio4kbase;
+   piolen = pio2kbase - pio4kbase +
+   dd->piobcnt2k * dd->palign;
+   }
+   } else {  /* single buffer size (2K, currently) */
+   pioaddr = addr + dd->piobufbase;
+   piolen = dd->piobcnt2k * dd->palign +
+   dd->piobcnt4k * dd->align4k;
+   }
+
+   for (bits = 0; !(piolen & (1ULL << bits)); bits++)
+   /* do nothing */ ;
+
+   if (piolen != (1ULL << bits)) {
+   piolen >>= bits;
+   while (piolen >>= 1)
+   bits++;
+   piolen = 1ULL << (bits + 1);
+   }
+   if (pioaddr & (piolen - 1)) {
+   u64 atmp;
+   atmp = pioaddr & ~(piolen - 1);
+   if (atmp < addr || (atmp + piolen) > (addr + len)) {
+   qib_dev_err(dd, "No 

[PATCH v3 48/52] IB/qib: Add qib_wc_ppc64.c

2010-05-06 Thread Ralph Campbell
creates the qib_wc_ppc64.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_wc_ppc64.c |   62 ++
 1 files changed, 62 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_wc_ppc64.c

diff --git a/drivers/infiniband/hw/qib/qib_wc_ppc64.c 
b/drivers/infiniband/hw/qib/qib_wc_ppc64.c
new file mode 100644
index 000..673cf4c
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_wc_ppc64.c
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file is conditionally built on PowerPC only.  Otherwise weak symbol
+ * versions of the functions exported from here are used.
+ */
+
+#include "qib.h"
+
+/**
+ * qib_enable_wc - enable write combining for MMIO writes to the device
+ * @dd: qlogic_ib device
+ *
+ * Nothing to do on PowerPC, so just return without error.
+ */
+int qib_enable_wc(struct qib_devdata *dd)
+{
+   return 0;
+}
+
+/**
+ * qib_unordered_wc - indicate whether write combining is unordered
+ *
+ * Because our performance depends on our ability to do write
+ * combining mmio writes in the most efficient way, we need to
+ * know if we are on a processor that may reorder stores when
+ * write combining.
+ */
+int qib_unordered_wc(void)
+{
+   return 1;
+}

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 47/52] IB/qib: Add qib_verbs_mcast.c

2010-05-06 Thread Ralph Campbell
creates the qib_verbs_mcast.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_verbs_mcast.c |  368 +++
 1 files changed, 368 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_verbs_mcast.c

diff --git a/drivers/infiniband/hw/qib/qib_verbs_mcast.c 
b/drivers/infiniband/hw/qib/qib_verbs_mcast.c
new file mode 100644
index 000..dabb697
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_verbs_mcast.c
@@ -0,0 +1,368 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+
+#include "qib.h"
+
+/**
+ * qib_mcast_qp_alloc - alloc a struct to link a QP to mcast GID struct
+ * @qp: the QP to link
+ */
+static struct qib_mcast_qp *qib_mcast_qp_alloc(struct qib_qp *qp)
+{
+   struct qib_mcast_qp *mqp;
+
+   mqp = kmalloc(sizeof *mqp, GFP_KERNEL);
+   if (!mqp)
+   goto bail;
+
+   mqp->qp = qp;
+   atomic_inc(&qp->refcount);
+
+bail:
+   return mqp;
+}
+
+static void qib_mcast_qp_free(struct qib_mcast_qp *mqp)
+{
+   struct qib_qp *qp = mqp->qp;
+
+   /* Notify qib_destroy_qp() if it is waiting. */
+   if (atomic_dec_and_test(&qp->refcount))
+   wake_up(&qp->wait);
+
+   kfree(mqp);
+}
+
+/**
+ * qib_mcast_alloc - allocate the multicast GID structure
+ * @mgid: the multicast GID
+ *
+ * A list of QPs will be attached to this structure.
+ */
+static struct qib_mcast *qib_mcast_alloc(union ib_gid *mgid)
+{
+   struct qib_mcast *mcast;
+
+   mcast = kmalloc(sizeof *mcast, GFP_KERNEL);
+   if (!mcast)
+   goto bail;
+
+   mcast->mgid = *mgid;
+   INIT_LIST_HEAD(&mcast->qp_list);
+   init_waitqueue_head(&mcast->wait);
+   atomic_set(&mcast->refcount, 0);
+   mcast->n_attached = 0;
+
+bail:
+   return mcast;
+}
+
+static void qib_mcast_free(struct qib_mcast *mcast)
+{
+   struct qib_mcast_qp *p, *tmp;
+
+   list_for_each_entry_safe(p, tmp, &mcast->qp_list, list)
+   qib_mcast_qp_free(p);
+
+   kfree(mcast);
+}
+
+/**
+ * qib_mcast_find - search the global table for the given multicast GID
+ * @ibp: the IB port structure
+ * @mgid: the multicast GID to search for
+ *
+ * Returns NULL if not found.
+ *
+ * The caller is responsible for decrementing the reference count if found.
+ */
+struct qib_mcast *qib_mcast_find(struct qib_ibport *ibp, union ib_gid *mgid)
+{
+   struct rb_node *n;
+   unsigned long flags;
+   struct qib_mcast *mcast;
+
+   spin_lock_irqsave(&ibp->lock, flags);
+   n = ibp->mcast_tree.rb_node;
+   while (n) {
+   int ret;
+
+   mcast = rb_entry(n, struct qib_mcast, rb_node);
+
+   ret = memcmp(mgid->raw, mcast->mgid.raw,
+sizeof(union ib_gid));
+   if (ret < 0)
+   n = n->rb_left;
+   else if (ret > 0)
+   n = n->rb_right;
+   else {
+   atomic_inc(&mcast->refcount);
+   spin_unlock_irqrestore(&ibp->lock, flags);
+   goto bail;
+   }
+   }
+   spin_unlock_irqrestore(&ibp->lock, flags);
+
+   mcast = NULL;
+
+bail:
+   return mcast;
+}
+
+/**
+ * qib_mcast_add - insert mcast GID into table and attach QP struct
+ * @mcast: the mcast GID t

[PATCH v3 46/52] IB/qib: Add qib_verbs.h

2010-05-06 Thread Ralph Campbell
creates the qib_verbs.h file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_verbs.h | 1099 +
 1 files changed, 1099 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_verbs.h

diff --git a/drivers/infiniband/hw/qib/qib_verbs.h 
b/drivers/infiniband/hw/qib/qib_verbs.h
new file mode 100644
index 000..fd972e0
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -0,0 +1,1099 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009, 2010 QLogic Corporation.
+ * All rights reserved.
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef QIB_VERBS_H
+#define QIB_VERBS_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct qib_ctxtdata;
+struct qib_pportdata;
+struct qib_devdata;
+struct qib_verbs_txreq;
+
+#define QIB_MAX_RDMA_ATOMIC 16
+#define QIB_GUIDS_PER_PORT 5
+
+#define QPN_MAX (1 << 24)
+#define QPNMAP_ENTRIES  (QPN_MAX / PAGE_SIZE / BITS_PER_BYTE)
+
+/*
+ * Increment this value if any changes that break userspace ABI
+ * compatibility are made.
+ */
+#define QIB_UVERBS_ABI_VERSION   2
+
+/*
+ * Define an ib_cq_notify value that is not valid so we know when CQ
+ * notifications are armed.
+ */
+#define IB_CQ_NONE  (IB_CQ_NEXT_COMP + 1)
+
+#define IB_SEQ_NAK (3 << 29)
+
+/* AETH NAK opcode values */
+#define IB_RNR_NAK  0x20
+#define IB_NAK_PSN_ERROR0x60
+#define IB_NAK_INVALID_REQUEST  0x61
+#define IB_NAK_REMOTE_ACCESS_ERROR  0x62
+#define IB_NAK_REMOTE_OPERATIONAL_ERROR 0x63
+#define IB_NAK_INVALID_RD_REQUEST   0x64
+
+/* Flags for checking QP state (see ib_qib_state_ops[]) */
+#define QIB_POST_SEND_OK0x01
+#define QIB_POST_RECV_OK0x02
+#define QIB_PROCESS_RECV_OK 0x04
+#define QIB_PROCESS_SEND_OK 0x08
+#define QIB_PROCESS_NEXT_SEND_OK0x10
+#define QIB_FLUSH_SEND 0x20
+#define QIB_FLUSH_RECV 0x40
+#define QIB_PROCESS_OR_FLUSH_SEND \
+   (QIB_PROCESS_SEND_OK | QIB_FLUSH_SEND)
+
+/* IB Performance Manager status values */
+#define IB_PMA_SAMPLE_STATUS_DONE   0x00
+#define IB_PMA_SAMPLE_STATUS_STARTED0x01
+#define IB_PMA_SAMPLE_STATUS_RUNNING0x02
+
+/* Mandatory IB performance counter select values. */
+#define IB_PMA_PORT_XMIT_DATA   cpu_to_be16(0x0001)
+#define IB_PMA_PORT_RCV_DATAcpu_to_be16(0x0002)
+#define IB_PMA_PORT_XMIT_PKTS   cpu_to_be16(0x0003)
+#define IB_PMA_PORT_RCV_PKTScpu_to_be16(0x0004)
+#define IB_PMA_PORT_XMIT_WAIT   cpu_to_be16(0x0005)
+
+#define QIB_VENDOR_IPG cpu_to_be16(0xFFA0)
+
+#define IB_BTH_REQ_ACK (1 << 31)
+#define IB_BTH_SOLICITED   (1 << 23)
+#define IB_BTH_MIG_REQ (1 << 22)
+
+/* XXX Should be defined in ib_verbs.h enum ib_port_cap_flags */
+#define IB_PORT_OTHER_LOCAL_CHANGES_SUP (1 << 26)
+
+#define IB_GRH_VERSION 6
+#define IB_GRH_VERSION_MASK0xF
+#define IB_GRH_VERSION_SHIFT   28
+#define IB_GRH_TCLASS_MASK 0xFF
+#define IB_GRH_TCLASS_SHIFT20
+#define IB_GRH_FLOW_MASK   0xF
+#define IB_GRH_FLOW_SHIFT  0
+#define IB_GRH_NEXT_HDR0x1B
+
+#define IB_DEFAULT_GID_PREFIX  cpu_to_be64(0xfe80ULL)
+
+/* Values for set/get portinfo VLCap OperationalVLs */
+#define IB_VL_VL0   1
+#define IB_VL_VL0_1 2
+#define IB_VL_VL0_3 3
+#define IB_VL_VL0_7 4
+#define IB_VL_VL0_145
+
+static inline i

[PATCH v3 44/52] IB/qib: Add qib_user_sdma.h

2010-05-06 Thread Ralph Campbell
creates the qib_user_sdma.h file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_user_sdma.h |   52 +
 1 files changed, 52 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_user_sdma.h

diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.h 
b/drivers/infiniband/hw/qib/qib_user_sdma.h
new file mode 100644
index 000..ce8cbaf
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_user_sdma.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include 
+
+struct qib_user_sdma_queue;
+
+struct qib_user_sdma_queue *
+qib_user_sdma_queue_create(struct device *dev, int unit, int port, int sport);
+void qib_user_sdma_queue_destroy(struct qib_user_sdma_queue *pq);
+
+int qib_user_sdma_writev(struct qib_ctxtdata *pd,
+struct qib_user_sdma_queue *pq,
+const struct iovec *iov,
+unsigned long dim);
+
+int qib_user_sdma_make_progress(struct qib_pportdata *ppd,
+   struct qib_user_sdma_queue *pq);
+
+void qib_user_sdma_queue_drain(struct qib_pportdata *ppd,
+  struct qib_user_sdma_queue *pq);
+
+u32 qib_user_sdma_complete_counter(const struct qib_user_sdma_queue *pq);
+u32 qib_user_sdma_inflight_counter(struct qib_user_sdma_queue *pq);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 43/52] IB/qib: Add qib_user_sdma.c

2010-05-06 Thread Ralph Campbell
creates the qib_user_sdma.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_user_sdma.c |  897 +
 1 files changed, 897 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_user_sdma.c

diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c 
b/drivers/infiniband/hw/qib/qib_user_sdma.c
new file mode 100644
index 000..4c19e06
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -0,0 +1,897 @@
+/*
+ * Copyright (c) 2007, 2008, 2009 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qib.h"
+#include "qib_user_sdma.h"
+
+/* minimum size of header */
+#define QIB_USER_SDMA_MIN_HEADER_LENGTH 64
+/* expected size of headers (for dma_pool) */
+#define QIB_USER_SDMA_EXP_HEADER_LENGTH 64
+/* attempt to drain the queue for 5secs */
+#define QIB_USER_SDMA_DRAIN_TIMEOUT 500
+
+struct qib_user_sdma_pkt {
+   u8 naddr;   /* dimension of addr (1..3) ... */
+   u32 counter;/* sdma pkts queued counter for this entry */
+   u64 added;  /* global descq number of entries */
+
+   struct {
+   u32 offset; /* offset for kvaddr, addr */
+   u32 length; /* length in page */
+   u8  put_page;   /* should we put_page? */
+   u8  dma_mapped; /* is page dma_mapped? */
+   struct page *page;  /* may be NULL (coherent mem) */
+   void *kvaddr;   /* FIXME: only for pio hack */
+   dma_addr_t addr;
+   } addr[4];   /* max pages, any more and we coalesce */
+   struct list_head list;  /* list element */
+};
+
+struct qib_user_sdma_queue {
+   /*
+* pkts sent to dma engine are queued on this
+* list head.  the type of the elements of this
+* list are struct qib_user_sdma_pkt...
+*/
+   struct list_head sent;
+
+   /* headers with expected length are allocated from here... */
+   char header_cache_name[64];
+   struct dma_pool *header_cache;
+
+   /* packets are allocated from the slab cache... */
+   char pkt_slab_name[64];
+   struct kmem_cache *pkt_slab;
+
+   /* as packets go on the queued queue, they are counted... */
+   u32 counter;
+   u32 sent_counter;
+
+   /* dma page table */
+   struct rb_root dma_pages_root;
+
+   /* protect everything above... */
+   struct mutex lock;
+};
+
+struct qib_user_sdma_queue *
+qib_user_sdma_queue_create(struct device *dev, int unit, int ctxt, int sctxt)
+{
+   struct qib_user_sdma_queue *pq =
+   kmalloc(sizeof(struct qib_user_sdma_queue), GFP_KERNEL);
+
+   if (!pq)
+   goto done;
+
+   pq->counter = 0;
+   pq->sent_counter = 0;
+   INIT_LIST_HEAD(&pq->sent);
+
+   mutex_init(&pq->lock);
+
+   snprintf(pq->pkt_slab_name, sizeof(pq->pkt_slab_name),
+"qib-user-sdma-pkts-%u-%02u.%02u", unit, ctxt, sctxt);
+   pq->pkt_slab = kmem_cache_create(pq->pkt_slab_name,
+sizeof(struct qib_user_sdma_pkt),
+0, 0, NULL);
+
+   if (!pq->pkt_slab)
+   goto err_kfree;
+
+   snprintf(pq->header_cache_name, sizeof(pq->header_cache_name),
+   

[PATCH v3 42/52] IB/qib: Add qib_user_pages.c

2010-05-06 Thread Ralph Campbell
creates the qib_user_pages.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_user_pages.c |  157 
 1 files changed, 157 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_user_pages.c

diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c 
b/drivers/infiniband/hw/qib/qib_user_pages.c
new file mode 100644
index 000..d7a26c1
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -0,0 +1,157 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "qib.h"
+
+static void __qib_release_user_pages(struct page **p, size_t num_pages,
+int dirty)
+{
+   size_t i;
+
+   for (i = 0; i < num_pages; i++) {
+   if (dirty)
+   set_page_dirty_lock(p[i]);
+   put_page(p[i]);
+   }
+}
+
+/*
+ * Call with current->mm->mmap_sem held.
+ */
+static int __get_user_pages(unsigned long start_page, size_t num_pages,
+   struct page **p, struct vm_area_struct **vma)
+{
+   unsigned long lock_limit;
+   size_t got;
+   int ret;
+
+   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+   if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
+   ret = -ENOMEM;
+   goto bail;
+   }
+
+   for (got = 0; got < num_pages; got += ret) {
+   ret = get_user_pages(current, current->mm,
+start_page + got * PAGE_SIZE,
+num_pages - got, 1, 1,
+p + got, vma);
+   if (ret < 0)
+   goto bail_release;
+   }
+
+   current->mm->locked_vm += num_pages;
+
+   ret = 0;
+   goto bail;
+
+bail_release:
+   __qib_release_user_pages(p, got, 0);
+bail:
+   return ret;
+}
+
+/**
+ * qib_map_page - a safety wrapper around pci_map_page()
+ *
+ * A dma_addr of all 0's is interpreted by the chip as "disabled".
+ * Unfortunately, it can also be a valid dma_addr returned on some
+ * architectures.
+ *
+ * The powerpc iommu assigns dma_addrs in ascending order, so we don't
+ * have to bother with retries or mapping a dummy page to insure we
+ * don't just get the same mapping again.
+ *
+ * I'm sure we won't be so lucky with other iommu's, so FIXME.
+ */
+dma_addr_t qib_map_page(struct pci_dev *hwdev, struct page *page,
+   unsigned long offset, size_t size, int direction)
+{
+   dma_addr_t phys;
+
+   phys = pci_map_page(hwdev, page, offset, size, direction);
+
+   if (phys == 0) {
+   pci_unmap_page(hwdev, phys, size, direction);
+   phys = pci_map_page(hwdev, page, offset, size, direction);
+   /*
+* FIXME: If we get 0 again, we should keep this page,
+* map another, then free the 0 page.
+*/
+   }
+
+   return phys;
+}
+
+/**
+ * qib_get_user_pages - lock user pages into memory
+ * @start_page: the start page
+ * @num_pages: the number of pages
+ * @p: the output page structures
+ *
+ * This function takes a given start page (page aligned user virtual
+ * address) and pins it and the following specified number of pages.  For
+ * now, num_pages is always 1, but that will probably change at some

[PATCH v3 41/52] IB/qib: Add qib_ud.c

2010-05-06 Thread Ralph Campbell
creates the qib_ud.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_ud.c |  607 
 1 files changed, 607 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_ud.c

diff --git a/drivers/infiniband/hw/qib/qib_ud.c 
b/drivers/infiniband/hw/qib/qib_ud.c
new file mode 100644
index 000..c838cda
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_ud.c
@@ -0,0 +1,607 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+
+#include "qib.h"
+#include "qib_mad.h"
+
+/**
+ * qib_ud_loopback - handle send on loopback QPs
+ * @sqp: the sending QP
+ * @swqe: the send work request
+ *
+ * This is called from qib_make_ud_req() to forward a WQE addressed
+ * to the same HCA.
+ * Note that the receive interrupt handler may be calling qib_ud_rcv()
+ * while this is being called.
+ */
+static void qib_ud_loopback(struct qib_qp *sqp, struct qib_swqe *swqe)
+{
+   struct qib_ibport *ibp = to_iport(sqp->ibqp.device, sqp->port_num);
+   struct qib_pportdata *ppd;
+   struct qib_qp *qp;
+   struct ib_ah_attr *ah_attr;
+   unsigned long flags;
+   struct qib_sge_state ssge;
+   struct qib_sge *sge;
+   struct ib_wc wc;
+   u32 length;
+
+   qp = qib_lookup_qpn(ibp, swqe->wr.wr.ud.remote_qpn);
+   if (!qp) {
+   ibp->n_pkt_drops++;
+   return;
+   }
+   if (qp->ibqp.qp_type != sqp->ibqp.qp_type ||
+   !(ib_qib_state_ops[qp->state] & QIB_PROCESS_RECV_OK)) {
+   ibp->n_pkt_drops++;
+   goto drop;
+   }
+
+   ah_attr = &to_iah(swqe->wr.wr.ud.ah)->attr;
+   ppd = ppd_from_ibp(ibp);
+
+   if (qp->ibqp.qp_num > 1) {
+   u16 pkey1;
+   u16 pkey2;
+   u16 lid;
+
+   pkey1 = qib_get_pkey(ibp, sqp->s_pkey_index);
+   pkey2 = qib_get_pkey(ibp, qp->s_pkey_index);
+   if (unlikely(!qib_pkey_ok(pkey1, pkey2))) {
+   lid = ppd->lid | (ah_attr->src_path_bits &
+ ((1 << ppd->lmc) - 1));
+   qib_bad_pqkey(ibp, IB_NOTICE_TRAP_BAD_PKEY, pkey1,
+ ah_attr->sl,
+ sqp->ibqp.qp_num, qp->ibqp.qp_num,
+ cpu_to_be16(lid),
+ cpu_to_be16(ah_attr->dlid));
+   goto drop;
+   }
+   }
+
+   /*
+* Check that the qkey matches (except for QP0, see 9.6.1.4.1).
+* Qkeys with the high order bit set mean use the
+* qkey from the QP context instead of the WR (see 10.2.5).
+*/
+   if (qp->ibqp.qp_num) {
+   u32 qkey;
+
+   qkey = (int)swqe->wr.wr.ud.remote_qkey < 0 ?
+   sqp->qkey : swqe->wr.wr.ud.remote_qkey;
+   if (unlikely(qkey != qp->qkey)) {
+   u16 lid;
+
+   lid = ppd->lid | (ah_attr->src_path_bits &
+ ((1 << ppd->lmc) - 1));
+   qib_bad_pqkey(ibp, IB_NOTICE_TRAP_BAD_QKEY, qkey,
+ ah_attr->sl,
+ sqp->ibqp.qp_num, qp->ibqp.qp_num,

[PATCH v3 40/52] IB/qib: Add qib_uc.c

2010-05-06 Thread Ralph Campbell
creates the qib_uc.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_uc.c |  555 
 1 files changed, 555 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_uc.c

diff --git a/drivers/infiniband/hw/qib/qib_uc.c 
b/drivers/infiniband/hw/qib/qib_uc.c
new file mode 100644
index 000..6c7fe78
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_uc.c
@@ -0,0 +1,555 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009, 2010 QLogic Corporation.
+ * All rights reserved.
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "qib.h"
+
+/* cut down ridiculously long IB macro names */
+#define OP(x) IB_OPCODE_UC_##x
+
+/**
+ * qib_make_uc_req - construct a request packet (SEND, RDMA write)
+ * @qp: a pointer to the QP
+ *
+ * Return 1 if constructed; otherwise, return 0.
+ */
+int qib_make_uc_req(struct qib_qp *qp)
+{
+   struct qib_other_headers *ohdr;
+   struct qib_swqe *wqe;
+   unsigned long flags;
+   u32 hwords;
+   u32 bth0;
+   u32 len;
+   u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);
+   int ret = 0;
+
+   spin_lock_irqsave(&qp->s_lock, flags);
+
+   if (!(ib_qib_state_ops[qp->state] & QIB_PROCESS_SEND_OK)) {
+   if (!(ib_qib_state_ops[qp->state] & QIB_FLUSH_SEND))
+   goto bail;
+   /* We are in the error state, flush the work request. */
+   if (qp->s_last == qp->s_head)
+   goto bail;
+   /* If DMAs are in progress, we can't flush immediately. */
+   if (atomic_read(&qp->s_dma_busy)) {
+   qp->s_flags |= QIB_S_WAIT_DMA;
+   goto bail;
+   }
+   wqe = get_swqe_ptr(qp, qp->s_last);
+   qib_send_complete(qp, wqe, IB_WC_WR_FLUSH_ERR);
+   goto done;
+   }
+
+   ohdr = &qp->s_hdr.u.oth;
+   if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)
+   ohdr = &qp->s_hdr.u.l.oth;
+
+   /* header size in 32-bit words LRH+BTH = (8+12)/4. */
+   hwords = 5;
+   bth0 = 0;
+
+   /* Get the next send request. */
+   wqe = get_swqe_ptr(qp, qp->s_cur);
+   qp->s_wqe = NULL;
+   switch (qp->s_state) {
+   default:
+   if (!(ib_qib_state_ops[qp->state] &
+   QIB_PROCESS_NEXT_SEND_OK))
+   goto bail;
+   /* Check if send work queue is empty. */
+   if (qp->s_cur == qp->s_head)
+   goto bail;
+   /*
+* Start a new request.
+*/
+   wqe->psn = qp->s_next_psn;
+   qp->s_psn = qp->s_next_psn;
+   qp->s_sge.sge = wqe->sg_list[0];
+   qp->s_sge.sg_list = wqe->sg_list + 1;
+   qp->s_sge.num_sge = wqe->wr.num_sge;
+   qp->s_sge.total_len = wqe->length;
+   len = wqe->length;
+   qp->s_len = len;
+   switch (wqe->wr.opcode) {
+   case IB_WR_SEND:
+   case IB_WR_SEND_WITH_IMM:
+   if (len > pmtu) {
+   qp->s_state = OP(SEND_FIRST);
+   len = pmtu;
+   break;
+   }
+   if (wqe->wr.opcode == IB_WR_SEND)
+  

[PATCH v3 39/52] IB/qib: Add qib_tx.c

2010-05-06 Thread Ralph Campbell
creates the qib_tx.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_tx.c |  557 
 1 files changed, 557 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_tx.c

diff --git a/drivers/infiniband/hw/qib/qib_tx.c 
b/drivers/infiniband/hw/qib/qib_tx.c
new file mode 100644
index 000..f7eb1dd
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_tx.c
@@ -0,0 +1,557 @@
+/*
+ * Copyright (c) 2008, 2009, 2010 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qib.h"
+
+static unsigned qib_hol_timeout_ms = 3000;
+module_param_named(hol_timeout_ms, qib_hol_timeout_ms, uint, S_IRUGO);
+MODULE_PARM_DESC(hol_timeout_ms,
+"duration of user app suspension after link failure");
+
+unsigned qib_sdma_fetch_arb = 1;
+module_param_named(fetch_arb, qib_sdma_fetch_arb, uint, S_IRUGO);
+MODULE_PARM_DESC(fetch_arb, "IBA7220: change SDMA descriptor arbitration");
+
+/**
+ * qib_disarm_piobufs - cancel a range of PIO buffers
+ * @dd: the qlogic_ib device
+ * @first: the first PIO buffer to cancel
+ * @cnt: the number of PIO buffers to cancel
+ *
+ * Cancel a range of PIO buffers. Used at user process close,
+ * in case it died while writing to a PIO buffer.
+ */
+void qib_disarm_piobufs(struct qib_devdata *dd, unsigned first, unsigned cnt)
+{
+   unsigned long flags;
+   unsigned i;
+   unsigned last;
+
+   last = first + cnt;
+   spin_lock_irqsave(&dd->pioavail_lock, flags);
+   for (i = first; i < last; i++) {
+   __clear_bit(i, dd->pio_need_disarm);
+   dd->f_sendctrl(dd->pport, QIB_SENDCTRL_DISARM_BUF(i));
+   }
+   spin_unlock_irqrestore(&dd->pioavail_lock, flags);
+}
+
+/*
+ * This is called by a user process when it sees the DISARM_BUFS event
+ * bit is set.
+ */
+int qib_disarm_piobufs_ifneeded(struct qib_ctxtdata *rcd)
+{
+   struct qib_devdata *dd = rcd->dd;
+   unsigned i;
+   unsigned last;
+   unsigned n = 0;
+
+   last = rcd->pio_base + rcd->piocnt;
+   /*
+* Don't need uctxt_lock here, since user has called in to us.
+* Clear at start in case more interrupts set bits while we
+* are disarming
+*/
+   if (rcd->user_event_mask) {
+   /*
+* subctxt_cnt is 0 if not shared, so do base
+* separately, first, then remaining subctxt, if any
+*/
+   clear_bit(_QIB_EVENT_DISARM_BUFS_BIT, &rcd->user_event_mask[0]);
+   for (i = 1; i < rcd->subctxt_cnt; i++)
+   clear_bit(_QIB_EVENT_DISARM_BUFS_BIT,
+ &rcd->user_event_mask[i]);
+   }
+   spin_lock_irq(&dd->pioavail_lock);
+   for (i = rcd->pio_base; i < last; i++) {
+   if (__test_and_clear_bit(i, dd->pio_need_disarm)) {
+   n++;
+   dd->f_sendctrl(rcd->ppd, QIB_SENDCTRL_DISARM_BUF(i));
+   }
+   }
+   spin_unlock_irq(&dd->pioavail_lock);
+   return 0;
+}
+
+static struct qib_pportdata *is_sdma_buf(struct qib_devdata *dd, unsigned i)
+{
+   struct qib_pportdata *ppd;
+   unsigned pidx;
+
+   for (pidx = 0; pidx < dd->num_pports; pidx++) {
+   ppd = dd->pport + pidx;
+   if (i >= ppd->sdma_state.first_sendbuf &&
+   

[PATCH v3 38/52] IB/qib: Add qib_twsi.c

2010-05-06 Thread Ralph Campbell
creates the qib_twsi.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_twsi.c |  498 ++
 1 files changed, 498 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_twsi.c

diff --git a/drivers/infiniband/hw/qib/qib_twsi.c 
b/drivers/infiniband/hw/qib/qib_twsi.c
new file mode 100644
index 000..6f31ca5
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_twsi.c
@@ -0,0 +1,498 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "qib.h"
+
+/*
+ * QLogic_IB "Two Wire Serial Interface" driver.
+ * Originally written for a not-quite-i2c serial eeprom, which is
+ * still used on some supported boards. Later boards have added a
+ * variety of other uses, most board-specific, so teh bit-boffing
+ * part has been split off to this file, while the other parts
+ * have been moved to chip-specific files.
+ *
+ * We have also dropped all pretense of fully generic (e.g. pretend
+ * we don't know whether '1' is the higher voltage) interface, as
+ * the restrictions of the generic i2c interface (e.g. no access from
+ * driver itself) make it unsuitable for this use.
+ */
+
+#define READ_CMD 1
+#define WRITE_CMD 0
+
+/**
+ * i2c_wait_for_writes - wait for a write
+ * @dd: the qlogic_ib device
+ *
+ * We use this instead of udelay directly, so we can make sure
+ * that previous register writes have been flushed all the way
+ * to the chip.  Since we are delaying anyway, the cost doesn't
+ * hurt, and makes the bit twiddling more regular
+ */
+static void i2c_wait_for_writes(struct qib_devdata *dd)
+{
+   /*
+* implicit read of EXTStatus is as good as explicit
+* read of scratch, if all we want to do is flush
+* writes.
+*/
+   dd->f_gpio_mod(dd, 0, 0, 0);
+   rmb(); /* inlined, so prevent compiler reordering */
+}
+
+/*
+ * QSFP modules are allowed to hold SCL low for 500uSec. Allow twice that
+ * for "almost compliant" modules
+ */
+#define SCL_WAIT_USEC 1000
+
+/* BUF_WAIT is time bus must be free between STOP or ACK and to next START.
+ * Should be 20, but some chips need more.
+ */
+#define TWSI_BUF_WAIT_USEC 60
+
+static void scl_out(struct qib_devdata *dd, u8 bit)
+{
+   u32 mask;
+
+   udelay(1);
+
+   mask = 1UL << dd->gpio_scl_num;
+
+   /* SCL is meant to be bare-drain, so never set "OUT", just DIR */
+   dd->f_gpio_mod(dd, 0, bit ? 0 : mask, mask);
+
+   /*
+* Allow for slow slaves by simple
+* delay for falling edge, sampling on rise.
+*/
+   if (!bit)
+   udelay(2);
+   else {
+   int rise_usec;
+   for (rise_usec = SCL_WAIT_USEC; rise_usec > 0; rise_usec -= 2) {
+   if (mask & dd->f_gpio_mod(dd, 0, 0, 0))
+   break;
+   udelay(2);
+   }
+   if (rise_usec <= 0)
+   qib_dev_err(dd, "SCL interface stuck low > %d uSec\n",
+   SCL_WAIT_USEC);
+   }
+   i2c_wait_for_writes(dd);
+}
+
+static void sda_out(struct qib_devdata *dd, u8 bit)
+{
+   u32 mask;
+
+   mask = 1UL << dd->gpio_sda_num;
+
+   /* SDA is meant to be bare-drain, so never set "OUT", just DIR */
+   dd->f_gpio_mod(d

[PATCH v3 37/52] IB/qib: Add qib_sysfs.c

2010-05-06 Thread Ralph Campbell
creates the qib_sysfs.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_sysfs.c |  691 +
 1 files changed, 691 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_sysfs.c

diff --git a/drivers/infiniband/hw/qib/qib_sysfs.c 
b/drivers/infiniband/hw/qib/qib_sysfs.c
new file mode 100644
index 000..dab4d9f
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_sysfs.c
@@ -0,0 +1,691 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include 
+
+#include "qib.h"
+
+/**
+ * qib_parse_ushort - parse an unsigned short value in an arbitrary base
+ * @str: the string containing the number
+ * @valp: where to put the result
+ *
+ * Returns the number of bytes consumed, or negative value on error.
+ */
+static int qib_parse_ushort(const char *str, unsigned short *valp)
+{
+   unsigned long val;
+   char *end;
+   int ret;
+
+   if (!isdigit(str[0])) {
+   ret = -EINVAL;
+   goto bail;
+   }
+
+   val = simple_strtoul(str, &end, 0);
+
+   if (val > 0x) {
+   ret = -EINVAL;
+   goto bail;
+   }
+
+   *valp = val;
+
+   ret = end + 1 - str;
+   if (ret == 0)
+   ret = -EINVAL;
+
+bail:
+   return ret;
+}
+
+/* start of per-port functions */
+/*
+ * Get/Set heartbeat enable. OR of 1=enabled, 2=auto
+ */
+static ssize_t show_hrtbt_enb(struct qib_pportdata *ppd, char *buf)
+{
+   struct qib_devdata *dd = ppd->dd;
+   int ret;
+
+   ret = dd->f_get_ib_cfg(ppd, QIB_IB_CFG_HRTBT);
+   ret = scnprintf(buf, PAGE_SIZE, "%d\n", ret);
+   return ret;
+}
+
+static ssize_t store_hrtbt_enb(struct qib_pportdata *ppd, const char *buf,
+  size_t count)
+{
+   struct qib_devdata *dd = ppd->dd;
+   int ret;
+   u16 val;
+
+   ret = qib_parse_ushort(buf, &val);
+
+   /*
+* Set the "intentional" heartbeat enable per either of
+* "Enable" and "Auto", as these are normally set together.
+* This bit is consulted when leaving loopback mode,
+* because entering loopback mode overrides it and automatically
+* disables heartbeat.
+*/
+   if (ret >= 0)
+   ret = dd->f_set_ib_cfg(ppd, QIB_IB_CFG_HRTBT, val);
+   if (ret < 0)
+   qib_dev_err(dd, "attempt to set invalid Heartbeat enable\n");
+   return ret < 0 ? ret : count;
+}
+
+static ssize_t store_loopback(struct qib_pportdata *ppd, const char *buf,
+ size_t count)
+{
+   struct qib_devdata *dd = ppd->dd;
+   int ret = count, r;
+
+   r = dd->f_set_ib_loopback(ppd, buf);
+   if (r < 0)
+   ret = r;
+
+   return ret;
+}
+
+static ssize_t store_led_override(struct qib_pportdata *ppd, const char *buf,
+ size_t count)
+{
+   struct qib_devdata *dd = ppd->dd;
+   int ret;
+   u16 val;
+
+   ret = qib_parse_ushort(buf, &val);
+   if (ret > 0)
+   qib_set_led_override(ppd, val);
+   else
+   qib_dev_err(dd, "attempt to set invalid LED override\n");
+   return ret < 0 ? ret : count;
+}
+
+static ssize_t show_status(struct qib_pportdata *ppd, char *buf)
+{
+   ssize_t ret;
+
+   if (!ppd->statusp)
+   re

[PATCH v3 36/52] IB/qib: Add qib_srq.c

2010-05-06 Thread Ralph Campbell
creates the qib_srq.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_srq.c |  374 +++
 1 files changed, 374 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_srq.c

diff --git a/drivers/infiniband/hw/qib/qib_srq.c 
b/drivers/infiniband/hw/qib/qib_srq.c
new file mode 100644
index 000..d79ae33
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_srq.c
@@ -0,0 +1,374 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "qib_verbs.h"
+
+/**
+ * qib_post_srq_receive - post a receive on a shared receive queue
+ * @ibsrq: the SRQ to post the receive on
+ * @wr: the list of work requests to post
+ * @bad_wr: A pointer to the first WR to cause a problem is put here
+ *
+ * This may be called from interrupt context.
+ */
+int qib_post_srq_receive(struct ib_srq *ibsrq, struct ib_recv_wr *wr,
+struct ib_recv_wr **bad_wr)
+{
+   struct qib_srq *srq = to_isrq(ibsrq);
+   struct qib_rwq *wq;
+   unsigned long flags;
+   int ret;
+
+   for (; wr; wr = wr->next) {
+   struct qib_rwqe *wqe;
+   u32 next;
+   int i;
+
+   if ((unsigned) wr->num_sge > srq->rq.max_sge) {
+   *bad_wr = wr;
+   ret = -EINVAL;
+   goto bail;
+   }
+
+   spin_lock_irqsave(&srq->rq.lock, flags);
+   wq = srq->rq.wq;
+   next = wq->head + 1;
+   if (next >= srq->rq.size)
+   next = 0;
+   if (next == wq->tail) {
+   spin_unlock_irqrestore(&srq->rq.lock, flags);
+   *bad_wr = wr;
+   ret = -ENOMEM;
+   goto bail;
+   }
+
+   wqe = get_rwqe_ptr(&srq->rq, wq->head);
+   wqe->wr_id = wr->wr_id;
+   wqe->num_sge = wr->num_sge;
+   for (i = 0; i < wr->num_sge; i++)
+   wqe->sg_list[i] = wr->sg_list[i];
+   /* Make sure queue entry is written before the head index. */
+   smp_wmb();
+   wq->head = next;
+   spin_unlock_irqrestore(&srq->rq.lock, flags);
+   }
+   ret = 0;
+
+bail:
+   return ret;
+}
+
+/**
+ * qib_create_srq - create a shared receive queue
+ * @ibpd: the protection domain of the SRQ to create
+ * @srq_init_attr: the attributes of the SRQ
+ * @udata: data from libibverbs when creating a user SRQ
+ */
+struct ib_srq *qib_create_srq(struct ib_pd *ibpd,
+ struct ib_srq_init_attr *srq_init_attr,
+ struct ib_udata *udata)
+{
+   struct qib_ibdev *dev = to_idev(ibpd->device);
+   struct qib_srq *srq;
+   u32 sz;
+   struct ib_srq *ret;
+
+   if (srq_init_attr->attr.max_sge == 0 ||
+   srq_init_attr->attr.max_sge > ib_qib_max_srq_sges ||
+   srq_init_attr->attr.max_wr == 0 ||
+   srq_init_attr->attr.max_wr > ib_qib_max_srq_wrs) {
+   ret = ERR_PTR(-EINVAL);
+   goto done;
+   }
+
+   srq = kmalloc(sizeof(*srq), GFP_KERNEL);
+   if (!srq) {
+   ret = ERR_PTR(-ENOMEM);
+   goto done;
+   }
+
+   /*
+* Need

[PATCH v3 35/52] IB/qib: Add qib_sdma.c

2010-05-06 Thread Ralph Campbell
creates the qib_sdma.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_sdma.c |  973 ++
 1 files changed, 973 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_sdma.c

diff --git a/drivers/infiniband/hw/qib/qib_sdma.c 
b/drivers/infiniband/hw/qib/qib_sdma.c
new file mode 100644
index 000..b845688
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_sdma.c
@@ -0,0 +1,973 @@
+/*
+ * Copyright (c) 2007, 2008, 2009, 2010 QLogic Corporation. All rights 
reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "qib.h"
+#include "qib_common.h"
+
+/* default pio off, sdma on */
+static ushort sdma_descq_cnt = 256;
+module_param_named(sdma_descq_cnt, sdma_descq_cnt, ushort, S_IRUGO);
+MODULE_PARM_DESC(sdma_descq_cnt, "Number of SDMA descq entries");
+
+/*
+ * Bits defined in the send DMA descriptor.
+ */
+#define SDMA_DESC_LAST  (1ULL << 11)
+#define SDMA_DESC_FIRST (1ULL << 12)
+#define SDMA_DESC_DMA_HEAD  (1ULL << 13)
+#define SDMA_DESC_USE_LARGE_BUF (1ULL << 14)
+#define SDMA_DESC_INTR  (1ULL << 15)
+#define SDMA_DESC_COUNT_LSB 16
+#define SDMA_DESC_GEN_LSB   30
+
+char *qib_sdma_state_names[] = {
+   [qib_sdma_state_s00_hw_down]  = "s00_HwDown",
+   [qib_sdma_state_s10_hw_start_up_wait] = "s10_HwStartUpWait",
+   [qib_sdma_state_s20_idle] = "s20_Idle",
+   [qib_sdma_state_s30_sw_clean_up_wait] = "s30_SwCleanUpWait",
+   [qib_sdma_state_s40_hw_clean_up_wait] = "s40_HwCleanUpWait",
+   [qib_sdma_state_s50_hw_halt_wait] = "s50_HwHaltWait",
+   [qib_sdma_state_s99_running]  = "s99_Running",
+};
+
+char *qib_sdma_event_names[] = {
+   [qib_sdma_event_e00_go_hw_down]   = "e00_GoHwDown",
+   [qib_sdma_event_e10_go_hw_start]  = "e10_GoHwStart",
+   [qib_sdma_event_e20_hw_started]   = "e20_HwStarted",
+   [qib_sdma_event_e30_go_running]   = "e30_GoRunning",
+   [qib_sdma_event_e40_sw_cleaned]   = "e40_SwCleaned",
+   [qib_sdma_event_e50_hw_cleaned]   = "e50_HwCleaned",
+   [qib_sdma_event_e60_hw_halted]= "e60_HwHalted",
+   [qib_sdma_event_e70_go_idle]  = "e70_GoIdle",
+   [qib_sdma_event_e7220_err_halted] = "e7220_ErrHalted",
+   [qib_sdma_event_e7322_err_halted] = "e7322_ErrHalted",
+   [qib_sdma_event_e90_timer_tick]   = "e90_TimerTick",
+};
+
+/* declare all statics here rather than keep sorting */
+static int alloc_sdma(struct qib_pportdata *);
+static void sdma_complete(struct kref *);
+static void sdma_finalput(struct qib_sdma_state *);
+static void sdma_get(struct qib_sdma_state *);
+static void sdma_put(struct qib_sdma_state *);
+static void sdma_set_state(struct qib_pportdata *, enum qib_sdma_states);
+static void sdma_start_sw_clean_up(struct qib_pportdata *);
+static void sdma_sw_clean_up_task(unsigned long);
+static void unmap_desc(struct qib_pportdata *, unsigned);
+
+static void sdma_get(struct qib_sdma_state *ss)
+{
+   kref_get(&ss->kref);
+}
+
+static void sdma_complete(struct kref *kref)
+{
+   struct qib_sdma_state *ss =
+   container_of(kref, struct qib_sdma_state, kref);
+
+   complete(&ss->comp);
+}
+
+static void sdma_put(struct qib_sdma_state *ss)
+{
+   kref_put(&ss->kref, sdma_complete);
+}
+
+static voi

[PATCH v3 34/52] IB/qib: Add qib_sd7220_img.c

2010-05-06 Thread Ralph Campbell
creates the qib_sd7220_img.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_sd7220_img.c | 1081 
 1 files changed, 1081 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_sd7220_img.c

diff --git a/drivers/infiniband/hw/qib/qib_sd7220_img.c 
b/drivers/infiniband/hw/qib/qib_sd7220_img.c
new file mode 100644
index 000..a1118fb
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_sd7220_img.c
@@ -0,0 +1,1081 @@
+/*
+ * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file contains the memory image from the vendor, to be copied into
+ * the IB SERDES of the IBA7220 during initialization.
+ * The file also includes the two functions which use this image.
+ */
+#include 
+#include 
+
+#include "qib.h"
+#include "qib_7220.h"
+
+static unsigned char qib_sd7220_ib_img[] = {
+/**/0x02, 0x0A, 0x29, 0x02, 0x0A, 0x87, 0xE5, 0xE6,
+   0x30, 0xE6, 0x04, 0x7F, 0x01, 0x80, 0x02, 0x7F,
+/*0010*/0x00, 0xE5, 0xE2, 0x30, 0xE4, 0x04, 0x7E, 0x01,
+   0x80, 0x02, 0x7E, 0x00, 0xEE, 0x5F, 0x60, 0x08,
+/*0020*/0x53, 0xF9, 0xF7, 0xE4, 0xF5, 0xFE, 0x80, 0x08,
+   0x7F, 0x0A, 0x12, 0x17, 0x31, 0x12, 0x0E, 0xA2,
+/*0030*/0x75, 0xFC, 0x08, 0xE4, 0xF5, 0xFD, 0xE5, 0xE7,
+   0x20, 0xE7, 0x03, 0x43, 0xF9, 0x08, 0x22, 0x00,
+/*0040*/0x01, 0x20, 0x11, 0x00, 0x04, 0x20, 0x00, 0x75,
+   0x51, 0x01, 0xE4, 0xF5, 0x52, 0xF5, 0x53, 0xF5,
+/*0050*/0x52, 0xF5, 0x7E, 0x7F, 0x04, 0x02, 0x04, 0x38,
+   0xC2, 0x36, 0x05, 0x52, 0xE5, 0x52, 0xD3, 0x94,
+/*0060*/0x0C, 0x40, 0x05, 0x75, 0x52, 0x01, 0xD2, 0x36,
+   0x90, 0x07, 0x0C, 0x74, 0x07, 0xF0, 0xA3, 0x74,
+/*0070*/0xFF, 0xF0, 0xE4, 0xF5, 0x0C, 0xA3, 0xF0, 0x90,
+   0x07, 0x14, 0xF0, 0xA3, 0xF0, 0x75, 0x0B, 0x20,
+/*0080*/0xF5, 0x09, 0xE4, 0xF5, 0x08, 0xE5, 0x08, 0xD3,
+   0x94, 0x30, 0x40, 0x03, 0x02, 0x04, 0x04, 0x12,
+/*0090*/0x00, 0x06, 0x15, 0x0B, 0xE5, 0x08, 0x70, 0x04,
+   0x7F, 0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, 0x09,
+/*00A0*/0x70, 0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00,
+   0xEE, 0x5F, 0x60, 0x05, 0x12, 0x18, 0x71, 0xD2,
+/*00B0*/0x35, 0x53, 0xE1, 0xF7, 0xE5, 0x08, 0x45, 0x09,
+   0xFF, 0xE5, 0x0B, 0x25, 0xE0, 0x25, 0xE0, 0x24,
+/*00C0*/0x83, 0xF5, 0x82, 0xE4, 0x34, 0x07, 0xF5, 0x83,
+   0xEF, 0xF0, 0x85, 0xE2, 0x20, 0xE5, 0x52, 0xD3,
+/*00D0*/0x94, 0x01, 0x40, 0x0D, 0x12, 0x19, 0xF3, 0xE0,
+   0x54, 0xA0, 0x64, 0x40, 0x70, 0x03, 0x02, 0x03,
+/*00E0*/0xFB, 0x53, 0xF9, 0xF8, 0x90, 0x94, 0x70, 0xE4,
+   0xF0, 0xE0, 0xF5, 0x10, 0xAF, 0x09, 0x12, 0x1E,
+/*00F0*/0xB3, 0xAF, 0x08, 0xEF, 0x44, 0x08, 0xF5, 0x82,
+   0x75, 0x83, 0x80, 0xE0, 0xF5, 0x29, 0xEF, 0x44,
+/*0100*/0x07, 0x12, 0x1A, 0x3C, 0xF5, 0x22, 0x54, 0x40,
+   0xD3, 0x94, 0x00, 0x40, 0x1E, 0xE5, 0x29, 0x54,
+/*0110*/0xF0, 0x70, 0x21, 0x12, 0x19, 0xF3, 0xE0, 0x44,
+   0x80, 0xF0, 0xE5, 0x22, 0x54, 0x30, 0x65, 0x08,
+/*0120*/0x70, 0x09, 0x12, 0x19, 0xF3, 0xE0, 0x54, 0xBF,
+   0xF0, 0x80, 0x09, 0x12, 0x19, 0xF3, 0x74, 0x40,
+/*0130*/0xF0, 0x02, 0x03, 0xFB, 0x12, 0x1A, 0x12, 0x75,
+   0x83, 0xAE, 0x74, 0xFF, 0xF0, 0xAF, 0x08, 0x7E,
+/*0140*/0x00, 0xEF, 0x44, 0x07, 0xF5, 0x82, 0xE0, 0xFD,
+   0xE5, 0x0B, 0x25, 0xE0, 0x25, 0xE0, 0x24, 0x81,
+/*0150*/0xF5, 0x82, 0xE4, 0x34, 0x07, 0xF5, 0x83, 0xED,
+   0xF0, 0x90, 0x07, 0x0E, 0xE0, 0x04, 0xF0, 0xEF,
+/*0160*/0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, 0x98, 0xE0,
+   0xF5, 0x28, 0x12, 0x1A, 0x23, 0x40, 0x0C, 0x12,
+/*0170*/0x19, 0xF3, 0xE0, 0x44, 0x01, 0x12, 0x1A, 0x32,
+   0x02, 0x03, 0xF6

[PATCH v3 33/52] IB/qib: Add qib_sd7220.c

2010-05-06 Thread Ralph Campbell
creates the qib_sd7220.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_sd7220.c | 1415 
 1 files changed, 1415 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_sd7220.c

diff --git a/drivers/infiniband/hw/qib/qib_sd7220.c 
b/drivers/infiniband/hw/qib/qib_sd7220.c
new file mode 100644
index 000..17af3cd
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_sd7220.c
@@ -0,0 +1,1415 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+/*
+ * This file contains all of the code that is specific to the SerDes
+ * on the QLogic_IB 7220 chip.
+ */
+
+#include 
+#include 
+
+#include "qib.h"
+#include "qib_7220.h"
+
+/*
+ * Same as in qib_iba7220.c, but just the registers needed here.
+ * Could move whole set to qib_7220.h, but decided better to keep
+ * local.
+ */
+#define KREG_IDX(regname) (QIB_7220_##regname##_OFFS / sizeof(u64))
+#define kr_hwerrclear KREG_IDX(HwErrClear)
+#define kr_hwerrmask KREG_IDX(HwErrMask)
+#define kr_hwerrstatus KREG_IDX(HwErrStatus)
+#define kr_ibcstatus KREG_IDX(IBCStatus)
+#define kr_ibserdesctrl KREG_IDX(IBSerDesCtrl)
+#define kr_scratch KREG_IDX(Scratch)
+#define kr_xgxs_cfg KREG_IDX(XGXSCfg)
+/* these are used only here, not in qib_iba7220.c */
+#define kr_ibsd_epb_access_ctrl KREG_IDX(ibsd_epb_access_ctrl)
+#define kr_ibsd_epb_transaction_reg KREG_IDX(ibsd_epb_transaction_reg)
+#define kr_pciesd_epb_transaction_reg KREG_IDX(pciesd_epb_transaction_reg)
+#define kr_pciesd_epb_access_ctrl KREG_IDX(pciesd_epb_access_ctrl)
+#define kr_serdes_ddsrxeq0 KREG_IDX(SerDes_DDSRXEQ0)
+
+/*
+ * The IBSerDesMappTable is a memory that holds values to be stored in
+ * various SerDes registers by IBC.
+ */
+#define kr_serdes_maptable KREG_IDX(IBSerDesMappTable)
+
+/*
+ * Below used for sdnum parameter, selecting one of the two sections
+ * used for PCIe, or the single SerDes used for IB.
+ */
+#define PCIE_SERDES0 0
+#define PCIE_SERDES1 1
+
+/*
+ * The EPB requires addressing in a particular form. EPB_LOC() is intended
+ * to make #definitions a little more readable.
+ */
+#define EPB_ADDR_SHF 8
+#define EPB_LOC(chn, elt, reg) \
+   (((elt & 0xf) | ((chn & 7) << 4) | ((reg & 0x3f) << 9)) << \
+EPB_ADDR_SHF)
+#define EPB_IB_QUAD0_CS_SHF (25)
+#define EPB_IB_QUAD0_CS (1U <<  EPB_IB_QUAD0_CS_SHF)
+#define EPB_IB_UC_CS_SHF (26)
+#define EPB_PCIE_UC_CS_SHF (27)
+#define EPB_GLOBAL_WR (1U << (EPB_ADDR_SHF + 8))
+
+/* Forward declarations. */
+static int qib_sd7220_reg_mod(struct qib_devdata *dd, int sdnum, u32 loc,
+ u32 data, u32 mask);
+static int ibsd_mod_allchnls(struct qib_devdata *dd, int loc, int val,
+int mask);
+static int qib_sd_trimdone_poll(struct qib_devdata *dd);
+static void qib_sd_trimdone_monitor(struct qib_devdata *dd, const char *where);
+static int qib_sd_setvals(struct qib_devdata *dd);
+static int qib_sd_early(struct qib_devdata *dd);
+static int qib_sd_dactrim(struct qib_devdata *dd);
+static int qib_internal_presets(struct qib_devdata *dd);
+/* Tweak the register (CMUCTRL5) that contains the TRIMSELF controls */
+static int qib_sd_trimself(struct qib_devdata *dd, int val);
+static int epb_access(struct qib_devdata *dd, int sdnum, int claim);
+
+/*
+ * Below keeps track of whether the "once per power-on" initialization has
+ * been done, because uC c

[PATCH v3 32/52] IB/qib: Add qib_ruc.c

2010-05-06 Thread Ralph Campbell
creates the qib_ruc.c file.

Signed-off-by: Ralph Campbell 
---

 drivers/infiniband/hw/qib/qib_ruc.c |  817 +++
 1 files changed, 817 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/hw/qib/qib_ruc.c

diff --git a/drivers/infiniband/hw/qib/qib_ruc.c 
b/drivers/infiniband/hw/qib/qib_ruc.c
new file mode 100644
index 000..eb78d93
--- /dev/null
+++ b/drivers/infiniband/hw/qib/qib_ruc.c
@@ -0,0 +1,817 @@
+/*
+ * Copyright (c) 2006, 2007, 2008, 2009 QLogic Corporation. All rights 
reserved.
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+
+#include "qib.h"
+#include "qib_mad.h"
+
+/*
+ * Convert the AETH RNR timeout code into the number of microseconds.
+ */
+const u32 ib_qib_rnr_table[32] = {
+   655360, /* 00: 655.36 */
+   10, /* 01:.01 */
+   20, /* 02 .02 */
+   30, /* 03:.03 */
+   40, /* 04:.04 */
+   60, /* 05:.06 */
+   80, /* 06:.08 */
+   120,/* 07:.12 */
+   160,/* 08:.16 */
+   240,/* 09:.24 */
+   320,/* 0A:.32 */
+   480,/* 0B:.48 */
+   640,/* 0C:.64 */
+   960,/* 0D:.96 */
+   1280,   /* 0E:   1.28 */
+   1920,   /* 0F:   1.92 */
+   2560,   /* 10:   2.56 */
+   3840,   /* 11:   3.84 */
+   5120,   /* 12:   5.12 */
+   7680,   /* 13:   7.68 */
+   10240,  /* 14:  10.24 */
+   15360,  /* 15:  15.36 */
+   20480,  /* 16:  20.48 */
+   30720,  /* 17:  30.72 */
+   40960,  /* 18:  40.96 */
+   61440,  /* 19:  61.44 */
+   81920,  /* 1A:  81.92 */
+   122880, /* 1B: 122.88 */
+   163840, /* 1C: 163.84 */
+   245760, /* 1D: 245.76 */
+   327680, /* 1E: 327.68 */
+   491520  /* 1F: 491.52 */
+};
+
+/*
+ * Validate a RWQE and fill in the SGE state.
+ * Return 1 if OK.
+ */
+static int qib_init_sge(struct qib_qp *qp, struct qib_rwqe *wqe)
+{
+   int i, j, ret;
+   struct ib_wc wc;
+   struct qib_lkey_table *rkt;
+   struct qib_pd *pd;
+   struct qib_sge_state *ss;
+
+   rkt = &to_idev(qp->ibqp.device)->lk_table;
+   pd = to_ipd(qp->ibqp.srq ? qp->ibqp.srq->pd : qp->ibqp.pd);
+   ss = &qp->r_sge;
+   ss->sg_list = qp->r_sg_list;
+   qp->r_len = 0;
+   for (i = j = 0; i < wqe->num_sge; i++) {
+   if (wqe->sg_list[i].length == 0)
+   continue;
+   /* Check LKEY */
+   if (!qib_lkey_ok(rkt, pd, j ? &ss->sg_list[j - 1] : &ss->sge,
+&wqe->sg_list[i], IB_ACCESS_LOCAL_WRITE))
+   goto bad_lkey;
+   qp->r_len += wqe->sg_list[i].length;
+   j++;
+   }
+   ss->num_sge = j;
+   ss->total_len = qp->r_len;
+   ret = 1;
+   goto bail;
+
+bad_lkey:
+   while (j) {
+   struct qib_sge *sge = --j ? &ss->sg_list[j - 1] : &ss->sge;
+
+   atomic_dec(&sge->mr->refcount);
+   }
+   ss->num_sge = 0;
+   memset(&wc, 0, sizeof(wc));
+   wc.wr_id = wqe->wr_id;
+   wc.status = IB_WC_LOC_PROT_ERR;
+   wc.opcode = IB_WC_RECV;
+   wc.qp = &qp->ibqp;
+   /* Signal solicited completion event. */
+   qib_cq_enter(to_icq(qp->ibqp.recv_cq), &wc, 1);
+   ret = 0;
+bail:
+   return ret;
+}
+
+/**
+ * qib_get_rwqe - copy the next RWQE 

  1   2   3   >