[openib-general] bug in libmthca/src/verbs.c?

2006-08-25 Thread Robert Pearson








struct ibv_cq *mthca_create_cq(struct ibv_context
*context, int cqe,

   struct
ibv_comp_channel *channel,

   int comp_vector)

{

    struct mthca_create_cq  cmd;

---à
snip ß

    ret = ibv_cmd_create_cq(context, cqe - 1,
channel, comp_vector,

    &cq->ibv_cq,
&cmd.ibv_cmd, sizeof cmd,

 

  
^^

 

    &resp.ibv_resp,
sizeof resp);

 

The command size passed to ibv_cmd_create_cq is the
size of the mthca command wrapper which is larger than what is most likely
expected.






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] basic IB doubt

2006-08-25 Thread Michael Krause


At 09:50 AM 8/25/2006, Caitlin Bestler wrote:

[EMAIL PROTECTED] wrote:
>>    Thomas> How does an adapter guarantee that
no bridges or other
>>    Thomas> intervening devices reorder their
writes, or for that
>>    Thomas> matter flush them to memory at
all!?
>> 
>> That's a good point.  The HCA would have to do a read to
flush the
>> posted writes, and I'm sure it's not doing that (since it would
add
>> horrible latency for no good reason).
>> 
>> I guess it's not safe to rely on ordering of RDMA writes after
all.
> 
> Couldn't the same point then be made that a CQ entry may come
> before the data has been posted?
> 
That's why both specs (IBTA and RDMAC) are very explicit that all
prior messages are complete before the CQE is given to the user.
It is up to the RDMA Device and/or its driver to guarantee this
by whatever means are appropriate. An implementation that allows
a CQE post to pass the data placement that it is reporting on the
PCI bus is in error.
The critical concept of the Work Completion is that it consolidates
guarantees and notificatins. The implementation can do all sorts
of strange things that it thinks optimize *before* the work
completion,
but at the time the work completion is delivered to the user
everything
is supposed to be as expected.

Caitlin's logic is correct and the basis for why these two specifications
call out this issue.  And yes, Roland, one cannot rely upon RDMA
Write ordering whether for IB or iWARP. iWARP specifically allows out of
order delivery.  IB while providing in-order delivery due to its
strong ordering protocol still has no guarantees when it comes to the
memory controller and I/O technology being used.  Given not
everything was expected to operate over PCI, we made sure that the
specifications pointed out these issues so that software would be
designed to accommodate all interconnect attach types and usage
models.  We wanted to maximize the underlying implementation options
while providing software with a consistent operating model to enable it
to be simplified as well.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] basic IB doubt

2006-08-25 Thread Michael Krause


At 10:45 AM 8/25/2006, Tom Tucker wrote:
On Fri, 2006-08-25 at 12:51
-0400, Talpey, Thomas wrote: 
> At 12:40 PM 8/25/2006, Sean Hefty wrote:
> >>    Thomas> How does an adapter guarantee
that no bridges or other
> >>    Thomas> intervening devices reorder
their writes, or for that
> >>    Thomas> matter flush them to memory at
all!?
> >>
> >>That's a good point.  The HCA would have to do a read
to flush the
> >>posted writes, and I'm sure it's not doing that (since it
would add
> >>horrible latency for no good reason).
> >>
> >>I guess it's not safe to rely on ordering of RDMA writes
after all.
> >
> >Couldn't the same point then be made that a CQ entry may come
before the data
> >has been posted?
> 
> When the CQ entry arrives, the context that polls it off the
queue
> must use the dma_sync_*() api to finalize any associated data
> transactions (known by the uper layer).
> 
> This is basic, and it's the reason that a completion is so
important.
> The completion, in and of itself, isn't what drives the
synchronization.
> It's the transfer of control to the processor.
This is a giant rat hole. 
On a coherent cache architecture, the CQE write posted to the bus
following the write of the last byte of data will NOT be seen by the
processor prior to the last byte of data. That is, write ordering is
preserved in bridges.
The dma_sync_* API has to do with processor cache, not transaction
ordering. In fact, per this argument at the time you called
dma_sync_*,
the processor may not have seen the reordered transaction yet, so
what
would it be syncing?
Write ordering and read ordering/fence is preserved in intervening
bridges. What you DON'T know is whether or not a write (which was
posted
and may be sitting in a bridge FIFO) has been flushed and/or
propagated
to memory at the time you submit the next write and/or interrupt the
host. 
If you submit a READ following the write, however, per the PCI bus
ordering rules you know that the data is in the target.  
Unless, of course, I'm wrong ... :-)
A PCI read following a write to the same address will result validate
that all prior write transactions are flushed to host memory.  This
is one way that people have used (albeit with a performance penalty) to
verify that a transaction it out of the HCA / RNIC fault zone and
therefore an acknowledgement to the source means the data is safe and one
can survive the HCA / RNIC failing without falling into a
non-deterministic state.    PCI writes are strongly
ordered on any PCI technology offering.   Relaxed ordering
needs to be taken into account w.r.t. writes vs. reads as well as read
completions being weakly ordered as well. 
Mike 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] basic IB doubt

2006-08-25 Thread Michael Krause


At 12:53 PM 8/25/2006, Talpey, Thomas wrote:
At 03:23 PM 8/25/2006, Greg
Lindahl wrote:
>On Fri, Aug 25, 2006 at 03:21:20PM -0400, [EMAIL PROTECTED]
wrote:
>
>> I presume you meant invalidate the cache, not flush it, before

>accessing DMA'ed 
>> data. 
>
>Yes, this is what I meant. Sorry!
Flush (sync for_device) before posting.
Invalidate (sync for_cpu) before processing.
On some architectures, these operations flush and/or invalidate
i/o pipeline caches as well. As they should.
Many platforms have coherent I/O components so the explicit requirements
on software to participate are often eliminated.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] basic IB doubt

2006-08-25 Thread Michael Krause


At 11:55 AM 8/25/2006, Greg Lindahl wrote:
On Fri, Aug 25, 2006 at
10:00:50AM -0400, Thomas Bachman wrote:
> Not that I have any stance on this issue, but is this is the text in
the
> spec that is being debated? 
> 
> (page 269, section 9.5, Transaction Ordering):
> "An application shall not depend upon the order of data writes
to
> memory within a message. For example, if an application sets up
> data buffers that overlap, for separate data segments within a
> message, it is not guaranteed that the last sent data will
always
> overwrite the earlier."
No. The case we're talking about is different from the example.
There's text elsewhere which says, basically, that you can't access
the data buffer until seeing the completion.
> I'm assuming that the spec
authors had reason for putting this in there, so
> maybe they could provide guidance here?
We put that text there to accommodate differing memory controller
architectures / coherency protocol capabilities / etc.  Basically,
there is no way to guarantee that the memory is in a usable and correct
state until the completion is seen.  This was intended to guide
software to not peek at memory but to examine a completion queue entry so
that if memory is updated out of order, silent data corruption would not
occur. 

I can't speak for
the authors, but as an implementor, this has a huge impact on
implementation.
For example, on an architecture where you need to do work such as
flushing the cache before accessing DMAed data, that's done in the
completion. x86 in general is not such an architecture, but they exist.
IB is intended to be portable to any CPU
architecture.
Invalidation protocol is one concern.  The other is the a completion
notification also often acts as a flush of the local I/O fabric as
well.  In the case of a RDMA Write, the only way to safely determine
complete delivery was to have a RDMA Write / Send with completion
combination or a RDMA Write / RDMA Read depending upon which side
required such completion knowledge.

For iWarp, the
issue is that packets are frequently reordered.
Neither IP or Ethernet re-order packets that often in practice. 
Same is true for packet drop rates (the real issue for packet drop is the
impact on performance and recovery times which is why IB was not designed
to work over long or diverse topologies where intermediate elements may
see what might be termed a high packet loss rate).
Mike

-- greg

___
openib-general mailing list
openib-general@openib.org

http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit

http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] OpenSM partition Management

2006-08-25 Thread Sasha Khapyorsky
Hi,

On 15:02 Fri 25 Aug , Venkatesh Babu wrote:
> 
> The document OpenSM_PKey_Mgr.txt under link 
> https://openib.org/svn/gen2/trunk/src/userspace/management/osm/doc/OpenSM_PKey_Mgr.txt
>  
> describes the roadmap for OpenSM partition management. It discusses two 
> phase implementation.
> 
> 1. What functionality is available with OFED version 1.1 ?

The implemented and available in OFED 1.1 part of partition management
is more or the less phase I from the road-map you are referring. For more
details about implemented features see:

https://openib.org/svn/gen2/trunk/src/userspace/management/osm/doc/partition-config.txt

> 2. When each of these two phases are going to be implemented and available ?

Phase I is done. And phase II is TBD.

Sasha

> 
>  Thanks,
>VBabu
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] OpenSM partition Management

2006-08-25 Thread Venkatesh Babu

The document OpenSM_PKey_Mgr.txt under link 
https://openib.org/svn/gen2/trunk/src/userspace/management/osm/doc/OpenSM_PKey_Mgr.txt
 
describes the roadmap for OpenSM partition management. It discusses two 
phase implementation.

1. What functionality is available with OFED version 1.1 ?
2. When each of these two phases are going to be implemented and available ?

 Thanks,
   VBabu

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] test message

2006-08-25 Thread Michael Lee
test message
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH 22 of 23] IB/ipath - print warning if LID not acquired within one minute

2006-08-25 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Roland Dreier wrote:
> 1) What makes ipath special so that we want this warning for ipath
> devices but not other IB hardware?

There's nothing special about our hardware that requires this.  We just
wanted that in there so we could direct customers to look at dmesg to
see if the warning popped up if they call with a problem.  It is useful
to have for this purpose.

> If this warning is actually
> useful, then I think it would make more sense to start a timer when
> any IB device is added, and warn if ports with a physical link don't
> become active after the timeout time.

I'd be OK with doing that, too.

> But I'm having a hard time
> seeing why we want this message in the kernel log.

It's useful when you're trying to track down problems.

> 2) You do cancel_delayed_work() but not flush_scheduled_work(), so
> it's possible for your timeout function to be running after the module
> text is gone.

OK - I'll fix this up.  Thanks for spotting it.

Regards,
 Robert.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRO9gU/zvnpzTd9fxAQJBQwgAkbgrEA4/UpbcD0gsGC+39r5ZAAz+4d3I
/QAIKn239juMf8TfrlekAzD9MCj5Rna1bk3yu1gu/Z0Jg5vHvQNmBxDtveQ4bDyu
1DAUbvmclNknzM00LtMHI6AZfYbRgsbCIKXJw0reXctAkbJAvMU0U6Ff1imvO0Tw
38C24ktDalaaKpz4DHO261UHlmtD4wlJojKLYI5yH39JSHK449zjJznrP9W8SPIU
RbxGktSsD69gQXmpqgY5KEmbcukZ9AIF4VHTG2uEz1aO7eOQ+1BsUg140EcWXC//
R1Jg56WhCYsMDVik7+u994VgQi34beos9pwbLIUkq+315VHN3QFbQg==
=XhNA
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 11 of 23] IB/ipath - add new minor device to allow sending of diag packets

2006-08-25 Thread Bryan O'Sullivan
On Fri, 2006-08-25 at 12:50 -0700, Roland Dreier wrote:

> The last line adds trailing whitespace, which git complains about.
> When patchbombing, can you run your patches through "git apply --check
> --whitespace=error-all" or the equivalent?

Sure.  Thanks for spotting that.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1 of 23] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems

2006-08-25 Thread Bryan O'Sullivan
On Fri, 2006-08-25 at 12:45 -0700, Roland Dreier wrote:
> How did you generate these patches?

Using Mercurial.  

> because the line
> 
> diff --git a/drivers/infiniband/hw/ipath/Makefile 
> b/drivers/infiniband/hw/ipath/Makefile
> 
> makes git think it's a git diff, but git doesn't put dates on the
> filename lines.

Ah, interesting.  Looks like a bug in the git-compatible patch
generator, then.  Sorry about that.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1 of 23] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems

2006-08-25 Thread Roland Dreier
 > Signed-off-by: John Gregor <[EMAIL PROTECTED]>

I assume this patch was actually written by John Gregor?  If so you
should include an extra "From:" line in the body of the email, so that
the authorship information gets put into git correctly.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 23 of 23] IB/ipath - control receive polarity inversion

2006-08-25 Thread Roland Dreier
Applied 1-21 and 23 to my for-2.6.19 branch, and skipped 22 for now.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Talpey, Thomas
At 03:23 PM 8/25/2006, Greg Lindahl wrote:
>On Fri, Aug 25, 2006 at 03:21:20PM -0400, [EMAIL PROTECTED] wrote:
>
>> I presume you meant invalidate the cache, not flush it, before 
>accessing DMA'ed 
>> data. 
>
>Yes, this is what I meant. Sorry!

Flush (sync for_device) before posting.
Invalidate (sync for_cpu) before processing.

On some architectures, these operations flush and/or invalidate
i/o pipeline caches as well. As they should.

Tom.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 11 of 23] IB/ipath - add new minor device to allow sending of diag packets

2006-08-25 Thread Roland Dreier
 > +if (ret < 0) {
 > +printk(KERN_ERR IPATH_DRV_NAME ": Unable to create "
 > +   "diag data device: error %d\n", -ret);
 > +goto bail_ipathfs;
 > +}
 > + 

The last line adds trailing whitespace, which git complains about.
When patchbombing, can you run your patches through "git apply --check
--whitespace=error-all" or the equivalent?

Thanks,
  Roland

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 1 of 23] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems

2006-08-25 Thread Roland Dreier
How did you generate these patches?  When I try to apply them with
git, I get errors like

error: drivers/infiniband/hw/ipath/Makefile Fri Aug 25 11:19:44 2006 -0700: 
No such file or directory

because the line

diff --git a/drivers/infiniband/hw/ipath/Makefile 
b/drivers/infiniband/hw/ipath/Makefile

makes git think it's a git diff, but git doesn't put dates on the
filename lines.  In other words, instead of

--- a/drivers/infiniband/hw/ipath/Makefile  Fri Aug 25 11:19:44 2006 -0700
+++ b/drivers/infiniband/hw/ipath/Makefile  Fri Aug 25 11:19:44 2006 -0700

the patch should just have

--- a/drivers/infiniband/hw/ipath/Makefile
+++ b/drivers/infiniband/hw/ipath/Makefile

before the Makefile chunks.

I fixed this up by deleting the "diff --git" lines, but I'm curious
how you created this in the first place.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH 22 of 23] IB/ipath - print warning if LID not acquired within one minute

2006-08-25 Thread Roland Dreier
1) What makes ipath special so that we want this warning for ipath
devices but not other IB hardware?  If this warning is actually
useful, then I think it would make more sense to start a timer when
any IB device is added, and warn if ports with a physical link don't
become active after the timeout time.  But I'm having a hard time
seeing why we want this message in the kernel log.

2) You do cancel_delayed_work() but not flush_scheduled_work(), so
it's possible for your timeout function to be running after the module
text is gone.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Greg Lindahl
On Fri, Aug 25, 2006 at 03:21:20PM -0400, [EMAIL PROTECTED] wrote:

> I presume you meant invalidate the cache, not flush it, before accessing 
> DMA'ed 
> data. 

Yes, this is what I meant. Sorry!

-- greg


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread mlakshmanan






Date sent:   Fri, 25 Aug 2006 11:55:55 -0700




From:     "Greg Lindahl" <[EMAIL PROTECTED]>











> For example, on an architecture where you need to do work such as




> flushing the cache before accessing DMAed data, that's done in the




> completion. x86 in general is not such an architecture, but they




> exist. IB is intended to be portable to any CPU architecture.




> 











I presume you meant invalidate the cache, not flush it, before accessing DMA'ed 
data. 











-madhu



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] SRP numbers from gen1 vs gen2

2006-08-25 Thread Cain, Brian (GE Healthcare)
Does anyone have any throughput benchmark data for SRP comparing gen1
and gen2?

--
-Brian 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Greg Lindahl
On Fri, Aug 25, 2006 at 10:00:50AM -0400, Thomas Bachman wrote:

> Not that I have any stance on this issue, but is this is the text in the
> spec that is being debated? 
> 
> (page 269, section 9.5, Transaction Ordering):
> "An application shall not depend upon the order of data writes to
> memory within a message. For example, if an application sets up
> data buffers that overlap, for separate data segments within a
> message, it is not guaranteed that the last sent data will always
> overwrite the earlier."

No. The case we're talking about is different from the example.
There's text elsewhere which says, basically, that you can't access
the data buffer until seeing the completion.

> I'm assuming that the spec authors had reason for putting this in there, so
> maybe they could provide guidance here?

I can't speak for the authors, but as an implementor, this has a huge
impact on implementation.

For example, on an architecture where you need to do work such as
flushing the cache before accessing DMAed data, that's done in the
completion. x86 in general is not such an architecture, but they
exist. IB is intended to be portable to any CPU architecture.

For iWarp, the issue is that packets are frequently reordered.

-- greg


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 6 of 23] IB/ipath - merge ipath_core and ib_ipath drivers

2006-08-25 Thread Bryan O'Sullivan
There is little point in keeping the two drivers separate, so we are
merging them.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
--- a/drivers/infiniband/Makefile   Fri Aug 25 11:19:44 2006 -0700
+++ b/drivers/infiniband/Makefile   Fri Aug 25 11:19:45 2006 -0700
@@ -1,6 +1,6 @@ obj-$(CONFIG_INFINIBAND)+= core/
 obj-$(CONFIG_INFINIBAND)   += core/
 obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mthca/
-obj-$(CONFIG_IPATH_CORE)   += hw/ipath/
+obj-$(CONFIG_INFINIBAND_IPATH) += hw/ipath/
 obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/
 obj-$(CONFIG_INFINIBAND_SRP)   += ulp/srp/
 obj-$(CONFIG_INFINIBAND_ISER)  += ulp/iser/
diff --git a/drivers/infiniband/hw/ipath/Kconfig 
b/drivers/infiniband/hw/ipath/Kconfig
--- a/drivers/infiniband/hw/ipath/Kconfig   Fri Aug 25 11:19:44 2006 -0700
+++ b/drivers/infiniband/hw/ipath/Kconfig   Fri Aug 25 11:19:45 2006 -0700
@@ -1,16 +1,9 @@ config IPATH_CORE
-config IPATH_CORE
+config INFINIBAND_IPATH
tristate "QLogic InfiniPath Driver"
-   depends on 64BIT && PCI_MSI && NET
+   depends on PCI_MSI && 64BIT && INFINIBAND
---help---
-   This is a low-level driver for QLogic InfiniPath host channel
-   adapters (HCAs) based on the HT-400 and PE-800 chips.
-
-config INFINIBAND_IPATH
-   tristate "QLogic InfiniPath Verbs Driver"
-   depends on IPATH_CORE && INFINIBAND
-   ---help---
-   This is a driver that provides InfiniBand verbs support for
-   QLogic InfiniPath host channel adapters (HCAs).  This
-   allows these devices to be used with both kernel upper level
-   protocols such as IP-over-InfiniBand as well as with userspace
-   applications (in conjunction with InfiniBand userspace access).
+   This is a driver for QLogic InfiniPath host channel adapters,
+   including InfiniBand verbs support.  This driver allows these
+   devices to be used with both kernel upper level protocols such
+   as IP-over-InfiniBand as well as with userspace applications
+   (in conjunction with InfiniBand userspace access).
diff --git a/drivers/infiniband/hw/ipath/Makefile 
b/drivers/infiniband/hw/ipath/Makefile
--- a/drivers/infiniband/hw/ipath/Makefile  Fri Aug 25 11:19:44 2006 -0700
+++ b/drivers/infiniband/hw/ipath/Makefile  Fri Aug 25 11:19:45 2006 -0700
@@ -1,10 +1,10 @@ EXTRA_CFLAGS += -DIPATH_IDSTR='"QLogic k
 EXTRA_CFLAGS += -DIPATH_IDSTR='"QLogic kernel.org driver"' \
-DIPATH_KERN_TYPE=0
 
-obj-$(CONFIG_IPATH_CORE) += ipath_core.o
 obj-$(CONFIG_INFINIBAND_IPATH) += ib_ipath.o
 
-ipath_core-y := \
+ib_ipath-y := \
+   ipath_cq.o \
ipath_diag.o \
ipath_driver.o \
ipath_eeprom.o \
@@ -13,26 +13,23 @@ ipath_core-y := \
ipath_ht400.o \
ipath_init_chip.o \
ipath_intr.o \
+   ipath_keys.o \
ipath_layer.o \
-   ipath_pe800.o \
-   ipath_stats.o \
-   ipath_sysfs.o \
-   ipath_user_pages.o
-
-ipath_core-$(CONFIG_X86_64) += ipath_wc_x86_64.o
-ipath_core-$(CONFIG_PPC64) += ipath_wc_ppc64.o
-
-ib_ipath-y := \
-   ipath_cq.o \
-   ipath_keys.o \
ipath_mad.o \
ipath_mmap.o \
ipath_mr.o \
+   ipath_pe800.o \
ipath_qp.o \
ipath_rc.o \
ipath_ruc.o \
ipath_srq.o \
+   ipath_stats.o \
+   ipath_sysfs.o \
ipath_uc.o \
ipath_ud.o \
-   ipath_verbs.o \
-   ipath_verbs_mcast.o
+   ipath_user_pages.o \
+   ipath_verbs_mcast.o \
+   ipath_verbs.o
+
+ib_ipath-$(CONFIG_X86_64) += ipath_wc_x86_64.o
+ib_ipath-$(CONFIG_PPC64) += ipath_wc_ppc64.o
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
b/drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:44 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:45 
2006 -0700
@@ -40,6 +40,7 @@
 
 #include "ipath_kernel.h"
 #include "ipath_layer.h"
+#include "ipath_verbs.h"
 #include "ipath_common.h"
 
 static void ipath_update_pio_bufs(struct ipath_devdata *);
@@ -50,8 +51,6 @@ const char *ipath_get_unit_name(int unit
snprintf(iname, sizeof iname, "infinipath%u", unit);
return iname;
 }
-
-EXPORT_SYMBOL_GPL(ipath_get_unit_name);
 
 #define DRIVER_LOAD_MSG "QLogic " IPATH_DRV_NAME " loaded: "
 #define PFX IPATH_DRV_NAME ": "
@@ -510,6 +509,7 @@ static int __devinit ipath_init_one(stru
ipath_user_add(dd);
ipath_diag_add(dd);
ipath_layer_add(dd);
+   ipath_register_ib_device(dd);
 
goto bail;
 
@@ -538,6 +538,7 @@ static void __devexit ipath_remove_one(s
return;
 
dd = pci_get_drvdata(pdev);
+   ipath_unregister_ib_device(dd->verbs_dev);
ipath_layer_remove(dd);
ipath_diag_remove(dd);
ipath_user_remove(dd);
@@ -978,12 +979,8 @@ 

[openib-general] [PATCH 14 of 23] IB/ipath - support new QLogic product naming scheme

2006-08-25 Thread Bryan O'Sullivan
This patch only renames files, fixes product names, and updates
comments.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/Makefile 
b/drivers/infiniband/hw/ipath/Makefile
--- a/drivers/infiniband/hw/ipath/Makefile  Fri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/Makefile  Fri Aug 25 11:19:45 2006 -0700
@@ -10,7 +10,8 @@ ib_ipath-y := \
ipath_eeprom.o \
ipath_file_ops.o \
ipath_fs.o \
-   ipath_ht400.o \
+   ipath_iba6110.o \
+   ipath_iba6120.o \
ipath_init_chip.o \
ipath_intr.o \
ipath_keys.o \
@@ -18,7 +19,6 @@ ib_ipath-y := \
ipath_mad.o \
ipath_mmap.o \
ipath_mr.o \
-   ipath_pe800.o \
ipath_qp.o \
ipath_rc.o \
ipath_ruc.o \
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
b/drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:45 
2006 -0700
@@ -401,10 +401,10 @@ static int __devinit ipath_init_one(stru
/* setup the chip-specific functions, as early as possible. */
switch (ent->device) {
case PCI_DEVICE_ID_INFINIPATH_HT:
-   ipath_init_ht400_funcs(dd);
+   ipath_init_iba6110_funcs(dd);
break;
case PCI_DEVICE_ID_INFINIPATH_PE800:
-   ipath_init_pe800_funcs(dd);
+   ipath_init_iba6120_funcs(dd);
break;
default:
ipath_dev_err(dd, "Found unknown QLogic deviceid 0x%x, "
@@ -969,7 +969,8 @@ reloop:
 */
if (l == hdrqtail || (i && !(i&0xf))) {
u64 lval;
-   if (l == hdrqtail) /* PE-800 interrupt only on last */
+   if (l == hdrqtail)
+   /* request IBA6120 interrupt only on last */
lval = dd->ipath_rhdrhead_intr_off | l;
else
lval = l;
@@ -983,7 +984,7 @@ reloop:
}
 
if (!dd->ipath_rhdrhead_intr_off && !reloop) {
-   /* HT-400 workaround; we can have a race clearing chip
+   /* IBA6110 workaround; we can have a race clearing chip
 * interrupt with another interrupt about to be delivered,
 * and can clear it before it is delivered on the GPIO
 * workaround.  By doing the extra check here for the
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c 
b/drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c  Fri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c  Fri Aug 25 11:19:45 
2006 -0700
@@ -1110,7 +1110,7 @@ static int ipath_mmap(struct file *fp, s
ret = mmap_rcvegrbufs(vma, pd);
else if (pgaddr == (u64) pd->port_rcvhdrq_phys) {
/*
-* The rcvhdrq itself; readonly except on HT-400 (so have
+* The rcvhdrq itself; readonly except on HT (so have
 * to allow writable mapping), multiple pages, contiguous
 * from an i/o perspective.
 */
@@ -1298,14 +1298,14 @@ static int find_best_unit(struct file *f
 * This code is present to allow a knowledgeable person to
 * specify the layout of processes to processors before opening
 * this driver, and then we'll assign the process to the "closest"
-* HT-400 to that processor (we assume reasonable connectivity,
+* InfiniPath chip to that processor (we assume reasonable connectivity,
 * for now).  This code assumes that if affinity has been set
 * before this point, that at most one cpu is set; for now this
 * is reasonable.  I check for both cpus_empty() and cpus_full(),
 * in case some kernel variant sets none of the bits when no
 * affinity is set.  2.6.11 and 12 kernels have all present
 * cpus set.  Some day we'll have to fix it up further to handle
-* a cpu subset.  This algorithm fails for two HT-400's connected
+* a cpu subset.  This algorithm fails for two HT chips connected
 * in tunnel fashion.  Eventually this needs real topology
 * information.  There may be some issues with dual core numbering
 * as well.  This needs more work prior to release.
diff --git a/drivers/infiniband/hw/ipath/ipath_ht400.c 
b/drivers/infiniband/hw/ipath/ipath_iba6110.c
rename from drivers/infiniband/hw/ipath/ipath_ht400.c
rename to drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c   Fri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c   Fri Aug 25 11:19:45 
2006 -0700
@@ -33,7 +33,7 @@
 
 /*
  * This file contains all o

[openib-general] [PATCH 9 of 23] IB/ipath - remove stale references to userspace SMA

2006-08-25 Thread Bryan O'Sullivan
When we first submitted a userspace subnet management agent, it was
rejected, so we left it out of the final driver submission.  This patch
removes a number of vestigial references to it.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_common.h 
b/drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.hFri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.hFri Aug 25 11:19:45 
2006 -0700
@@ -106,9 +106,9 @@ struct infinipath_stats {
__u64 sps_ether_spkts;
/* number of "ethernet" packets received by driver */
__u64 sps_ether_rpkts;
-   /* number of SMA packets sent by driver */
+   /* number of SMA packets sent by driver. Obsolete. */
__u64 sps_sma_spkts;
-   /* number of SMA packets received by driver */
+   /* number of SMA packets received by driver. Obsolete. */
__u64 sps_sma_rpkts;
/* number of times all ports rcvhdrq was full and packet dropped */
__u64 sps_hdrqfull;
@@ -138,7 +138,7 @@ struct infinipath_stats {
__u64 sps_pageunlocks;
/*
 * Number of packets dropped in kernel other than errors (ether
-* packets if ipath not configured, sma/mad, etc.)
+* packets if ipath not configured, etc.)
 */
__u64 sps_krdrops;
/* pad for future growth */
@@ -153,8 +153,6 @@ struct infinipath_stats {
 #define IPATH_STATUS_DISABLED  0x2 /* hardware disabled */
 /* Device has been disabled via admin request */
 #define IPATH_STATUS_ADMIN_DISABLED0x4
-#define IPATH_STATUS_OIB_SMA   0x8 /* ipath_mad kernel SMA running */
-#define IPATH_STATUS_SMA  0x10 /* user SMA running */
 /* Chip has been found and initted */
 #define IPATH_STATUS_CHIP_PRESENT 0x20
 /* IB link is at ACTIVE, usable for data traffic */
@@ -463,14 +461,6 @@ struct __ipath_sendpkt {
__u32 sps_cnt;  /* number of entries to use in sps_iov */
/* array of iov's describing packet. TEMPORARY */
struct ipath_iovec sps_iov[4];
-};
-
-/* Passed into SMA special file's ->read and ->write methods. */
-struct ipath_sma_pkt
-{
-   __u32 unit; /* unit on which to send packet */
-   __u64 data; /* address of payload in userspace */
-   __u32 len;  /* length of payload */
 };
 
 /*
diff --git a/drivers/infiniband/hw/ipath/ipath_debug.h 
b/drivers/infiniband/hw/ipath/ipath_debug.h
--- a/drivers/infiniband/hw/ipath/ipath_debug.h Fri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_debug.h Fri Aug 25 11:19:45 2006 -0700
@@ -60,7 +60,6 @@
 #define __IPATH_USER_SEND   0x1000 /* use user mode send */
 #define __IPATH_KERNEL_SEND 0x2000 /* use kernel mode send */
 #define __IPATH_EPKTDBG 0x4000 /* print ethernet packet data */
-#define __IPATH_SMADBG  0x8000 /* sma packet debug */
 #define __IPATH_IPATHDBG0x1/* Ethernet (IPATH) gen debug */
 #define __IPATH_IPATHWARN   0x2/* Ethernet (IPATH) warnings */
 #define __IPATH_IPATHERR0x4/* Ethernet (IPATH) errors */
@@ -84,7 +83,6 @@
 /* print mmap/nopage stuff, not using VDBG any more */
 #define __IPATH_MMDBG 0x0
 #define __IPATH_EPKTDBG   0x0  /* print ethernet packet data */
-#define __IPATH_SMADBG0x0   /* process startup (init)/exit messages */
 #define __IPATH_IPATHDBG  0x0  /* Ethernet (IPATH) table dump on */
 #define __IPATH_IPATHWARN 0x0  /* Ethernet (IPATH) warnings on   */
 #define __IPATH_IPATHERR  0x0  /* Ethernet (IPATH) errors on   */
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
b/drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:45 
2006 -0700
@@ -64,7 +64,7 @@ DEFINE_SPINLOCK(ipath_devs_lock);
 DEFINE_SPINLOCK(ipath_devs_lock);
 LIST_HEAD(ipath_dev_list);
 
-wait_queue_head_t ipath_sma_state_wait;
+wait_queue_head_t ipath_state_wait;
 
 unsigned ipath_debug = __IPATH_INFO;
 
@@ -618,15 +618,16 @@ static int ipath_wait_linkstate(struct i
 static int ipath_wait_linkstate(struct ipath_devdata *dd, u32 state,
int msecs)
 {
-   dd->ipath_sma_state_wanted = state;
-   wait_event_interruptible_timeout(ipath_sma_state_wait,
+   dd->ipath_state_wanted = state;
+   wait_event_interruptible_timeout(ipath_state_wait,
 (dd->ipath_flags & state),
 msecs_to_jiffies(msecs));
-   dd->ipath_sma_state_wanted = 0;
+   dd->ipath_state_wanted = 0;
 
if (!(dd->ipath_flags & state)) {
u64 val;
-   ipath_cdbg(SMA, "Didn't reach linkstate %s within %u ms\n",
+   ipath_cdbg(VERBOSE, "Didn't reach linkstate %s within %u"
+  " ms\n",
  

[openib-general] [PATCH 8 of 23] IB/ipath - simplify debugging code after ipath_core and ib_ipath merger

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
b/drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:45 
2006 -0700
@@ -58,7 +58,7 @@ const char *ipath_get_unit_name(int unit
  * The size has to be longer than this string, so we can append
  * board/chip information to it in the init code.
  */
-const char ipath_core_version[] = IPATH_IDSTR "\n";
+const char ib_ipath_version[] = IPATH_IDSTR "\n";
 
 static struct idr unit_table;
 DEFINE_SPINLOCK(ipath_devs_lock);
@@ -1847,7 +1847,7 @@ static int __init infinipath_init(void)
 {
int ret;
 
-   ipath_dbg(KERN_INFO DRIVER_LOAD_MSG "%s", ipath_core_version);
+   ipath_dbg(KERN_INFO DRIVER_LOAD_MSG "%s", ib_ipath_version);
 
/*
 * These must be called before the driver is registered with
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h 
b/drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.hFri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.hFri Aug 25 11:19:45 
2006 -0700
@@ -785,7 +785,7 @@ static inline u32 ipath_read_creg32(cons
 
 struct device_driver;
 
-extern const char ipath_core_version[];
+extern const char ib_ipath_version[];
 
 int ipath_driver_create_group(struct device_driver *);
 void ipath_driver_remove_group(struct device_driver *);
@@ -815,7 +815,7 @@ const char *ipath_get_unit_name(int unit
 
 extern struct mutex ipath_mutex;
 
-#define IPATH_DRV_NAME "ipath_core"
+#define IPATH_DRV_NAME "ib_ipath"
 #define IPATH_MAJOR233
 #define IPATH_USER_MINOR_BASE  0
 #define IPATH_SMA_MINOR128
diff --git a/drivers/infiniband/hw/ipath/ipath_keys.c 
b/drivers/infiniband/hw/ipath/ipath_keys.c
--- a/drivers/infiniband/hw/ipath/ipath_keys.c  Fri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_keys.c  Fri Aug 25 11:19:45 2006 -0700
@@ -34,6 +34,7 @@
 #include 
 
 #include "ipath_verbs.h"
+#include "ipath_kernel.h"
 
 /**
  * ipath_alloc_lkey - allocate an lkey
@@ -60,7 +61,7 @@ int ipath_alloc_lkey(struct ipath_lkey_t
r = (r + 1) & (rkt->max - 1);
if (r == n) {
spin_unlock_irqrestore(&rkt->lock, flags);
-   _VERBS_INFO("LKEY table full\n");
+   ipath_dbg(KERN_INFO "LKEY table full\n");
ret = 0;
goto bail;
}
diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c 
b/drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
@@ -274,7 +274,7 @@ void ipath_free_all_qps(struct ipath_qp_
free_qpn(qpt, qp->ibqp.qp_num);
if (!atomic_dec_and_test(&qp->refcount) ||
!ipath_destroy_qp(&qp->ibqp))
-   _VERBS_INFO("QP memory leak!\n");
+   ipath_dbg(KERN_INFO "QP memory leak!\n");
qp = nqp;
}
}
@@ -362,8 +362,8 @@ void ipath_error_qp(struct ipath_qp *qp)
struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
struct ib_wc wc;
 
-   _VERBS_INFO("QP%d/%d in error state\n",
-   qp->ibqp.qp_num, qp->remote_qpn);
+   ipath_dbg(KERN_INFO "QP%d/%d in error state\n",
+ qp->ibqp.qp_num, qp->remote_qpn);
 
spin_lock(&dev->pending_lock);
/* XXX What if its already removed by the timeout code? */
@@ -945,8 +945,8 @@ void ipath_sqerror_qp(struct ipath_qp *q
struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last);
 
-   _VERBS_INFO("Send queue error on QP%d/%d: err: %d\n",
-   qp->ibqp.qp_num, qp->remote_qpn, wc->status);
+   ipath_dbg(KERN_INFO "Send queue error on QP%d/%d: err: %d\n",
+ qp->ibqp.qp_num, qp->remote_qpn, wc->status);
 
spin_lock(&dev->pending_lock);
/* XXX What if its already removed by the timeout code? */
diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c 
b/drivers/infiniband/hw/ipath/ipath_sysfs.c
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri Aug 25 11:19:45 2006 -0700
@@ -75,7 +75,7 @@ static ssize_t show_version(struct devic
 static ssize_t show_version(struct device_driver *dev, char *buf)
 {
/* The string printed here is already newline-terminated. */
-   return scnprintf(buf, PAGE_SIZE, "%s", ipath_core_version);
+   return scnprintf(buf, PAGE_SIZE, "%s", ib_ipath_version);
 }
 
 static ssize_t show_num_uni

[openib-general] [PATCH 11 of 23] IB/ipath - add new minor device to allow sending of diag packets

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_common.h 
b/drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.hFri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.hFri Aug 25 11:19:45 
2006 -0700
@@ -461,6 +461,13 @@ struct __ipath_sendpkt {
__u32 sps_cnt;  /* number of entries to use in sps_iov */
/* array of iov's describing packet. TEMPORARY */
struct ipath_iovec sps_iov[4];
+};
+
+/* Passed into diag data special file's ->write method. */
+struct ipath_diag_pkt {
+   __u32 unit;
+   __u64 data;
+   __u32 len;
 };
 
 /*
diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c 
b/drivers/infiniband/hw/ipath/ipath_diag.c
--- a/drivers/infiniband/hw/ipath/ipath_diag.c  Fri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c  Fri Aug 25 11:19:45 2006 -0700
@@ -41,6 +41,7 @@
  * through the /sys/bus/pci resource mmap interface.
  */
 
+#include 
 #include 
 #include 
 
@@ -273,6 +274,158 @@ bail:
return ret;
 }
 
+static ssize_t ipath_diagpkt_write(struct file *fp,
+  const char __user *data,
+  size_t count, loff_t *off);
+
+static struct file_operations diagpkt_file_ops = {
+   .owner = THIS_MODULE,
+   .write = ipath_diagpkt_write,
+};
+
+static struct cdev *diagpkt_cdev;
+static struct class_device *diagpkt_class_dev;
+
+int __init ipath_diagpkt_add(void)
+{
+   return ipath_cdev_init(IPATH_DIAGPKT_MINOR,
+  "ipath_diagpkt", &diagpkt_file_ops,
+  &diagpkt_cdev, &diagpkt_class_dev);
+}
+
+void __exit ipath_diagpkt_remove(void)
+{
+   ipath_cdev_cleanup(&diagpkt_cdev, &diagpkt_class_dev);
+}
+
+/**
+ * ipath_diagpkt_write - write an IB packet
+ * @fp: the diag data device file pointer
+ * @data: ipath_diag_pkt structure saying where to get the packet
+ * @count: size of data to write
+ * @off: unused by this code
+ */
+static ssize_t ipath_diagpkt_write(struct file *fp,
+  const char __user *data,
+  size_t count, loff_t *off)
+{
+   u32 __iomem *piobuf;
+   u32 plen, clen, pbufn;
+   struct ipath_diag_pkt dp;
+   u32 *tmpbuf = NULL;
+   struct ipath_devdata *dd;
+   ssize_t ret = 0;
+   u64 val;
+
+   if (count < sizeof(dp)) {
+   ret = -EINVAL;
+   goto bail;
+   }
+
+   if (copy_from_user(&dp, data, sizeof(dp))) {
+   ret = -EFAULT;
+   goto bail;
+   }
+
+   /* send count must be an exact number of dwords */
+   if (dp.len & 3) {
+   ret = -EINVAL;
+   goto bail;
+   }
+
+   clen = dp.len >> 2;
+
+   dd = ipath_lookup(dp.unit);
+   if (!dd || !(dd->ipath_flags & IPATH_PRESENT) ||
+   !dd->ipath_kregbase) {
+   ipath_cdbg(VERBOSE, "illegal unit %u for diag data send\n",
+  dp.unit);
+   ret = -ENODEV;
+   goto bail;
+   }
+
+   if (ipath_diag_inuse && !diag_set_link &&
+   !(dd->ipath_flags & IPATH_LINKACTIVE)) {
+   diag_set_link = 1;
+   ipath_cdbg(VERBOSE, "Trying to set to set link active for "
+  "diag pkt\n");
+   ipath_set_linkstate(dd, IPATH_IB_LINKARM);
+   ipath_set_linkstate(dd, IPATH_IB_LINKACTIVE);
+   }
+
+   if (!(dd->ipath_flags & IPATH_INITTED)) {
+   /* no hardware, freeze, etc. */
+   ipath_cdbg(VERBOSE, "unit %u not usable\n", dd->ipath_unit);
+   ret = -ENODEV;
+   goto bail;
+   }
+   val = dd->ipath_lastibcstat & IPATH_IBSTATE_MASK;
+   if (val != IPATH_IBSTATE_INIT && val != IPATH_IBSTATE_ARM &&
+   val != IPATH_IBSTATE_ACTIVE) {
+   ipath_cdbg(VERBOSE, "unit %u not ready (state %llx)\n",
+  dd->ipath_unit, (unsigned long long) val);
+   ret = -EINVAL;
+   goto bail;
+   }
+
+   /* need total length before first word written */
+   /* +1 word is for the qword padding */
+   plen = sizeof(u32) + dp.len;
+
+   if ((plen + 4) > dd->ipath_ibmaxlen) {
+   ipath_dbg("Pkt len 0x%x > ibmaxlen %x\n",
+ plen - 4, dd->ipath_ibmaxlen);
+   ret = -EINVAL;
+   goto bail;  /* before writing pbc */
+   }
+   tmpbuf = vmalloc(plen);
+   if (!tmpbuf) {
+   dev_info(&dd->pcidev->dev, "Unable to allocate tmp buffer, "
+"failing\n");
+   ret = -ENOMEM;
+   goto bail;
+   }
+
+   if (copy_from_user(tmpbuf,
+  (const void __user *) (unsigned long) dp.data,
+  dp.len)) {
+ 

[openib-general] [PATCH 2 of 23] IB/ipath - lock resource limit counters correctly

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c 
b/drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri Aug 25 11:19:44 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri Aug 25 11:19:44 2006 -0700
@@ -776,18 +776,22 @@ static struct ib_pd *ipath_alloc_pd(stru
 * we allow allocations of more than we report for this value.
 */
 
-   if (dev->n_pds_allocated == ib_ipath_max_pds) {
-   ret = ERR_PTR(-ENOMEM);
-   goto bail;
-   }
-
pd = kmalloc(sizeof *pd, GFP_KERNEL);
if (!pd) {
ret = ERR_PTR(-ENOMEM);
goto bail;
}
 
+   spin_lock(&dev->n_pds_lock);
+   if (dev->n_pds_allocated == ib_ipath_max_pds) {
+   spin_unlock(&dev->n_pds_lock);
+   kfree(pd);
+   ret = ERR_PTR(-ENOMEM);
+   goto bail;
+   }
+
dev->n_pds_allocated++;
+   spin_unlock(&dev->n_pds_lock);
 
/* ib_alloc_pd() will initialize pd->ibpd. */
pd->user = udata != NULL;
@@ -803,7 +807,9 @@ static int ipath_dealloc_pd(struct ib_pd
struct ipath_pd *pd = to_ipd(ibpd);
struct ipath_ibdev *dev = to_idev(ibpd->device);
 
+   spin_lock(&dev->n_pds_lock);
dev->n_pds_allocated--;
+   spin_unlock(&dev->n_pds_lock);
 
kfree(pd);
 
@@ -824,11 +830,6 @@ static struct ib_ah *ipath_create_ah(str
struct ib_ah *ret;
struct ipath_ibdev *dev = to_idev(pd->device);
 
-   if (dev->n_ahs_allocated == ib_ipath_max_ahs) {
-   ret = ERR_PTR(-ENOMEM);
-   goto bail;
-   }
-
/* A multicast address requires a GRH (see ch. 8.4.1). */
if (ah_attr->dlid >= IPATH_MULTICAST_LID_BASE &&
ah_attr->dlid != IPATH_PERMISSIVE_LID &&
@@ -854,7 +855,16 @@ static struct ib_ah *ipath_create_ah(str
goto bail;
}
 
+   spin_lock(&dev->n_ahs_lock);
+   if (dev->n_ahs_allocated == ib_ipath_max_ahs) {
+   spin_unlock(&dev->n_ahs_lock);
+   kfree(ah);
+   ret = ERR_PTR(-ENOMEM);
+   goto bail;
+   }
+
dev->n_ahs_allocated++;
+   spin_unlock(&dev->n_ahs_lock);
 
/* ib_create_ah() will initialize ah->ibah. */
ah->attr = *ah_attr;
@@ -876,7 +886,9 @@ static int ipath_destroy_ah(struct ib_ah
struct ipath_ibdev *dev = to_idev(ibah->device);
struct ipath_ah *ah = to_iah(ibah);
 
+   spin_lock(&dev->n_ahs_lock);
dev->n_ahs_allocated--;
+   spin_unlock(&dev->n_ahs_lock);
 
kfree(ah);
 
@@ -963,6 +975,12 @@ static void *ipath_register_ib_device(in
dev = &idev->ibdev;
 
/* Only need to initialize non-zero fields. */
+   spin_lock_init(&idev->n_pds_lock);
+   spin_lock_init(&idev->n_ahs_lock);
+   spin_lock_init(&idev->n_cqs_lock);
+   spin_lock_init(&idev->n_srqs_lock);
+   spin_lock_init(&idev->n_mcast_grps_lock);
+
spin_lock_init(&idev->qp_table.lock);
spin_lock_init(&idev->lk_table.lock);
idev->sm_lid = __constant_be16_to_cpu(IB_LID_PERMISSIVE);
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h 
b/drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri Aug 25 11:19:44 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri Aug 25 11:19:44 2006 -0700
@@ -434,11 +434,18 @@ struct ipath_ibdev {
__be64 sys_image_guid;  /* in network order */
__be64 gid_prefix;  /* in network order */
__be64 mkey;
+
u32 n_pds_allocated;/* number of PDs allocated for device */
+   spinlock_t n_pds_lock;
u32 n_ahs_allocated;/* number of AHs allocated for device */
+   spinlock_t n_ahs_lock;
u32 n_cqs_allocated;/* number of CQs allocated for device */
+   spinlock_t n_cqs_lock;
u32 n_srqs_allocated;   /* number of SRQs allocated for device */
+   spinlock_t n_srqs_lock;
u32 n_mcast_grps_allocated; /* number of mcast groups allocated */
+   spinlock_t n_mcast_grps_lock;
+
u64 ipath_sword;/* total dwords sent (sample result) */
u64 ipath_rword;/* total dwords received (sample result) */
u64 ipath_spkts;/* total packets sent (sample result) */
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c 
b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c   Fri Aug 25 11:19:44 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c   Fri Aug 25 11:19:44 
2006 -0700
@@ -207,12 +207,15 @@ static int ipath_mcast_add(struct ipath_
goto bail;
}
 
+   spin_lock(&dev->n_mcast_grps_lock);
if (dev->n_mcast_grps_allocated == ib_ipath_max_mcast_grps) {
+   spin_unlock(&dev->n_mcast_grps_lock);
ret = ENOME

[openib-general] [PATCH 1 of 23] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems

2006-08-25 Thread Bryan O'Sullivan
Ordering of writethrough store buffers needs to be forced, and we need
to use ifdef to get writethrough behavior to InfiniPath buffers, because
there is no generic way to specify that at this time (similar to code
in char/drm/drm_vm.c and block/z2ram.c).

Signed-off-by: John Gregor <[EMAIL PROTECTED]>
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/Makefile 
b/drivers/infiniband/hw/ipath/Makefile
--- a/drivers/infiniband/hw/ipath/Makefile  Fri Aug 25 11:19:44 2006 -0700
+++ b/drivers/infiniband/hw/ipath/Makefile  Fri Aug 25 11:19:44 2006 -0700
@@ -20,6 +20,7 @@ ipath_core-y := \
ipath_user_pages.o
 
 ipath_core-$(CONFIG_X86_64) += ipath_wc_x86_64.o
+ipath_core-$(CONFIG_PPC64) += ipath_wc_ppc64.o
 
 ib_ipath-y := \
ipath_cq.o \
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
b/drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:44 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:44 
2006 -0700
@@ -440,7 +440,13 @@ static int __devinit ipath_init_one(stru
}
dd->ipath_pcirev = rev;
 
+#if defined(__powerpc__)
+   /* There isn't a generic way to specify writethrough mappings */
+   dd->ipath_kregbase = __ioremap(addr, len,
+   (_PAGE_NO_CACHE|_PAGE_WRITETHRU));
+#else
dd->ipath_kregbase = ioremap_nocache(addr, len);
+#endif
 
if (!dd->ipath_kregbase) {
ipath_dbg("Unable to map io addr %llx to kvirt, failing\n",
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c 
b/drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c  Fri Aug 25 11:19:44 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c  Fri Aug 25 11:19:44 
2006 -0700
@@ -985,6 +985,13 @@ static int mmap_piobufs(struct vm_area_s
 * write combining behavior we want on the PIO buffers!
 */
 
+#if defined(__powerpc__)
+   /* There isn't a generic way to specify writethrough mappings */
+   pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE;
+   pgprot_val(vma->vm_page_prot) |= _PAGE_WRITETHRU;
+   pgprot_val(vma->vm_page_prot) &= ~_PAGE_GUARDED;
+#endif
+
if (vma->vm_flags & VM_READ) {
dev_info(&dd->pcidev->dev,
 "Can't map piobufs as readable (flags=%lx)\n",
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c 
b/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
new file mode 100644
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c  Fri Aug 25 11:19:44 
2006 -0700
@@ -0,0 +1,52 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file is conditionally built on PowerPC only.  Otherwise weak symbol
+ * versions of the functions exported from here are used.
+ */
+
+#include "ipath_kernel.h"
+
+/**
+ * ipath_unordered_wc - indicate whether write combining is ordered
+ *
+ * PowerPC systems (at least those in the 970 processor family)
+ * write partially filled store buffers in address order, but will write
+ * completely filled store buffers in "random" order, and therefore must
+ * have serialization for correctness with current InfiniPath chips.
+ *
+ */
+int ipath_unordered_wc(void)
+{
+   return 1;
+}

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-genera

[openib-general] [PATCH 5 of 23] IB/ipath - drop requirement that PIO buffers be mmaped write-only

2006-08-25 Thread Bryan O'Sullivan
Some userlands try to mmap these pages read-write, so accommodate them.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c 
b/drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c  Fri Aug 25 11:19:44 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c  Fri Aug 25 11:19:44 
2006 -0700
@@ -992,15 +992,10 @@ static int mmap_piobufs(struct vm_area_s
pgprot_val(vma->vm_page_prot) &= ~_PAGE_GUARDED;
 #endif
 
-   if (vma->vm_flags & VM_READ) {
-   dev_info(&dd->pcidev->dev,
-"Can't map piobufs as readable (flags=%lx)\n",
-vma->vm_flags);
-   ret = -EPERM;
-   goto bail;
-   }
-
-   /* don't allow them to later change to readable with mprotect */
+   /*
+* don't allow them to later change to readable with mprotect (for when
+* not initially mapped readable, as is normally the case)
+*/
vma->vm_flags &= ~VM_MAYREAD;
vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 10 of 23] IB/ipath - trivial cleanups

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h 
b/drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.hFri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.hFri Aug 25 11:19:45 
2006 -0700
@@ -528,7 +528,6 @@ void ipath_cdev_cleanup(struct cdev **cd
 
 int ipath_diag_add(struct ipath_devdata *);
 void ipath_diag_remove(struct ipath_devdata *);
-void ipath_diag_bringup_link(struct ipath_devdata *);
 
 extern wait_queue_head_t ipath_state_wait;
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 4 of 23] IB/ipath - fix handling of kpiobufs

2006-08-25 Thread Bryan O'Sullivan
Change comment: no longer imply that user can set ipath_kpiobufs to zero.
Actually set ipath_kpiobufs from parameter. Previously only altered
per-device ipath_lastport_piobuf, which was over-written in chip init.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c 
b/drivers/infiniband/hw/ipath/ipath_init_chip.c
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri Aug 25 11:19:44 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri Aug 25 11:19:44 
2006 -0700
@@ -691,7 +691,7 @@ int ipath_init_chip(struct ipath_devdata
dd->ipath_pioavregs = ALIGN(val, sizeof(u64) * BITS_PER_BYTE / 2)
/ (sizeof(u64) * BITS_PER_BYTE / 2);
if (ipath_kpiobufs == 0) {
-   /* not set by user, or set explictly to default  */
+   /* not set by user (this is default) */
if ((dd->ipath_piobcnt2k + dd->ipath_piobcnt4k) > 128)
kpiobufs = 32;
else
@@ -950,6 +950,7 @@ static int ipath_set_kpiobufs(const char
dd->ipath_piobcnt2k + dd->ipath_piobcnt4k - val;
}
 
+   ipath_kpiobufs = val;
ret = 0;
 bail:
spin_unlock_irqrestore(&ipath_devs_lock, flags);

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 3 of 23] IB/ipath - fix for crash on module unload, if cfgports < portcnt

2006-08-25 Thread Bryan O'Sullivan
Allocate enough pointers for all possible ports, to avoid problems in
cleanup/unload.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c 
b/drivers/infiniband/hw/ipath/ipath_init_chip.c
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri Aug 25 11:19:44 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri Aug 25 11:19:44 
2006 -0700
@@ -240,7 +240,11 @@ static int init_chip_first(struct ipath_
  "only supports %u\n", ipath_cfgports,
  dd->ipath_portcnt);
}
-   dd->ipath_pd = kzalloc(sizeof(*dd->ipath_pd) * dd->ipath_cfgports,
+   /*
+* Allocate full portcnt array, rather than just cfgports, because
+* cleanup iterates across all possible ports.
+*/
+   dd->ipath_pd = kzalloc(sizeof(*dd->ipath_pd) * dd->ipath_portcnt,
   GFP_KERNEL);
 
if (!dd->ipath_pd) {

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 12 of 23] IB/ipath - do not allow use of CQ entries with invalid counts

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c 
b/drivers/infiniband/hw/ipath/ipath_cq.c
--- a/drivers/infiniband/hw/ipath/ipath_cq.cFri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_cq.cFri Aug 25 11:19:45 2006 -0700
@@ -172,7 +172,7 @@ struct ib_cq *ipath_create_cq(struct ib_
struct ipath_cq_wc *wc;
struct ib_cq *ret;
 
-   if (entries > ib_ipath_max_cqes) {
+   if (entries < 1 || entries > ib_ipath_max_cqes) {
ret = ERR_PTR(-EINVAL);
goto done;
}
@@ -324,6 +324,11 @@ int ipath_resize_cq(struct ib_cq *ibcq, 
u32 head, tail, n;
int ret;
 
+   if (cqe < 1 || cqe > ib_ipath_max_cqes) {
+   ret = -EINVAL;
+   goto bail;
+   }
+
/*
 * Need to use vmalloc() if we want to support large #s of entries.
 */

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 13 of 23] IB/ipath - account for attached QPs correctly

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c 
b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c   Fri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c   Fri Aug 25 11:19:45 
2006 -0700
@@ -217,6 +217,8 @@ static int ipath_mcast_add(struct ipath_
dev->n_mcast_grps_allocated++;
spin_unlock(&dev->n_mcast_grps_lock);
 
+   mcast->n_attached++;
+
list_add_tail_rcu(&mqp->list, &mcast->qp_list);
 
atomic_inc(&mcast->refcount);

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 23 of 23] IB/ipath - control receive polarity inversion

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
b/drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:46 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:46 
2006 -0700
@@ -2156,5 +2156,22 @@ bail:
return ret;
 }
 
+int ipath_set_rx_pol_inv(struct ipath_devdata *dd, u8 new_pol_inv)
+{
+   u64 val;
+   if ( new_pol_inv > INFINIPATH_XGXS_RX_POL_MASK ) {
+   return -1;
+   }
+   if ( dd->ipath_rx_pol_inv != new_pol_inv ) {
+   dd->ipath_rx_pol_inv = new_pol_inv;
+   val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig);
+   val &= ~(INFINIPATH_XGXS_RX_POL_MASK <<
+ INFINIPATH_XGXS_RX_POL_SHIFT);
+val |= ((u64)dd->ipath_rx_pol_inv) <<
+INFINIPATH_XGXS_RX_POL_SHIFT;
+   ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val);
+   }
+   return 0;
+}
 module_init(infinipath_init);
 module_exit(infinipath_cleanup);
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c 
b/drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c   Fri Aug 25 11:19:46 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c   Fri Aug 25 11:19:46 
2006 -0700
@@ -1290,6 +1290,15 @@ static int ipath_ht_bringup_serdes(struc
val &= ~INFINIPATH_XGXS_RESET;
change = 1;
}
+   if (((val >> INFINIPATH_XGXS_RX_POL_SHIFT) &
+INFINIPATH_XGXS_RX_POL_MASK) != dd->ipath_rx_pol_inv ) {
+   /* need to compensate for Tx inversion in partner */
+   val &= ~(INFINIPATH_XGXS_RX_POL_MASK <<
+INFINIPATH_XGXS_RX_POL_SHIFT);
+   val |= dd->ipath_rx_pol_inv <<
+   INFINIPATH_XGXS_RX_POL_SHIFT;
+   change = 1;
+   }
if (change)
ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c 
b/drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c   Fri Aug 25 11:19:46 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c   Fri Aug 25 11:19:46 
2006 -0700
@@ -654,6 +654,15 @@ static int ipath_pe_bringup_serdes(struc
val &= ~INFINIPATH_XGXS_RESET;
change = 1;
}
+   if (((val >> INFINIPATH_XGXS_RX_POL_SHIFT) &
+INFINIPATH_XGXS_RX_POL_MASK) != dd->ipath_rx_pol_inv ) {
+   /* need to compensate for Tx inversion in partner */
+   val &= ~(INFINIPATH_XGXS_RX_POL_MASK <<
+INFINIPATH_XGXS_RX_POL_SHIFT);
+   val |= dd->ipath_rx_pol_inv <<
+   INFINIPATH_XGXS_RX_POL_SHIFT;
+   change = 1;
+   }
if (change)
ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h 
b/drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.hFri Aug 25 11:19:46 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.hFri Aug 25 11:19:46 
2006 -0700
@@ -503,6 +503,8 @@ struct ipath_devdata {
u8 ipath_pci_cacheline;
/* LID mask control */
u8 ipath_lmc;
+   /* Rx Polarity inversion (compensate for ~tx on partner) */
+   u8 ipath_rx_pol_inv;
 
/* local link integrity counter */
u32 ipath_lli_counter;
@@ -570,6 +572,7 @@ int ipath_set_linkstate(struct ipath_dev
 int ipath_set_linkstate(struct ipath_devdata *, u8);
 int ipath_set_mtu(struct ipath_devdata *, u16);
 int ipath_set_lid(struct ipath_devdata *, u32, u8);
+int ipath_set_rx_pol_inv(struct ipath_devdata *dd, u8 new_pol_inv);
 
 /* for use in system calls, where we want to know device type, etc. */
 #define port_fp(fp) ((struct ipath_portdata *) (fp)->private_data)
diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h 
b/drivers/infiniband/hw/ipath/ipath_registers.h
--- a/drivers/infiniband/hw/ipath/ipath_registers.h Fri Aug 25 11:19:46 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_registers.h Fri Aug 25 11:19:46 
2006 -0700
@@ -282,6 +282,8 @@
 #define INFINIPATH_XGXS_RESET  0x7ULL
 #define INFINIPATH_XGXS_MDIOADDR_MASK  0xfULL
 #define INFINIPATH_XGXS_MDIOADDR_SHIFT 4
+#define INFINIPATH_XGXS_RX_POL_SHIFT 19
+#define INFINIPATH_XGXS_RX_POL_MASK 0xfULL
 
 #define INFINIPATH_RT_ADDR_MASK 0xFFULL/* 40 bits valid */
 
diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c 
b/drivers/infiniband/hw/ipath/ipath_sysfs.c
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri Aug 25 11:19:46 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri Aug 25 11:19:46 2006 -0700
@@ -561,6 +561,33 @@ bail:
return ret;
 

[openib-general] [PATCH 18 of 23] IB/ipath - put a limit on the number of QPs that can be created

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c 
b/drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
@@ -833,9 +833,21 @@ struct ib_qp *ipath_create_qp(struct ib_
}
}
 
+   spin_lock(&dev->n_qps_lock);
+   if (dev->n_qps_allocated == ib_ipath_max_qps) {
+   spin_unlock(&dev->n_qps_lock);
+   ret = ERR_PTR(-ENOMEM);
+   goto bail_ip;
+   }
+
+   dev->n_qps_allocated++;
+   spin_unlock(&dev->n_qps_lock);
+
ret = &qp->ibqp;
goto bail;
 
+bail_ip:
+   kfree(qp->ip);
 bail_rwq:
vfree(qp->r_rq.wq);
 bail_qp:
@@ -864,6 +876,9 @@ int ipath_destroy_qp(struct ib_qp *ibqp)
spin_lock_irqsave(&qp->s_lock, flags);
qp->state = IB_QPS_ERR;
spin_unlock_irqrestore(&qp->s_lock, flags);
+   spin_lock(&dev->n_qps_lock);
+   dev->n_qps_allocated--;
+   spin_unlock(&dev->n_qps_lock);
 
/* Stop the sending tasklet. */
tasklet_kill(&qp->s_task);
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c 
b/drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri Aug 25 11:19:45 2006 -0700
@@ -72,6 +72,10 @@ module_param_named(max_qp_wrs, ib_ipath_
 module_param_named(max_qp_wrs, ib_ipath_max_qp_wrs, uint,
   S_IWUSR | S_IRUGO);
 MODULE_PARM_DESC(max_qp_wrs, "Maximum number of QP WRs to support");
+
+unsigned int ib_ipath_max_qps = 16384;
+module_param_named(max_qps, ib_ipath_max_qps, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_qps, "Maximum number of QPs to support");
 
 unsigned int ib_ipath_max_sges = 0x60;
 module_param_named(max_sges, ib_ipath_max_sges, uint, S_IWUSR | S_IRUGO);
@@ -958,7 +962,7 @@ static int ipath_query_device(struct ib_
props->sys_image_guid = dev->sys_image_guid;
 
props->max_mr_size = ~0ull;
-   props->max_qp = dev->qp_table.max;
+   props->max_qp = ib_ipath_max_qps;
props->max_qp_wr = ib_ipath_max_qp_wrs;
props->max_sge = ib_ipath_max_sges;
props->max_cq = ib_ipath_max_cqs;
@@ -1420,6 +1424,7 @@ int ipath_register_ib_device(struct ipat
spin_lock_init(&idev->n_pds_lock);
spin_lock_init(&idev->n_ahs_lock);
spin_lock_init(&idev->n_cqs_lock);
+   spin_lock_init(&idev->n_qps_lock);
spin_lock_init(&idev->n_srqs_lock);
spin_lock_init(&idev->n_mcast_grps_lock);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h 
b/drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri Aug 25 11:19:45 2006 -0700
@@ -482,6 +482,8 @@ struct ipath_ibdev {
spinlock_t n_ahs_lock;
u32 n_cqs_allocated;/* number of CQs allocated for device */
spinlock_t n_cqs_lock;
+   u32 n_qps_allocated;/* number of QPs allocated for device */
+   spinlock_t n_qps_lock;
u32 n_srqs_allocated;   /* number of SRQs allocated for device */
spinlock_t n_srqs_lock;
u32 n_mcast_grps_allocated; /* number of mcast groups allocated */
@@ -792,6 +794,8 @@ extern unsigned int ib_ipath_max_cqs;
 
 extern unsigned int ib_ipath_max_qp_wrs;
 
+extern unsigned int ib_ipath_max_qps;
+
 extern unsigned int ib_ipath_max_sges;
 
 extern unsigned int ib_ipath_max_mcast_grps;

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 22 of 23] IB/ipath - print warning if LID not acquired within one minute

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c 
b/drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.cFri Aug 25 11:19:46 
2006 -0700
@@ -114,6 +114,13 @@ static int __devinit ipath_init_one(stru
 #define PCI_DEVICE_ID_INFINIPATH_HT 0xd
 #define PCI_DEVICE_ID_INFINIPATH_PE800 0x10
 
+/*
+ * Number of seconds before we complain about not getting a LID
+ * assignment.
+ */
+
+#define LID_TIMEOUT 60
+
 static const struct pci_device_id ipath_pci_tbl[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_HT) },
{ PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_PE800) },
@@ -129,6 +136,29 @@ static struct pci_driver ipath_driver = 
.id_table = ipath_pci_tbl,
 };
 
+
+static void check_link_status(void *data)
+{
+   struct ipath_devdata *dd = data;
+
+   /*
+* If we're in the NOCABLE state, try again in another minute.
+*/
+
+   if (dd->ipath_flags & IPATH_STATUS_IB_NOCABLE) {
+   schedule_delayed_work(&dd->link_task, HZ * LID_TIMEOUT);
+   return;
+   }
+
+   /*
+* If we don't have a LID, let the user know and don't bother
+* checking again.
+*/
+
+   if (dd->ipath_lid == 0)
+   dev_info(&dd->pcidev->dev,
+"We don't have a LID yet (no subnet manager?)");
+}
 
 static inline void read_bars(struct ipath_devdata *dd, struct pci_dev *dev,
 u32 *bar0, u32 *bar1)
@@ -196,6 +226,8 @@ static struct ipath_devdata *ipath_alloc
 
dd->pcidev = pdev;
pci_set_drvdata(pdev, dd);
+
+   INIT_WORK(&dd->link_task, check_link_status, dd);
 
list_add(&dd->ipath_list, &ipath_dev_list);
 
@@ -509,6 +541,9 @@ static int __devinit ipath_init_one(stru
ipath_diag_add(dd);
ipath_register_ib_device(dd);
 
+   /* Check that we have a LID in LID_TIMEOUT seconds. */
+   schedule_delayed_work(&dd->link_task, HZ * LID_TIMEOUT);
+
goto bail;
 
 bail_iounmap:
@@ -536,6 +571,9 @@ static void __devexit ipath_remove_one(s
return;
 
dd = pci_get_drvdata(pdev);
+
+   cancel_delayed_work(&dd->link_task);
+
ipath_unregister_ib_device(dd->verbs_dev);
ipath_diag_remove(dd);
ipath_user_remove(dd);
@@ -1644,6 +1682,8 @@ int ipath_set_lid(struct ipath_devdata *
dd->ipath_lid = arg;
dd->ipath_lmc = lmc;
 
+   dev_info(&dd->pcidev->dev, "We got a lid: %u\n", arg);
+
return 0;
 }
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h 
b/drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.hFri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.hFri Aug 25 11:19:46 
2006 -0700
@@ -508,6 +508,9 @@ struct ipath_devdata {
u32 ipath_lli_counter;
/* local link integrity errors */
u32 ipath_lli_errors;
+
+   /* Link status check work */
+   struct work_struct link_task;
 };
 
 extern struct list_head ipath_dev_list;

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 0 of 23] IB/ipath - updates for 2.6.19

2006-08-25 Thread Bryan O'Sullivan
Hi, Roland -

This is a series of patches to bring the ipath driver up to date for 2.6.19.
The patches apply on top of Ralph's mmap patch that you accepted yesterday.

Please apply.

Thanks,

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 19 of 23] IB/ipath - handle sq_sig_all field correctly

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c 
b/drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
@@ -606,9 +606,10 @@ int ipath_query_qp(struct ib_qp *ibqp, s
init_attr->recv_cq = qp->ibqp.recv_cq;
init_attr->srq = qp->ibqp.srq;
init_attr->cap = attr->cap;
-   init_attr->sq_sig_type =
-   (qp->s_flags & (1 << IPATH_S_SIGNAL_REQ_WR))
-   ? IB_SIGNAL_REQ_WR : 0;
+   if (qp->s_flags & (1 << IPATH_S_SIGNAL_REQ_WR))
+   init_attr->sq_sig_type = IB_SIGNAL_REQ_WR;
+   else
+   init_attr->sq_sig_type = IB_SIGNAL_ALL_WR;
init_attr->qp_type = qp->ibqp.qp_type;
init_attr->port_num = 1;
return 0;
@@ -776,8 +777,10 @@ struct ib_qp *ipath_create_qp(struct ib_
qp->s_wq = swq;
qp->s_size = init_attr->cap.max_send_wr + 1;
qp->s_max_sge = init_attr->cap.max_send_sge;
-   qp->s_flags = init_attr->sq_sig_type == IB_SIGNAL_REQ_WR ?
-   1 << IPATH_S_SIGNAL_REQ_WR : 0;
+   if (init_attr->sq_sig_type == IB_SIGNAL_REQ_WR)
+   qp->s_flags = 1 << IPATH_S_SIGNAL_REQ_WR;
+   else
+   qp->s_flags = 0;
dev = to_idev(ibpd->device);
err = ipath_alloc_qpn(&dev->qp_table, qp,
  init_attr->qp_type);

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 21 of 23] IB/ipath - fix return value from ipath_poll

2006-08-25 Thread Bryan O'Sullivan
This stops the generic poll code from waiting for a timeout.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c 
b/drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c  Fri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c  Fri Aug 25 11:19:45 
2006 -0700
@@ -1150,6 +1150,7 @@ static unsigned int ipath_poll(struct fi
struct ipath_portdata *pd;
u32 head, tail;
int bit;
+   unsigned pollflag = 0;
struct ipath_devdata *dd;
 
pd = port_fp(fp);
@@ -1186,9 +1187,12 @@ static unsigned int ipath_poll(struct fi
clear_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag);
pd->port_rcvwait_to++;
}
+   else
+   pollflag = POLLIN | POLLRDNORM;
}
else {
/* it's already happened; don't do wait_event overhead */
+   pollflag = POLLIN | POLLRDNORM;
pd->port_rcvnowait++;
}
 
@@ -1196,7 +1200,7 @@ static unsigned int ipath_poll(struct fi
ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
 dd->ipath_rcvctrl);
 
-   return 0;
+   return pollflag;
 }
 
 static int try_alloc_port(struct ipath_devdata *dd, int port,

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 16 of 23] IB/ipath - be more strict about testing the modify QP verb

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c 
b/drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
@@ -455,11 +455,16 @@ int ipath_modify_qp(struct ib_qp *ibqp, 
attr_mask))
goto inval;
 
-   if (attr_mask & IB_QP_AV)
+   if (attr_mask & IB_QP_AV) {
if (attr->ah_attr.dlid == 0 ||
attr->ah_attr.dlid >= IPATH_MULTICAST_LID_BASE)
goto inval;
 
+   if ((attr->ah_attr.ah_flags & IB_AH_GRH) &&
+   (attr->ah_attr.grh.sgid_index > 1))
+   goto inval;
+   }
+
if (attr_mask & IB_QP_PKEY_INDEX)
if (attr->pkey_index >= ipath_get_npkeys(dev->dd))
goto inval;
@@ -468,6 +473,27 @@ int ipath_modify_qp(struct ib_qp *ibqp, 
if (attr->min_rnr_timer > 31)
goto inval;
 
+   if (attr_mask & IB_QP_PORT)
+   if (attr->port_num == 0 ||
+   attr->port_num > ibqp->device->phys_port_cnt)
+   goto inval;
+
+   if (attr_mask & IB_QP_PATH_MTU)
+   if (attr->path_mtu > IB_MTU_4096)
+   goto inval;
+
+   if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC)
+   if (attr->max_dest_rd_atomic > 1)
+   goto inval;
+ 
+   if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC)
+   if (attr->max_rd_atomic > 1)
+   goto inval;
+ 
+   if (attr_mask & IB_QP_PATH_MIG_STATE)
+   if (attr->path_mig_state != IB_MIG_MIGRATED)
+   goto inval;
+ 
switch (new_state) {
case IB_QPS_RESET:
ipath_reset_qp(qp);
@@ -517,6 +543,9 @@ int ipath_modify_qp(struct ib_qp *ibqp, 
 
if (attr_mask & IB_QP_MIN_RNR_TIMER)
qp->r_min_rnr_timer = attr->min_rnr_timer;
+
+   if (attr_mask & IB_QP_TIMEOUT)
+   qp->timeout = attr->timeout;
 
if (attr_mask & IB_QP_QKEY)
qp->qkey = attr->qkey;
@@ -564,7 +593,7 @@ int ipath_query_qp(struct ib_qp *ibqp, s
attr->max_dest_rd_atomic = 1;
attr->min_rnr_timer = qp->r_min_rnr_timer;
attr->port_num = 1;
-   attr->timeout = 0;
+   attr->timeout = qp->timeout;
attr->retry_cnt = qp->s_retry_cnt;
attr->rnr_retry = qp->s_rnr_retry;
attr->alt_port_num = 0;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h 
b/drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri Aug 25 11:19:45 2006 -0700
@@ -371,6 +371,7 @@ struct ipath_qp {
u8 s_retry; /* requester retry counter */
u8 s_rnr_retry; /* requester RNR retry counter */
u8 s_pkey_index;/* PKEY index to use */
+   u8 timeout; /* Timeout for this QP */
enum ib_mtu path_mtu;
u32 remote_qpn;
u32 qkey;   /* QKEY for this QP (for UD or RD) */

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 20 of 23] IB/ipath - allow SMA to be disabled

2006-08-25 Thread Bryan O'Sullivan
This is useful for testing purposes.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c 
b/drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri Aug 25 11:19:45 2006 -0700
@@ -107,6 +107,10 @@ module_param_named(max_srq_wrs, ib_ipath
   uint, S_IWUSR | S_IRUGO);
 MODULE_PARM_DESC(max_srq_wrs, "Maximum number of SRQ WRs support");
 
+static unsigned int ib_ipath_disable_sma;
+module_param_named(disable_sma, ib_ipath_disable_sma, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(ib_ipath_disable_sma, "Disable the SMA");
+
 const int ib_ipath_state_ops[IB_QPS_ERR + 1] = {
[IB_QPS_RESET] = 0,
[IB_QPS_INIT] = IPATH_POST_RECV_OK,
@@ -354,6 +358,9 @@ static void ipath_qp_rcv(struct ipath_ib
switch (qp->ibqp.qp_type) {
case IB_QPT_SMI:
case IB_QPT_GSI:
+   if (ib_ipath_disable_sma)
+   break;
+   /* FALLTHROUGH */
case IB_QPT_UD:
ipath_ud_rcv(dev, hdr, has_grh, data, tlen, qp);
break;

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 15 of 23] IB/ipath - add serial number to hardware freeze error message

2006-08-25 Thread Bryan O'Sullivan
Also added the word "Hardware" after "Fatal" to make it more obvious
that it's hardware, not software.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c 
b/drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c   Fri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c   Fri Aug 25 11:19:45 
2006 -0700
@@ -461,8 +461,9 @@ static void ipath_ht_handle_hwerrors(str
 * times.
 */
if (dd->ipath_flags & IPATH_INITTED) {
-   ipath_dev_err(dd, "Fatal Error (freeze "
- "mode), no longer usable\n");
+   ipath_dev_err(dd, "Fatal Hardware Error (freeze 
"
+ "mode), no longer usable, SN 
%.16s\n",
+ dd->ipath_serial);
isfatal = 1;
}
*dd->ipath_statusp &= ~IPATH_STATUS_IB_READY;
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c 
b/drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c   Fri Aug 25 11:19:45 
2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c   Fri Aug 25 11:19:45 
2006 -0700
@@ -363,8 +363,9 @@ static void ipath_pe_handle_hwerrors(str
 * and we get here multiple times
 */
if (dd->ipath_flags & IPATH_INITTED) {
-   ipath_dev_err(dd, "Fatal Error (freeze "
- "mode), no longer usable\n");
+   ipath_dev_err(dd, "Fatal Hardware Error (freeze 
"
+ "mode), no longer usable, SN 
%.16s\n",
+ dd->ipath_serial);
isfatal = 1;
}
/*

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH 17 of 23] IB/ipath - validate path_mig_state properly

2006-08-25 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c 
b/drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.cFri Aug 25 11:19:45 2006 -0700
@@ -491,7 +491,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, 
goto inval;
  
if (attr_mask & IB_QP_PATH_MIG_STATE)
-   if (attr->path_mig_state != IB_MIG_MIGRATED)
+   if (attr->path_mig_state != IB_MIG_MIGRATED &&
+   attr->path_mig_state != IB_MIG_REARM)
goto inval;
  
switch (new_state) {

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] osm: handle local events

2006-08-25 Thread Sasha Khapyorsky
On 07:58 Fri 25 Aug , Greg Lindahl wrote:
> On Fri, Aug 25, 2006 at 05:17:04PM +0300, Sasha Khapyorsky wrote:
> 
> > So more generic question: some application performs blocked read() from
> > /dev/umadN, should this read() be interrupted and return error (with
> > appropriate errno value), then the port state becomes DOWN?
> 
> Iif the SM gets a signal (alarm timeout) and the read() is interrupted
> with errno=EINTR... presumably this is not the case you had in mind.

Right, not this case, I'm not about signals. By "interrupted" I meant
that read() returns error.

Sasha

> 
> -- greg
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Tom Tucker
On Fri, 2006-08-25 at 12:51 -0400, Talpey, Thomas wrote: 
> At 12:40 PM 8/25/2006, Sean Hefty wrote:
> >>Thomas> How does an adapter guarantee that no bridges or other
> >>Thomas> intervening devices reorder their writes, or for that
> >>Thomas> matter flush them to memory at all!?
> >>
> >>That's a good point.  The HCA would have to do a read to flush the
> >>posted writes, and I'm sure it's not doing that (since it would add
> >>horrible latency for no good reason).
> >>
> >>I guess it's not safe to rely on ordering of RDMA writes after all.
> >
> >Couldn't the same point then be made that a CQ entry may come before the data
> >has been posted?
> 
> When the CQ entry arrives, the context that polls it off the queue
> must use the dma_sync_*() api to finalize any associated data
> transactions (known by the uper layer).
> 
> This is basic, and it's the reason that a completion is so important.
> The completion, in and of itself, isn't what drives the synchronization.
> It's the transfer of control to the processor.

This is a giant rat hole. 

On a coherent cache architecture, the CQE write posted to the bus
following the write of the last byte of data will NOT be seen by the
processor prior to the last byte of data. That is, write ordering is
preserved in bridges.

The dma_sync_* API has to do with processor cache, not transaction
ordering. In fact, per this argument at the time you called dma_sync_*,
the processor may not have seen the reordered transaction yet, so what
would it be syncing?

Write ordering and read ordering/fence is preserved in intervening
bridges. What you DON'T know is whether or not a write (which was posted
and may be sitting in a bridge FIFO) has been flushed and/or propagated
to memory at the time you submit the next write and/or interrupt the
host. 

If you submit a READ following the write, however, per the PCI bus
ordering rules you know that the data is in the target.  

Unless, of course, I'm wrong ... :-)


> 
> Tom.
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Jason Gunthorpe
On Fri, Aug 25, 2006 at 09:33:31AM -0700, Roland Dreier wrote:
> Thomas> How does an adapter guarantee that no bridges or other
> Thomas> intervening devices reorder their writes, or for that
> Thomas> matter flush them to memory at all!?
 
> That's a good point.  The HCA would have to do a read to flush the
> posted writes, and I'm sure it's not doing that (since it would add
> horrible latency for no good reason).

PCI (-X and -E) have strict transaction ordering rules, writes may not
be re-ordered, and two ordered writes to the same address have defined
semantics. One thing that is absolutely assured in a PCI system
is that if write B follows write A and the CPU observes B's data
then all of A must be visible to the CPU.

What I don't recall being assured is if all the data in a single
transaction has some defined order that it must be become visible to
the CPU..

This is why CQ's don't have a problem, seeing the new CQ entry, or the
MSI, is enough to ensure everything is visible to the CPU.

So, at the worst case, all a HCA would have to do is put the last
dword in a seperate transaction..

Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Steve Wise
On Fri, 2006-08-25 at 09:53 -0700, Roland Dreier wrote:
> Sean> Couldn't the same point then be made that a CQ entry may
> Sean> come before the data has been posted?
> 
> That's true -- I guess I need to look at what ordering guarantees the
> PCI spec makes to give a real answer.
> 

I believe bus bridges between devices and memory _must_ ensure write
ordering.  Otherwise nothing works, right?




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] OFED 1.1-rc2 is ready

2006-08-25 Thread Doug Ledford
On Mon, 2006-08-21 at 21:49 +0300, Tziporet Koren wrote:
> Hi,
> 
> OFED 1.1-RC2 is avilable on 
> https://openib.org/svn/gen2/branches/1.1/ofed/releases/
> File: OFED-1.1-rc2.tgz
> Please report any issues in bugzilla http://openib.org/bugzilla/
> 
> Tziporet & Vlad
> -
> 
> Release details:
> 
> 
> Build_id:
> OFED-1.1-rc2
> 
> openib-1.1 (REV=9037)
> # User space
> https://openib.org/svn/gen2/branches/1.1/src/userspace
> Git:
> ref: refs/heads/ofed_1_1
> commit a13195d7ca0f047f479a58b2a81ff2b796eb8fa4
> 
> # MPI
> mpi_osu-0.9.7-mlx2.2.0.tgz
> openmpi-1.1-1.src.rpm
> mpitests-2.0-0.src.rpm
> 
> 
> OS support:
> ===
> Novell:
>   - SLES 9.0 SP3*
>   - SLES10 (official release)*
> Redhat:
>   - Redhat EL4 up3
>   - Redhat EL4 up4* (not supported yet)
> kernel.org:
>   - Kernel 2.6.17*
> * Changed from 1.0 release
> 
> Note: Redhat EL4 up2, Fedora C4 and SuSE Pro 10 were dropped from the list.
> We keep the backport patches for these OSes and make sure OFED compile and 
> loaded properly but will not do full QA cycle.
> 
> Systems:
> 
> * x86_64
> * x86
> * ia64
> * ppc64

Not supporting ppc is a problem to a certain extent.  I can't speak for
SuSE, but at least for Red Hat, ppc is the default and over rides ppc64.
The ppc64 arch is less efficient than the ppc arch on ppc64 processors
except when large memory footprints are involved.  So, for things like
opensm, ibv_*, etc. the ppc arch should actually be preferred, and the
ppc64 arch libs should be present for those end user apps that need
large memory access.  That fact that dapl doesn't compile on ppc at all
is problematic as well.  In addition, what are you guys doing about the
lack of asm/atomic.h (breaking udapl compiles on ppc64 and ia64) going
forward?  I'd look in the packages and see for myself but the svn update
is taking forever due to those binary rpms packed into svn...ahh, it's
finally doneok, still broken.

Without getting into an argument over the usage of that include, suffice
it to say that the include file is gone and builds fails on
fc6/rhel5beta.  Since the code really only uses low level intrinsics as
opposed to high level atomic ops, I made a ppc and ia64 intrinsics
header for linux and added it to the dapl package itself to work around
the issue.

> 
> Main changes from OFED-1.1-rc1:
> ===
> 1. ipath driver: 
>   - Compilation pass on all systems, except SLES9 SP3.
>   - See list of changes in the ipath driver at the end
> 2. SDP: 
>   - Fixed issue with 32 bit systems run out of low memory when opening 
> hundreds of sockets.
>   - Added out of band and message peek support; telnet and ftp are now 
> working
> 3. SRP - a new srp_daemon was added - see explanation at the end
> 4. IPoIB: High availability support using a daemon in user level. 
>Daemon is located under /userspace/ipoibtools/. See explanation at the end.
> 5. Added Madeye utility
> 6. Added verbs fork support. Should work from kernel 2.6.16
> 7. Fatal error support in mthca
> 8. iSER support in install script for SLES 10 was fixed
> 9. Diagnostic tools does not requires opensm installation.
>For this the following changes were done to opensm RPM: 
>   opensm-devel was removed
>New packages were added:
>   libosmcomp
>   libosmcomp-devel
>   libosmvendor
>   libosmvendor-devel
>   libopensm
>   libopensm-devel

Ugh.  Each library does not need it's own package.  Imagine what X would
do to your RPM count otherwise.  For grouped libraries like this, it is
perfectly acceptable to do opensm, opensm-libs, opensm-devel (and that's
in fact what I did for RHEL4 U4).  Regardless though, make a decision
and stick to it.  Changing package names with each release == not good.

> 10. bug fixes:
>- SRP: Add local_ib_device/local_ib_port attributes to srp scsi_host
>- mthca: fence bit supported; fixed deadlock in destroy qp
>- ipoib: connectivity lost on sm lid change
>- OSM: fix to work with Cisco stack
> 
> 
> Limitations and known issues:
> =
> 1. SDP: For Mellanox Sinai HCAs one must use latest FW version (1.1.000).
> 2. SDP: Get peer name is not working properly
> 3. SDP: Scalability issue when many connections are opened
> 4. ipath driver does not compile on SLES9 SP3
> 5. RHEL4 up4 is not supported due to problems in the backport patches.

You should be able to start by pulling the patches that are already
applied out of the RHEL4 U4 kernel rpm, looking at which ones fix up the
core kernel to provide what's needed instead of doing a thousand little
backports all over the kernel tree, and axing any backport patches you
had planned that would undo that.  IOW, make use of the infrastructure
provided in U4 instead of working around it.

> 
> Missing features that should be completed for RC3:
> =

Re: [openib-general] basic IB doubt

2006-08-25 Thread Caitlin Bestler
Woodruff, Robert J wrote:
> Catlin wrote,
>> Another point, even if a vendor were to implement the firmware you
>> suggest, how does the Data Source know that it is safe to use just
>> RDMA Writes? The enabling firmware is in the Data Sink.
> 
> Huh, don't understand the question.
> 
>> Applications certainly do not want to have to validate the model of
>> the RNIC that they are connected with.
> 
> If ISVs want to use an RNIC that does not support this
> technique, then they will have to implement their completion
> checking another way, which will be slower, so the hardware
> NICs that do not support this fast polling completion
> technique will be at a competitive disadvantage. Sometimes
> you can lead a horse to water, but you can't make then drink.

The benefit of "last byte RDMA Write ordering" would be to sipmlify
the logic of the remote peer doing the RDMA Writes. It does not benefit
the application doing the receiving.

The decision on whether or not to take the action that requires
a clean completion at the data sink must be taken by the data
source -- which has no method of knowing what vendor specific
features the remote peer has.

The whole point of using a standard protocol is to at least
define the optional features in a vendor independent way.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Woodruff, Robert J
Catlin wrote,
>Another point, even if a vendor were to implement the firmware
>you suggest, how does the Data Source know that it is safe to
>use just RDMA Writes? The enabling firmware is in the Data Sink.

Huh, don't understand the question.

>Applications certainly do not want to have to validate the model
>of the RNIC that they are connected with.

If ISVs want to use an RNIC that does not support this technique,
then they will have to implement their completion checking
another way, which will be slower, so the hardware NICs that
do not support this fast polling completion technique will be
at a competitive disadvantage. Sometimes you can lead a
horse to water, but you can't make then drink. 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Roland Dreier
Sean> Couldn't the same point then be made that a CQ entry may
Sean> come before the data has been posted?

That's true -- I guess I need to look at what ordering guarantees the
PCI spec makes to give a real answer.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Talpey, Thomas
At 12:40 PM 8/25/2006, Sean Hefty wrote:
>>Thomas> How does an adapter guarantee that no bridges or other
>>Thomas> intervening devices reorder their writes, or for that
>>Thomas> matter flush them to memory at all!?
>>
>>That's a good point.  The HCA would have to do a read to flush the
>>posted writes, and I'm sure it's not doing that (since it would add
>>horrible latency for no good reason).
>>
>>I guess it's not safe to rely on ordering of RDMA writes after all.
>
>Couldn't the same point then be made that a CQ entry may come before the data
>has been posted?

When the CQ entry arrives, the context that polls it off the queue
must use the dma_sync_*() api to finalize any associated data
transactions (known by the uper layer).

This is basic, and it's the reason that a completion is so important.
The completion, in and of itself, isn't what drives the synchronization.
It's the transfer of control to the processor.

Tom.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Caitlin Bestler
[EMAIL PROTECTED] wrote:
>>Thomas> How does an adapter guarantee that no bridges or other
>>Thomas> intervening devices reorder their writes, or for that
>>Thomas> matter flush them to memory at all!?
>> 
>> That's a good point.  The HCA would have to do a read to flush the
>> posted writes, and I'm sure it's not doing that (since it would add
>> horrible latency for no good reason).
>> 
>> I guess it's not safe to rely on ordering of RDMA writes after all.
> 
> Couldn't the same point then be made that a CQ entry may come
> before the data has been posted?
> 

That's why both specs (IBTA and RDMAC) are very explicit that all
prior messages are complete before the CQE is given to the user.

It is up to the RDMA Device and/or its driver to guarantee this
by whatever means are appropriate. An implementation that allows
a CQE post to pass the data placement that it is reporting on the
PCI bus is in error.

The critical concept of the Work Completion is that it consolidates
guarantees and notificatins. The implementation can do all sorts
of strange things that it thinks optimize *before* the work completion,
but at the time the work completion is delivered to the user everything
is supposed to be as expected.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Caitlin Bestler
Woodruff, Robert J wrote:
> Catlin wrote,
> 
>> For iWARP there are network performance reasons why in-order memory
>> writes will never be guaranteed.
> 
> For iWarp, or any other RDMA over Ethernet protocol, the
> behavior is not to guarantee all packets are written
> in-order, just that the last byte of the last packet is
> written last. This can easily be implemented in an iWarp card
> or by the driver with minimal performance impact in most cases.
> 
> So for example, if the last packet arrives before all the
> other packets have arrived, the iWarp card or driver does not
> place that data of the last packet until all the other
> packets have arrived.
> 
> woody

Another point, even if a vendor were to implement the firmware
you suggest, how does the Data Source know that it is safe to
use just RDMA Writes? The enabling firmware is in the Data Sink.

Applications certainly do not want to have to validate the model
of the RNIC that they are connected with.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Sean Hefty
>Thomas> How does an adapter guarantee that no bridges or other
>Thomas> intervening devices reorder their writes, or for that
>Thomas> matter flush them to memory at all!?
>
>That's a good point.  The HCA would have to do a read to flush the
>posted writes, and I'm sure it's not doing that (since it would add
>horrible latency for no good reason).
>
>I guess it's not safe to rely on ordering of RDMA writes after all.

Couldn't the same point then be made that a CQ entry may come before the data
has been posted?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Roland Dreier
Thomas> How does an adapter guarantee that no bridges or other
Thomas> intervening devices reorder their writes, or for that
Thomas> matter flush them to memory at all!?

That's a good point.  The HCA would have to do a read to flush the
posted writes, and I'm sure it's not doing that (since it would add
horrible latency for no good reason).

I guess it's not safe to rely on ordering of RDMA writes after all.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] drop mthca from svn?

2006-08-25 Thread Roland Dreier
James> I'm concerned about the licensing implications on moving
James> the code.  Most of the source code hosted on kernel.org is
James> GPL-only (the sparse repository is the only one I know of
James> that is not).

I have no plans to change any of the copyright notices on the code.  I
see quite a few other dual licensed drivers in the Linux kernel tree
so I don't see why the current status quo would be a problem.

James> If the code is moved, how can the OpenFabrics community be
James> guaranteed that the entire software stack will remain under
James> a dual BSD/GPL license?

You can't guarantee that someone won't come along and write some IB
driver and get it merged upstream without a BSD license.  So there's
not much we can do anyway.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] A critique of RDMA PUT/GET in HPC

2006-08-25 Thread Greg Lindahl
On Fri, Aug 25, 2006 at 10:13:01AM -0500, Tom Tucker wrote:

> He does say this, but his analysis does not support this conclusion. His
> analysis revolves around MPI send/recv, not the MPI 2.0 get/put
> services.

Nobody uses MPI put/get anyway, so leaving out analyzing that doesn't
change reality much.

> A valid conclusion IMO is that "MPI send/recv can
> be most efficiently implemented over an unconnected reliable datagram
> protocol that supports 64bit tag matching at the data sink." And not
> coincidentally, Myricom has this ;-)

As do all of the non-VIA-family interconnects he mentions.  Since "we"
all landed on the same conclusion, you might think we're on to
something. Or not.

However, that's only part of the argument.  Another part is that the
buffer space needed to use RDMA put/get for all data links is huge.
And there are some other interesting points.

> I DO agree that it is interesting reading. :-), it's definitely got
> people fired up.

Heh. Glad you found it interesting.

-- greg


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] A critique of RDMA PUT/GET in HPC

2006-08-25 Thread Tom Tucker
On Thu, 2006-08-24 at 15:53 -0700, Greg Lindahl wrote:
> For those of you interested in this topic, there's an interesting
> article by Patrick Geoffrey in HPCWire entitled "A Critique of RDMA".
> 
> http://www.hpcwire.com/hpc/815242.html
> 
> (you might have to be a subscriber, but I'm sure Patrick would send
> you a copy if you ask.)
> 
> It's basically a critique of why SEND/RECV is better for MPI
> implementations than PUT/GET.

He does say this, but his analysis does not support this conclusion. His
analysis revolves around MPI send/recv, not the MPI 2.0 get/put
services. He makes the point (true in my opinion) that the MPI_RECV
64bit (tag,communicator) filter make MPI_RECV prickly to implement on
IB/iWARP SEND/RECV and IB/iWARP RDMA. His data are drawn from
observations of MPI applications that use MPI send/recv mapped to an
RDMA transport. However, his conclusion covers a programming model (MPI
get/put) that is not observed in the data. In other words, he doesn't
compare the performance of an algorithm implemented using MPI send/recv
vs. the same algorithm implemented using MPI get/put. He evaluates the
performance of an algorithm implemented using MPI send/recv mapped to an
RDMA transport and then says because this mapping has problems that the
RDMA programming model is bad. That conclusion is not supported by his
analysis or his data. A valid conclusion IMO is that "MPI send/recv can
be most efficiently implemented over an unconnected reliable datagram
protocol that supports 64bit tag matching at the data sink." And not
coincidentally, Myricom has this ;-)

I DO agree that it is interesting reading. :-), it's definitely got
people fired up.

My 2 cents.


> 
> Even if you don't agree with him, it's good reading. For motivation,
> you might want to note that most of the SEND/RECV-based products
> mentioned achieve better MPI 0-byte latency than IB Verbs-based MPI
> implementations.
> 
> While I don't agree with everything Patrick says, this does get back
> to my point that I've run into many people who assume that PUT/GET is
> always the right way to do things. And it isn't.
> 
> -- greg
> 
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] drop mthca from svn? (was: Rollup patch for ipath and OFED)

2006-08-25 Thread James Lentini


On Thu, 24 Aug 2006, Bryan O'Sullivan wrote:

> On Thu, 2006-08-24 at 09:31 -0700, Roland Dreier wrote:
> 
> > Along those lines, how would people feel if I removed the mthca kernel
> > code from svn, and just maintained mthca in kernel.org git trees?
> 
> +1 from me.  We'll drop the ipath code, too.

I'm concerned about the licensing implications on moving the code. 
Most of the source code hosted on kernel.org is GPL-only (the sparse 
repository is the only one I know of that is not).

If the code is moved, how can the OpenFabrics community be guaranteed 
that the entire software stack will remain under a dual BSD/GPL 
license?

If the only goal is to use git, git can be setup on an OpenFabrics 
server.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] osm: handle local events

2006-08-25 Thread Greg Lindahl
On Fri, Aug 25, 2006 at 05:17:04PM +0300, Sasha Khapyorsky wrote:

> So more generic question: some application performs blocked read() from
> /dev/umadN, should this read() be interrupted and return error (with
> appropriate errno value), then the port state becomes DOWN?

Iif the SM gets a signal (alarm timeout) and the read() is interrupted
with errno=EINTR... presumably this is not the case you had in mind.

-- greg


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] osm: handle local events

2006-08-25 Thread Sasha Khapyorsky
On 16:28 Thu 24 Aug , Michael S. Tsirkin wrote:
> Quoting r. Yevgeny Kliteynik <[EMAIL PROTECTED]>:
> > Index: libvendor/osm_vendor_ibumad.c
> > ===
> > --- libvendor/osm_vendor_ibumad.c   (revision 8998)
> > +++ libvendor/osm_vendor_ibumad.c   (working copy)
> > @@ -72,6 +72,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  /s* OpenSM: Vendor AL/osm_umad_bind_info_t
> >   * NAME
> 
> NAK.
> 
> This means that the SM becomes dependent on the uverbs module.  I don't think
> this is a good idea.  Let's not go there - SM should depend just on the umad
> module and libc.

Agree on this point. I dislike this new libibverbs dependency too. I
think we need to work with umad.

So more generic question: some application performs blocked read() from
/dev/umadN, should this read() be interrupted and return error (with
appropriate errno value), then the port state becomes DOWN?

I think yes, it should. Other opinions? Sean?

And if yes, then in OpenSM we will need just to check errno value upon
umad_recv() failure.

Sasha

>  In particular, SM should work even on embedded platforms where
> uverbs do not necessarily work.
> 
> Further, hotplug events still do not seem to be handled, even with this patch.
> 
> For port events, it seems sane that umad module could provide a way
> to listen for them.
> 
> A recent patch to mthca converts fatal events to hotplug events, so fatal 
> events
> can and should be handled as part of general hotplug support.
> 
> -- 
> MST
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] basic IB doubt

2006-08-25 Thread Thomas Bachman

>On Thu, Aug 24, 2006 at 03:13:33PM -0700, Woodruff, Robert J wrote:
>> If the feature gives them a huge advantage in performance (and it
>> does) and all of the hardware vendors that they deal with already
>> implement it, then yes, they will force, by defacto standard that
>> all other newcomers implement it or face the fact that no one will
>> buy their hardware. It seems like that is what is happening in this
>> case.
>
>In this case the feature reduces performance on one HCA and increases
>it on another. Which shows why it's a bad idea to pick features based
>on a single implementation.
>
> But you're still confusing practicality and theory. I can see why it's
> pratical sense for newcomers to implement this new, performance-
> reducing feature. But why is it theoretically good? And shouldn't it
> be added to the standard, before all the poor iWarp people discover
> the hard way that they need it?
>
> -- greg

Not that I have any stance on this issue, but is this is the text in the
spec that is being debated? 

(page 269, section 9.5, Transaction Ordering):
"An application shall not depend upon the order of data writes to
memory within a message. For example, if an application sets up
data buffers that overlap, for separate data segments within a
message, it is not guaranteed that the last sent data will always
overwrite the earlier."

I'm assuming that the spec authors had reason for putting this in there, so
maybe they could provide guidance here?

Or was this only meant to apply to SENDs, and not RDMA WRITEs?

Cheers,

-Thomas Bachman


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] opensm: libibmad: rpc API which supports more than one ports.

2006-08-25 Thread Sasha Khapyorsky

This provides RPC like API which may work with several ports.

Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>
---

 libibmad/include/infiniband/mad.h |9 +++
 libibmad/src/libibmad.map |4 +
 libibmad/src/register.c   |   20 +--
 libibmad/src/rpc.c|  106 +++--
 libibumad/src/umad.c  |4 +
 5 files changed, 130 insertions(+), 13 deletions(-)

diff --git a/libibmad/include/infiniband/mad.h 
b/libibmad/include/infiniband/mad.h
index 45ff572..bd8a80b 100644
--- a/libibmad/include/infiniband/mad.h
+++ b/libibmad/include/infiniband/mad.h
@@ -660,6 +660,7 @@ uint64_t mad_trid(void);
 intmad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, 
ib_rmpp_hdr_t *rmpp, void *data);
 
 /* register.c */
+intmad_register_port_client(int port_id, int mgmt, uint8_t rmpp_version);
 intmad_register_client(int mgmt, uint8_t rmpp_version);
 intmad_register_server(int mgmt, uint8_t rmpp_version,
uint32_t method_mask[4], uint32_t class_oui);
@@ -704,6 +705,14 @@ void   madrpc_lock(void);
 void   madrpc_unlock(void);
 void   madrpc_show_errors(int set);
 
+void * mad_rpc_open_port(char *dev_name, int dev_port, int *mgmt_classes,
+ int num_classes);
+void   mad_rpc_close_port(void *ibmad_port);
+void * mad_rpc(void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport,
+   void *payload, void *rcvdata);
+void *  mad_rpc_rmpp(void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport,
+ib_rmpp_hdr_t *rmpp, void *data);
+
 /* smp.c */
 uint8_t * smp_query(void *buf, ib_portid_t *id, uint attrid, uint mod,
uint timeout);
diff --git a/libibmad/src/libibmad.map b/libibmad/src/libibmad.map
index bf81bd1..78b7ff0 100644
--- a/libibmad/src/libibmad.map
+++ b/libibmad/src/libibmad.map
@@ -62,6 +62,10 @@ IBMAD_1.0 {
ib_resolve_self;
ib_resolve_smlid;
ibdebug;
+   mad_rpc_open_port;
+   mad_rpc_close_port;
+   mad_rpc;
+   mad_rpc_rmpp;
madrpc;
madrpc_def_timeout;
madrpc_init;
diff --git a/libibmad/src/register.c b/libibmad/src/register.c
index 4f44625..52d6989 100644
--- a/libibmad/src/register.c
+++ b/libibmad/src/register.c
@@ -43,6 +43,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "mad.h"
@@ -118,7 +119,7 @@ mad_agent_class(int agent)
 }
 
 int
-mad_register_client(int mgmt, uint8_t rmpp_version)
+mad_register_port_client(int port_id, int mgmt, uint8_t rmpp_version)
 {
int vers, agent;
 
@@ -126,7 +127,7 @@ mad_register_client(int mgmt, uint8_t rm
DEBUG("Unknown class %d mgmt_class", mgmt);
return -1;
}
-   if ((agent = umad_register(madrpc_portid(), mgmt,
+   if ((agent = umad_register(port_id, mgmt,
   vers, rmpp_version, 0)) < 0) {
DEBUG("Can't register agent for class %d", mgmt);
return -1;
@@ -137,13 +138,22 @@ mad_register_client(int mgmt, uint8_t rm
return -1;
}
 
-   if (register_agent(agent, mgmt) < 0)
-   return -1;
-
return agent;
 }
 
 int
+mad_register_client(int mgmt, uint8_t rmpp_version)
+{
+   int agent;
+
+   agent = mad_register_port_client(madrpc_portid(), mgmt, rmpp_version);
+   if (agent < 0)
+   return agent;
+
+   return register_agent(agent, mgmt);
+}
+
+int
 mad_register_server(int mgmt, uint8_t rmpp_version,
uint32_t method_mask[4], uint32_t class_oui)
 {
diff --git a/libibmad/src/rpc.c b/libibmad/src/rpc.c
index b2d3e77..ac4f361 100644
--- a/libibmad/src/rpc.c
+++ b/libibmad/src/rpc.c
@@ -48,6 +48,13 @@ #include 
 #include 
 #include "mad.h"
 
+#define MAX_CLASS 256
+
+struct ibmad_port {
+   int port_id;  /* file descriptor returned by umad_open() */
+   int class_agents[MAX_CLASS]; /* class2agent mapper */
+};
+
 int ibdebug;
 
 static int mad_portid = -1;
@@ -105,7 +112,8 @@ madrpc_portid(void)
 }
 
 static int 
-_do_madrpc(void *sndbuf, void *rcvbuf, int agentid, int len, int timeout)
+_do_madrpc(int port_id, void *sndbuf, void *rcvbuf, int agentid, int len,
+  int timeout)
 {
uint32_t trid; /* only low 32 bits */
int retries;
@@ -133,7 +141,7 @@ _do_madrpc(void *sndbuf, void *rcvbuf, i
}
 
length = len;
-   if (umad_send(mad_portid, agentid, sndbuf, length, timeout, 0) 
< 0) {
+   if (umad_send(port_id, agentid, sndbuf, length, timeout, 0) < 
0) {
IBWARN("send failed; %m");
return -1;
}
@@ -141,7 +149,7 @@ _do_madrpc(void *sndbuf, void *rcvbuf, i
/* Use same timeout on receive side just in case */
/* send packet is lost somewhere.