[openib-general] Re: Revenge of the sysfs maintainer! (was Re: [PATCH 8 of 20] ipath - sysfs support for core driver)

2006-03-09 Thread Arjan van de Ven
On Thu, 2006-03-09 at 20:58 -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 17:00 -0800, Greg KH wrote:
> 
> > They are in the latest -mm tree if you wish to use them.  Unfortunatly
> > it might look like they will not work out, due to the per-cpu relay
> > files not working properly with Paul's patches at the moment.
> 
> Hmm, OK.
> 
> > What's wrong with debugfs?
> 
> It's not configured into the kernels of either of the distros I use (Red
> Hat or SUSE).  I can't have a required part of my driver depend on a
> feature that's not enabled in the major distro kernels.

sucks to be you, however I think it's equally or even more unacceptable
to cripple the main kernel because you want to also support antique
kernels (those more than 12 months old). The general rule is "if you
want to support that, do it outside the kernel.org tree".


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: TSO and IPoIB performance degradation

2006-03-09 Thread David S. Miller
From: Rick Jones <[EMAIL PROTECTED]>
Date: Thu, 09 Mar 2006 16:21:05 -0800

> well, there are stacks which do "stretch acks" (after a fashion) that 
> make sure when they see packet loss to "do the right thing" wrt sending 
> enough acks to allow cwnds to open again in a timely fashion.

Once a loss happens, it's too late to stop doing the stretch ACKs, the
damage is done already.  It is going to take you at least one
extra RTT to recover from the loss compared to if you were not doing
stretch ACKs.

You have to keep giving consistent well spaced ACKs back to the
receiver in order to recover from loss optimally.

The ACK every 2 full sized frames behavior of TCP is absolutely
essential.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: TSO and IPoIB performance degradation

2006-03-09 Thread David S. Miller
From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
Date: Fri, 10 Mar 2006 02:10:31 +0200

> But with the change we are discussing, could an ack now be sent even
> sooner than we have at least two full sized segments?  Or does
> __tcp_ack_snd_check delay until we have at least two full sized
> segments? David, could you explain please?

__tcp_ack_snd_check() delays until we have at least two full
sized segments.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: Revenge of the sysfs maintainer! (was Re: [PATCH 8 of 20] ipath - sysfs support for core driver)

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 08:58:13PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 17:00 -0800, Greg KH wrote:
> 
> > They are in the latest -mm tree if you wish to use them.  Unfortunatly
> > it might look like they will not work out, due to the per-cpu relay
> > files not working properly with Paul's patches at the moment.
> 
> Hmm, OK.
> 
> > What's wrong with debugfs?
> 
> It's not configured into the kernels of either of the distros I use (Red
> Hat or SUSE).

Well, I can do something about SuSE, it's up to someone else to persuade
Red Hat :)

> I can't have a required part of my driver depend on a feature that's
> not enabled in the major distro kernels.

Fair enough.

> I'd like a mechanism that is (a) always there (b) easy for kernel to use
> and (c) easy for userspace to use.  A sysfs file satisfies a, b, and c,
> but I can't use it; a sysfs bin file satisfies all three (a bit worse on
> b), but I can't use it; debugfs isn't there, so I can't use it.
> 
> That leaves me with few options, I think.  What do you suggest?  (Please
> don't say netlink.)

Write your own filesystem?  Seriously, you do that and you get to set
all of your own rules (well, within reason).  It's only 200 lines of
code, max.

thanks,

greg k-h
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 09:09:37PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 17:11 -0800, Greg KH wrote:
> 
> > These two files sure do show a lot of different stuff, all in a
> > predefined structure for a single file.  Please break them up into the
> > different individual files please.
> 
> The problem is that I want them to be presented together.  They look
> like a pile of different stuff, but they're actually Infiniband NodeInfo
> and PortInfo structures.  And yes, they are that ugly.

Then why not just have a bunch of different files for the different
things, and then a simple shell script to grab them all and put them
together however you want.

The main issue is that if you create a sysfs file like this, and then in
3 months realize that you need to change one of those characters to
be something else, you are in big trouble...

> These files fall into the same categories as the atomic_counters and
> atomic_snapshots files you raised objections to earlier; it actually
> makes sense to look at them as a whole, not their constituent parts.

Sure, lots of different files can be combined by a script into a whole.

> In the earlier round of review, people suggested that I use netlink for
> stuff like this, but I quickly decided I'd rather gnaw my leg off than
> use the netlink API.

Just because you don't want to use it doesn't mean it isn't the proper
tool...

thanks,

greg k-h
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
Bryan> So.  Our userspace SMA that talks to /dev/ipath_sma can run
Bryan> without ib_mad or any other IB code present, and it will
Bryan> coexist with them if they are.

Bryan> In addition, it provides some facilities to ipath_ether
Bryan> that ib_mad doesn't, without which ipath_ether can't
Bryan> support multicast, for example.

So why do you need all the in-kernel SMA code in ipath_mad.c?

Why can't you just have one SMA that has the union of all the features
you need?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
Bryan> I *assumed* that there was something more that we would
Bryan> need to do in order to support real hotplug of actual
Bryan> physical cards, but now that I look more closely, it
Bryan> doesn't appear that there is.  At least, there's nothing in
Bryan> Documentation/pci.txt or LDD3 that indicates to me that we
Bryan> ought to be doing more.

Bryan> Am I missing something?

No, the only problems are with the way the various pieces of your
drivers refer to devices by index.  There are obvious races such as

 > +int __init ipath_verbs_init(void)
 > +{
 > +int i;
 > +
 > +number_of_devices = ipath_layer_get_num_of_dev();
 > +i = number_of_devices * sizeof(struct ipath_ibdev *);
 > +ipath_devices = kmalloc(i, GFP_ATOMIC);
 > +if (ipath_devices == NULL)
 > +return -ENOMEM;
 > +
 > +for (i = 0; i < number_of_devices; i++) {
 > +struct ipath_devdata *dd;
 > +int ret = ipath_verbs_register(i, ipath_ib_piobufavail,
 > +   ipath_ib_rcv, ipath_ib_timer,
 > +   &dd);

suppose number_of_devices gets set to 5 but by the time you call
ipath_verbs_register(5,...), the 5th device has been unplugged?

Also you only do this when the module is loaded, so you won't handle
devices that are hot-plugged later.  And I don't see anything that
would handle hot unplug either.

Pretty much any use of ipath_max is probably broken by hot plug.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 08:41:36PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 17:04 -0800, Greg KH wrote:
> 
> > > I don't expect this to be a practical problem.  We're planning to add
> > > hotplug support to the driver once we have some cycles free.
> > 
> > Ugh, that means it's never going to be there.
> > 
> > All new PCI drivers have the requirement that they work properly in
> > hotplug systems, as they should follow the PCI core api.  If not, odds
> > are they will not be accepted into the tree :(
> 
> Okay, maybe we're talking at cross purposes here.  We do follow the PCI
> core API.  We have a __devinit probe and __devexit remove routine, a
> MODULE_DEVICE_TABLE, the kernel generates hotplug events when a device
> is detected or the driver is unloaded, and so on.
> 
> I *assumed* that there was something more that we would need to do in
> order to support real hotplug of actual physical cards, but now that I
> look more closely, it doesn't appear that there is.  At least, there's
> nothing in Documentation/pci.txt or LDD3 that indicates to me that we
> ought to be doing more.
> 
> Am I missing something?

Nope, that's all that you need to do.  Your driver will be notified that
the device will be going away by calling the disconnect function.  So
great, nothing needs to be done :)

Oh, and you can test this out if you don't have a pci hotplug system by
using the fakephp driver and disconnecting your device that way.

thanks,

greg k-h
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:47 -0800, Roland Dreier wrote:

> Huh?  What does OpenSM working or not have to do with the SMA?

My mention of OpenSM was a braino, by the way.  I mixed up ib_mad and
OpenSM in my response.

So.  Our userspace SMA that talks to /dev/ipath_sma can run without
ib_mad or any other IB code present, and it will coexist with them if
they are.

In addition, it provides some facilities to ipath_ether that ib_mad
doesn't, without which ipath_ether can't support multicast, for example.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 17:11 -0800, Greg KH wrote:

> These two files sure do show a lot of different stuff, all in a
> predefined structure for a single file.  Please break them up into the
> different individual files please.

The problem is that I want them to be presented together.  They look
like a pile of different stuff, but they're actually Infiniband NodeInfo
and PortInfo structures.  And yes, they are that ugly.

These files fall into the same categories as the atomic_counters and
atomic_snapshots files you raised objections to earlier; it actually
makes sense to look at them as a whole, not their constituent parts.

In the earlier round of review, people suggested that I use netlink for
stuff like this, but I quickly decided I'd rather gnaw my leg off than
use the netlink API.

I'm thinking at this point that I should just route this information
through the /dev/ipath_sma char device, and maybe
add /dev/ipath_counters%d and /dev/ipath_stats to go with it.  I think
that's a pretty crummy approach that sysfs solves more cleanly, but
there you go.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: Revenge of the sysfs maintainer! (was Re: [PATCH 8 of 20] ipath - sysfs support for core driver)

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 17:00 -0800, Greg KH wrote:

> They are in the latest -mm tree if you wish to use them.  Unfortunatly
> it might look like they will not work out, due to the per-cpu relay
> files not working properly with Paul's patches at the moment.

Hmm, OK.

> What's wrong with debugfs?

It's not configured into the kernels of either of the distros I use (Red
Hat or SUSE).  I can't have a required part of my driver depend on a
feature that's not enabled in the major distro kernels.

I'd like a mechanism that is (a) always there (b) easy for kernel to use
and (c) easy for userspace to use.  A sysfs file satisfies a, b, and c,
but I can't use it; a sysfs bin file satisfies all three (a bit worse on
b), but I can't use it; debugfs isn't there, so I can't use it.

That leaves me with few options, I think.  What do you suggest?  (Please
don't say netlink.)

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 17:04 -0800, Greg KH wrote:

> > I don't expect this to be a practical problem.  We're planning to add
> > hotplug support to the driver once we have some cycles free.
> 
> Ugh, that means it's never going to be there.
> 
> All new PCI drivers have the requirement that they work properly in
> hotplug systems, as they should follow the PCI core api.  If not, odds
> are they will not be accepted into the tree :(

Okay, maybe we're talking at cross purposes here.  We do follow the PCI
core API.  We have a __devinit probe and __devexit remove routine, a
MODULE_DEVICE_TABLE, the kernel generates hotplug events when a device
is detected or the driver is unloaded, and so on.

I *assumed* that there was something more that we would need to do in
order to support real hotplug of actual physical cards, but now that I
look more closely, it doesn't appear that there is.  At least, there's
nothing in Documentation/pci.txt or LDD3 that indicates to me that we
ought to be doing more.

Am I missing something?

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] ctask and mtask

2006-03-09 Thread Mohit Katiyar, Noida
Hi,
I am referring to iSER gen2 code. I am not able to comprehend the
difference and linkage between ctask and the mtask? What are these two
defined for? And what are the immediate and non immediate control PDUs?
Also can anyone provide me more details of usage of
iser_post_receive_control function.

Thanks and Regards
Mohit Katiyar

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH] OpenSM: Properly encode response time value in SA ClassPortInfo

2006-03-09 Thread Hal Rosenstock
OpenSM: Properly encode response time value in SA ClassPortInfo

Rather than use log function, use a precalculated table to help in
calculation of the encoding (4.096 usec * 2 ** n)

Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

Index: opensm/osm_sa_class_port_info.c
===
--- opensm/osm_sa_class_port_info.c (revision 5733)
+++ opensm/osm_sa_class_port_info.c (working copy)
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved.
  * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
  *
@@ -66,6 +66,17 @@
 #include 
 #include 
 
+#define MAX_MSECS_TO_RTV 24
+/* Precalculated table in msec (index is related to encoded value) */
+/* 4.096 usec * 2 ** n (where n = 8 - 31) */
+static int __msecs_to_rtv_table[MAX_MSECS_TO_RTV] =
+   { 1, 2, 4, 8,
+ 16, 33, 67, 134, 
+ 268, 536, 1073, 2147,
+ 4294, 8589, 17179, 34359,
+ 68719, 137438, 274877, 549755,
+ 1099511, 2199023, 4398046, 8796093 };
+
 /**
  **/
 void
@@ -125,6 +136,7 @@ __osm_cpi_rcv_respond(
   ib_class_port_info_t*p_resp_cpi;
   ib_api_status_t   status;
   ib_gid_tzero_gid;
+  int   rtv;
 
   OSM_LOG_ENTER( p_rcv->p_log, __osm_cpi_rcv_respond );
 
@@ -159,7 +171,18 @@ __osm_cpi_rcv_respond(
   /* finally do it (the job) man ! */
   p_resp_cpi->base_ver = 1;
   p_resp_cpi->class_ver = 2;
-  p_resp_cpi->resp_time_val = p_rcv->p_subn->opt.transaction_timeout;
+  /* Calculate encoded response time value */
+  /* transaction timeout is in msec */
+  if (p_rcv->p_subn->opt.transaction_timeout > 
__msecs_to_rtv_table[MAX_MSECS_TO_RTV])
+rtv = MAX_MSECS_TO_RTV - 1;
+  else {
+for (rtv = 0; rtv < MAX_MSECS_TO_RTV; rtv++) {
+  if (p_rcv->p_subn->opt.transaction_timeout <= __msecs_to_rtv_table[rtv])
+ break;
+}
+  }
+  rtv += 8;
+  p_resp_cpi->resp_time_val = rtv;
   p_resp_cpi->redir_gid = zero_gid;
   p_resp_cpi->redir_tc_sl_fl = 0;
   p_resp_cpi->redir_lid = 0;



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: neigh->ops destructor patch

2006-03-09 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: neigh->ops destructor patch
> 
> I applied this to svn. I'll hold onto it for git until DaveM's
> net-2.6.17 queue is merged upstream, since it depends on the
> destructor changes.

OK, thanks a bunch.
My short term todo for ipoib is now
- (svn only) full destructor work-around for kernels < 2.6.17
- flush all AH's on client re-registration event
Both changes are simple and localized with alloc/free for neighbours.

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 04:35:38PM -0800, Bryan O'Sullivan wrote:
> +static ssize_t show_node_info(struct device *dev,
> +struct device_attribute *attr,
> +char *buf)
> +{
> + static const size_t count = 10;
> + struct ipath_devdata *dd = dev_get_drvdata(dev);
> + u32 *nodeinfo;
> + int ret;
> +
> + if (!dd->ipath_statusp) {
> + ret = -EINVAL;
> + goto bail;
> + }
> +
> + nodeinfo = (u32 *) buf;
> +
> + /* so we only initialize non-zero fields. */
> + memset(nodeinfo, 0, count * sizeof(u32));
> +
> + nodeinfo[0] =   /* BaseVersion is SMA */
> + /* ClassVersion is SMA */
> + (1 << 8)/* NodeType  */
> + |(1 << 0);  /* NumPorts */
> + nodeinfo[1] = (u32) (dd->ipath_guid >> 32);
> + nodeinfo[2] = (u32) (dd->ipath_guid & 0x);
> + /* PortGUID == SystemImageGUID for us */
> + nodeinfo[3] = nodeinfo[1];
> + /* PortGUID == SystemImageGUID for us */
> + nodeinfo[4] = nodeinfo[2];
> + /* PortGUID == NodeGUID for us */
> + nodeinfo[5] = nodeinfo[3];
> + /* PortGUID == NodeGUID for us */
> + nodeinfo[6] = nodeinfo[4];
> + nodeinfo[7] = (4 << 16) /* we support 4 pkeys */
> + |(dd->ipath_deviceid << 0);
> + /* our chip version as 16 bits major, 16 bits minor */
> + nodeinfo[8] = dd->ipath_minrev | (dd->ipath_majrev << 16);
> + nodeinfo[9] = (dd->ipath_unit << 24) | (dd->ipath_vendorid << 0);
> +
> + ret = count * sizeof(u32);
> +bail:
> + return ret;
> +}
> +
> +static ssize_t show_port_info(struct device *dev,
> +struct device_attribute *attr,
> +char *buf)
> +{
> + static const size_t count = 13;
> + int ret;
> + u32 tmp, tmp2;
> + struct ipath_devdata *dd = dev_get_drvdata(dev);
> + u32 *portinfo;
> +
> + if (!dd->ipath_statusp) {
> + ret = -EINVAL;
> + goto bail;
> + }
> +
> + portinfo = (u32 *) buf;
> +
> + /* so we only initialize non-zero fields. */
> + memset(portinfo, 0, count * sizeof portinfo);
> +
> + /*
> +  * Notimpl yet M_Key (64)
> +  * Notimpl yet GID (64)
> +  */
> +
> + portinfo[4] = (dd->ipath_lid << 16);
> +
> + /*
> +  * Notimpl yet SMLID (should we store this in the driver, in case
> +  * SMA dies?)  CapabilityMask is 0, we don't support any of these
> +  * DiagCode is 0; we don't store any diag info for now Notimpl yet
> +  * M_KeyLeasePeriod (we don't support M_Key)
> +  */
> +
> + /* LocalPortNum is whichever port number they ask for */
> + portinfo[7] = (dd->ipath_unit << 24)
> + /* LinkWidthEnabled */
> + | (2 << 16)
> + /* LinkWidthSupported (really 2, but not IB valid) */
> + | (3 << 8)
> + /* LinkWidthActive */
> + | (2 << 0);
> + tmp = dd->ipath_lastibcstat & IPATH_IBSTATE_MASK;
> + tmp2 = 5;
> + if (tmp == IPATH_IBSTATE_INIT)
> + tmp = 2;
> + else if (tmp == IPATH_IBSTATE_ARM)
> + tmp = 3;
> + else if (tmp == IPATH_IBSTATE_ACTIVE)
> + tmp = 4;
> + else {
> + tmp = 0;/* down */
> + tmp2 = tmp & 0xf;
> + }
> +
> + portinfo[8] = (1 << 28) /* LinkSpeedSupported */
> + |(tmp << 24)/* PortState */
> + |(tmp2 << 20)   /* PortPhysicalState */
> + |(2 << 16)
> +
> + /* LinkDownDefaultState */
> + /* M_KeyProtectBits == 0 */
> + /* NotImpl yet LMC == 0 (we can support all values) */
> + | (1 << 4)  /* LinkSpeedActive */
> + |(1 << 0);  /* LinkSpeedEnabled */
> + switch (dd->ipath_ibmtu) {
> + case 4096:
> + tmp = 5;
> + break;
> + case 2048:
> + tmp = 4;
> + break;
> + case 1024:
> + tmp = 3;
> + break;
> + case 512:
> + tmp = 2;
> + break;
> + case 256:
> + tmp = 1;
> + break;
> + default:/* oops, something is wrong */
> + ipath_dbg("Problem, ipath_ibmtu 0x%x not a valid IB MTU, "
> +   "treat as 2048\n", dd->ipath_ibmtu);
> + tmp = 4;
> + break;
> + }
> + portinfo[9] = (tmp << 28)
> + /* NeighborMTU */
> + /* Notimpl MasterSMSL */
> + | (1 << 20)
> +
> + /* VLCap */
> + /* Notimpl InitType (actually, an SMA decision) */
> + /* VLHighLimit is 0 (only one VL) */
> + ; /* VLArbitrationHighCap is 0 (only one VL) */
> + portinfo[10] =  /* VLArbitrationLowCap is 0 (only one VL) */
> + /* InitTypeReply is SMA decision */
> + (5 << 1

[openib-general] Re: neigh->ops destructor patch

2006-03-09 Thread Roland Dreier
I applied this to svn.  I'll hold onto it for git until DaveM's
net-2.6.17 queue is merged upstream, since it depends on the
destructor changes.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 04:48:45PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 16:45 -0800, Greg KH wrote:
> 
> > > We don't support hotplugged devices at the moment.
> > 
> > Why not?  Your cards can't be placed in a machine that supports PCI
> > Hotplug (or PCI-E hotplug)?
> 
> No, the driver and userspace code doesn't support it yet.  That's all.
> 
> > You can't really tell users that (no matter
> > how often I have wished I could...)
> 
> I don't expect this to be a practical problem.  We're planning to add
> hotplug support to the driver once we have some cycles free.

Ugh, that means it's never going to be there.

All new PCI drivers have the requirement that they work properly in
hotplug systems, as they should follow the PCI core api.  If not, odds
are they will not be accepted into the tree :(

thanks,

greg k-h
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 03:59:37PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 15:46 -0800, Greg KH wrote:
> > On Thu, Mar 09, 2006 at 03:18:49PM -0800, Roland Dreier wrote:
> > 
> > Thanks for CC:ing me, but where were the originals of these posted?
> 
> My patch posting script screwed up.  Only Roland got them, even though
> the envelopes were all correct.
> 
> > >  > +static ssize_t show_atomic_stats(struct device_driver *dev, char *buf)
> > >  > +{
> > >  > +  memcpy(buf, &ipath_stats, sizeof(ipath_stats));
> > >  > +
> > >  > +  return sizeof(ipath_stats);
> > >  > +}
> > > 
> > > I think putting a whole binary struct in a sysfs attribute is
> > > considered a no-no.
> > 
> > That's an understatement, where is the large stick to thwap the author
> > of this code...
> 
> I'd like to understand why, though.  As I already explained, it's a
> smallish structure (< 1KB), and I can use the special binary sysfs
> attribute goo for it if you insist, but ... why?

I think I explained this in my prior post enough, right?  If not, please
let me know.

thanks,

greg k-h
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: Revenge of the sysfs maintainer! (was Re: [PATCH 8 of 20] ipath - sysfs support for core driver)

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 04:46:29PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 16:35 -0800, Greg KH wrote:
> 
> > Grumble?  Oh come on, don't export binary structures through sysfs, it's
> > in the DOCUMENTATION THAT SYSFS IS FOR TEXT FILES ONLY
> 
> OK, fine.
> 
> > If you don't want to export a text file, then use something else other
> > than sysfs, it's that simple.
> 
> Use what?  Would a sysfs relay file, or whatever they're called now that
> relayfs is moving into sysfs, do the trick?  If so, what's a good place
> to pull those patches from so I can compile-test my changes?  Should I
> just grub through my archives and apply whatever Paul Mundt sent out a
> few weeks ago?

They are in the latest -mm tree if you wish to use them.  Unfortunatly
it might look like they will not work out, due to the per-cpu relay
files not working properly with Paul's patches at the moment.  But I
think he's still working on them.

What's wrong with debugfs?

> > sysfs binary files are for PASS-THROUGH things ONLY!
> 
> If there's any documentation on what sysfs binary files are for, I
> haven't seen it.  It's not in the include files, the source, or
> Documentation/filesystems.  

Fair enough, you are correct.  There is a serious dearth of sysfs and
kobject documentation lately, I'll work on fixing that up.

> > Ok, here's a new rule to help this from happening again in the future:
> > 
> >   If you want to add a new sysfs file to the kernel, it MUST be
> >   accompanied with full documentation that explains exactly what that
> >   file contains and what it is for.  No exceptions will be allowed.
> 
> I'm fine with this rule, but accompanied how?  In a comment in the code?
> In the patch description?  In the same way that sysfs binary files are
> documented? :-)

Touche :)

I referred to my prior lkml post:
http://thread.gmane.org/gmane.linux.kernel/383717
which provides a structure for documenting the user<->kernel API, which
is what you are creating here.

> Also, I'd suggest that you put a similar requirement on directories and
> symlinks, if you're going to clamp down on files.

I completly agree, anything that is in sysfs falls under this
requirement.  Sorry, but I think of directories and symlinks as files,
as I've been spelunking through the vfs layer too many times :)

thanks,

greg k-h
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
Bryan> It's read outside of this file, without a lock held.

I missed the other reference in another patch.  But the central point
still stands: if all you do is atomic_set() and atomic_read(), then
using atomic_t doesn't buy you anything.  Just look at what
atomic_read() expands to -- using it isn't protecting you against
anything, so either you have a race, or you were safe without
atomic_t.  The only point to atomic_t is so that you can safely do
read-modify-write things like atomic_inc().

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 16:37 -0800, Andrew Morton wrote:

> We'd need to see a halfway decent description of the problem first ;)

I've been consumed with patch generation for a bit :-)

I'll try to come up with a coherent description of what we're doing and
how it's blowing up in our face tomorrow.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 16:45 -0800, Greg KH wrote:

> > We don't support hotplugged devices at the moment.
> 
> Why not?  Your cards can't be placed in a machine that supports PCI
> Hotplug (or PCI-E hotplug)?

No, the driver and userspace code doesn't support it yet.  That's all.

> You can't really tell users that (no matter
> how often I have wished I could...)

I don't expect this to be a practical problem.  We're planning to add
hotplug support to the driver once we have some cycles free.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 16:45 -0800, Roland Dreier wrote:

> So why is ipath_sma_alive an atomic_t (and why isn't it static)?
> You never modify ipath_sma_alive outside of your spinlock, so I don't
> see what having it be atomic buys you.

It's read outside of this file, without a lock held.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: Revenge of the sysfs maintainer! (was Re: [PATCH 8 of 20] ipath - sysfs support for core driver)

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 16:35 -0800, Greg KH wrote:

> Grumble?  Oh come on, don't export binary structures through sysfs, it's
> in the DOCUMENTATION THAT SYSFS IS FOR TEXT FILES ONLY

OK, fine.

> If you don't want to export a text file, then use something else other
> than sysfs, it's that simple.

Use what?  Would a sysfs relay file, or whatever they're called now that
relayfs is moving into sysfs, do the trick?  If so, what's a good place
to pull those patches from so I can compile-test my changes?  Should I
just grub through my archives and apply whatever Paul Mundt sent out a
few weeks ago?

> sysfs binary files are for PASS-THROUGH things ONLY!

If there's any documentation on what sysfs binary files are for, I
haven't seen it.  It's not in the include files, the source, or
Documentation/filesystems.  

> Ok, here's a new rule to help this from happening again in the future:
> 
>   If you want to add a new sysfs file to the kernel, it MUST be
>   accompanied with full documentation that explains exactly what that
>   file contains and what it is for.  No exceptions will be allowed.

I'm fine with this rule, but accompanied how?  In a comment in the code?
In the patch description?  In the same way that sysfs binary files are
documented? :-)

Also, I'd suggest that you put a similar requirement on directories and
symlinks, if you're going to clamp down on files.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 03:52:47PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 15:26 -0800, Roland Dreier wrote:
> 
> > Similarly what protects against another process opening the device
> > right after the ipath_sma_alive = 0 setting, but before you do all the
> > cleanup that's after that?
> 
> This is fixed by the stuff I just did in response to your earlier
> message.
> 
> > And what protects against a hot unplug of a device after the test of s
> > against ipath_max?
> 
> We don't support hotplugged devices at the moment.

Why not?  Your cards can't be placed in a machine that supports PCI
Hotplug (or PCI-E hotplug)?  You can't really tell users that (no matter
how often I have wished I could...)

thanks,

greg k-h
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
OK, now I can see the change you made:

 > +atomic_t ipath_sma_alive;
 > +DEFINE_SPINLOCK(ipath_sma_lock);/* SMA receive */

So why is ipath_sma_alive an atomic_t (and why isn't it static)?
You never modify ipath_sma_alive outside of your spinlock, so I don't
see what having it be atomic buys you.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: TSO and IPoIB performance degradation

2006-03-09 Thread Michael S. Tsirkin
Quoting r. Michael S. Tsirkin <[EMAIL PROTECTED]>:
> Or does __tcp_ack_snd_check delay until we have at least two full sized
> segments?

What I'm trying to say, since RFC 2525, 2.13 talks about
"every second full-sized segment", so following the code from
__tcp_ack_snd_check, why does it do

/* More than one full frame received... */
if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss

rather than

/* At least two full frames received... */
if (((tp->rcv_nxt - tp->rcv_wup) >= 2 * inet_csk(sk)->icsk_ack.rcv_mss

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 16:32 -0800, Roland Dreier wrote:

> What's wrong with doing get_page()?  Surely the VM won't take pages
> that you hold a reference to.

We've tried it, but it has apparently not been enough so far.  I'll see
if I can post an oops report.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH 19 of 20] ipath - integrate driver into infiniband kbuild infrastructure

2006-03-09 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 867a396dd518 -r d5a8cb977923 drivers/infiniband/Kconfig
--- a/drivers/infiniband/KconfigThu Mar  9 16:17:00 2006 -0800
+++ b/drivers/infiniband/KconfigThu Mar  9 16:17:14 2006 -0800
@@ -30,6 +30,7 @@ config INFINIBAND_USER_ACCESS
  .
 
 source "drivers/infiniband/hw/mthca/Kconfig"
+source "drivers/infiniband/hw/ipath/Kconfig"
 
 source "drivers/infiniband/ulp/ipoib/Kconfig"
 
diff -r 867a396dd518 -r d5a8cb977923 drivers/infiniband/Makefile
--- a/drivers/infiniband/Makefile   Thu Mar  9 16:17:00 2006 -0800
+++ b/drivers/infiniband/Makefile   Thu Mar  9 16:17:14 2006 -0800
@@ -1,4 +1,5 @@ obj-$(CONFIG_INFINIBAND)+= core/
 obj-$(CONFIG_INFINIBAND)   += core/
 obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mthca/
+obj-$(CONFIG_IPATH_CORE)   += hw/ipath/
 obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/
 obj-$(CONFIG_INFINIBAND_SRP)   += ulp/srp/
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH 18 of 20] ipath - kbuild infrastructure

2006-03-09 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 1c88f73c2ac0 -r 867a396dd518 drivers/infiniband/hw/ipath/Kconfig
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/Kconfig   Thu Mar  9 16:17:00 2006 -0800
@@ -0,0 +1,18 @@
+config IPATH_CORE
+   tristate "PathScale InfiniPath Driver"
+   depends on 64BIT && (PCIEPORTBUS || X86_HT)
+   ---help---
+   This is a low-level driver for PathScale InfiniPath host channel
+   adapters (HCAs) based on the HT-400 and PE-800 chips, including
+   the InfiniPath HT-460, the small form factor InfiniPath HT-460,
+   the InfiniPath HT-470 and the Linux Networx LS/X.
+
+config INFINIBAND_IPATH
+   tristate "PathScale InfiniPath Verbs Driver"
+   depends on IPATH_CORE && INFINIBAND
+   ---help---
+   This is a driver that provides InfiniBand verbs support for
+   PathScale InfiniPath host channel adapters (HCAs).  This
+   allows these devices to be used with both kernel upper level
+   protocols such as IP-over-InfiniBand as well as with userspace
+   applications (in conjunction with InfiniBand userspace access).
diff -r 1c88f73c2ac0 -r 867a396dd518 drivers/infiniband/hw/ipath/Makefile
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/Makefile  Thu Mar  9 16:17:00 2006 -0800
@@ -0,0 +1,42 @@
+EXTRA_CFLAGS += -O3
+
+_ipath_idstr:="PathScale $(shell date +%F)"
+EXTRA_CFLAGS += -DIPATH_IDSTR='$(_ipath_idstr)' -DIPATH_KERN_TYPE=0
+
+obj-$(CONFIG_IPATH_CORE) += ipath_core.o
+obj-$(CONFIG_INFINIBAND_IPATH) += ib_ipath.o
+obj-$(CONFIG_IPATH_ETHER) += ipath_ether.o
+
+ipath_core-y := \
+   ipath_copy.o \
+   ipath_diag.o \
+   ipath_driver.o \
+   ipath_eeprom.o \
+   ipath_file_ops.o \
+   ipath_ht400.o \
+   ipath_init_chip.o \
+   ipath_intr.o \
+   ipath_layer.o \
+   ipath_pe800.o \
+   ipath_sma.o \
+   ipath_stats.o \
+   ipath_sysfs.o \
+   ipath_user_pages.o
+
+ipath_core-$(CONFIG_X86_64) += ipath_wc_x86_64.o
+
+ib_ipath-y := \
+   ipath_cq.o \
+   ipath_keys.o \
+   ipath_mad.o \
+   ipath_mr.o \
+   ipath_qp.o \
+   ipath_rc.o \
+   ipath_ruc.o \
+   ipath_srq.o \
+   ipath_uc.o \
+   ipath_ud.o \
+   ipath_verbs.o \
+   ipath_verbs_mcast.o
+
+ipath_ether-y := ipath_eth.o
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH 14 of 20] ipath - infiniband RC protocol support

2006-03-09 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r e6b07a6f5a64 -r 70e3edb0d82d drivers/infiniband/hw/ipath/ipath_rc.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_rc.cThu Mar  9 16:16:37 2006 -0800
@@ -0,0 +1,1753 @@
+/*
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "ipath_verbs.h"
+#include "ips_common.h"
+
+/**
+ * ipath_init_restart- initialize the qp->s_sge after a restart
+ * @qp: the QP who's SGE we're restarting
+ * @wqe: the work queue to initialize the QP's SGE from
+ *
+ * The QP s_lock should be held.
+ */
+static void ipath_init_restart(struct ipath_qp *qp, struct ipath_swqe *wqe)
+{
+   struct ipath_ibdev *dev;
+   u32 len;
+
+   len = ((qp->s_psn - wqe->psn) & IPS_PSN_MASK) *
+   ib_mtu_enum_to_int(qp->path_mtu);
+   qp->s_sge.sge = wqe->sg_list[0];
+   qp->s_sge.sg_list = wqe->sg_list + 1;
+   qp->s_sge.num_sge = wqe->wr.num_sge;
+   ipath_skip_sge(&qp->s_sge, len);
+   qp->s_len = wqe->length - len;
+   dev = to_idev(qp->ibqp.device);
+   spin_lock(&dev->pending_lock);
+   if (qp->timerwait.next == LIST_POISON1)
+   list_add_tail(&qp->timerwait,
+ &dev->pending[dev->pending_index]);
+   spin_unlock(&dev->pending_lock);
+}
+
+/**
+ * ipath_do_rc_send - perform a send on an RC QP
+ * @data: contains a pointer to the QP
+ *
+ * Process entries in the send work queue until credit or queue is
+ * exhausted.  Only allow one CPU to send a packet per QP (tasklet).
+ * Otherwise, after we drop the QP s_lock, two threads could send
+ * packets out of order.
+ */
+void ipath_do_rc_send(unsigned long data)
+{
+   struct ipath_qp *qp = (struct ipath_qp *)data;
+   struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
+   struct ipath_swqe *wqe;
+   struct ipath_sge_state *ss;
+   unsigned long flags;
+   u16 lrh0;
+   u32 hwords;
+   u32 nwords;
+   u32 extra_bytes;
+   u32 bth0;
+   u32 bth2;
+   u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);
+   u32 len;
+   struct ipath_other_headers *ohdr;
+   char newreq;
+
+   if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags))
+   return;
+
+   if (unlikely(qp->remote_ah_attr.dlid ==
+ipath_layer_get_lid(dev->dd))) {
+   struct ib_wc wc;
+
+   /*
+* Pass in an uninitialized ib_wc to be consistent with
+* other places where ipath_ruc_loopback() is called.
+*/
+   ipath_ruc_loopback(qp, &wc);
+   clear_bit(IPATH_S_BUSY, &qp->s_flags);
+   return;
+   }
+
+   ohdr = &qp->s_hdr.u.oth;
+   if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)
+   ohdr = &qp->s_hdr.u.l.oth;
+
+again:
+   /* Check for a constructed packet to be sent. */
+   if (qp->s_hdrwords != 0) {
+   /*
+* If no PIO bufs are available, return.
+* An interrupt will call ipath_ib_piobufavail()
+* when one is available.
+*/
+   if (ipath_verbs_send(dev->dd, qp->s_hdrwords,
+(u32 *) &qp->s_hdr,
+qp->s_cur_size,
+qp->s_cur_sge)) {
+   ipath_no_bufs_available(qp, dev);
+   return;
+   }
+

[openib-general] [PATCH 13 of 20] ipath - infiniband UC and UD protocol support

2006-03-09 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 3d2a9e8f845c -r e6b07a6f5a64 drivers/infiniband/hw/ipath/ipath_uc.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_uc.cThu Mar  9 16:16:31 2006 -0800
@@ -0,0 +1,625 @@
+/*
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "ipath_verbs.h"
+#include "ips_common.h"
+
+/**
+ * ipath_do_uc_send - do a send on a UC queue
+ * @data: contains a pointer to the QP to send on
+ *
+ * Process entries in the send work queue until the queue is exhausted.
+ * Only allow one CPU to send a packet per QP (tasklet).
+ * Otherwise, after we drop the QP lock, two threads could send
+ * packets out of order.
+ * This is similar to ipath_do_rc_send() below except we don't have
+ * timeouts or resends.
+ */
+void ipath_do_uc_send(unsigned long data)
+{
+   struct ipath_qp *qp = (struct ipath_qp *)data;
+   struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
+   struct ipath_swqe *wqe;
+   unsigned long flags;
+   u16 lrh0;
+   u32 hwords;
+   u32 nwords;
+   u32 extra_bytes;
+   u32 bth0;
+   u32 bth2;
+   u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);
+   u32 len;
+   struct ipath_other_headers *ohdr;
+   struct ib_wc wc;
+
+   if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags))
+   return;
+
+   if (unlikely(qp->remote_ah_attr.dlid ==
+ipath_layer_get_lid(dev->dd))) {
+   /* Pass in an uninitialized ib_wc to save stack space. */
+   ipath_ruc_loopback(qp, &wc);
+   clear_bit(IPATH_S_BUSY, &qp->s_flags);
+   return;
+   }
+
+   ohdr = &qp->s_hdr.u.oth;
+   if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)
+   ohdr = &qp->s_hdr.u.l.oth;
+
+again:
+   /* Check for a constructed packet to be sent. */
+   if (qp->s_hdrwords != 0) {
+   /*
+* If no PIO bufs are available, return.
+* An interrupt will call ipath_ib_piobufavail()
+* when one is available.
+*/
+   if (ipath_verbs_send(dev->dd, qp->s_hdrwords,
+(u32 *) &qp->s_hdr,
+qp->s_cur_size,
+qp->s_cur_sge)) {
+   ipath_no_bufs_available(qp, dev);
+   return;
+   }
+   dev->n_unicast_xmit++;
+   /* Record that we sent the packet and s_hdr is empty. */
+   qp->s_hdrwords = 0;
+   }
+
+   lrh0 = IPS_LRH_BTH;
+   /* header size in 32-bit words LRH+BTH = (8+12)/4. */
+   hwords = 5;
+
+   /*
+* The lock is needed to synchronize between
+* setting qp->s_ack_state and post_send().
+*/
+   spin_lock_irqsave(&qp->s_lock, flags);
+
+   if (!(ib_ipath_state_ops[qp->state] & IPATH_PROCESS_SEND_OK))
+   goto done;
+
+   bth0 = ipath_layer_get_pkey(dev->dd, qp->s_pkey_index);
+
+   /* Send a request. */
+   wqe = get_swqe_ptr(qp, qp->s_last);
+   switch (qp->s_state) {
+   default:
+   /*
+* Signal the completion of the last send (if there is
+* one).
+*/
+   if (qp->s_last != qp->s_tail) {
+   if (++qp->s_last == qp->s_size)
+   qp->s_last = 0;
+   

[openib-general] [PATCH 15 of 20] ipath - misc infiniband code, part 1

2006-03-09 Thread Bryan O'Sullivan
Completion queues, local and remote memory keys, and memory region
support.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 70e3edb0d82d -r 44cd07539d66 drivers/infiniband/hw/ipath/ipath_cq.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_cq.cThu Mar  9 16:16:44 2006 -0800
@@ -0,0 +1,277 @@
+/*
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "ipath_verbs.h"
+
+/**
+ * ipath_cq_enter - add a new entry to the completion queue
+ * @cq: completion queue
+ * @entry: work completion entry to add
+ * @sig: true if @entry is a solicitated entry
+ *
+ * This may be called with one of the qp->s_lock or qp->r_rq.lock held.
+ */
+void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int solicited)
+{
+   unsigned long flags;
+   u32 next;
+
+   spin_lock_irqsave(&cq->lock, flags);
+
+   if (cq->head == cq->ibcq.cqe)
+   next = 0;
+   else
+   next = cq->head + 1;
+   if (unlikely(next == cq->tail)) {
+   spin_unlock_irqrestore(&cq->lock, flags);
+   if (cq->ibcq.event_handler) {
+   struct ib_event ev;
+
+   ev.device = cq->ibcq.device;
+   ev.element.cq = &cq->ibcq;
+   ev.event = IB_EVENT_CQ_ERR;
+   cq->ibcq.event_handler(&ev, cq->ibcq.cq_context);
+   }
+   return;
+   }
+   cq->queue[cq->head] = *entry;
+   cq->head = next;
+
+   if (cq->notify == IB_CQ_NEXT_COMP ||
+   (cq->notify == IB_CQ_SOLICITED && solicited)) {
+   cq->notify = IB_CQ_NONE;
+   cq->triggered++;
+   /*
+* This will cause send_complete() to be called in
+* another thread.
+*/
+   tasklet_hi_schedule(&cq->comptask);
+   }
+
+   spin_unlock_irqrestore(&cq->lock, flags);
+
+   if (entry->status != IB_WC_SUCCESS)
+   to_idev(cq->ibcq.device)->n_wqe_errs++;
+}
+
+/**
+ * ipath_poll_cq - poll for work completion entries
+ * @ibcq: the completion queue to poll
+ * @num_entries: the maximum number of entries to return
+ * @entry: pointer to array where work completions are placed
+ *
+ * Returns the number of completion entries polled.
+ *
+ * This may be called from interrupt context.  Also called by ib_poll_cq()
+ * in the generic verbs code.
+ */
+int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry)
+{
+   struct ipath_cq *cq = to_icq(ibcq);
+   unsigned long flags;
+   int npolled;
+
+   spin_lock_irqsave(&cq->lock, flags);
+
+   for (npolled = 0; npolled < num_entries; ++npolled, ++entry) {
+   if (cq->tail == cq->head)
+   break;
+   *entry = cq->queue[cq->tail];
+   if (cq->tail == cq->ibcq.cqe)
+   cq->tail = 0;
+   else
+   cq->tail++;
+   }
+
+   spin_unlock_irqrestore(&cq->lock, flags);
+
+   return npolled;
+}
+
+static void send_complete(unsigned long data)
+{
+   struct ipath_cq *cq = (struct ipath_cq *)data;
+
+   /*
+* The completion handler will most likely rearm the notification
+* and poll for all pending entries.  If a new completion entry
+* is added while we are in this routine, tasklet_hi_schedule()
+* won't call us again until we return so we check triggered to
+* see if we need to call the handler again

[openib-general] [PATCH 17 of 20] ipath - infiniband verbs support

2006-03-09 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r f57c24166c57 -r 1c88f73c2ac0 drivers/infiniband/hw/ipath/ipath_verbs.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Thu Mar  9 16:16:58 2006 -0800
@@ -0,0 +1,1195 @@
+/*
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ipath_kernel.h"
+#include "ipath_verbs.h"
+#include "ips_common.h"
+
+/* Not static, because we don't want the compiler removing it */
+const char ipath_verbs_version[] = "ipath_verbs " IPATH_IDSTR;
+
+unsigned int ib_ipath_qp_table_size = 251;
+module_param_named(qp_table_size, ib_ipath_qp_table_size, uint, S_IRUGO);
+MODULE_PARM_DESC(qp_table_size, "QP table size");
+
+unsigned int ib_ipath_lkey_table_size = 12;
+module_param_named(lkey_table_size, ib_ipath_lkey_table_size, uint,
+  S_IRUGO);
+MODULE_PARM_DESC(lkey_table_size,
+"LKEY table size in bits (2^n, 1 <= n <= 23)");
+
+unsigned int ib_ipath_debug;   /* debug mask */
+module_param_named(debug, ib_ipath_debug, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(debug, "Verbs debug mask");
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("PathScale <[EMAIL PROTECTED]>");
+MODULE_DESCRIPTION("Pathscale InfiniPath driver");
+
+const int ib_ipath_state_ops[IB_QPS_ERR + 1] = {
+   [IB_QPS_RESET] = 0,
+   [IB_QPS_INIT] = IPATH_POST_RECV_OK,
+   [IB_QPS_RTR] = IPATH_POST_RECV_OK | IPATH_PROCESS_RECV_OK,
+   [IB_QPS_RTS] = IPATH_POST_RECV_OK | IPATH_PROCESS_RECV_OK |
+   IPATH_POST_SEND_OK | IPATH_PROCESS_SEND_OK,
+   [IB_QPS_SQD] = IPATH_POST_RECV_OK | IPATH_PROCESS_RECV_OK |
+   IPATH_POST_SEND_OK,
+   [IB_QPS_SQE] = IPATH_POST_RECV_OK | IPATH_PROCESS_RECV_OK,
+   [IB_QPS_ERR] = 0,
+};
+
+/*
+ * Translate ib_wr_opcode into ib_wc_opcode.
+ */
+const enum ib_wc_opcode ib_ipath_wc_opcode[] = {
+   [IB_WR_RDMA_WRITE] = IB_WC_RDMA_WRITE,
+   [IB_WR_RDMA_WRITE_WITH_IMM] = IB_WC_RDMA_WRITE,
+   [IB_WR_SEND] = IB_WC_SEND,
+   [IB_WR_SEND_WITH_IMM] = IB_WC_SEND,
+   [IB_WR_RDMA_READ] = IB_WC_RDMA_READ,
+   [IB_WR_ATOMIC_CMP_AND_SWP] = IB_WC_COMP_SWAP,
+   [IB_WR_ATOMIC_FETCH_AND_ADD] = IB_WC_FETCH_ADD
+};
+
+/*
+ * Array of device pointers.
+ */
+static u32 number_of_devices;
+static struct ipath_ibdev **ipath_devices;
+
+/**
+ * ipath_copy_sge - copy data to SGE memory
+ * @ss: the SGE state
+ * @data: the data to copy
+ * @length: the length of the data
+ */
+void ipath_copy_sge(struct ipath_sge_state *ss, void *data, u32 length)
+{
+   struct ipath_sge *sge = &ss->sge;
+
+   while (length) {
+   u32 len = sge->length;
+
+   BUG_ON(len == 0);
+   if (len > length)
+   len = length;
+   memcpy(sge->vaddr, data, len);
+   sge->vaddr += len;
+   sge->length -= len;
+   sge->sge_length -= len;
+   if (sge->sge_length == 0) {
+   if (--ss->num_sge)
+   *sge = *ss->sg_list++;
+   } else if (sge->length == 0 && sge->mr != NULL) {
+   if (++sge->n >= IPATH_SEGSZ) {
+   if (++sge->m >= sge->mr->mapsz)
+   break;
+   sge->n = 0;
+   }
+   sge->vaddr =
+   sge->mr->map[sge->m]->segs[sge->n].vaddr;
+   sge->length =
+   sge->mr-

[openib-general] [PATCH 12 of 20] ipath - infiniband header files

2006-03-09 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r f0b2f6d58480 -r 3d2a9e8f845c drivers/infiniband/hw/ipath/ipath_verbs.h
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Thu Mar  9 16:16:24 2006 -0800
@@ -0,0 +1,684 @@
+/*
+ * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IPATH_VERBS_H
+#define IPATH_VERBS_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ipath_layer.h"
+#include "verbs_debug.h"
+
+#define QPN_MAX (1 << 24)
+#define QPNMAP_ENTRIES  (QPN_MAX / PAGE_SIZE / BITS_PER_BYTE)
+
+/*
+ * Increment this value if any changes that break userspace ABI
+ * compatibility are made.
+ */
+#define IPATH_UVERBS_ABI_VERSION   1
+
+/*
+ * Define an ib_cq_notify value that is not valid so we know when CQ
+ * notifications are armed.
+ */
+#define IB_CQ_NONE (IB_CQ_NEXT_COMP + 1)
+
+#define IB_RNR_NAK 0x20
+#define IB_NAK_PSN_ERROR   0x60
+#define IB_NAK_INVALID_REQUEST 0x61
+#define IB_NAK_REMOTE_ACCESS_ERROR 0x62
+#define IB_NAK_REMOTE_OPERATIONAL_ERROR 0x63
+#define IB_NAK_INVALID_RD_REQUEST  0x64
+
+#define IPATH_POST_SEND_OK 0x01
+#define IPATH_POST_RECV_OK 0x02
+#define IPATH_PROCESS_RECV_OK  0x04
+#define IPATH_PROCESS_SEND_OK  0x08
+
+/* IB Performance Manager status values */
+#define IB_PMA_SAMPLE_STATUS_DONE  0x00
+#define IB_PMA_SAMPLE_STATUS_STARTED   0x01
+#define IB_PMA_SAMPLE_STATUS_RUNNING   0x02
+
+/* Mandatory IB performance counter select values. */
+#define IB_PMA_PORT_XMIT_DATA  __constant_htons(0x0001)
+#define IB_PMA_PORT_RCV_DATA   __constant_htons(0x0002)
+#define IB_PMA_PORT_XMIT_PKTS  __constant_htons(0x0003)
+#define IB_PMA_PORT_RCV_PKTS   __constant_htons(0x0004)
+#define IB_PMA_PORT_XMIT_WAIT  __constant_htons(0x0005)
+
+struct ib_reth {
+   __be64 vaddr;
+   __be32 rkey;
+   __be32 length;
+} __attribute__ ((packed));
+
+struct ib_atomic_eth {
+   __be64 vaddr;
+   __be32 rkey;
+   __be64 swap_data;
+   __be64 compare_data;
+} __attribute__ ((packed));
+
+struct ipath_other_headers {
+   __be32 bth[3];
+   union {
+   struct {
+   __be32 deth[2];
+   __be32 imm_data;
+   } ud;
+   struct {
+   struct ib_reth reth;
+   __be32 imm_data;
+   } rc;
+   struct {
+   __be32 aeth;
+   __be64 atomic_ack_eth;
+   } at;
+   __be32 imm_data;
+   __be32 aeth;
+   struct ib_atomic_eth atomic_eth;
+   } u;
+} __attribute__ ((packed));
+
+/*
+ * Note that UD packets with a GRH header are 8+40+12+8 = 68 bytes long
+ * (72 w/ imm_data).
+ * Only the first 56 bytes of the IB header will be in the
+ * eager header buffer.  The remaining 12 or 16 bytes are in the data buffer.
+ */
+struct ipath_ib_header {
+   __be16 lrh[4];
+   union {
+   struct {
+   struct ib_grh grh;
+   struct ipath_other_headers oth;
+   } l;
+   struct ipath_other_headers oth;
+   } u;
+} __attribute__ ((packed));
+
+/*
+ * There is one struct ipath_mcast for each multicast GID.
+ * All attached QPs are then stored as a list of
+ * struct ipath_mcast_qp.
+ */
+struct ipath_mcast_qp {
+   struct list_head list;
+   struct ipath_qp *qp;
+};
+
+struct ipath_mcast {
+   struct 

[openib-general] [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r a9ed49ad489c -r 1123028ac13a drivers/infiniband/hw/ipath/ipath_sysfs.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c Thu Mar  9 16:15:57 2006 -0800
@@ -0,0 +1,950 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include "ipath_kernel.h"
+#include "ips_common.h"
+#include "ipath_layer.h"
+
+/**
+ * ipath_parse_ushort - parse an unsigned short value in an arbitrary base
+ * @str: the string containing the number
+ * @valp: where to put the result
+ *
+ * returns the number of bytes consumed, or negative value on error
+ */
+int ipath_parse_ushort(const char *str, unsigned short *valp)
+{
+   unsigned long val;
+   char *end;
+   int ret;
+
+   if (!isdigit(str[0]))
+   return -EINVAL;
+
+   val = simple_strtoul(str, &end, 0);
+
+   if (val > 0x)
+   return -EINVAL;
+
+   *valp = val;
+
+   ret = end + 1 - str;
+   if (ret == 0)
+   ret = -EINVAL;
+
+   return ret;
+}
+
+static ssize_t show_version(struct device_driver *dev, char *buf)
+{
+   /* The string printed here is already newline-terminated. */
+   return scnprintf(buf, PAGE_SIZE, "%s", ipath_core_version);
+}
+
+static ssize_t show_num_units(struct device_driver *dev, char *buf)
+{
+   return scnprintf(buf, PAGE_SIZE, "%d\n",
+ipath_count_units(NULL, NULL, NULL));
+}
+
+#define DRIVER_STAT(name, attr) \
+   static ssize_t show_stat_##name(struct device_driver *dev, \
+   char *buf) \
+   { \
+   return scnprintf( \
+   buf, PAGE_SIZE, "%llu\n", \
+   (unsigned long long) ipath_stats.sps_ ##attr); \
+   } \
+   static DRIVER_ATTR(name, S_IRUGO, show_stat_##name, NULL)
+
+DRIVER_STAT(intrs, ints);
+DRIVER_STAT(err_intrs, errints);
+DRIVER_STAT(errs, errs);
+DRIVER_STAT(pkt_errs, pkterrs);
+DRIVER_STAT(crc_errs, crcerrs);
+DRIVER_STAT(hw_errs, hwerrs);
+DRIVER_STAT(ib_link, iblink);
+DRIVER_STAT(port0_pkts, port0pkts);
+DRIVER_STAT(ether_spkts, ether_spkts);
+DRIVER_STAT(ether_rpkts, ether_rpkts);
+DRIVER_STAT(sma_spkts, sma_spkts);
+DRIVER_STAT(sma_rpkts, sma_rpkts);
+DRIVER_STAT(hdrq_full, hdrqfull);
+DRIVER_STAT(etid_full, etidfull);
+DRIVER_STAT(no_piobufs, nopiobufs);
+DRIVER_STAT(ports, ports);
+DRIVER_STAT(pkey0, pkeys[0]);
+DRIVER_STAT(pkey1, pkeys[1]);
+DRIVER_STAT(pkey2, pkeys[2]);
+DRIVER_STAT(pkey3, pkeys[3]);
+/* XXX fix the following when dynamic table of devices used */
+DRIVER_STAT(lid0, lid[0]);
+DRIVER_STAT(lid1, lid[1]);
+DRIVER_STAT(lid2, lid[2]);
+DRIVER_STAT(lid3, lid[3]);
+
+DRIVER_STAT(nports, nports);
+DRIVER_STAT(null_intr, nullintr);
+DRIVER_STAT(max_pkts_call, maxpkts_call);
+DRIVER_STAT(avg_pkts_call, avgpkts_call);
+DRIVER_STAT(page_locks, pagelocks);
+DRIVER_STAT(page_unlocks, pageunlocks);
+DRIVER_STAT(krdrops, krdrops);
+/* XXX fix the following when dynamic table of devices used */
+DRIVER_STAT(mlid0, mlid[0]);
+DRIVER_STAT(mlid1, mlid[1]);
+DRIVER_STAT(mlid2, mlid[2]);
+DRIVER_STAT(mlid3, mlid[3]);
+
+static struct attribute *driver_stat_attributes[] = {
+   &driver_attr_intrs.attr,
+   &driver_attr_err_intrs.attr,
+   &driver_attr_errs.attr,
+   &driver_attr_pkt_errs.attr,
+   &driver_attr_crc_errs.attr,
+   &driver_attr_hw_errs.attr,
+   &driver_attr_ib_link.attr,
+   &driver_attr_port0_pkts.attr,
+   &driver_attr_ether_spkts.att

[openib-general] [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
The ipath_diag.c file permits userspace diagnostic tools to read and
write a chip's registers.  It is different in purpose from the mmap
interfaces to the /sys/bus/pci resource files.

The ipath_sma.c file supports a lightweight userspace subnet management
agent (SMA).  This is used in deployments (such as HPC clusters) where
a full Infiniband protocol stack is not needed.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 1123028ac13a -r 28bb276205de drivers/infiniband/hw/ipath/ipath_diag.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c  Thu Mar  9 16:16:04 2006 -0800
@@ -0,0 +1,377 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file contains support for diagnostic functions.  It is accessed by
+ * opening the ipath_diag device, normally minor number 129.  Diagnostic use
+ * of the InfiniPath chip may render the chip or board unusable until the
+ * driver is unloaded, or in some cases, until the system is rebooted.
+ *
+ * Accesses to the chip through this interface are not similar to going
+ * through the /sys/bus/pci resource mmap interface.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "ipath_common.h"
+#include "ipath_kernel.h"
+#include "ips_common.h"
+#include "ipath_layer.h"
+
+__kernel_pid_t ipath_diag_alive;   /* PID of diags, if running */
+static int diag_set_link;
+
+static int ipath_diag_open(struct inode *in, struct file *fp);
+static int ipath_diag_release(struct inode *in, struct file *fp);
+static ssize_t ipath_diag_read(struct file *fp, char __user *data,
+  size_t count, loff_t *off);
+static ssize_t ipath_diag_write(struct file *fp, const char __user *data,
+   size_t count, loff_t *off);
+
+static struct file_operations diag_file_ops = {
+   .owner = THIS_MODULE,
+   .write = ipath_diag_write,
+   .read = ipath_diag_read,
+   .open = ipath_diag_open,
+   .release = ipath_diag_release
+};
+
+static struct cdev *diag_cdev;
+static struct class_device *diag_class_dev;
+
+int ipath_diag_init(void)
+{
+   return ipath_cdev_init(IPATH_DIAG_MINOR, "ipath_diag",
+  &diag_file_ops, &diag_cdev, &diag_class_dev);
+}
+
+void ipath_diag_cleanup(void)
+{
+   ipath_cdev_cleanup(&diag_cdev, &diag_class_dev);
+}
+
+/**
+ * ipath_read_umem64 - read a 64-bit quantity from the chip into user space
+ * @dd: the infinipath device
+ * @uaddr: the location to store the data in user memory
+ * @caddr: the source chip address (full pointer, not offset)
+ * @count: number of bytes to copy (multiple of 32 bits)
+ *
+ * This function also localizes all chip memory accesses.
+ * The copy should be written such that we read full cacheline packets
+ * from the chip.  This is usually used for a single qword
+ *
+ * NOTE:  This assumes the chip address is 64-bit aligned.
+ */
+static int ipath_read_umem64(struct ipath_devdata *dd, void __user *uaddr,
+const void __iomem *caddr, size_t count)
+{
+   const u64 __iomem *reg_addr = caddr;
+   const u64 __iomem *reg_end = reg_addr + (count / sizeof(u64));
+   int ret;
+
+   /* not very efficient, but it works for now */
+   if (reg_addr < dd->ipath_kregbase ||
+   reg_end > dd->ipath_kregend) {
+   ret = -EINVAL;
+   goto bail;
+   }
+   while (reg_addr < reg_end) {
+   u64 data = readq(reg_addr);
+   if (copy_to_user(uaddr, &data, sizeof(u64))) {
+ 

[openib-general] [PATCH 20 of 20] ipath - ethernet emulation driver

2006-03-09 Thread Bryan O'Sullivan
The ethernet emulation driver makes an eth* interface available.  It
uses Infiniband UD packets, but is not IPoIB compatible.  It provides
higher bandwidth and lower latency than IPoIB.

The driver is implemented using the ipath_layer code, as is the ipath
driver's OpenIB support.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r d5a8cb977923 -r 7f00f404094f drivers/infiniband/hw/ipath/Kconfig
--- a/drivers/infiniband/hw/ipath/Kconfig   Thu Mar  9 16:17:14 2006 -0800
+++ b/drivers/infiniband/hw/ipath/Kconfig   Thu Mar  9 16:17:14 2006 -0800
@@ -16,3 +16,10 @@ config INFINIBAND_IPATH
allows these devices to be used with both kernel upper level
protocols such as IP-over-InfiniBand as well as with userspace
applications (in conjunction with InfiniBand userspace access).
+
+config IPATH_ETHER
+   tristate "PathScale InfiniPath ethernet driver"
+   depends on IPATH_CORE
+   ---help---
+   This is an ethernet emulator layer for the PathScale InfiniPath
+   host channel adapters (HCAs).
diff -r d5a8cb977923 -r 7f00f404094f drivers/infiniband/hw/ipath/ipath_eth.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_eth.c   Thu Mar  9 16:17:14 2006 -0800
@@ -0,0 +1,1187 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * Further, this software is distributed without any warranty that it is
+ * free of the rightful claim of any third person regarding infringement
+ * or the like.  Any license provided herein, whether implied or
+ * otherwise, applies only to this software file.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston MA 02111-1307, USA.
+ */
+
+/*
+ * ipath_ether.c ethernet driver emulation over PathScale Infinipath
+ * for Linux.
+ */
+
+#define ipath_ether_ioctl_support
+
+#include 
+#include 
+#include 
+
+#include "ipath_debug.h"
+#include "ips_common.h"
+#include "ipath_layer.h"
+
+/* Not static, because we don't want the compiler removing it */
+#define DRV_NAME"ipath_ether"
+const char ipath_ether_version[] = DRV_NAME " " IPATH_IDSTR;
+#define DRV_VERSION "1.0"
+
+#if _IPATH_DEBUGGING
+
+#define __IPATH_DBG_WHICH(which,fmt,...) \
+   do { \
+   if (unlikely(ipath_debug&(which))) \
+   printk(KERN_DEBUG DRV_NAME ": %s: " fmt, \
+  __func__,##__VA_ARGS__); \
+   } while (0)
+
+#define ipath_eth_dbg(fmt,...) \
+   __IPATH_DBG_WHICH(__IPATH_IPATHDBG,fmt,##__VA_ARGS__)
+#define ipath_eth_cdbg(which,fmt,...) \
+   __IPATH_DBG_WHICH(__IPATH_##which##DBG,fmt,##__VA_ARGS__)
+#define ipath_eth_warn(fmt,...) \
+   __IPATH_DBG_WHICH(__IPATH_IPATHWARN,fmt,##__VA_ARGS__)
+#define ipath_eth_err(fmt,...) \
+   __IPATH_DBG_WHICH(__IPATH_IPATHERR ,fmt,##__VA_ARGS__)
+#define ipath_eth_table(fmt,...) \
+   __IPATH_DBG_WHICH(__IPATH_IPATHTABLE  ,fmt,##__VA_ARGS__)
+
+#else
+
+#define ipath_eth_dbg(fmt,...)
+#define ipath_eth_warn(fmt,...)
+#define ipath_eth_err(fmt,...)
+#define ipath_eth_table(fmt,...)
+
+#endif
+
+#define MAX_IPATH_LAYER_DEVICE 4
+#define ETHER_MAC_SIZE 6
+
+#define TX_TIMEOUT  2000
+
+#define BROADCAST_MASK 0x0001
+
+#define IPATH_LAYER_DOWN 0
+#define IPATH_LAYER_UP 1
+
+#define MAC_LENGTH 6
+
+#define MAX_HASH_ENTRIES 4129
+
+#define LID_ARP_REQUEST 1
+#define LID_ARP_RESPONSE 2
+
+#define ETH_ARP_PROTOCOL 0x0806/* ARP protocol ID */
+
+#define HASH_ALLOC_ENTRIES 256
+
+#define priv_data(dev) ((struct ipath_ether_priv *)(dev)->priv)
+
+#define make_hash_key(mac) ((mac[0] + mac[1] + mac[2]) % MAX_HASH_ENTRIES)
+
+/* This structure is used to reassemble packets for large MTUs. */
+struct ipath_frag_state {
+   spinlock_t lock;
+   struct sk_buff *skb;
+   struct sk_buff *last_skb;
+   uint16_t lid;
+   uint8_t frag_num;   /* ips_message_header.unused */
+   uint8_t seq_num;/* ips_message_header.tinylen */
+   uint32_t len;   /* ips_message_header.ack_seq_num */
+};
+
+struct ipath_ether_priv {
+   struct ipath_devdata *dd;
+   int device_id;
+   uint16_t my_lid;/* set in network order */
+   uint16_t my_bcast;  /* set in network order */
+   uint16_t my_mac_addr[3];
+   int ipath_ether_if_stat;
+   struct net_device_stats ipath_ether_stats;
+   wait_queue_head_t lid_wait; /* when waiting for LID at open */
+   struct copy_data_s

[openib-general] [PATCH 5 of 20] ipath - support for PCI Express devices

2006-03-09 Thread Bryan O'Sullivan
This file contains PCIE-specific routines and definitions.  It is not
compiled unless the kernel has PCI Express support.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r f6fb63439323 -r 6955d1da1172 drivers/infiniband/hw/ipath/ipath_pe800.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_pe800.c Thu Mar  9 16:15:36 2006 -0800
@@ -0,0 +1,1205 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+/*
+ * This file contains all of the code that is specific to the
+ * InfiniPath PE-800 chip.
+ */
+
+#include 
+#include 
+#include 
+
+
+#include "ipath_kernel.h"
+#include "ipath_registers.h"
+
+/*
+ * This file contains all the chip-specific register information and
+ * access functions for the PathScale PE800, the PCI-Express chip.
+ *
+ * This lists the InfiniPath PE800 registers, in the actual chip layout.
+ * This structure should never be directly accessed.
+ */
+struct _infinipath_do_not_use_kernel_regs {
+   unsigned long long Revision;
+   unsigned long long Control;
+   unsigned long long PageAlign;
+   unsigned long long PortCnt;
+   unsigned long long DebugPortSelect;
+   unsigned long long Reserved0;
+   unsigned long long SendRegBase;
+   unsigned long long UserRegBase;
+   unsigned long long CounterRegBase;
+   unsigned long long Scratch;
+   unsigned long long Reserved1;
+   unsigned long long Reserved2;
+   unsigned long long IntBlocked;
+   unsigned long long IntMask;
+   unsigned long long IntStatus;
+   unsigned long long IntClear;
+   unsigned long long ErrorMask;
+   unsigned long long ErrorStatus;
+   unsigned long long ErrorClear;
+   unsigned long long HwErrMask;
+   unsigned long long HwErrStatus;
+   unsigned long long HwErrClear;
+   unsigned long long HwDiagCtrl;
+   unsigned long long MDIO;
+   unsigned long long IBCStatus;
+   unsigned long long IBCCtrl;
+   unsigned long long ExtStatus;
+   unsigned long long ExtCtrl;
+   unsigned long long GPIOOut;
+   unsigned long long GPIOMask;
+   unsigned long long GPIOStatus;
+   unsigned long long GPIOClear;
+   unsigned long long RcvCtrl;
+   unsigned long long RcvBTHQP;
+   unsigned long long RcvHdrSize;
+   unsigned long long RcvHdrCnt;
+   unsigned long long RcvHdrEntSize;
+   unsigned long long RcvTIDBase;
+   unsigned long long RcvTIDCnt;
+   unsigned long long RcvEgrBase;
+   unsigned long long RcvEgrCnt;
+   unsigned long long RcvBufBase;
+   unsigned long long RcvBufSize;
+   unsigned long long RxIntMemBase;
+   unsigned long long RxIntMemSize;
+   unsigned long long RcvPartitionKey;
+   unsigned long long Reserved3;
+   unsigned long long RcvPktLEDCnt;
+   unsigned long long Reserved4[8];
+   unsigned long long SendCtrl;
+   unsigned long long SendPIOBufBase;
+   unsigned long long SendPIOSize;
+   unsigned long long SendPIOBufCnt;
+   unsigned long long SendPIOAvailAddr;
+   unsigned long long TxIntMemBase;
+   unsigned long long TxIntMemSize;
+   unsigned long long Reserved5;
+   unsigned long long PCIeRBufTestReg0;
+   unsigned long long PCIeRBufTestReg1;
+   unsigned long long Reserved51[6];
+   unsigned long long SendBufferError;
+   unsigned long long SendBufferErrorCONT1;
+   unsigned long long Reserved6SBE[6];
+   unsigned long long RcvHdrAddr0;
+   unsigned long long RcvHdrAddr1;
+   unsigned long long RcvHdrAddr2

[openib-general] [PATCH 11 of 20] ipath - layering interfaces used by higher-level driver code

2006-03-09 Thread Bryan O'Sullivan
These are used to implement the Infiniband protocols and the ethernet
emulation driver.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r d1da4154aae1 -r f0b2f6d58480 drivers/infiniband/hw/ipath/ipath_layer.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_layer.c Thu Mar  9 16:16:17 2006 -0800
@@ -0,0 +1,1266 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * These are the routines used by layered drivers, currently just the
+ * layered ethernet driver and verbs layer.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ipath_kernel.h"
+#include "ips_common.h"
+#include "ipath_layer.h"
+
+int ipath_layer_set_linkstate(struct ipath_devdata *dd, u8 newstate)
+{
+   u32 lstate;
+
+   switch (newstate) {
+   case IPATH_IB_LINKDOWN:
+   ipath_set_ib_lstate(dd, INFINIPATH_IBCC_LINKINITCMD_POLL <<
+   INFINIPATH_IBCC_LINKINITCMD_SHIFT);
+   /* don't wait */
+   return 0;
+
+   case IPATH_IB_LINKDOWN_SLEEP:
+   ipath_set_ib_lstate(dd, INFINIPATH_IBCC_LINKINITCMD_SLEEP <<
+   INFINIPATH_IBCC_LINKINITCMD_SHIFT);
+   /* don't wait */
+   return 0;
+
+   case IPATH_IB_LINKDOWN_DISABLE:
+   ipath_set_ib_lstate(dd, INFINIPATH_IBCC_LINKINITCMD_DISABLE <<
+   INFINIPATH_IBCC_LINKINITCMD_SHIFT);
+   /* don't wait */
+   return 0;
+
+   case IPATH_IB_LINKINIT:
+   if (dd->ipath_flags & IPATH_LINKINIT)
+   return 0;
+   ipath_set_ib_lstate(dd, INFINIPATH_IBCC_LINKCMD_INIT <<
+   INFINIPATH_IBCC_LINKCMD_SHIFT);
+   lstate = IPATH_LINKINIT;
+   break;
+
+   case IPATH_IB_LINKARM:
+   if (dd->ipath_flags & IPATH_LINKARMED)
+   return 0;
+   if (!(dd->ipath_flags & (IPATH_LINKINIT | IPATH_LINKACTIVE)))
+   return -EINVAL;
+   ipath_set_ib_lstate(dd, INFINIPATH_IBCC_LINKCMD_ARMED <<
+   INFINIPATH_IBCC_LINKCMD_SHIFT);
+   /*
+* Since the port can transition to ACTIVE by receiving
+* a non VL 15 packet, wait for either state.
+*/
+   lstate = IPATH_LINKARMED | IPATH_LINKACTIVE;
+   break;
+
+   case IPATH_IB_LINKACTIVE:
+   if (dd->ipath_flags & IPATH_LINKACTIVE)
+   return 0;
+   if (!(dd->ipath_flags & IPATH_LINKARMED))
+   return -EINVAL;
+   ipath_set_ib_lstate(dd, INFINIPATH_IBCC_LINKCMD_ACTIVE <<
+   INFINIPATH_IBCC_LINKCMD_SHIFT);
+   lstate = IPATH_LINKACTIVE;
+   break;
+
+   default:
+   ipath_dbg("Invalid linkstate 0x%x requested\n", newstate);
+   return -EINVAL;
+   }
+   return ipath_wait_linkstate(dd, lstate, 2000);
+}
+
+EXPORT_SYMBOL_GPL(ipath_layer_set_linkstate);
+
+/**
+ * ipath_layer_set_mtu - set the MTU
+ * @dd: the infinipath device
+ * @arg: the new MTU
+ *
+ * we can handle "any" incoming size, the issue here is whether we
+ * need to restrict our outgoing size.   For now, we don't do any
+ * sanity checking on this, and we don't deal with what happens to
+ * programs that are already running when the size changes.
+ * NOTE: changing the MTU will usually cause t

[openib-general] [PATCH 7 of 20] ipath - misc driver support code

2006-03-09 Thread Bryan O'Sullivan
EEPROM support, interrupt handling, statistics gathering, and write
combining management for x86_64.

A note regarding i2c: The Atmel EEPROM hardware we use looks like an
i2c device electrically, but is not i2c compliant at all from a
functional perspective.  We tried using the kernel's i2c support to
talk to it, but failed.

Normal i2c devices have a single 7-bit or 10-bit i2c address that they
respond to.  Valid 7-bit addresses range from 0x03 to 0x77.  Addresses
0x00 to 0x02 and 0x78 to 0x7F are special reserved addresses
(e.g. 0x00 is the "general call" address.)  The Atmel device, on the
other hand, responds to ALL addresses.  It's designed to be the only
device on a given i2c bus.  A given i2c device address corresponds to
the memory address within the i2c device itself.

At least one reason why the linux core i2c stuff won't work for this
is that it prohibits access to reserved addresses like 0x00, which are
really valid addresses on the Atmel devices.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 696ba12283f4 -r a9ed49ad489c drivers/infiniband/hw/ipath/ipath_eeprom.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.cThu Mar  9 16:15:49 
2006 -0800
@@ -0,0 +1,587 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ipath_kernel.h"
+
+/*
+ * InfiniPath I2C driver for a serial eeprom.  This is not a generic
+ * I2C interface.  For a start, the device we're using (Atmel AT24C11)
+ * doesn't work like a regular I2C device.  It looks like one
+ * electrically, but not logically.  Normal I2C devices have a single
+ * 7-bit or 10-bit I2C address that they respond to.  Valid 7-bit
+ * addresses range from 0x03 to 0x77.  Addresses 0x00 to 0x02 and 0x78
+ * to 0x7F are special reserved addresses (e.g. 0x00 is the "general
+ * call" address.)  The Atmel device, on the other hand, responds to ALL
+ * 7-bit addresses.  It's designed to be the only device on a given I2C
+ * bus.  A 7-bit address corresponds to the memory address within the
+ * Atmel device itself.
+ *
+ * Also, the timing requirements mean more than simple software
+ * bitbanging, with readbacks from chip to ensure timing (simple udelay
+ * is not enough).
+ *
+ * This all means that accessing the device is specialized enough
+ * that using the standard kernel I2C bitbanging interface would be
+ * impossible.  For example, the core I2C eeprom driver expects to find
+ * a device at one or more of a limited set of addresses only.  It doesn't
+ * allow writing to an eeprom.  It also doesn't provide any means of
+ * accessing eeprom contents from within the kernel, only via sysfs.
+ */
+
+enum i2c_type {
+   i2c_line_scl = 0,
+   i2c_line_sda
+};
+
+enum i2c_state {
+   i2c_line_low = 0,
+   i2c_line_high
+};
+
+#define READ_CMD 1
+#define WRITE_CMD 0
+
+static int eeprom_init;
+
+/*
+ * The gpioval manipulation really should be protected by spinlocks
+ * or be converted to use atomic operations.
+ */
+
+/**
+ * i2c_gpio_set - set a GPIO line
+ * @dd: the infinipath device
+ * @line: the line to set
+ * @new_line_state: the state to set
+ *
+ * Returns 0 if the line was set to the new state successfully, non-zero
+ * on error.
+ */
+static int i2c_gpio_set(struct ipath_devdata *dd,
+   enum i2c_type line,
+   enum i2c_state new_line_state)
+{
+   u64 read_val, write_val, mask, *gpioval;
+
+   gpioval = &dd->ipath_gpio_out;
+   read_val = ipath_read_kreg64(dd, dd->ipath_kr

[openib-general] [PATCH 6 of 20] ipath - chip initialisation code

2006-03-09 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 6955d1da1172 -r 696ba12283f4 
drivers/infiniband/hw/ipath/ipath_init_chip.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Thu Mar  9 16:15:43 
2006 -0800
@@ -0,0 +1,957 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ipath_kernel.h"
+#include "ips_common.h"
+
+/*
+ * min buffers we want to have per port, after driver
+ */
+#define IPATH_MIN_USER_PORT_BUFCNT 8
+
+/*
+ * Number of ports we are configured to use (to allow for more pio
+ * buffers per port, etc.)  Zero means use chip value.
+ */
+static ushort ipath_cfgports;
+
+module_param_named(cfgports, ipath_cfgports, ushort, S_IRUGO);
+MODULE_PARM_DESC(cfgports, "Set max number of ports to use");
+
+/*
+ * Number of buffers reserved for driver (layered drivers and SMA
+ * send).  Reserved at end of buffer list.
+ */
+static ushort ipath_kpiobufs = 32;
+
+static int ipath_set_kpiobufs(const char *val, struct kernel_param *kp);
+
+module_param_call(kpiobufs, ipath_set_kpiobufs, param_get_uint,
+ &ipath_kpiobufs, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(kpiobufs, "Set number of PIO buffers for driver");
+
+/**
+ * create_port0_egr - allocate the eager TID buffers
+ * @dd: the infinipath device
+ *
+ * This code is now quite different for user and kernel, because
+ * the kernel uses skb's, for the accelerated network performance.
+ * This is the kernel (port0) version.
+ *
+ * Allocate the eager TID buffers and program them into infinipath.
+ * We use the network layer alloc_skb() allocator to allocate the
+ * memory, and either use the buffers as is for things like SMA
+ * packets, or pass the buffers up to the ipath layered driver and
+ * thence the network layer, replacing them as we do so (see
+ * ipath_rcv_layer()).
+ */
+static int create_port0_egr(struct ipath_devdata *dd)
+{
+   unsigned e, egrcnt;
+   struct sk_buff **skbs;
+
+   egrcnt = dd->ipath_rcvegrcnt;
+
+   skbs = vmalloc(sizeof(*dd->ipath_port0_skbs) * egrcnt);
+   if (skbs == NULL) {
+   ipath_dev_err(dd, "allocation error for eager TID "
+ "skb array\n");
+   return -ENOMEM;
+   }
+   for (e = 0; e < egrcnt; e++) {
+   /*
+* This is a bit tricky in that we allocate extra
+* space for 2 bytes of the 14 byte ethernet header.
+* These two bytes are passed in the ipath header so
+* the rest of the data is word aligned.  We allocate
+* 4 bytes so that the data buffer stays word aligned.
+* See ipath_kreceive() for more details.
+*/
+   skbs[e] = ipath_alloc_skb(dd, GFP_KERNEL);
+   if (!skbs[e]) {
+   ipath_dev_err(dd, "SKB allocation error for "
+ "eager TID %u\n", e);
+   while (e != 0)
+   dev_kfree_skb(skbs[--e]);
+   return -ENOMEM;
+   }
+   }
+   /*
+* After loop above, so we can test non-NULL to see if ready
+* to use at receive, etc.
+*/
+   dd->ipath_port0_skbs = skbs;
+
+   for (e = 0; e < egrcnt; e++)
+   dd->ipath_f_put_tid(dd, e + (u64 __iomem *)
+   ((char __iomem *) dd->ipath_kregbase
++ dd->ipath_rcvegrbase), 0,
+   

[openib-general] [PATCH 4 of 20] ipath - support for HyperTransport devices

2006-03-09 Thread Bryan O'Sullivan
This file contains HT-specific routines and definitions.  It is not
compiled unless the kernel has HyperTransport support.

Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 227b3e7c27ce -r f6fb63439323 drivers/infiniband/hw/ipath/ipath_ht400.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_ht400.c Thu Mar  9 16:15:30 2006 -0800
@@ -0,0 +1,1570 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file contains all of the code that is specific to the InfiniPath
+ * HT-400 chip.
+ */
+
+#include 
+#include 
+
+#include "ipath_kernel.h"
+#include "ipath_registers.h"
+
+/*
+ * This lists the InfiniPath HT400 registers, in the actual chip layout.
+ * This structure should never be directly accessed.
+ *
+ * The names are in InterCap form because they're taken straight from
+ * the chip specification.  Since they're only used in this file, they
+ * don't pollute the rest of the source.
+*/
+
+struct _infinipath_do_not_use_kernel_regs {
+   unsigned long long Revision;
+   unsigned long long Control;
+   unsigned long long PageAlign;
+   unsigned long long PortCnt;
+   unsigned long long DebugPortSelect;
+   unsigned long long DebugPort;
+   unsigned long long SendRegBase;
+   unsigned long long UserRegBase;
+   unsigned long long CounterRegBase;
+   unsigned long long Scratch;
+   unsigned long long ReservedMisc1;
+   unsigned long long InterruptConfig;
+   unsigned long long IntBlocked;
+   unsigned long long IntMask;
+   unsigned long long IntStatus;
+   unsigned long long IntClear;
+   unsigned long long ErrorMask;
+   unsigned long long ErrorStatus;
+   unsigned long long ErrorClear;
+   unsigned long long HwErrMask;
+   unsigned long long HwErrStatus;
+   unsigned long long HwErrClear;
+   unsigned long long HwDiagCtrl;
+   unsigned long long MDIO;
+   unsigned long long IBCStatus;
+   unsigned long long IBCCtrl;
+   unsigned long long ExtStatus;
+   unsigned long long ExtCtrl;
+   unsigned long long GPIOOut;
+   unsigned long long GPIOMask;
+   unsigned long long GPIOStatus;
+   unsigned long long GPIOClear;
+   unsigned long long RcvCtrl;
+   unsigned long long RcvBTHQP;
+   unsigned long long RcvHdrSize;
+   unsigned long long RcvHdrCnt;
+   unsigned long long RcvHdrEntSize;
+   unsigned long long RcvTIDBase;
+   unsigned long long RcvTIDCnt;
+   unsigned long long RcvEgrBase;
+   unsigned long long RcvEgrCnt;
+   unsigned long long RcvBufBase;
+   unsigned long long RcvBufSize;
+   unsigned long long RxIntMemBase;
+   unsigned long long RxIntMemSize;
+   unsigned long long RcvPartitionKey;
+   unsigned long long ReservedRcv[10];
+   unsigned long long SendCtrl;
+   unsigned long long SendPIOBufBase;
+   unsigned long long SendPIOSize;
+   unsigned long long SendPIOBufCnt;
+   unsigned long long SendPIOAvailAddr;
+   unsigned long long TxIntMemBase;
+   unsigned long long TxIntMemSize;
+   unsigned long long ReservedSend[9];
+   unsigned long long SendBufferError;
+   unsigned long long SendBufferErrorCONT1;
+   unsigned long long SendBufferErrorCONT2;
+   unsigned long long SendBufferErrorCONT3;
+   unsigned long long ReservedSBE[4];
+   unsigned long long RcvHdrAddr0;
+   unsigned long long RcvHdrAddr1;
+   unsigned long long RcvHdrAddr2;
+   unsigned long long RcvHdrAddr3;
+   unsigne

[openib-general] [PATCH 0 of 20] [RFC] ipath driver - another round for review

2006-03-09 Thread Bryan O'Sullivan
I posted these patches for review this morning, but due to a bug in my
posting script, only Roland actually received them.  In fact, this version
of these patches contains a few changes in response to his comments.

Thanks, Roland!

The original text from this morning follows.

Here is another set of ipath driver patches for review.  The list of
changes compared to the last patch set I posted is huge, so I won't go
into it here.  Suffice it to say that we've taken every reviewer
comment into account, and done a *lot* of work to clean things up.

I'll point out a few things that I think are worth attention.

  - We've introduced support for our PCI Express chips, so the driver
is no longer HyperTransport-specific.  It's still a 64-bit driver,
because 32-bit platforms don't implement readq or writeq.  (It
does compile cleanly on i386, but of course fails to link.)

  - We've added an ethernet emulation driver so that if you're not
using Infiniband support, you still have a high-performance net
device (lower latency and higher bandwidth than IPoIB) for IP
traffic.

  - There are no longer any fixed tables of device structures.
Instead we allocate device structures dynamically using
pci_alloc_consistent or dma_alloc_coherent, and use the
 stuff to number them.

  - There are no more ioctls anywhere.

  - Huge source files have been split up into digestible, logical
chunks.

  - A few more sparse annotations.

  - Buckets of other cleanups.  Code reformatting, comment
reformatting, trimming code to <= 76 cols, you name it.

There are still a few things left to do that I know of.

  - Since the core driver isn't really an IB driver at all, perhaps it
belongs in drivers/char instead of drivers/infiniband/hw?

  - Our hardware only supports MSI interrupts.  I don't know how to
program it to interrupt us if CONFIG_PCI_MSI is not set.  Right
now, we have a timer-based hack in place to emulate interrupts.

  - Not all of the code is 80- or 76-col clean yet.  I'm working on
this.

  - I guess we need to face the music and use sysfs binary attributes
in the two cases where we're not at the moment :-)

  - There's clearly something wrong with the way we're pinning some
pages into memory, but I don't actually know what it is.  I'm
pretty sure our use of get_user_pages is correct, so I suspect it
must be the code that's doing SetPageReserved (see ipath_driver.c
and ipath_file_ops.c).

I've spent some time trying to figure out what the problem is, but
am stumped.  If someone knows what we should be doing instead, I'd
be delighted to hear from them.

If you have any comments or suggestions, please let me know.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH 3 of 20] ipath - copy and send routines for sending an skb

2006-03-09 Thread Bryan O'Sullivan
Signed-off-by: Bryan O'Sullivan <[EMAIL PROTECTED]>

diff -r 19bdf20bc544 -r 227b3e7c27ce drivers/infiniband/hw/ipath/ipath_copy.c
--- /dev/null   Thu Jan  1 00:00:00 1970 +
+++ b/drivers/infiniband/hw/ipath/ipath_copy.c  Thu Mar  9 16:15:23 2006 -0800
@@ -0,0 +1,504 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file provides support for doing sk_buff buffer swapping between
+ * the low level driver eager buffers, and the network layer.  It's part
+ * of the core driver, rather than the ether driver, because it relies
+ * on variables and functions in the core driver.  It exports a single
+ * entry point for use in the ipath_ether module.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ipath_kernel.h"
+#include "ips_common.h"
+
+/**
+ * layer_send_getpiobuf - allocate, setup and copy out a PIO send buffer
+ * @dd: the infinipath device
+ * @cdp: the data to copy
+ *
+ * Allocate a PIO send buffer, initialize the header and copy it out.
+ */
+static int layer_send_getpiobuf(struct ipath_devdata *dd,
+   struct copy_data_s *cdp)
+{
+   u32 extra_bytes;
+   u32 len, nwords, hdrwords;
+   u32 __iomem *piobuf;
+
+   piobuf = ipath_getpiobuf(dd, NULL);
+   if (!piobuf) {
+   cdp->error = -EBUSY;
+   return cdp->error;
+   }
+
+   /*
+* Compute the max amount of data that can fit into a PIO buffer.
+* buffer size - header size - trigger qword length & flags - CRC
+*/
+   len = dd->ipath_ibmaxlen -
+   sizeof(struct ether_header) - 8 - (SIZE_OF_CRC << 2);
+   if (len > dd->ipath_rcvegrbufsize)
+   len = dd->ipath_rcvegrbufsize;
+   if (len > (cdp->len + cdp->extra))
+   len = (cdp->len + cdp->extra);
+   /* Compute word aligment (i.e., (len & 3) ? 4 - (len & 3) : 0) */
+   extra_bytes = (4 - len) & 3;
+   nwords = (sizeof(struct ether_header) + len + extra_bytes) >> 2;
+   cdp->hdr->lrh[2] = htons(nwords + SIZE_OF_CRC);
+   cdp->hdr->bth[0] = htonl((OPCODE_ITH4X << 24) +
+(extra_bytes << 20) +
+IPS_DEFAULT_P_KEY);
+   cdp->hdr->sub_opcode = OPCODE_ENCAP;
+
+   cdp->hdr->bth[2] = 0;
+   /*
+* Generate an interrupt on the receive side for the last
+* fragment.
+*/
+   cdp->hdr->iph.pkt_flags = ((cdp->len + cdp->extra) == len)
+   ? INFINIPATH_KPF_INTR : 0;
+   cdp->hdr->iph.chksum =
+   (u16) IPS_LRH_BTH + (u16) (nwords + SIZE_OF_CRC) -
+   (u16) ((cdp->hdr->iph.ver_port_tid_offset >> 16) & 0x) -
+   (u16) (cdp->hdr->iph.ver_port_tid_offset & 0x) -
+   (u16) cdp->hdr->iph.pkt_flags;
+
+   ipath_cdbg(VERBOSE, "send %d (%x %x %x %x %x %x %x)\n", nwords,
+  cdp->hdr->lrh[0], cdp->hdr->lrh[1],
+  cdp->hdr->lrh[2], cdp->hdr->lrh[3],
+  cdp->hdr->bth[0], cdp->hdr->bth[1], cdp->hdr->bth[2]);
+   /*
+* Write len to control qword, no flags.
+* +1 is for the qword padding of pbc.
+*/
+   writeq(nwords + 1ULL, (u64 __iomem *) piobuf);
+   /* we have to flush after the PBC for correctness on some cpus
+* or WC buffer can be written out of order */
+   ipath_flush_wc();
+   piobuf += 2;
+   hdrwords = sizeof(struct ether_header) >> 2;
+   __iowrite32_copy(piobuf, cdp->hdr, hdrwords);
+   cdp->csum_pio = &((struct ether_header __iomem *)piobuf)->c

[openib-general] Revenge of the sysfs maintainer! (was Re: [PATCH 8 of 20] ipath - sysfs support for core driver)

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 03:32:23PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 15:18 -0800, Roland Dreier wrote:
> >  > +static ssize_t show_atomic_stats(struct device_driver *dev, char *buf)
> >  > +{
> >  > +memcpy(buf, &ipath_stats, sizeof(ipath_stats));
> >  > +
> >  > +return sizeof(ipath_stats);
> >  > +}
> > 
> > I think putting a whole binary struct in a sysfs attribute is
> > considered a no-no.
> 
> Grumble.

Grumble?  Oh come on, don't export binary structures through sysfs, it's
in the DOCUMENTATION THAT SYSFS IS FOR TEXT FILES ONLY

If you don't want to export a text file, then use something else other
than sysfs, it's that simple.

> it's a fairly small struct, much less than a page in length,
> and userspace needs an atomic view of it, instead of reading each of the
> umpteen broken-out files that we also provide for humean-readable access
> to each counter.
> 
> I didn't see any point to implementing the sysfs binary file interface
> in order to do exactly what this 6-line function does.  Still don't, in
> fact :-)

sysfs binary files are for PASS-THROUGH things ONLY!  stuff like this is
NOT for sysfs binary files, so even if you switched to using it, it
would not be allowed.

And if I sound grumpy about this whole thing, I am.  I'm tired of people
trying to abuse sysfs and putting crappy userspace APIs in it.  They
don't realize how messy it causes things to be over the long run.  If
you want to make your own filesystem to export stuff in whatever way you
want, feel free to do so (only takes about 200 lines including comments
to do so.)  But DO NOT ABUSE SYSFS just because you don't happen to
agree with the way it was designed, or feel inconvienced by it.

Ok, here's a new rule to help this from happening again in the future:

  If you want to add a new sysfs file to the kernel, it MUST be
  accompanied with full documentation that explains exactly what that
  file contains and what it is for.  No exceptions will be allowed.

Structure for this documentation will be in the format that I layed out
last week, I'll update it again and post it to lkml for review later
tonight.

thanks,

greg k-h
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Andrew Morton
"Bryan O'Sullivan" <[EMAIL PROTECTED]> wrote:
>
> On Thu, 2006-03-09 at 16:01 -0800, Roland Dreier wrote:
> > Bryan> Any idea what I should be using instead?
> > 
> > It depends what you're trying to do.  Hence my original question: why
> > are you doing SetPageReserved?
> 
> We're mapping some memory that the chip DMAs to into userspace, so that
> user processes can spin on memory locations without going through the
> kernel.  The SetPageReserved hack is an attempt to stop the VM from
> reclaiming those pages from us once a user process exits.

If your driver allocated these pages and never added them to the LRU then
the VM won't touch them.

If your driver owns the pages and has a ref on them then they won't get
freed at task exit-time.

If the app owns the pages and you're using get_user_pages() then your
driver owns the last ref on the pages.

> I realise that it's surely bogus, and I'd be thrilled to do something
> correct instead.  We've tried doing both SetPageReserved and get_page,
> but it hasn't work out too well so far.

We'd need to see a halfway decent description of the problem first ;)
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Roland Dreier
Bryan> We're mapping some memory that the chip DMAs to into
Bryan> userspace, so that user processes can spin on memory
Bryan> locations without going through the kernel.  The
Bryan> SetPageReserved hack is an attempt to stop the VM from
Bryan> reclaiming those pages from us once a user process exits.

Bryan> I realise that it's surely bogus, and I'd be thrilled to do
Bryan> something correct instead.  We've tried doing both
Bryan> SetPageReserved and get_page, but it hasn't work out too
Bryan> well so far.

What's wrong with doing get_page()?  Surely the VM won't take pages
that you hold a reference to.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH 0/3] iWARP Device Support for AMSO and Chelsio Devices

2006-03-09 Thread Tom Tucker
These changes have been regression tested on AMSO1100, Chelsio, and
Mellanox hardware...

On Thu, 2006-03-09 at 18:26 -0600, Tom Tucker wrote:
> This is an updated patchset based on feedback from Sean 
> and others. All suggestions have been addressed except 
> for the user-mode response structure change. I'd like
> to defer this since it is a cosmetic change and I've 
> yet to find something more beautiful than what's 
> currently being done. Thanks to all who reviewed this
> patch.
> 
> This patchset defines the modifications to the CMA and 
> header files to support iWARP devices. This patchset 
> is a prerequisite of both the AMSO1100 and 
> Chelsio RNIC device drivers. 
> 
> The patchset consists of three patches:
> 1 - Header files
> 2 - Modifications to the CMA and uCMA
> 3 - Implementation of the IW CM
> 
> Signed-off-by: Tom Tucker <[EMAIL PROTECTED]>
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] [PATCH 3/3] iWARP CM

2006-03-09 Thread Tom Tucker

This patch provides the implementation of the iWARP CM: 
core/Makefile
core/iwcm.c

Signed-off-by: Tom Tucker <[EMAIL PROTECTED]>

Index: infiniband/core/Makefile
===
--- infiniband/core/Makefile(revision 5632)
+++ infiniband/core/Makefile(working copy)
@@ -1,8 +1,9 @@
 EXTRA_CFLAGS += -Idrivers/infiniband/include -Idrivers/infiniband/ulp/ipoib
 
-obj-$(CONFIG_INFINIBAND) +=ib_core.o ib_mad.o ib_ping.o ib_cm.o \
+obj-$(CONFIG_INFINIBAND) +=ib_core.o ib_mad.o ib_ping.o \
+   ib_cm.o iw_cm.o \
ib_sa.o ib_at.o ib_addr.o rdma_cm.o \
-   ib_local_sa.o findex.o
+   ib_local_sa.o findex.o 
 obj-$(CONFIG_INFINIBAND_USER_MAD) +=   ib_umad.o
 obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o ib_uat.o 
rdma_ucm.o
 
@@ -17,6 +18,8 @@
 
 ib_cm-y := cm.o
 
+iw_cm-y := iwcm.o
+
 rdma_cm-y :=   cma.o
 
 rdma_ucm-y :=  ucma.o
Index: infiniband/core/iwcm.c
===
--- infiniband/core/iwcm.c  (revision 0)
+++ infiniband/core/iwcm.c  (revision 0)
@@ -0,0 +1,635 @@
+/*
+ * Copyright (c) 2004, 2005 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
+ * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
+ * Copyright (c) 2005 Network Appliance, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_AUTHOR("Tom Tucker");
+MODULE_DESCRIPTION("iWARP CM");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static void iwcm_add_one(struct ib_device *device);
+static void iwcm_remove_one(struct ib_device *device);
+
+static struct ib_client iwcm_client = {
+   .name   = "iwcm",
+   .add= iwcm_add_one,
+   .remove = iwcm_remove_one
+};
+
+static struct {
+   struct workqueue_struct *wq;
+} iwcm;
+
+struct iwcm_id_private {
+   struct iw_cm_id id;
+
+   spinlock_t lock;
+   wait_queue_head_t wait;
+   atomic_t refcount;
+};
+
+struct iwcm_work {
+   struct work_struct work;
+   struct iwcm_id_private *cm_id;
+   struct iw_cm_event event;
+};
+
+/* Called whenever a reference added for a cm_id */
+static inline void iwcm_addref_id(struct iwcm_id_private *cm_id_priv)
+{
+   atomic_inc(&cm_id_priv->refcount);
+}
+
+/* Called whenever releasing a reference to a cm id */
+static inline void iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
+{
+   if (atomic_dec_and_test(&cm_id_priv->refcount))
+   wake_up(&cm_id_priv->wait);
+}
+
+static void cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event 
*event);
+
+struct iw_cm_id *iw_create_cm_id(struct ib_device *device,
+iw_cm_handler cm_handler,
+void *context)
+{
+   struct iwcm_id_private *iwcm_id_priv;
+
+   iwcm_id_priv = kzalloc(sizeof *iwcm_id_priv, GFP_KERNEL);
+   if (!iwcm_id_priv)
+   return ERR_PTR(-ENOMEM);
+
+   iwcm_id_priv->id.state = IW_CM_STATE_IDLE;
+   iwcm_id_priv->id.device = device;
+   iwcm_id_priv->id.cm_handler = cm_handler;
+   iwcm_id_priv->id.conte

[openib-general] [PATCH 2/3] CMA Changes and iWARP CM

2006-03-09 Thread Tom Tucker

This patch includes the modifications to the CMA, and uCMA needed to
support the new iWARP CM and includes updates to the following files:
core/cma.c 
core/ucma.c

Signed-off-by: Tom Tucker <[EMAIL PROTECTED]>

Index: infiniband/core/cma.c
===
--- infiniband/core/cma.c   (revision 5632)
+++ infiniband/core/cma.c   (working copy)
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("Generic RDMA CM Agent");
@@ -111,6 +112,7 @@
int query_id;
union {
struct ib_cm_id *ib;
+   struct iw_cm_id *iw;
} cm_id;
 
u32 seq_num;
@@ -230,13 +232,23 @@
id_priv->cma_dev = NULL;
 }
 
-static int cma_acquire_ib_dev(struct rdma_id_private *id_priv)
+static int cma_acquire_dev(struct rdma_id_private *id_priv)
 {
+   enum rdma_node_type dev_type = id_priv->id.route.addr.dev_addr.dev_type;
struct cma_device *cma_dev;
union ib_gid *gid;
int ret = -ENODEV;
 
-   gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr);
+   switch (rdma_node_get_transport(dev_type)) {
+   case RDMA_TRANSPORT_IB:
+   gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr);
+   break;
+   case RDMA_TRANSPORT_IWARP:
+   gid = iw_addr_get_sgid(&id_priv->id.route.addr.dev_addr);
+   break;
+   default:
+   return -ENODEV;
+   }
 
mutex_lock(&lock);
list_for_each_entry(cma_dev, &dev_list, list) {
@@ -251,18 +263,6 @@
return ret;
 }
 
-static int cma_acquire_dev(struct rdma_id_private *id_priv)
-{
-   enum rdma_node_type dev_type = id_priv->id.route.addr.dev_addr.dev_type;
-
-   switch (rdma_node_get_transport(dev_type)) {
-   case RDMA_TRANSPORT_IB:
-   return cma_acquire_ib_dev(id_priv);
-   default:
-   return -ENODEV;
-   }
-}
-
 static void cma_deref_id(struct rdma_id_private *id_priv)
 {
if (atomic_dec_and_test(&id_priv->refcount))
@@ -320,6 +320,16 @@
  IB_QP_PKEY_INDEX | IB_QP_PORT);
 }
 
+static int cma_init_iw_qp(struct rdma_id_private *id_priv, struct ib_qp *qp)
+{
+   struct ib_qp_attr qp_attr;
+
+   qp_attr.qp_state = IB_QPS_INIT;
+   qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE;
+
+   return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS);
+}
+
 int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
   struct ib_qp_init_attr *qp_init_attr)
 {
@@ -339,6 +349,9 @@
case RDMA_TRANSPORT_IB:
ret = cma_init_ib_qp(id_priv, qp);
break;
+   case RDMA_TRANSPORT_IWARP:
+   ret = cma_init_iw_qp(id_priv, qp);
+   break;
default:
ret = -ENOSYS;
break;
@@ -431,6 +444,10 @@
if (qp_attr->qp_state == IB_QPS_RTR)
qp_attr->rq_psn = id_priv->seq_num;
break;
+   case RDMA_TRANSPORT_IWARP:
+   ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr,
+   qp_attr_mask);
+   break;
default:
ret = -ENOSYS;
break;
@@ -588,6 +605,10 @@
if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib))
ib_destroy_cm_id(id_priv->cm_id.ib);
break;
+   case RDMA_TRANSPORT_IWARP:
+   if (id_priv->cm_id.iw && !IS_ERR(id_priv->cm_id.iw))
+   iw_destroy_cm_id(id_priv->cm_id.iw);
+   break;
default:
break;
}
@@ -651,6 +672,10 @@
if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib))
ib_destroy_cm_id(id_priv->cm_id.ib);
break;
+   case RDMA_TRANSPORT_IWARP:
+   if (id_priv->cm_id.iw && !IS_ERR(id_priv->cm_id.iw))
+   iw_destroy_cm_id(id_priv->cm_id.iw);
+   break;
default:
break;
}
@@ -837,7 +862,7 @@
}
 
atomic_inc(&conn_id->dev_remove);
-   ret = cma_acquire_ib_dev(conn_id);
+   ret = cma_acquire_dev(conn_id);
if (ret) {
ret = -ENODEV;
cma_release_remove(conn_id);
@@ -900,6 +925,113 @@
}
 }
 
+static int cma_iw_handler(struct iw_cm_id* iw_id, struct iw_cm_event* iw_event)
+{
+   struct rdma_id_private *id_priv = iw_id->context;
+   enum rdma_cm_event_type event = 0;
+   int ret = 0;
+
+   atomic_inc(&id_priv->dev_remove);
+
+   switch (iw_event->event) {
+   case IW_CM_EVENT_LLP_DISCONNE

[openib-general] [PATCH 1/3] iWARP Header Files

2006-03-09 Thread Tom Tucker

This patch includes the modifications necessary to the IB verbs file 
and specifies an initial iWARP CM verbs interface
include/rdma/ib_verbs.h
include/rdma/iw_cm.h

Signed-off-by: Tom Tucker <[EMAIL PROTECTED]>

Index: infiniband/include/rdma/ib_verbs.h
===
--- infiniband/include/rdma/ib_verbs.h  (revision 5669)
+++ infiniband/include/rdma/ib_verbs.h  (working copy)
@@ -101,6 +101,9 @@
IB_DEVICE_RC_RNR_NAK_GEN= (1<<12),
IB_DEVICE_SRQ_RESIZE= (1<<13),
IB_DEVICE_N_NOTIFY_CQ   = (1<<14),
+   IB_DEVICE_ZERO_STAG = (1<<15),
+   IB_DEVICE_MEM_WINDOW= (1<<16),
+   IB_DEVICE_SEND_W_INV= (1<<17),
 };
 
 enum ib_atomic_cap {
@@ -824,6 +827,7 @@
struct ib_gid_cache   **gid_cache;
 };
 
+struct iw_cm_verbs;
 struct ib_device {
struct device*dma_device;
 
@@ -840,6 +844,8 @@
 
u32   flags;
 
+   struct iw_cm_verbs*   iwcm;
+
int(*query_device)(struct ib_device *device,
   struct ib_device_attr 
*device_attr);
int(*query_port)(struct ib_device *device,
Index: infiniband/include/rdma/iw_cm.h
===
--- infiniband/include/rdma/iw_cm.h (revision 0)
+++ infiniband/include/rdma/iw_cm.h (revision 0)
@@ -0,0 +1,148 @@
+/*
+ * Copyright (c) 2005 Network Appliance, Inc. All rights reserved.
+ * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#if !defined(IW_CM_H)
+#define IW_CM_H
+
+#include 
+#include 
+
+struct iw_cm_id;
+struct iw_cm_event;
+
+enum iw_cm_event_type {
+   IW_CM_EVENT_CONNECT_REQUEST = 1, /* connect request received */
+   IW_CM_EVENT_CONNECT_REPLY,   /* reply from active connect request */
+   IW_CM_EVENT_ESTABLISHED, /* passive side accept successful */
+   IW_CM_EVENT_LLP_DISCONNECT,  /* orderly shutdown */
+   IW_CM_EVENT_LLP_RESET,   /* bang */
+   IW_CM_EVENT_LLP_TIMEOUT, /* retransmit timeout */
+   IW_CM_EVENT_CLOSE/* close complete */
+};
+
+struct iw_cm_event {
+   enum iw_cm_event_type event;
+   int status;
+   u32 provider_id;
+   struct sockaddr_in local_addr;
+   struct sockaddr_in remote_addr;
+   void *private_data;
+   u8 private_data_len;
+};
+
+typedef int (*iw_cm_handler)(struct iw_cm_id *cm_id,
+struct iw_cm_event *event);
+
+enum iw_cm_state {
+   IW_CM_STATE_IDLE, /* unbound, inactive */
+   IW_CM_STATE_LISTEN,   /* listen waiting for connect */
+   IW_CM_STATE_CONN_SENT,/* outbound waiting for peer accept */
+   IW_CM_STATE_CONN_RECV,/* inbound waiting for user accept */
+   IW_CM_STATE_CLOSING,  /* disconnect */
+   IW_CM_STATE_ESTABLISHED,  /* established */
+};
+
+typedef void (*iw_event_handler)(struct iw_cm_id *cm_id,
+struct iw_cm_event *event);
+struct iw_cm_id {
+   iw_cm_handler   cm_handler;  /* client callback function */
+   void*context;/* context to provide to 
client cb */
+   enum iw_cm_statestate;
+   struct ib_device*device; 
+   struct ib_qp*qp; /* If the qp is null, use 
qp_num */
+   u32 qp

[openib-general] [PATCH 0/3] iWARP Device Support for AMSO and Chelsio Devices

2006-03-09 Thread Tom Tucker

This is an updated patchset based on feedback from Sean 
and others. All suggestions have been addressed except 
for the user-mode response structure change. I'd like
to defer this since it is a cosmetic change and I've 
yet to find something more beautiful than what's 
currently being done. Thanks to all who reviewed this
patch.

This patchset defines the modifications to the CMA and 
header files to support iWARP devices. This patchset 
is a prerequisite of both the AMSO1100 and 
Chelsio RNIC device drivers. 

The patchset consists of three patches:
1 - Header files
2 - Modifications to the CMA and uCMA
3 - Implementation of the IW CM

Signed-off-by: Tom Tucker <[EMAIL PROTECTED]>


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: TSO and IPoIB performance degradation

2006-03-09 Thread Rick Jones

David S. Miller wrote:

From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
Date: Wed, 8 Mar 2006 14:53:11 +0200



What I was trying to figure out was, how can we re-enable the trick
without hurting TSO? Could a solution be to simply look at the frame
size, and call tcp_send_delayed_ack if the frame size is small?



The change is really not related to TSO.

By reverting it, you are reducing the number of ACKs on the wire, and
the number of context switches at the sender to push out new data.
That's why it can make things go faster, but it also leads to bursty
TCP sender behavior, which is bad for congestion on the internet.


naughty naughty Solaris and HP-UX TCP :)



When the receiver has a strong cpu and can keep up with the incoming
packet rate very well and we are in an environment with no congestion,
the old code helps a lot.  But if the receiver is cpu limited or we
have congestion of any kind, it does exactly the wrong thing.  It will
delay ACKs a very long time to the point where the pipe is depleted
and this kills performance in that case.  For congested environments,
due to the decreased ACK feedback, packet loss recovery will be
extremely poor.  This is the first reason behind my change.


well, there are stacks which do "stretch acks" (after a fashion) that 
make sure when they see packet loss to "do the right thing" wrt sending 
enough acks to allow cwnds to open again in a timely fashion.


that brings-back all that stuff I posted ages ago about the performance 
delta when using an HP-UX receiver and altering the number of segmetns 
per ACK.  should be in the netdev archive somewhere.


might have been around the time of the discussions about MacOS and its 
ack avoidance - which wasn't done very well at the time.





The behavior is also specifically frowned upon in the TCP implementor
community.  It is specifically mentioned in the Known TCP
Implementation Problems RFC2525, in section 2.13 "Stretch ACK
violation".

The entry, quoted below for reference, is very clear on the reasons
why stretch ACKs are bad.  And although it may help performance for
your case, in congested environments and also with cpu limited
receivers it will have a negative impact on performance.  So, this was
the second reason why I made this change.


I would have thought that a receiver "stretching ACK's" would be helpful 
when it was CPU limited since it was spending fewer CPU cycles 
generating ACKs?




So reverting the change isn't really an option.

   Name of Problem
  Stretch ACK violation

   Classification
  Congestion Control/Performance

   Description
  To improve efficiency (both computer and network) a data receiver
  may refrain from sending an ACK for each incoming segment,
  according to [RFC1122].  However, an ACK should not be delayed an
  inordinate amount of time.  Specifically, ACKs SHOULD be sent for
  every second full-sized segment that arrives.  If a second full-
  sized segment does not arrive within a given timeout (of no more
  than 0.5 seconds), an ACK should be transmitted, according to
  [RFC1122].  A TCP receiver which does not generate an ACK for
  every second full-sized segment exhibits a "Stretch ACK
  Violation".


How can it be a "violation" of a SHOULD?-)



   Significance
  TCP receivers exhibiting this behavior will cause TCP senders to
  generate burstier traffic, which can degrade performance in
  congested environments.  In addition, generating fewer ACKs
  increases the amount of time needed by the slow start algorithm to
  open the congestion window to an appropriate point, which
  diminishes performance in environments with large bandwidth-delay
  products.  Finally, generating fewer ACKs may cause needless
  retransmission timeouts in lossy environments, as it increases the
  possibility that an entire window of ACKs is lost, forcing a
  retransmission timeout.


Of those three, I think the most meaningful is the second, which can be 
dealt with by smarts in the ACK-stretching receiver.


For the first, it will only degrade performance if it triggers packet loss.

I'm not sure I've ever seen the third item happen.



   Implications
  When not in loss recovery, every ACK received by a TCP sender
  triggers the transmission of new data segments.  The burst size is
  determined by the number of previously unacknowledged segments
  each ACK covers.  Therefore, a TCP receiver ack'ing more than 2
  segments at a time causes the sending TCP to generate a larger
  burst of traffic upon receipt of the ACK.  This large burst of
  traffic can overwhelm an intervening gateway, leading to higher
  drop rates for both the connection and other connections passing
  through the congested gateway.


Doesn't RED mean that those other connections are rather less likely to 
be affected?




  In addition, the TCP slow start algorithm increases the congestion

[openib-general] Re: TSO and IPoIB performance degradation

2006-03-09 Thread Michael S. Tsirkin
Quoting David S. Miller <[EMAIL PROTECTED]>:
>Description
>   To improve efficiency (both computer and network) a data receiver
>   may refrain from sending an ACK for each incoming segment,
>   according to [RFC1122].  However, an ACK should not be delayed an
>   inordinate amount of time.  Specifically, ACKs SHOULD be sent for
>   every second full-sized segment that arrives.  If a second full-
>   sized segment does not arrive within a given timeout (of no more
>   than 0.5 seconds), an ACK should be transmitted, according to
>   [RFC1122].  A TCP receiver which does not generate an ACK for
>   every second full-sized segment exhibits a "Stretch ACK
>   Violation".

Thanks very much for the info!

So the longest we can delay, according to this spec, is until we have two full
sized segments.

But with the change we are discussing, could an ack now be sent even sooner than
we have at least two full sized segments?  Or does __tcp_ack_snd_check delay
until we have at least two full sized segments? David, could you explain please?

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 16:01 -0800, Roland Dreier wrote:
> Bryan> Any idea what I should be using instead?
> 
> It depends what you're trying to do.  Hence my original question: why
> are you doing SetPageReserved?

We're mapping some memory that the chip DMAs to into userspace, so that
user processes can spin on memory locations without going through the
kernel.  The SetPageReserved hack is an attempt to stop the VM from
reclaiming those pages from us once a user process exits.

I realise that it's surely bogus, and I'd be thrilled to do something
correct instead.  We've tried doing both SetPageReserved and get_page,
but it hasn't work out too well so far.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 16:00 -0800, Roland Dreier wrote:
> Bryan> We don't support hotplugged devices at the moment.
> 
> How do you stop someone from hot plugging a PCIe device?

You say "we don't support this yet" somewhere in big letters.  The chips
and cards support hotplug electrically and logically.  We just haven't
had time yet to do the driver support work, and won't for a while.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Roland Dreier
Bryan> Any idea what I should be using instead?

It depends what you're trying to do.  Hence my original question: why
are you doing SetPageReserved?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
Bryan> We don't support hotplugged devices at the moment.  If
Bryan> you're asking whether an rmmod at the wrong time could
Bryan> cause something bad to happen, I don't *think* so.

How do you stop someone from hot plugging a PCIe device?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:46 -0800, Greg KH wrote:
> On Thu, Mar 09, 2006 at 03:18:49PM -0800, Roland Dreier wrote:
> 
> Thanks for CC:ing me, but where were the originals of these posted?

My patch posting script screwed up.  Only Roland got them, even though
the envelopes were all correct.

> >  > +static ssize_t show_atomic_stats(struct device_driver *dev, char *buf)
> >  > +{
> >  > +memcpy(buf, &ipath_stats, sizeof(ipath_stats));
> >  > +
> >  > +return sizeof(ipath_stats);
> >  > +}
> > 
> > I think putting a whole binary struct in a sysfs attribute is
> > considered a no-no.
> 
> That's an understatement, where is the large stick to thwap the author
> of this code...

I'd like to understand why, though.  As I already explained, it's a
smallish structure (< 1KB), and I can use the special binary sysfs
attribute goo for it if you insist, but ... why?

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:33 -0800, Roland Dreier wrote:
>  > +  yield();/* don't hog the cpu */
> 
> You probably don't want to do this -- yield() means "put me at the
> tail of the runqueue."  I think cond_resched() is more likely what you
> want.

OK.  I don't think it much matters either way.

>  > +#endif
>  > +/* END_NOSHIP_TO_OPENIB */
> 
> uhh... and I don't see an #if to match that #endif.

The code got drain bamaged by the patch mangling script.  The real code
contains a mess of crap to support kernels back to 2.6.9, which gets
automatically stripped, except when it gets broken as above.

Next rev will be clean in this regard.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:28 -0800, Roland Dreier wrote:

> Why are you doing SetPageReserved?  As I understand things, the
> reserved bit is deprecated because it doesn't really have any defined
> semantics...

News to me :-)

Any idea what I should be using instead?

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:26 -0800, Roland Dreier wrote:

> Similarly what protects against another process opening the device
> right after the ipath_sma_alive = 0 setting, but before you do all the
> cleanup that's after that?

This is fixed by the stuff I just did in response to your earlier
message.

> And what protects against a hot unplug of a device after the test of s
> against ipath_max?

We don't support hotplugged devices at the moment.  If you're asking
whether an rmmod at the wrong time could cause something bad to happen,
I don't *think* so.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
Roland> That's fine.  So then I guess the question is, why can't
Roland> you use your SMA all the time?

Bryan> We do.  It coexists with OpenSM if OpenSM is present.

So can we kill the other SMA?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
Bryan> Yep, this is a real race, albeit incredibly unlikely.  I
Bryan> just turned ipath_sma_alive into an atomic_t, and wrapped
Bryan> the open/close code in spinlocks.

How does making it an atomic_t help?  I think you're only going to be
using atomic_set() and atomic_read(), and atomic_t doesn't provide
anything in that case.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:47 -0800, Roland Dreier wrote:

> That's fine.  So then I guess the question is, why can't you use your
> SMA all the time?

We do.  It coexists with OpenSM if OpenSM is present.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:24 -0800, Roland Dreier wrote:

> It seems there's a window here where two processes can both pass the
> if (ipath_sma_alive) test and then proceed to step on each other.

Yep, this is a real race, albeit incredibly unlikely.  I just turned
ipath_sma_alive into an atomic_t, and wrapped the open/close code in
spinlocks.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Roland Dreier
Greg> wrote: Thanks for CC:ing me, but where were the originals of
Greg> these posted?

I think Bryan's original script was busted, so even though everyone
ended in the To: line (and hence in my reply-to-all), the mail didn't
go everywhere.  In fact I might have been the only one to get it.

Bryan, is it worth reposting the series so that everyone can see what
we're talking about?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: TSO and IPoIB performance degradation

2006-03-09 Thread David S. Miller
From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
Date: Wed, 8 Mar 2006 14:53:11 +0200

> What I was trying to figure out was, how can we re-enable the trick
> without hurting TSO? Could a solution be to simply look at the frame
> size, and call tcp_send_delayed_ack if the frame size is small?

The change is really not related to TSO.

By reverting it, you are reducing the number of ACKs on the wire, and
the number of context switches at the sender to push out new data.
That's why it can make things go faster, but it also leads to bursty
TCP sender behavior, which is bad for congestion on the internet.

When the receiver has a strong cpu and can keep up with the incoming
packet rate very well and we are in an environment with no congestion,
the old code helps a lot.  But if the receiver is cpu limited or we
have congestion of any kind, it does exactly the wrong thing.  It will
delay ACKs a very long time to the point where the pipe is depleted
and this kills performance in that case.  For congested environments,
due to the decreased ACK feedback, packet loss recovery will be
extremely poor.  This is the first reason behind my change.

The behavior is also specifically frowned upon in the TCP implementor
community.  It is specifically mentioned in the Known TCP
Implementation Problems RFC2525, in section 2.13 "Stretch ACK
violation".

The entry, quoted below for reference, is very clear on the reasons
why stretch ACKs are bad.  And although it may help performance for
your case, in congested environments and also with cpu limited
receivers it will have a negative impact on performance.  So, this was
the second reason why I made this change.

So reverting the change isn't really an option.

   Name of Problem
  Stretch ACK violation

   Classification
  Congestion Control/Performance

   Description
  To improve efficiency (both computer and network) a data receiver
  may refrain from sending an ACK for each incoming segment,
  according to [RFC1122].  However, an ACK should not be delayed an
  inordinate amount of time.  Specifically, ACKs SHOULD be sent for
  every second full-sized segment that arrives.  If a second full-
  sized segment does not arrive within a given timeout (of no more
  than 0.5 seconds), an ACK should be transmitted, according to
  [RFC1122].  A TCP receiver which does not generate an ACK for
  every second full-sized segment exhibits a "Stretch ACK
  Violation".

   Significance
  TCP receivers exhibiting this behavior will cause TCP senders to
  generate burstier traffic, which can degrade performance in
  congested environments.  In addition, generating fewer ACKs
  increases the amount of time needed by the slow start algorithm to
  open the congestion window to an appropriate point, which
  diminishes performance in environments with large bandwidth-delay
  products.  Finally, generating fewer ACKs may cause needless
  retransmission timeouts in lossy environments, as it increases the
  possibility that an entire window of ACKs is lost, forcing a
  retransmission timeout.

   Implications
  When not in loss recovery, every ACK received by a TCP sender
  triggers the transmission of new data segments.  The burst size is
  determined by the number of previously unacknowledged segments
  each ACK covers.  Therefore, a TCP receiver ack'ing more than 2
  segments at a time causes the sending TCP to generate a larger
  burst of traffic upon receipt of the ACK.  This large burst of
  traffic can overwhelm an intervening gateway, leading to higher
  drop rates for both the connection and other connections passing
  through the congested gateway.

  In addition, the TCP slow start algorithm increases the congestion
  window by 1 segment for each ACK received.  Therefore, increasing
  the ACK interval (thus decreasing the rate at which ACKs are
  transmitted) increases the amount of time it takes slow start to
  increase the congestion window to an appropriate operating point,
  and the connection consequently suffers from reduced performance.
  This is especially true for connections using large windows.

   Relevant RFCs
  RFC 1122 outlines delayed ACKs as a recommended mechanism.

   Trace file demonstrating it
  Trace file taken using tcpdump at host B, the data receiver (and
  ACK originator).  The advertised window (which never changed) and
  timestamp options have been omitted for clarity, except for the
  first packet sent by A:

   12:09:24.820187 A.1174 > B.3999: . 2049:3497(1448) ack 1
   win 33580  [tos 0x8]
   12:09:24.824147 A.1174 > B.3999: . 3497:4945(1448) ack 1
   12:09:24.832034 A.1174 > B.3999: . 4945:6393(1448) ack 1
   12:09:24.83 B.3999 > A.1174: . ack 6393
   12:09:24.934837 A.1174 > B.3999: . 6393:7841(1448) ack 1
   12:09:24.942721 A.1174 > B.3999: . 7841:9289(1448) ack 1
   12:09:24.950605

[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
 > Three reasons.
 > 
 >   * OpenSM wasn't usable when we wrote our SMA.  We have customers
 > using ours now, so we have to support it.

Huh?  What does OpenSM working or not have to do with the SMA?

 >   * Our SMA does some setup for the layered ethernet emulation
 > driver.
 >   * Our SMA works without an IB stack of any kind present.

That's fine.  So then I guess the question is, why can't you use your
SMA all the time?

And does that mean that the verbs SMA doesn't support ethernet
emulation, so you can't use ethernet emulation and verbs at the same time?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Greg KH
On Thu, Mar 09, 2006 at 03:18:49PM -0800, Roland Dreier wrote:

Thanks for CC:ing me, but where were the originals of these posted?

>  > +static ssize_t show_version(struct device_driver *dev, char *buf)
>  > +{
>  > +  return scnprintf(buf, PAGE_SIZE, "%s", ipath_core_version);
>  > +}
> 
> Any reason you left a "\n" off of this attribute?
> 
>  > +static ssize_t show_atomic_stats(struct device_driver *dev, char *buf)
>  > +{
>  > +  memcpy(buf, &ipath_stats, sizeof(ipath_stats));
>  > +
>  > +  return sizeof(ipath_stats);
>  > +}
> 
> I think putting a whole binary struct in a sysfs attribute is
> considered a no-no.

That's an understatement, where is the large stick to thwap the author
of this code...

thanks,

greg k-h
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 17 of 20] ipath - infiniband verbs support

2006-03-09 Thread Roland Dreier
 > +/*
 > + * We don't need to register a MAD agent, we just need to create
 > + * a linker dependency on ib_mad so the module is loaded before
 > + * this module is initialized.  The call to ib_register_device()
 > + * above will then cause ib_mad to create QP 0 & 1.
 > + */
 > +(void) ib_register_mad_agent(dev, 1, (enum ib_qp_type) 2,
 > + NULL, 0, NULL, NULL, NULL);

This looks shady to me.  Can this be solved in userspace by just
making sure that modprobe loads ib_mad before this module?

As it stands you're leaking a mad agent at the very least, not to
mention the hard-coded 2 in there.

 > +number_of_devices = ipath_layer_get_num_of_dev();
 > +i = number_of_devices * sizeof(struct ipath_ibdev *);
 > +ipath_devices = kmalloc(i, GFP_ATOMIC);
 > +if (ipath_devices == NULL)
 > +return -ENOMEM;
 > +
 > +for (i = 0; i < number_of_devices; i++) {
 > +struct ipath_devdata *dd;
 > +int ret = ipath_verbs_register(i, ipath_ib_piobufavail,
 > +   ipath_ib_rcv, ipath_ib_timer,
 > +   &dd);

What happens if a device is hot plugged or unplugged after you call
ipath_layer_get_num_of_dev() but before you call ipath_verbs_register()?

For that matter, what happens if a device is hot plugged after this
module loads?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:20 -0800, Roland Dreier wrote:

> I've never understood what forces you to maintain two separate SMAs.
> Why can't you pick one of the two SMAs and use that unconditionally?

Three reasons.

  * OpenSM wasn't usable when we wrote our SMA.  We have customers
using ours now, so we have to support it.
  * Our SMA does some setup for the layered ethernet emulation
driver.
  * Our SMA works without an IB stack of any kind present.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 7 of 20] ipath - misc driver support code

2006-03-09 Thread Roland Dreier
Bryan> It's purely a performance optimisation.  Since we tune very
Bryan> closely to each CPU, there's no point right now in
Bryan> sort-of-tuning for a CPU that doesn't yet exist :-)

I thought that if ipath_unordered_wc() returns false then you assume
the writes through a WC mapping go in order.  If Via behaves like
Intel and reorders writes, but ipath_unordered_wc() returns false,
then won't your driver break in a subtle way?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Roland Dreier
 > +yield();/* don't hog the cpu */

You probably don't want to do this -- yield() means "put me at the
tail of the runqueue."  I think cond_resched() is more likely what you
want.

 > +#endif
 > +/* END_NOSHIP_TO_OPENIB */

uhh... and I don't see an #if to match that #endif.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:18 -0800, Roland Dreier wrote:
>  > +static ssize_t show_version(struct device_driver *dev, char *buf)
>  > +{
>  > +  return scnprintf(buf, PAGE_SIZE, "%s", ipath_core_version);
>  > +}
> 
> Any reason you left a "\n" off of this attribute?

Nope, just a bogon.

>  > +static ssize_t show_atomic_stats(struct device_driver *dev, char *buf)
>  > +{
>  > +  memcpy(buf, &ipath_stats, sizeof(ipath_stats));
>  > +
>  > +  return sizeof(ipath_stats);
>  > +}
> 
> I think putting a whole binary struct in a sysfs attribute is
> considered a no-no.

Grumble.  it's a fairly small struct, much less than a page in length,
and userspace needs an atomic view of it, instead of reading each of the
umpteen broken-out files that we also provide for humean-readable access
to each counter.

I didn't see any point to implementing the sysfs binary file interface
in order to do exactly what this 6-line function does.  Still don't, in
fact :-)

> Another missing "\n"

Thanks.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [RFC] [PATCH] OpenSM: Add functional partition manager support

2006-03-09 Thread Sasha Khapyorsky
On 19:23 Wed 01 Mar , Sasha Khapyorsky wrote:
> > > 
> > > There is phase 1 of partiton manager for OpenSM. Please review.
> 
> There is partition patch which includes all last changes and is updated
> against recent SVN tree.

Updated patch. Changes are:
 - clean perror() (osm_log() instead)
 - creates "dummy" MC group for non ipoib partitions
 - formatting changes
 - update against recent SVN

Sasha.


This patch implements partition management for OpenSM (Phase 1) as
described in osm/doc/OpenSM_PKey_Mgr.txt.

Basically at each heavy resweep this will:

 - recreate partition configuration
 - update pkey tables for endports
 - update switch's ports connected to endports
 - for partitions marked for IPoIB support this will also create
   appropriate multicast group, desired rate and mtu values may
   be specified

Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>
---

 osm/doc/partition-config.txt|   98 ++
 osm/include/opensm/osm_base.h   |   16 +
 osm/include/opensm/osm_partition.h  |  151 --
 osm/include/opensm/osm_sa_mcmember_record.h |   38 ++
 osm/include/opensm/osm_subnet.h |7 
 osm/opensm/Makefile.am  |1 
 osm/opensm/main.c   |   11 +
 osm/opensm/osm_opensm.c |2 
 osm/opensm/osm_pkey_mgr.c   |  359 ---
 osm/opensm/osm_prtn.c   |  325 +
 osm/opensm/osm_prtn_config.c|  417 +++
 osm/opensm/osm_sa_mcmember_record.c |   20 +
 osm/opensm/osm_subnet.c |   17 +
 13 files changed, 1256 insertions(+), 206 deletions(-)

diff --git a/osm/doc/partition-config.txt b/osm/doc/partition-config.txt
new file mode 100644
index 000..a5d0fd0
--- /dev/null
+++ b/osm/doc/partition-config.txt
@@ -0,0 +1,98 @@
+OpenSM Partitions configuration.
+===
+
+The default name of OpenSM partitions configuration file is
+'/etc/osm-partitions.txt'. The default may be changed by using
+--Pconfig (-P) option with OpenSM.
+
+The default partition will be created by OpenSM unconditionally even
+when partition configuration file does not exist or cannot be accessed.
+
+The default partition has P_Key value 0x7fff. OpenSM's port will have
+full membership in default partition. all other end ports will have
+partial membership.
+
+
+File Format.
+===
+
+Comments:
+
+
+Line content followed after '#' character is comment and ignored by
+parser.
+
+
+General file format:
+---
+
+: ;
+
+
+Partition Definition:
+
+
+[PartitionName][=PKey][,flag[=value]]
+
+PartitionName - free string, will be used with logging. When omitted
+empty string will be used.
+PKey  - P_Key value for this partition. Only low 15 bits will
+be used. When omitted will be autogenerated.
+flag  - used to indicate IPoIB capability of this partition.
+
+Currently recognized flags are:
+
+ipoib  - indicates that this partition may be used for IPoIB, as
+ result IPoIB capable MC group will be created.
+rate= - specifies rate for this IPoIB MC group (default is 3 (10Bps))
+mtu=  - specifies MTU for this IPoIB MC group (default is 4 (2048))
+
+Note that values for 'rate' and 'mtu' should be specified as defined in
+IBTA specification (for example mtu=4 for 2048).
+
+
+PortGUIDs list:
+--
+
+[PortGUID[=full|=part]] [,PortGUID[=full|=part]] [,PortGUID] ...
+
+PortGUID - GUID of partition member EndPort. Hexadecimal numbers
+   should start from 0x, decimal numbers are accepted too.
+full or part - indicates full or partial membership for this port. When
+   omitted (or unrecognized) partial membership is assumed.
+
+There are two useful keywords for PortGUID definition:
+
+- 'ALL' means all end ports in this subnet
+- 'SELF' means subnet manager's port.
+
+Empty list means no ports in this partition.
+
+
+Notes:
+-
+
+White spaces are permitted between delimiters ('=', ',',':',';').
+
+The Line can be wrapped after ':' followed after Partition Definition and
+between.
+
+PartitionName does not need to be unique, PKey does need to be unique.
+If PKey is repeated then those partition configurations will be merged
+and first PartitionName will be used (see also next note).
+
+It is possible to split partition configuration in more than one
+definition, but then PKey should be explicitly specified (overwise
+different PKey values will be generated for those definitions).
+
+
+Examples:
+
+
+Default=0x7fff : ALL, SELF=full ;
+
+NewPartition , ipoib : 0x123456=full, 0x3456789034=part, 0x2134af2306 ;
+
+YetAnotherOne = 0x300 : SELF=full ;  
+YetAnotherOne = 0x300 : ALL=part ;  
+
diff --git a/osm/include/opensm/osm_base.h b/osm/include/opensm/osm_base.h
index 660771f..3da39a6 100644
--- a/osm/include/opensm/osm_ba

[openib-general] Re: [PATCH 7 of 20] ipath - misc driver support code

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:13 -0800, Roland Dreier wrote:

> This is kind of theoritical, but it seems to me that it would be safer
> to write this as
> 
>   int ipath_unordered_wc(void)
>   {
>   return boot_cpu_data.x86_vendor != X86_VENDOR_AMD;
>   }
> 
> after all, Via is probably going to have an x86-64 CPU one of these
> days, and I doubt you've checked that their WC flush is ordered.

It's purely a performance optimisation.  Since we tune very closely to
each CPU, there's no point right now in sort-of-tuning for a CPU that
doesn't yet exist :-)

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver

2006-03-09 Thread Roland Dreier
Bryan> I suspect that our use of SetPageReserved in
Bryan> ipath_file_ops.c may be causing us problems, but I am not
Bryan> sure what correct behaviour would look like.  Suggestions
Bryan> appreciated.

Why are you doing SetPageReserved?  As I understand things, the
reserved bit is deprecated because it doesn't really have any defined
semantics...

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 7 of 20] ipath - misc driver support code

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:11 -0800, Roland Dreier wrote:
>  > +static unsigned handle_frequent_errors(struct ipath_devdata *dd,
>  > + ipath_err_t errs, char msg[512],
>  > + int *noprint)

> Could this be replaced by printk_ratelimit()?

I looked into doing that a few weeks ago, and it really didn't look like
a good fit at all.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
 > +static int ipath_sma_release(struct inode *in, struct file *fp)
 > +{
 > +int s;
 > +
 > +ipath_sma_alive = 0;
 > +ipath_cdbg(SMA, "Closing SMA device\n");
 > +for (s = 0; s < atomic_read(&ipath_max); s++) {
 > +struct ipath_devdata *dd = ipath_lookup(s);
 > +
 > +if (!dd || !(dd->ipath_flags & IPATH_INITTED))
 > +continue;
 > +*dd->ipath_statusp &= ~IPATH_STATUS_SMA;
 > +if (dd->verbs_layer.l_flags & IPATH_VERBS_KERNEL_SMA)
 > +*dd->ipath_statusp |= IPATH_STATUS_OIB_SMA;
 > +}
 > +return 0;
 > +}

Similarly what protects against another process opening the device
right after the ipath_sma_alive = 0 setting, but before you do all the
cleanup that's after that?

And what protects against a hot unplug of a device after the test of s
against ipath_max?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 18 of 20] ipath - kbuild infrastructure

2006-03-09 Thread Sam Ravnborg
On Thu, Mar 09, 2006 at 11:00:07AM -0800, Roland Dreier wrote:
> Sam> Eventually - yes.  But not just now. Kbuild was introduced
> Sam> because it was needed in the top-level directory and it made
> Sam> good sense to do so.  But for now keeping Makefile is a good
> Sam> choice. This is anyway what people are used to.
> 
> OK, disregard my suggestion then.  Should we patch
> Documentation/kbuild/makefiles.txt to correct the current
> documentation, which says:
> 
>   The preferred name for the kbuild files is 'Kbuild' but 'Makefile'
>   will continue to be supported. All new developmen is expected to use
>   the Kbuild filename.

I've just checked in the following patch:

diff --git a/Documentation/kbuild/makefiles.txt 
b/Documentation/kbuild/makefiles.txt
index 99d51a5..a9c00fa 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -106,9 +106,9 @@ This document is aimed towards normal de
 Most Makefiles within the kernel are kbuild Makefiles that use the
 kbuild infrastructure. This chapter introduce the syntax used in the
 kbuild makefiles.
-The preferred name for the kbuild files is 'Kbuild' but 'Makefile' will
-continue to be supported. All new developmen is expected to use the
-Kbuild filename.
+The preferred name for the kbuild files are 'Makefile' but 'Kbuild' can
+be used and if both a 'Makefile' and a 'Kbuild' file exists then the 'Kbuild'
+file will be used.
 
 Section 3.1 "Goal definitions" is a quick intro, further chapters provide
 more details, with real examples.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
 > +static int ipath_sma_open(struct inode *in, struct file *fp)
 > +{
 > +int s;
 > +
 > +if (ipath_sma_alive) {
 > +ipath_dbg("SMA already running (pid %u), failing\n",
 > +  ipath_sma_alive);
 > +return -EBUSY;
 > +}
 > +
 > +for (s = 0; s < atomic_read(&ipath_max); s++) {
 > +struct ipath_devdata *dd = ipath_lookup(s);
 > +/* we need at least one infinipath device to be initialized. */
 > +if (dd && dd->ipath_flags & IPATH_INITTED) {
 > +ipath_sma_alive = current->pid;

It seems there's a window here where two processes can both pass the
if (ipath_sma_alive) test and then proceed to step on each other.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 2 of 20] ipath - core device driver

2006-03-09 Thread Roland Dreier
 > +if (dd->ipath_unit >= atomic_read(&ipath_max))
 > +atomic_set(&ipath_max, dd->ipath_unit + 1);

If this is the way you use ipath_max, why is it an atomic variable?  I
can't find any uses of ipath_max that don't look racy if the only
thing protecting it is the fact that it's an atomic_t, and if it has
some other protection, then I don't think it needs to be atomic.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 4 of 20] ipath - support for HyperTransport devices

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 15:01 -0800, Roland Dreier wrote:

> It seems like all these hypertransport magic constants should be in a
> general .h somewhere.  I'm not sure if it makes sense to put them in
> , or start a .

Either way is fine by me.

> The logic here is pretty hard to follow, and you're getting squeezed
> pretty hard by indenting 5 tabs stops.  Can ipath_setup_ht_config() be
> split up into subfunctions?

Definitely.  I mentioned this in the introductory message for the series
as something I'm working on.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management

2006-03-09 Thread Roland Dreier
Bryan> The ipath_sma.c file supports a lightweight userspace
Bryan> subnet management agent (SMA).  This is used in deployments
Bryan> (such as HPC clusters) where a full Infiniband protocol
Bryan> stack is not needed.

I've never understood what forces you to maintain two separate SMAs.
Why can't you pick one of the two SMAs and use that unconditionally?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 8 of 20] ipath - sysfs support for core driver

2006-03-09 Thread Roland Dreier
 > +static ssize_t show_version(struct device_driver *dev, char *buf)
 > +{
 > +return scnprintf(buf, PAGE_SIZE, "%s", ipath_core_version);
 > +}

Any reason you left a "\n" off of this attribute?

 > +static ssize_t show_atomic_stats(struct device_driver *dev, char *buf)
 > +{
 > +memcpy(buf, &ipath_stats, sizeof(ipath_stats));
 > +
 > +return sizeof(ipath_stats);
 > +}

I think putting a whole binary struct in a sysfs attribute is
considered a no-no.

 > +static ssize_t show_boardversion(struct device *dev,
 > +   struct device_attribute *attr,
 > +   char *buf)
 > +{
 > +struct ipath_devdata *dd = dev_get_drvdata(dev);
 > +return scnprintf(buf, PAGE_SIZE, "%s", dd->ipath_boardversion);
 > +}

Another missing "\n"
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Please give 1.0 RC1 a whirl

2006-03-09 Thread Bob Woodruff
>In the Subversion tree, there's a new directory called
>scripts/buildrpms.sh, which is what I used to build everything.

I tried to use this script on RHEL EL4 Itanium and got the
following error.

{ test ! -d opensm-1.2.0 || { find opensm-1.2.0 -type d ! -perm -200 -exec
chmod u+w {} ';' && rm -fr opensm-1.2.0; }; }
+ cp opensm-1.2.0.tar.gz /usr/src/redhat/SPECS/rhel4/SOURCES
+ rpmbuild --define '_topdir /usr/src/redhat/SPECS/rhel4' --define 'dist
.rhel4' -ta opensm-1.2.0.tar.gz
error: line 9: Illegal char '-' in version: Version: 1.2.0-rc1
[EMAIL PROTECTED] SPECS]# 

I was however able to manually build all the RPMS from the SRPMS posted
to the website. 

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 7 of 20] ipath - misc driver support code

2006-03-09 Thread Roland Dreier
 > +/**
 > + * ipath_unordered_wc - indicate whether write combining is ordered
 > + *
 > + * Because our performance depends on our ability to do write combining mmio
 > + * writes in the most efficient way, we need to know if we are on an Intel
 > + * or AMD x86_64 processor.  AMD x86_64 processors flush WC buffers out in
 > + * the order completed, and so no special flushing is required to get
 > + * correct ordering.  Intel processors, however, will flush write buffers
 > + * out in "random" orders, and so explict ordering is needed at times.
 > + */
 > +int ipath_unordered_wc(void)
 > +{
 > +return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
 > +}

This is kind of theoritical, but it seems to me that it would be safer
to write this as

int ipath_unordered_wc(void)
{
return boot_cpu_data.x86_vendor != X86_VENDOR_AMD;
}

after all, Via is probably going to have an x86-64 CPU one of these
days, and I doubt you've checked that their WC flush is ordered.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 7 of 20] ipath - misc driver support code

2006-03-09 Thread Roland Dreier
 > +static unsigned handle_frequent_errors(struct ipath_devdata *dd,
 > +   ipath_err_t errs, char msg[512],
 > +   int *noprint)
 > +{
 > +cycles_t nc;
 > +static cycles_t nextmsg_time;
 > +static unsigned nmsgs, supp_msgs;
 > +
 > +/*
 > + * throttle back "fast" messages to no more than 10 per 5 seconds
 > + * (1.4-2GHz clock).  This isn't perfect, but it's a reasonable
 > + * heuristic. If we get more than 10, give a 5x longer delay
 > + */

Could this be replaced by printk_ratelimit()?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Please give 1.0 RC1 a whirl

2006-03-09 Thread Woodruff, Robert J
James/Arlin ?

woody
 

-Original Message-
From: Bryan O'Sullivan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 09, 2006 3:00 PM
To: Woodruff, Robert J
Cc: openib-general@openib.org; Davis, Arlin R; 'James Lentini'
Subject: RE: [openib-general] Please give 1.0 RC1 a whirl

On Thu, 2006-03-09 at 14:53 -0800, Bob Woodruff wrote:

> Where are the uDAPL RPMs ?

Nobody has fixed uDAPL to be autostools buildable or written a .spec.in
file for it.  That will be up to someone other than me to do :-)

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH 4 of 20] ipath - support for HyperTransport devices

2006-03-09 Thread Roland Dreier
 > +/* the HT capability type byte is 3 bytes after the
 > + * capability byte.
 > + */
 > +if (pci_read_config_byte(pdev, pos + 3, &cap_type)) {
 > +dev_info(&pdev->dev, "Couldn't read config "
 > + "command @ %d\n", pos);
 > +continue;
 > +}
 > +if (!(cap_type & 0xE0)) {

It seems like all these hypertransport magic constants should be in a
general .h somewhere.  I'm not sure if it makes sense to put them in
, or start a .

 > +else if (linkctrl & (0xf << 8)) {
 > +ipath_cdbg(
 > +VERBOSE,
 > +"Clear linkctrl%d CRC "
 > +"Error bits %x\n", i,
 > +linkctrl & (0xf << 8));
 > +/*
 > + * now write them back to clear
 > + * the error.
 > + */
 > +pci_write_config_byte(
 > +pdev, link_off,
 > +linkctrl & (0xf << 8));

The logic here is pretty hard to follow, and you're getting squeezed
pretty hard by indenting 5 tabs stops.  Can ipath_setup_ht_config() be
split up into subfunctions?
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Please give 1.0 RC1 a whirl

2006-03-09 Thread Bryan O'Sullivan
On Thu, 2006-03-09 at 14:53 -0800, Bob Woodruff wrote:

> Where are the uDAPL RPMs ?

Nobody has fixed uDAPL to be autostools buildable or written a .spec.in
file for it.  That will be up to someone other than me to do :-)

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: [PATCH] mthca: fix max_wr value in create_srq/query_srq when using memfree devices

2006-03-09 Thread Roland Dreier
Thanks, applied.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Please give 1.0 RC1 a whirl

2006-03-09 Thread Bob Woodruff
Bryan wrote,
> You can download source tarballs and RPMs from the following URL:

http://openib.red-bean.com/ 

Where are the uDAPL RPMs ?

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Please give 1.0 RC1 a whirl

2006-03-09 Thread Woodruff, Robert J
I built binary RPMS for Itanium for RHEL4 from your source RPMS.
I'd like to at least do some minimal testing before giving them
to you.

Also, not sure where we ended up on kernel binary RPMs.
I built a stock 2.6.15 kernel RPM (x86_64) that I can overlay
on top of my RHEL EL4 system and it seems to work OK. 
Do we want to put these somewhere also ?

-Original Message-
From: Bryan O'Sullivan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 09, 2006 11:25 AM
To: Woodruff, Robert J
Cc: openib-general@openib.org; Hal Rosenstock
Subject: RE: [openib-general] Please give 1.0 RC1 a whirl

On Thu, 2006-03-09 at 11:00 -0800, Woodruff, Robert J wrote:
> If someone builds RPMs for other architectures, how do 
> them get them posted to this website ?

Send me a pointer and I'll upload them.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: [PATCH 0/7] AMSO1100 RNIC Driver

2006-03-09 Thread Roland Dreier
Michael> Could iowrite32be be what you are looking for?

iowrite32be() just seems like a more complicated way to spell
__raw_writel() for this usage.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [PATCH 0/7] AMSO1100 RNIC Driver

2006-03-09 Thread Tom Tucker
I'll take a whack at fixing this stuff, test, and repost along with the
cosmetic changes you proposed.

Thanks,
On Wed, 2006-03-08 at 14:06 -0800, Roland Dreier wrote:
> I ran sparse against the amso1100 driver, and came up with a bunch of
> cleanups.  This includes at least one fix for a minor memory leak:
> c2_cleanup_qp_table() was not calling c2_array_cleanup() for the QP
> table.
> 
> This leaves the following warnings, which are harder to untangle.  The
> fundamental problem is that the c2_mq stuff is broken: for example,
> req_vq has its pool in system memory, while rep_vq has its pool in
> iomem.  Fixing this will also require fixing things like qp_wr_post(),
> which right now does a memcpy to iomem.
> 
> drivers/infiniband/hw/amso1100/c2_rnic.c:501:16: warning: incorrect type 
> in argument 5 (different address spaces)
> drivers/infiniband/hw/amso1100/c2_rnic.c:501:16:expected unsigned 
> char [usertype] *pool_start
> drivers/infiniband/hw/amso1100/c2_rnic.c:501:16:got void [noderef] 
> *[assigned] mmio_regs
> drivers/infiniband/hw/amso1100/c2_qp.c:365:11: warning: incorrect type in 
> argument 5 (different address spaces)
> drivers/infiniband/hw/amso1100/c2_qp.c:365:11:expected unsigned char 
> [usertype] *pool_start
> drivers/infiniband/hw/amso1100/c2_qp.c:365:11:got void [noderef] 
> *[assigned] mmap
> drivers/infiniband/hw/amso1100/c2_qp.c:384:11: warning: incorrect type in 
> argument 5 (different address spaces)
> drivers/infiniband/hw/amso1100/c2_qp.c:384:11:expected unsigned char 
> [usertype] *pool_start
> drivers/infiniband/hw/amso1100/c2_qp.c:384:11:got void [noderef] 
> *[assigned] mmap
> 
>  - R.
> 
> Various sparse annotation fixes etc. for amso1100.
> 
> Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>
> 
> --- infiniband/hw/amso1100/c2_mq.c(revision 5693)
> +++ infiniband/hw/amso1100/c2_mq.c(working copy)
> @@ -69,7 +69,7 @@ void c2_mq_produce(struct c2_mq *q)
>   BUMP(q, q->priv);
>   q->hint_count++;
>   /* Update peer's offset. */
> - q->peer->shared = cpu_to_be16(q->priv);
> + writew(cpu_to_be16(q->priv), &q->peer->shared);
>   }
>  }
>  
> @@ -112,7 +112,7 @@ void c2_mq_free(struct c2_mq *q)
>  #endif
>   BUMP(q, q->priv);
>   /* Update peer's offset. */
> - q->peer->shared = cpu_to_be16(q->priv);
> + writew(cpu_to_be16(q->priv), &q->peer->shared);
>   }
>  }
>  
> @@ -148,9 +148,8 @@ u32 c2_mq_count(struct c2_mq *q)
>   return (u32) count;
>  }
>  
> -void
> -c2_mq_init(struct c2_mq *q, u32 index, u32 q_size,
> -u32 msg_size, u8 * pool_start, u16 * peer, u32 type)
> +void c2_mq_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size,
> + u8 *pool_start, u16 __iomem *peer, u32 type)
>  {
>   assert(q->shared);
>  
> @@ -159,7 +158,7 @@ c2_mq_init(struct c2_mq *q, u32 index, u
>   q->q_size = q_size;
>   q->msg_size = msg_size;
>   q->msg_pool = pool_start;
> - q->peer = (struct c2_mq_shared *) peer;
> + q->peer = (struct c2_mq_shared __iomem *) peer;
>   q->magic = C2_MQ_MAGIC;
>   q->type = type;
>   q->priv = 0;
> --- infiniband/hw/amso1100/c2_qp.c(revision 5693)
> +++ infiniband/hw/amso1100/c2_qp.c(working copy)
> @@ -66,6 +66,7 @@ static const u8 c2_opcode[] = {
>   [IB_WR_ATOMIC_FETCH_AND_ADD] = NO_SUPPORT,
>  };
>  
> +#if 0
>  void c2_qp_event(struct c2_dev *c2dev, u32 qpn, enum ib_event_type 
> event_type)
>  {
>   struct c2_qp *qp;
> @@ -91,6 +92,7 @@ void c2_qp_event(struct c2_dev *c2dev, u
>   if (atomic_dec_and_test(&qp->refcount))
>   wake_up(&qp->wait);
>  }
> +#endif
>  
>  static int to_c2_state(enum ib_qp_state ib_state)
>  {
> @@ -258,7 +260,7 @@ int c2_alloc_qp(struct c2_dev *c2dev,
>   struct c2_cq *recv_cq = to_c2cq(qp_attrs->recv_cq);
>   unsigned long peer_pa;
>   u32 q_size, msg_size, mmap_size;
> - void *mmap;
> + void __iomem *mmap;
>   int err;
>  
>   qp->qpn = c2_alloc(&c2dev->qp_table.alloc);
> @@ -348,7 +350,7 @@ int c2_alloc_qp(struct c2_dev *c2dev,
>   /* Initialize the SQ MQ */
>   q_size = be32_to_cpu(reply->sq_depth);
>   msg_size = be32_to_cpu(reply->sq_msg_size);
> - peer_pa = (unsigned long) (c2dev->pa + be32_to_cpu(reply->sq_mq_start));
> + peer_pa = c2dev->pa + be32_to_cpu(reply->sq_mq_start);
>   mmap_size = PAGE_ALIGN(sizeof(struct c2_mq_shared) + msg_size * q_size);
>   mmap = ioremap_nocache(peer_pa, mmap_size);
>   if (!mmap) {
> @@ -367,7 +369,7 @@ int c2_alloc_qp(struct c2_dev *c2dev,
>   /* Initialize the RQ mq */
>   q_size = be32_to_cpu(reply->rq_depth);
>   msg_size = be32_to_cpu(reply->rq_msg_size);
> - peer_pa = (unsigned long) (c2dev->pa + be32_to_cpu(reply->rq_mq_start));
> + peer_pa = c2dev->pa + be32_to_cpu(reply->rq_mq_start);
>   mmap_size = PAGE_ALIGN(sizeof(str

Re: [openib-general] [PATCH 4/7] AMSO1100 Provider

2006-03-09 Thread Tom Tucker
Thanks, applied...

On Wed, 2006-03-08 at 12:34 -0800, Roland Dreier wrote:
>  > +  dprintk("couldn't vmalloc page_list of size %d\n",
>  > +  (sizeof(u64) * pbl_depth));
> 
> size_t should be printed with %zd, otherwise you get
> 
> c2_provider.c:388: warning: format ‘%d’ expects type ‘int’, but argument 2 
> has type ‘long unsigned int’
> 
> Fix:
> 
> --- infiniband/hw/amso1100/c2_provider.c  (revision 5693)
> +++ infiniband/hw/amso1100/c2_provider.c  (working copy)
> @@ -385,7 +385,7 @@ static struct ib_mr *c2_reg_phys_mr(stru
>  
>   page_list = vmalloc(sizeof(u64) * pbl_depth);
>   if (!page_list) {
> - dprintk("couldn't vmalloc page_list of size %d\n",
> + dprintk("couldn't vmalloc page_list of size %zd\n",
>   (sizeof(u64) * pbl_depth));
>   return ERR_PTR(-ENOMEM);
>   }

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


  1   2   >