Re: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM
Sean Hefty wrote: >> BTW,do you think we need this for 2.6.18? >> It does fix a bug when RTU is lost ... > > The chances of an RTU being repeatedly lost, but user data being received over > the same path seems fairly low IMO. My take is that it's probably not needed > for 2.6.18, but that depends on where we are in the 2.6.18 release cycle. I think we need first to commit this to the SVN and have different developers (eg people working on iSER/NFSoRDMA/SRP/Lustre passive side and ofcourse SDP) test and experience with it before pushing it upstream, targeting 2.6.19 makes sense to me. Not fully handling this race is not a bug but rather a feature that was missing in the openib stack from day one for which now we have a patch that attempts to address it. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM
Rimmer, Todd wrote: > This approach will not work. If the QP is in RTS the Communication > established event will never be generated. Hence the lost RTU case > would not be properly handled and the ULP would need to take on the > burden. Its much better to isolate the solution to the CM and let the > ULP post to the send Q in RTR. I might miss you allover also is there a chance you might not read the patches with enough attention? Lets first agree that you don't refer to CMA consumers for which the CMA does the state transitions, since for them the CM will always get the COMM_EST async event and will emulate an RTU reception, that is will transition the cm id state and generate CM_USER_ESTABLISHED event for the CMA which will modify the qp state to RTS and generate RDMA_ESTABLISHED event to the ULP. So might mean to other types of CM/CMA consumers, please provide the details, specifically what makes you state "if the QP is in RTS". Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] controlling IPoIB debug
Roland Dreier wrote: > > I can't disable CONFIG_INFINIBAND_IPOIB_DEBUG, that is i was > > expecting to be able to press "n" on the "IP-over-InfiniBand debugging" > > submenu of "IP-over-InfiniBand" and it does not have any impact. > > Debugging is forced to be on unless you set EMBEDDED=y. This is so > that everyone will ship modules with debugging enabled, so that when > someone has a problem we can actually debug it. OK, this makes sense. However, I could not find the way to set CONFIG_EMBEDDED, can you educate me how to do it? ... thanks. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] multicast: add support for MGID 0
>These are the hard coded values for pkey, qkey, and join state mentioned >above. Should there be module parameters to override them ? My thought was that a user could override any of the values before creating the group. I'm not sure module parameters are necessary, but I do see how they might be useful. I'll see what others think. >Also, where do the other parameters (components) that are necessary to >create a group come from ? They default to 0. I looked at the values returned for the ipoib broadcast group that was running on my system, and coded the values to match that. >Another option would be to obtain all of them from the appropriate >(partition based) IPoIB broadcast group. I agree, and this is what the RDMA CM does. This was the future extension that I mentioned, since there are still issues that would need to be worked out. Including partition information in the query changes the API. Also, ipoib depends on the ib_multicast module, so ib_multicast cannot rely on ipoib being loaded. It may work better for a user to get the broadcast address that they want, then query for that MGID, but I haven't looked at this enough to know what makes the most sense yet. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] multicast: add support for MGID 0
Hi Sean, On Wed, 2006-07-26 at 20:10, Sean Hefty wrote: > Add support to join a multicast group with MGID = 0, with the actual > MGID of the group returned by the SA. The multicast module must be able > to handle multiple requests for MGID = 0, with each request causing a > new multicast group to be created. > > Also enhance the API for ib_get_mcmember_rec() to support a requested > MGID of 0. In this case, a default MCMemberRecord is returned to the > user and may be used when creating a new multicast group. Currently, > the default values are hard-coded by the multicast module, but that can > be extended in the future or overridden by the user before creating > the group. > > Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> > --- [snip...] > Index: core/multicast.c > === > --- core/multicast.c (revision 8695) > +++ core/multicast.c (working copy) [snip...] > + if (mgid && memcmp(mgid, &mgid0, sizeof mgid0)) { > + spin_lock_irqsave(&port->lock, flags); > + group = mcast_find(port, mgid); > + if (group) > + *rec = group->rec; > + else > + ret = -EADDRNOTAVAIL; > + spin_unlock_irqrestore(&port->lock, flags); > + } else { > + memset(rec, 0, sizeof *rec); > + ib_get_cached_gid(device, port_num, 0, &rec->port_gid); > + rec->pkey = 0x; > + get_random_bytes(&rec->qkey, sizeof rec->qkey); > + rec->join_state = 1; > + } These are the hard coded values for pkey, qkey, and join state mentioned above. Should there be module parameters to override them ? Also, where do the other parameters (components) that are necessary to create a group come from ? Another option would be to obtain all of them from the appropriate (partition based) IPoIB broadcast group. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] multicast: add support for MGID 0
Add support to join a multicast group with MGID = 0, with the actual MGID of the group returned by the SA. The multicast module must be able to handle multiple requests for MGID = 0, with each request causing a new multicast group to be created. Also enhance the API for ib_get_mcmember_rec() to support a requested MGID of 0. In this case, a default MCMemberRecord is returned to the user and may be used when creating a new multicast group. Currently, the default values are hard-coded by the multicast module, but that can be extended in the future or overridden by the user before creating the group. Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> --- Index: include/rdma/ib_multicast.h === --- include/rdma/ib_multicast.h (revision 8647) +++ include/rdma/ib_multicast.h (working copy) @@ -88,8 +88,13 @@ void ib_free_multicast(struct ib_multica * @device: Device associated with the multicast group. * @port_num: Port on the specified device to associate with the multicast * group. - * @mgid: MGID of multicast group. + * @mgid: optional MGID of multicast group. * @rec: Location to copy SA multicast member record. + * + * If an MGID is specified, returns an existing multicast member record if + * one is found for the local port. If no MGID is specified, or the specified + * MGID is 0, returns a multicast member record filled in with default values + * that may be used to create a new multicast group. */ int ib_get_mcmember_rec(struct ib_device *device, u8 port_num, union ib_gid *mgid, struct ib_sa_mcmember_rec *rec); Index: core/multicast.c === --- core/multicast.c(revision 8695) +++ core/multicast.c(working copy) @@ -37,8 +37,10 @@ #include #include #include +#include #include +#include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("InfiniBand multicast membership handling"); @@ -63,6 +65,7 @@ static struct ib_client mcast_client = { static struct ib_event_handler event_handler; static struct workqueue_struct *mcast_wq; +static union ib_gid mgid0; struct mcast_device; @@ -144,7 +147,8 @@ static struct mcast_group *mcast_find(st } static struct mcast_group *mcast_insert(struct mcast_port *port, - struct mcast_group *group) + struct mcast_group *group, + int allow_duplicates) { struct rb_node **link = &port->table.rb_node; struct rb_node *parent = NULL; @@ -161,6 +165,8 @@ static struct mcast_group *mcast_insert( link = &(*link)->rb_left; else if (ret > 0) link = &(*link)->rb_right; + else if (allow_duplicates) + link = &(*link)->rb_left; else return cur_group; } @@ -476,6 +482,10 @@ static void join_handler(int status, str else { spin_lock_irq(&group->port->lock); group->rec = *rec; + if (!memcmp(&mgid0, &group->rec.mgid, sizeof mgid0)) { + rb_erase(&group->node, &group->port->table); + mcast_insert(group->port, group, 1); + } spin_unlock_irq(&group->port->lock); } mcast_work_handler(group); @@ -492,12 +502,16 @@ static struct mcast_group *acquire_group { struct mcast_group *group, *cur_group; unsigned long flags; + int is_mgid0; - spin_lock_irqsave(&port->lock, flags); - group = mcast_find(port, mgid); - if (group) - goto found; - spin_unlock_irqrestore(&port->lock, flags); + is_mgid0 = !memcmp(&mgid0, mgid, sizeof mgid0); + if (!is_mgid0) { + spin_lock_irqsave(&port->lock, flags); + group = mcast_find(port, mgid); + if (group) + goto found; + spin_unlock_irqrestore(&port->lock, flags); + } group = kzalloc(sizeof *group, gfp_mask); if (!group) @@ -511,7 +525,7 @@ static struct mcast_group *acquire_group spin_lock_init(&group->lock); spin_lock_irqsave(&port->lock, flags); - cur_group = mcast_insert(port, group); + cur_group = mcast_insert(port, group, is_mgid0); if (cur_group) { kfree(group); group = cur_group; @@ -619,19 +633,30 @@ int ib_get_mcmember_rec(struct ib_device struct mcast_port *port; struct mcast_group *group; unsigned long flags; + int ret = 0; dev = ib_get_client_data(device, &mcast_client); if (!dev) return -ENODEV; port = &dev->port[port_num - dev->start_port]; - spin_lock_irqsave(&port->lock, flags); - group = mcast_find(port, mgid); - if (group
Re: [openib-general] [SRP] [RFC] Needed changes to support fail-over drivers
Roland Dreier wrote: > > > Why does userspace need to be able to disconnect a connection? > > > There are two options on who will initiate the disconnection: the userspace > > daemon or the ib_srp module. I considered both options and I was not sure > > which one is better. I choose to do it in userspace because it looks a > good > > symmetry that both the disconnection and reconnection will be initiate in > the > > same place. I will accept your comment and change it to the kernel. > > I'm not telling you what to do -- I'm just asking. > > But it does seem to me that the kernel knows better when to disconnect > a connection -- eg I don't think an error completion will be signaled > to userspace. Conversely if a target goes away and comes back with no > IOs submitted in between, then the connection should survive and > there's no reason to disconnect/reconnect. > Yes; however the usermode can still signal the kernel about the events but the kernel will justify on the action to disconnect/reconnect. In your example with no I/O, the kernel can check active_q/pending_q and decide to keep the connection intact. While the target is offline + some apps issue I/Os or in case of error completion/IB errors, the kernel can actively disconnect a connection, moving target to DISCONNECTED state if required. And it does seem to me that the kerne does not know a target off-line until scsi commands timeout and scsi error recovery kick in - this will bring scsi devices to off-line state. Some fail-over drivers may not happy about scsi devices going off-line. So the kernel can rely on usermode's signal to disconnect. In summary I think that we need usermode + kernel working together. Usermode signal the kernel about off-line/on-line events, kernel justify on action disconnect/reconnect or not ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification.
From: Steve Wise <[EMAIL PROTECTED]> Date: Wed, 26 Jul 2006 11:15:43 -0500 > Dave, what do you think about removing the user-space stuff for the > first round of integration? IE: Just add netevents and kernel hooks to > generate them. Sure. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: RE: [PATCH 0/4] Dispatch communication related events to the IB CM > > >BTW,do you think we need this for 2.6.18? > >It does fix a bug when RTU is lost ... > > The chances of an RTU being repeatedly lost, but user data being received over > the same path seems fairly low IMO. My take is that it's probably not needed > for 2.6.18, but that depends on where we are in the 2.6.18 release cycle. Well, I think as long at there's no actual release, theoretical problems are fair game. In -stable only real word issues count :) -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/uverbs: include cosmetic fix
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] IB/uverbs: include cosmetic fix > > > Since uverbs_cmd.c uses lockdep now, it should include > > linux/lockdep.h directly rather than rely on linux/file.h to pull > > it in. > > Current style seems to be to let lockdep.h be included implicitly. A > quick grep shows that none of the files that call lockdep_set_class() > include . > > - R. > > Fair enough. I removed this from mst-for-2.6.18, so there are 3 patches thre now: Author: Michael S. Tsirkin <[EMAIL PROTECTED]> IB/mthca: fix mthca_array_clear thinko commit fcba37034273136e6bc3124a2ab21821743ce9fd Author: Ishai Rabinovitz <[EMAIL PROTECTED]> IB/srp: fix crash in srp_reconnect_target commit 82bf649ad7e434ccb7ba91e2fc5764a5888bbfb4 Author: Sean Hefty <[EMAIL PROTECTED]> IB/cm: fix error handling in ib_send_cm_req -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/uverbs: include cosmetic fix
> Since uverbs_cmd.c uses lockdep now, it should include > linux/lockdep.h directly rather than rely on linux/file.h to pull > it in. Current style seems to be to let lockdep.h be included implicitly. A quick grep shows that none of the files that call lockdep_set_class() include . - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around
> I'll fix that up to check the OUI. Makes sense. Unfortunately at least the Engenio target I have access to uses the same Mellanox OUI: IO Unit Info: port LID:0003 port GID:fe82c902004000e6 change ID: 0002 max controllers: 0x10 controller[ 1] GUID: 0002c902004000e4 vendor ID: 0002c9 device ID: 005a44 IO class : 0100 ID:LSI Storage Systems SRP Driver 200400a0b80bdd41 service entries: 1 service[ 0]: 200400a0b80bdd41 / SRP.T10:200400A0B80BDD41 but still I think it's better than nothing to only activate the workaround for GUIDs starting with 0002c9. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM
>BTW,do you think we need this for 2.6.18? >It does fix a bug when RTU is lost ... The chances of an RTU being repeatedly lost, but user data being received over the same path seems fairly low IMO. My take is that it's probably not needed for 2.6.18, but that depends on where we are in the 2.6.18 release cycle. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM
Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > Subject: Re: [PATCH 0/4] Dispatch communication related events to the IB CM > > I don't believe that there were any objections to this patch, and only one > minor > change request to print a warning message. Roland, do the mthca changes look > okay to commit? > > - Sean BTW,do you think we need this for 2.6.18? It does fix a bug when RTU is lost ... -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 0/4] Dispatch communication related events to the IB CM
I don't believe that there were any objections to this patch, and only one minor change request to print a warning message. Roland, do the mthca changes look okay to commit? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] restore missing PCI registers after reset
On Wed, Jul 26, 2006 at 07:32:26PM +0300, Michael S. Tsirkin wrote: > Quoting r. Greg KH <[EMAIL PROTECTED]>: > > I think pci_restore_state() already restores the msi and msix state, > > take a look at the latest kernel version :) > > Yes, I know :) > but I am not talking abotu MSI/MSI-X, I am talking about the following: > > > > PCI-X device: PCI-X command register > > > > PCI-X bridge: upstream and downstream split transaction registers > > > > PCI Express : PCI Express device control and link control registers > > these register values include maxumum MTU for PCI express and other vital > data. Make up a patch that shows how you would save these in a generic way and we can discuss it. I know people have talked about saving the extended PCI config space for devices that need it, so that might be all you need to do here. thanks, greg k-h ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around
Quoting r. Vu Pham <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] RFC: srp filesystem data corruption problem/work-around > > Michael S. Tsirkin wrote: > > Quoting r. Vu Pham <[EMAIL PROTECTED]>: > >>> Right now this workaround affects all targets unconditionally. > >>> > >> Can we rework the patch to have mellanox_workarounds=0 by > >> default? > > > > Hmm ... since this is a data corruption issue, seems to me the safe > > setting should be the default one. No? > > > > As Roland pointed out "Right now this workaround affects all targets > unconditionally" I'll fix that up to check the OUI. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around
Michael S. Tsirkin wrote: > Quoting r. Vu Pham <[EMAIL PROTECTED]>: >>> Right now this workaround affects all targets unconditionally. >>> >> Can we rework the patch to have mellanox_workarounds=0 by >> default? > > Hmm ... since this is a data corruption issue, seems to me the safe > setting should be the default one. No? > As Roland pointed out "Right now this workaround affects all targets unconditionally" We can set mellanox_workarounds=0 by default to avoid affecting other targets. Whoever test with Mellanox target will pass mellanox_workaround=1 by loading time ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 3/4] IB CM: register and handle COMM_EST events on a QP
Or Gerlitz wrote: > Generally, i guess you need to insert the local QPN into the rb_tree > ***before*** sending the REP not after it. That is what the patch does. > Can you state what is the usage being done with the local QPNs in the > timeout on REQ flow? I don't quite follow what you're asking here. Local QPNs are tracked when a REQ is sent. If a second REQ is sent using the same QPN, it will fail with an address in use error. The local QPN is not removed from the table until the connection fails, or we exit the timewait state after being disconnected. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users
Andrew Friedley wrote: > I figured you would say that. So this would be a separate polling > interface from a CQ or what the RDMA CM provides? Yes. This is one of the issues that I have with the userspace implementation. For a raw IB interface, users can end up needing a half-dozen libraries, each with their own event interface. > I see a possible race condition though - consider two processes calling > ib_get_mcmember_rec(). Both of them return from this before either can > call ib_join_multicast() and create the multicast group. Is it possible > for the same MGID to be returned from ib_get_mcmember_rec() in this > scenario? I probably wasn't being clear. ib_get_mcmember_rec() would return an MCMemberRecord with MGID 0, since that was what was requested. Other default parameters needed to create the group would be filled in. The actual MGID has to come from the SA through the join call. > Thought I'd try. Are you saying that just because a join has completed, > that doesn't imply the network is fully ready for handling multicast > messages for that group? I believe that is the case. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] restore missing PCI registers after reset
Quoting r. Greg KH <[EMAIL PROTECTED]>: > I think pci_restore_state() already restores the msi and msix state, > take a look at the latest kernel version :) Yes, I know :) but I am not talking abotu MSI/MSI-X, I am talking about the following: > > > PCI-X device: PCI-X command register > > > PCI-X bridge: upstream and downstream split transaction registers > > > PCI Express : PCI Express device control and link control registers these register values include maxumum MTU for PCI express and other vital data. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] restore missing PCI registers after reset
On Wed, Jul 26, 2006 at 01:29:44PM +0300, Michael S. Tsirkin wrote: > Quoting r. Greg KH <[EMAIL PROTECTED]>: > > Subject: [patch 02/45] IB/mthca: restore missing PCI registers after reset > > -- > > mthca does not restore the following PCI-X/PCI Express registers after > > reset: > > PCI-X device: PCI-X command register > > PCI-X bridge: upstream and downstream split transaction registers > > PCI Express : PCI Express device control and link control registers > > > > This causes instability and/or bad performance on systems where one of > > these registers is set to a non-default value by BIOS. > > > > Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> > > Signed-off-by: Chris Wright <[EMAIL PROTECTED]> > > Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> > > By the way, Greg, this code is completely generic, and the same seems to apply > to all PCI-X/PCI-Express devices - should not pci_restore_state and > friends really know about these registers, as well? > > What do you think? I think pci_restore_state() already restores the msi and msix state, take a look at the latest kernel version :) thanks, greg k-h ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification.
On Wed, 2006-07-26 at 13:39 +1000, Herbert Xu wrote: > On Tue, Jul 25, 2006 at 10:05:40AM -0500, Steve Wise wrote: > > > > But they really are seeing a delete followed by an add. That's what the > > kernel is doing. > > Actually that's the other thing I don't really like. The user-space > monitor may perceive that a route was actually deleted and replaced > by a new one even though this isn't what's happening at all. > > In fact the problem here is that you're sending route notifications > when it's really the dst_entry that's changing. User-space as it > stands only get notifications about fib changes which is quite different > from changes to the transient dst_entry objects which only exist in the > route cache. > > Is anyone actually going to use the user-space interface of this? If not > perhaps we should wait until someone really needs it before adding the > netlink part of the patch. > > We can change the kernel interface at will so if we make a mistake with > netevent it can be easily corrected. For user-space though the rules > are totally different. I'd really hate to be stuck with an interface > which turns out to not be the one that people actually want to have. > The user interface is not needed for the rdma users. They are all in kernel. I added this at the request of reviewers of this patch. I have no problem at all defering the rtnetlink integration until someone really needs it. > > The rdma driver needs to update all established rdma connections that > > are using the next-hop information of the existing route and make them > > use the next-hop information of the new route. In addition, the rdma > > driver might have a reference to the old dst entry. So it can release > > that ref and add a ref to the new dst entry. > > Do you really need the old route for the user-space part of your patch? > Not if we remove the user-space parts. :-) > > I have to admit I'm a little fuzzy on the routing stuff. The main > > netevents I've utilized in the the rdma driver I'm writing is the > > neighbour update event and the redirect event. Route add/del was added > > for completeness of "routing" netevents. > > So you mean you aren't going to use the route notifications? In that case > we should probably just drop them and add them when someone actually needs > it. At that point they can tell us what semantics they want from it :) > This is fine by me too! The key events needed for rdma are: neighbour update events rtredirect events pmtu change events > > Can you expand further or point me to code where the IP stack "flushes > > its tables" when routes are changed? > > Grep for rt_cache_flush in net/ipv4/fib_hash.c. > thanks. Dave, what do you think about removing the user-space stuff for the first round of integration? IE: Just add netevents and kernel hooks to generate them. Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/mthca: fix mthca_array_clear thinko
Discovered by Ali Ayoub: mthca_array_clear does not clear the slot if the used count is positive. This leads to crashes in mthca_qp_event since that uses mthca_array_get to check that the qp is valid. Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> diff --git a/drivers/infiniband/hw/mthca/mthca_allocator.c b/drivers/infiniband/hw/mthca/mthca_allocator.c index 9ba3211..848e583 100644 --- a/drivers/infiniband/hw/mthca/mthca_allocator.c +++ b/drivers/infiniband/hw/mthca/mthca_allocator.c @@ -144,7 +144,9 @@ void mthca_array_clear(struct mthca_arra if (--array->page_list[p].used == 0) { free_page((unsigned long) array->page_list[p].page); array->page_list[p].page = NULL; - } + } else + array->page_list[p].page[index & (PAGE_SIZE / + sizeof (void *) - 1)] = NULL; if (array->page_list[p].used < 0) pr_debug("Array %p index %d page %d with ref count %d < 0\n", -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users
Sean Hefty wrote: > I was trying to ask if there was any way for the processes to generate unique > addresses. For example, what TCP port number do the processes listen on when > establishing their out of band connections? Is there some way that you can > map > the addresses that are used for out of band communication to a multicast IP > address, such that the processes get unique addresses? From reading down into > your mail, it doesn't sound like this would help much. Not without breaking many layers of abstraction.. although TCP is all we support for OOB right now, the framework is in place for supporting other (non-TCP/IP) protocols in the future. I'm asking some of our runtime developers if there's anything I could use.. doesn't look like it right now. > I think the same basic API can be exposed in userspace. It may be possible to > expose a couple of extra helper functions to simplify creating and joining a > group, but I'm not sure if they will be worth it. The existing interface seems reasonable - I don't see how adding extra functions would improve anything. > This doesn't end up working well for userspace apps. To get a callback, the > library ends up needing to create a thread to poll for events from the kernel. > It makes more sense to give the application control over the threading, and > let > it poll for the events. I figured you would say that. So this would be a separate polling interface from a CQ or what the RDMA CM provides? > Well, after looking at the code, an MGID of 0 doesn't currently work. The > implementation doesn't handle it. I worked on a design to add support for > MGID > 0 to the multicast module, and will start on it in the next day or so. Okay, I look forward to seeing the patch. > Another thought I had is to allow ib_get_mcmember_rec() be called with an MGID > of 0. Doing so would return an MCMemberRecord with reasonable default values > that could be used when creating a group. (The returned values would either > be > hard-coded or copy those from the first join on a given port, if one had > occurred. In almost all cases, the first join would come from ipoib.) This would be very good - it would allow for adjusting such values before the group is actually joined. I see a possible race condition though - consider two processes calling ib_get_mcmember_rec(). Both of them return from this before either can call ib_join_multicast() and create the multicast group. Is it possible for the same MGID to be returned from ib_get_mcmember_rec() in this scenario? > There is no way to do this. Note that there may be a delay between a node > joining a group and the programming of the switch tables. Thought I'd try. Are you saying that just because a join has completed, that doesn't imply the network is fully ready for handling multicast messages for that group? Andrew ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] controlling IPoIB debug
> I can't disable CONFIG_INFINIBAND_IPOIB_DEBUG, that is i was > expecting to be able to press "n" on the "IP-over-InfiniBand debugging" > submenu of "IP-over-InfiniBand" and it does not have any impact. Debugging is forced to be on unless you set EMBEDDED=y. This is so that everyone will ship modules with debugging enabled, so that when someone has a problem we can actually debug it. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] new user level branch for OFED 1.1
Hi All, Toward OFED 1.1 release I have created the 1.1 branch: https://openib.org/svn/gen2/branches/1.1/ This branch includes the src/userspace/ based on trunk r8680, and all the other ofed staff. Tziporet ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [Bug 181] New: HPL test always failed
http://openib.org/bugzilla/show_bug.cgi?id=181 Summary: HPL test always failed Product: OpenFabrics Windows Version: unspecified Platform: X86-64 OS/Version: Other Status: NEW Severity: blocker Priority: P2 Component: WSD AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] We tried to run without RDMA read,and with low level driver HCA MT25208. command line: mpiexec -hosts 4 hostname1 2 hostname2 2. hpl.exe example of error msg: job aborted: rank: node: exit code: message 0: parker6: terminated 1: parker6: terminated 2: parker7: terminated 3: parker7: terminated 4: parker8: fatal error: Fatal error in MPI_Send: Internal MPI error!, error stack: MPI_Send(172)...: MPI_Send(buf=0x02B1A9B8, count=17820, MPI_DOUBLE, dest=3, tag=1001, comm=0x8402) failed MPIDI_CH3I_Progress(165): handle_sock_op failed handle_new_message_read(422): MPIDI_CH3U_Handle_recv_pkt(1359): received unknown packet type (type=1071575908) 5: parker8: terminated 6: parker9: terminated 7: parker9: terminated error analysis - 4: mpi has detected a fatal error and aborted hpl.exe run on parker8 error analysis - example of HPL.dat --HPL.dat--- HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 4 # of problems sizes (N) 5100 3000 3400 3500 Ns 4 # of NBs 100 97 95 90 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 3 # of process grids (P x Q) 2 4 4 Ps 4 2 2 Qs 16.0 threshold 3 # of panel fact 0 1 2 PFACTs (0=left, 1=Crout, 2=Right) 2 # of recursive stopping criterium 2 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 3 # of recursive panel fact. 0 1 2 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 0 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) --- --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] controlling IPoIB debug
Roland, This will probably turn into newbee question, but anyway: I can't disable CONFIG_INFINIBAND_IPOIB_DEBUG, that is i was expecting to be able to press "n" on the "IP-over-InfiniBand debugging" submenu of "IP-over-InfiniBand" and it does not have any impact. Also, with CONFIG_INFINIBAND_IPOIB_DEBUG being set, no "*" is marked near it, so it looks like: IP-over-InfiniBand --- IP-over-InfiniBand debugging [ ] IP-over-InfiniBand data path debugging Do i miss anything here, or there is some problem? attached is my .config Or.# # Automatically generated make config: don't edit # Linux kernel version: 2.6.18-rc2 # Wed Jul 26 17:21:21 2006 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y # CONFIG_TASK_DELAY_ACCT is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_CPUSETS is not set # CONFIG_RELAY is not set CONFIG_INITRAMFS_SOURCE="" CONFIG_UID16=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_RT_MUTEXES=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set CONFIG_STOP_MACHINE=y # # Block layer # CONFIG_LBD=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set # CONFIG_MK8 is not set # CONFIG_MPSC is not set CONFIG_GENERIC_CPU=y CONFIG_X86_L1_CACHE_BYTES=128 CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_X86_INTERNODE_CACHE_BYTES=128 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_X86_HT=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y CONFIG_SMP=y CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y CONFIG_NUMA=y CONFIG_K8_NUMA=y CONFIG_NODES_SHIFT=6 CONFIG_X86_64_ACPI_NUMA=y CONFIG_NUMA_EMU=y CONFIG_ARCH_DISCONTIGMEM_ENABLE=y CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_FLATMEM_MANUAL is not set CONFIG_DISCONTIGMEM_MANUAL=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_DISCONTIGMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_NEED_MULTIPLE_NODES=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_MIGRATION=y CONFIG_RESOURCES_64BIT=y CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y CONFIG_NR_CPUS=32 CONFIG_HOTPLUG_CPU=y CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_IOMMU=y # CONFIG_CALGARY_IOMMU is not set CONFIG_SWIOTLB=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_AMD=y # CONFIG_KEXEC is not set # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x20 CONFIG_SECCOMP=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_1000 is not set CONFIG_HZ=250 # CONFIG_REORDER is not set CONFIG_K8_NB=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_ISA_DMA_API=y CONFIG_GENERIC_PENDING_IRQ=y # # Power management options # CONFIG_PM=y # CONFIG_PM_LEGACY is not set # CONFIG_PM_DEBUG is not set CONFIG_SOFTWARE_SUSPEND=y CONFIG_PM_STD_PARTITION="" CONFIG_SUSPEND_SMP=y # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_SLEEP_PROC_FS=y CONFIG_ACPI_SLEEP_PROC_SLEEP=y CONFIG_ACPI_AC=y CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y # CONFIG_ACPI_VIDEO is not set # CONFIG_ACPI_HOTKEY is not set CONFIG_ACPI_FAN=y CONFIG_ACPI_DOCK=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_HOTPLUG_CPU=y CONFIG_ACPI_THERMAL=y CONFIG_ACPI_NUMA=y # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set CONF
Re: [openib-general] [PATCH 0/4] Dispatch communication relatedevents to the IB CM
> Quoting r. Sean Hefty <[EMAIL PROTECTED]>: > > Subject: RE: [PATCH 0/4] Dispatch communication relatedevents to the IB > CM > > > > >Perhaps we should pursue changing this in the IBTA spec. Being able to > > >post to the SQ while in RTR makes handling of the Comm Est/RTU race > with > > >the CQ callback much easier to handle. > > > > > >It would be better if the IB spec permitted posting to the SQ in RTR > but > > >indicated the SQ would not be processed until the QP moved to RTS. I > > >believe the present Mellanox silicon/firmware implements such behavior. > > > > I think it would be simpler to transition the QP to RTS after sending a > REP, > > with the restriction that a user may not post sends until an RTU is > received, a > > communication establish event occurs, or a receive message completes on > the QP. This approach will not work. If the QP is in RTS the Communication established event will never be generated. Hence the lost RTU case would not be properly handled and the ULP would need to take on the burden. Its much better to isolate the solution to the CM and let the ULP post to the send Q in RTR. Todd Rimmer Chief Systems Architect SilverStorm Technologies Voice: 610-233-4852 Fax: 610-233-4777 [EMAIL PROTECTED]www.SilverStorm.com ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] restore missing PCI registers after reset
Quoting r. Greg KH <[EMAIL PROTECTED]>: > Subject: [patch 02/45] IB/mthca: restore missing PCI registers after reset > -- > mthca does not restore the following PCI-X/PCI Express registers after reset: > PCI-X device: PCI-X command register > PCI-X bridge: upstream and downstream split transaction registers > PCI Express : PCI Express device control and link control registers > > This causes instability and/or bad performance on systems where one of > these registers is set to a non-default value by BIOS. > > Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> > Signed-off-by: Chris Wright <[EMAIL PROTECTED]> > Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> By the way, Greg, this code is completely generic, and the same seems to apply to all PCI-X/PCI-Express devices - should not pci_restore_state and friends really know about these registers, as well? What do you think? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] connection loss handling in MTHCA
hello all,i have a query about the "connection loss" handling in the mthca driverconsider the following situation, during the data transfer between two connected endpoints if one side end point(HCA level) detects that coneection is lost due to some reason such as "receive queue empty on remote end " or "TPT error for data buffer on remote end" then how it will be handled in the mthca implementationwhat happens to the WRs which are in progress and outstanding??is there any asynchronous event generated correspondig to that??-Mahesh Find out what India is talking about on Yahoo! Answers India. SMS memory full? Store all your important SMS in your Yahoo! Mail. Register for SMS BAK UP now! ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] RFC: srp filesystem data corruption problem/work-around
Quoting r. Vu Pham <[EMAIL PROTECTED]>: > > Right now this workaround affects all targets unconditionally. > > > > Can we rework the patch to have mellanox_workarounds=0 by > default? Hmm ... since this is a data corruption issue, seems to me the safe setting should be the default one. No? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] lockdep: don't pull in includes when lockdep disabled
On Wed, Jul 26, 2006 at 08:33:19AM +0200, Arjan van de Ven wrote: > On Wed, 2006-07-26 at 09:26 +0300, Michael S. Tsirkin wrote: > > Ingo, does the following look good to you? > > > > Do not pull in various includes through lockdep.h if lockdep is disabled. > > Hi, > > can you tell us what this fixes? Eg is there a specific problem? [raises hand] Zillions of warnings on m68k allmodconfig. And, yes, patch removes them. In file included from ... from ... include/linux/list.h: In function `__list_add_rcu': include/linux/list.h:89: warning: implicit declaration of function `smp_wmb' > I mean... we're adding ifdefs so there better be a real good reason for > them fixing something real would be such a reason ;-) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general