Re: [openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB
Thanks for the comments >> To fix it, this patch adds a dev field to struct ipoib_neigh which is used >> instead of the struct neighbour dev one. > > It seems that in this design, if multiple ipoib interfaces are present, we > might > get an skb such that skb->dev will be different from the new dev field in > struct > ipoib_neigh. > > It seems that the result will be that the packet will be sent on a wrong > interface. > Right? > I don't see how. The field dev in ipoib_neigh doesn't take part in interface selection. As I see it, skb travels this path: 1. Passed to bond_dev->hard_start_xmit 2. bond_dev->hard_start_xmit chooses the current active interface, changes skb->dev and enqueues it back for xmittig. >> In addition, if an IPoIB device is removed before bonding is unloaded it may >> cause bond0 neighbours (neighbours that point to bond0) to exist after the >> IPoIB >> device no longer exist. This is why a neighbour cleanup is required during >> device >> cleanup. This cleanup scans the arp cache and the ndisc cache to find there >> neighbours of bond0 which refer also to the relevant ibX. Also, when >> ib_ipoib module is >> unloaded, the neighbour destructor must be set to NULL because the neighbour >> function is in >> ib_ipoib. >> For this neigh table cleanup, it is required to export the symbol nd_tbl >> just like the symbol arp_tbl is. > > I wonder about this: is it really true that any allocated neighbour is always > in > either arp_tbl or nd_tbl? For example, could some code have called neigh_hold > and retained a neighbour that is not in either one of these tables? > I got the assumption about neighbours living in one of these 2 tables from observation and code reading. I preferred that that on keeping track of all ipoib_neighs and putting them in a list. However, I could do that instead of neigh_table scanning. Do you think it's better? For the example... I didn't understand it. Could you please explain? >> During my tests I found that when running >> >> 1. modprobe -r ib_mthca (to delete IPoIB interfaces) >> 2. ping somewhere on the subnet of bond0 >> >> I get this stack dump (which ends with kernel death) >> [] skb_under_panic+0x5c/0x60 >> [] :ib_ipoib:ipoib_hard_header+0xa6/0xc0 >> [] arp_create+0x120/0x226 >> [] arp_send+0x25/0x3b >> [] arp_solicit+0x186/0x195 >> [] neigh_timer_handler+0x2b5/0x309 >> [] neigh_timer_handler+0x0/0x309 >> [] run_timer_softirq+0x130/0x19e >> [] __do_softirq+0x55/0xc3 >> [] call_softirq+0x1c/0x28 >> [] do_softirq+0x2c/0x7d >> [] smp_apic_timer_interrupt+0x57/0x6a >> [] mwait_idle+0x0/0x45 >> [] apic_timer_interrupt+0x66/0x70 >> [] mwait_idle+0x42/0x45 >> [] cpu_idle+0x8b/0xae >> [] start_secondary+0x47f/0x48f >> >> The only way I found to avoid this (for now) is to check skb headroom in >> ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB >> operation and it seems to solve my problem. However, I would be happy to >> hear what >> others think of this last issue. > > As I said, this seems to indicate a problem in the bonding code. > But what will happen after you error out in ipoib_hard_header? > Is the packet dropped? What might break as a result? > I will check the hard_header_len issue in the bonding code more carefully. From first look it seems that bonding does borrow the hard_header_len. Also, my checks show that it is safe to return with error from hard_header(). For example, in neigh_connected_output: err = dev->hard_header(skb, dev, ntohs(skb->protocol), neigh->ha, NULL, skb->len); read_unlock_bh(&neigh->lock); if (err >= 0) err = neigh->ops->queue_xmit(skb); else { err = -EINVAL; kfree_skb(skb); >> I would really appreciate comments. >> >> thanks >> >> -MoniS > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB
Hi, This post follows a previous one, regarding required changes to IPoIB to enable it to work with bonding. Please find it here: http://openib.org/pipermail/openib-general/2007-February/032598.html This patch version adds fixes to the comments from Michael Tsirkin from the last post. IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Combing these two flows, there is a hole if some code at ipoib (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev is an ipoib device so for example netdev_priv(n->dev) would be of type struct ipoib_dev_priv. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one. In addition, if an IPoIB device is removed before bonding is unloaded it may cause bond0 neighbours (neighbours that point to bond0) to exist after the IPoIB device no longer exist. This is why a neighbour cleanup is required during device cleanup. This cleanup scans the arp cache and the ndisc cache to find there neighbours of bond0 which refer also to the relevant ibX. Also, when ib_ipoib module is unloaded, the neighbour destructor must be set to NULL because the neighbour function is in ib_ipoib. For this neigh table cleanup, it is required to export the symbol nd_tbl just like the symbol arp_tbl is. During my tests I found that when running 1. modprobe -r ib_mthca (to delete IPoIB interfaces) 2. ping somewhere on the subnet of bond0 I get this stack dump (which ends with kernel death) [] skb_under_panic+0x5c/0x60 [] :ib_ipoib:ipoib_hard_header+0xa6/0xc0 [] arp_create+0x120/0x226 [] arp_send+0x25/0x3b [] arp_solicit+0x186/0x195 [] neigh_timer_handler+0x2b5/0x309 [] neigh_timer_handler+0x0/0x309 [] run_timer_softirq+0x130/0x19e [] __do_softirq+0x55/0xc3 [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] smp_apic_timer_interrupt+0x57/0x6a [] mwait_idle+0x0/0x45 [] apic_timer_interrupt+0x66/0x70 [] mwait_idle+0x42/0x45 [] cpu_idle+0x8b/0xae [] start_secondary+0x47f/0x48f The only way I found to avoid this (for now) is to check skb headroom in ipoib_hard_header. I guess that this safety check doesn't harm regular IPoIB operation and it seems to solve my problem. However, I would be happy to hear what others think of this last issue. I would really appreciate comments. thanks -MoniS -- diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 07deee8..31bc6d8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -216,6 +216,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 705eb1d..0e3953e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -48,6 +48,8 @@ #include #include #include +#include +#include #define IPOIB_QPN(ha) (be32_to_cpup((__be32 *) ha) & 0xff) @@ -70,6 +72,7 @@ module_param_named(debug_level, ipoib_de MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0"); #endif +static int ipoib_at_exit = 0; struct ipoib_path_iter { struct net_device *dev; struct ipoib_path path; @@ -490,7 +493,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -735,6 +738,9 @@ static int ipoib_hard_header(struct sk_b { struct ipoib_header *header;
Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
Michael S. Tsirkin wrote: >> Quoting Moni Shoua <[EMAIL PROTECTED]>: >> Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of >> linux neighbour >> >> >>> Another concern: assume that one device goes away (e.g. hotplug). >>> It seems that neighbours whose dev field point to another device, will not >>> be destroyed. >>> Correct? >> I agree. >> >>> Therefore in your design, it seems that to_ipoib_neigh()->dev >>> will get us a pointer to device that has been removed already. >>> >> I agree that this is a problem. > > I think we can solve this if we track all ipoib neighbours, like we do for > old kernels, > and then flush ipoib neighbours on any hotplug event. > Roland, does this sound too awful? > >> It think it would be best to prevent an IPoIB device >> from disappearing or from ib_ipoib from being unloaded as long as IPoIB >> device is a slave. Unfortunately, I don't see how this can be done just >> by fixing something in bonding or IPoIB. > > So hotplug is blocked potentially forever? > This does not sound good. OK, so I'm dropping this thought. > >> However, any slave knows he has a master (dev->master). >> What do you think about a solution where IPoIB first tries to clean up the >> neighbours that belong to it's master before deleting the IPoIB device? > > How? Let me know what do you think about that. I hope this makes sense. in IPoIB, before calling unregister_netdev do for each kernel neighbour n if n->dev == ib_dev->master delete n Michael, as I see it we have to deal with 2 cases. 1. IPoIB device is deleted (unregister_netdev) - IPoIB netdev in not in the kernel's address space. we have to make sure that no one holds a pointer to it after it is deleted. 2 ib_ipoib module is unloaded (modprobe -r) - the ipoib_neigh_destructor is not in the kernel's address space. we have to make sure no one calls to it after the module is unloaded. I think that if nothing prevents the execution of the "code" above it serves both cases. Do you see any problem with that? Do I have to maintain my own list of neighbours or use the kernel's arp table for that? I am trying to study the neighbour cleanup function and do something like that but I would be happy to learn from others as well. >>>> Furthermore, bond_setup_by_slave is called only for non >>>> Ethernet devices (we consider to change the logic to "called only for >>>> IPoIB devices just for safety). >>> Why is this necessary, BTW? >>> >> If we don't do that, we get a memory leak because the neigh destructor will >> never be called for non IPoIB devices although they carry ipoib_neigh >> with them. > > How can this happen? If it does, I think we are back to where we started: > to_ipoib_neigh is broken for non-IPoIB device. > I thought you said only devices of the same type can be paired? > > The scenario is: 1. kernel allocates a neighbour structure for bond0, puts it on a skb and passed it to bond xmit function. 2. bond0 passes the skb to ipoib 3. ipoib allocates ipoib_neigh and hangs it on linux neighbour. 4. a while after that, the kernel wants to destroy the neighbour (cleanup) but doesn't call ipoib_neigh_destructor because it the neigh setup registered the destructor for ibX device. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
> Another concern: assume that one device goes away (e.g. hotplug). > It seems that neighbours whose dev field point to another device, will not be > destroyed. > Correct? I agree. > > Therefore in your design, it seems that to_ipoib_neigh()->dev > will get us a pointer to device that has been removed already. > I agree that this is a problem. It think it would be best to prevent an IPoIB device from disappearing or from ib_ipoib from being unloaded as long as IPoIB device is a slave. Unfortunately, I don't see how this can be done just by fixing something in bonding or IPoIB. However, any slave knows he has a master (dev->master). What do you think about a solution where IPoIB first tries to clean up the neighbours that belong to it's master before deleting the IPoIB device? >> Furthermore, bond_setup_by_slave is called only for non >> Ethernet devices (we consider to change the logic to "called only for >> IPoIB devices just for safety). > > Why is this necessary, BTW? > If we don't do that, we get a memory leak because the neigh destructor will never be called for non IPoIB devices although they carry ipoib_neigh with them. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
Michael S. Tsirkin wrote: >>-- >>IPoIB uses a two layer neighboring scheme, such that for each struct neighbour >>whose device is an ipoib one, there is a struct ipoib_neigh buddy which is >>created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) >>call. >> >>When using the bonding driver, neighbours are created by the net stack on >>behalf >>of the bonding (master) device. On the tx flow the bonding code gets an skb >>such >>that skb->dev points to the master device, it changes this skb to point on the >>slave device and calls the slave hard_start_xmit function. >> >>Combing these two flows, there is a hole if some code at ipoib >>(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, >>n->dev >>is an ipoib device so for example netdev_priv(n->dev) would be of type struct >>ipoib_dev_priv. > > > Could you plese elaborate how ipoib_neigh_destructor comes to be called at > all? > At what point does ipoib_neigh_setup_dev get called? > > The bond device uses its slave's neigh_setup function. Please look at line 19 below from the bonding code. static void bond_setup_by_slave(struct net_device *bond_dev, 11 + struct net_device *slave_dev) 12 +{ 13 + bond_dev->hard_header = slave_dev->hard_header; 14 + bond_dev->rebuild_header= slave_dev->rebuild_header; 15 + bond_dev->hard_header_cache = slave_dev->hard_header_cache; 16 + bond_dev->header_cache_update = slave_dev->header_cache_update; 17 + bond_dev->hard_header_parse = slave_dev->hard_header_parse; 18 + 19 + bond_dev->neigh_setup = slave_dev->neigh_setup; 20 + 21 + bond_dev->type = slave_dev->type; 22 + bond_dev->hard_header_len = slave_dev->hard_header_len; 23 + bond_dev->addr_len = slave_dev->addr_len; 24 + 25 + memcpy(bond_dev->broadcast, slave_dev->broadcast, 26 + slave_dev->addr_len); 27 +} >>To fix it, this patch adds a dev field to struct ipoib_neigh which is used >>instead of the struct neighbour dev one. > > > What I am concerned with is - if the master is not an IPoIB device, > what guarantee do we have that to_ipoib_neigh will return 0 > and not part of an actual hardware address? > > Without bonding, the reason is that dev points to an ipoib device, > so we know hw address is 20 bytes. > I guess you meant "if the slave is not an IPoIB device"... The bond device doesn't allow devices of different types to be grouped together as its slaves. Furthermore, bond_setup_by_slave is called only for non Ethernet devices (we consider to change the logic to "called only for IPoIB devices just for safety). ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
Michael, Roland, I'd appreciate if you take a look at this and give your comments. The patch here refers to this thread about adding bonding support for IPoIB interfaces and is necessary for it to work properly. http://openib.org/pipermail/openib-general/2007-January/031934.html The patch here is for upstream kernel while there is a version of the patch for OFED as well (for kernels up to 2.6.16) http://openib.org/pipermail/openib-general/2007-January/031935.html thanks - MoniS -- IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Combing these two flows, there is a hole if some code at ipoib (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev is an ipoib device so for example netdev_priv(n->dev) would be of type struct ipoib_dev_priv. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one. Signed-off-by: Moni Shoua <[EMAIL PROTECTED]> Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> --- ipoib.h |4 +++- ipoib_main.c | 23 +-- ipoib_multicast.c |2 +- 3 files changed, 17 insertions(+), 12 deletions(-) Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib.h === --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib.h2007-01-22 12:11:25.0 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-22 12:18:06.101698456 +0200 @@ -216,6 +216,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-01-22 12:11:33.0 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c2007-01-22 12:34:57.599156580 +0200 @@ -490,7 +490,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -769,32 +769,34 @@ static void ipoib_set_mcast_list(struct static void ipoib_neigh_destructor(struct neighbour *n) { struct ipoib_neigh *neigh; - struct ipoib_dev_priv *priv = netdev_priv(n->dev); + struct ipoib_dev_priv *priv; unsigned long flags; struct ipoib_ah *ah = NULL; - ipoib_dbg(priv, - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", - IPOIB_QPN(n->ha), - IPOIB_GID_RAW_ARG(n->ha + 4)); - - spin_lock_irqsave(&priv->lock, flags); neigh = *to_ipoib_neigh(n); if (neigh) { + priv = netdev_priv(neigh->dev); + ipoib_dbg(priv, + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", + IPOIB_QPN(n->ha), + IPOIB_GID_RAW_ARG(n->ha + 4)); + + spin_lock_irqsave(&priv->lock, flags); if (neigh->ah) ah = neigh->ah; list_del(&neigh->list); ipoib_neigh_free(n->dev, neigh); + spin_unlock_irqrestore(&priv->lock, flags); } - spin_unlock_irqrestore(&priv->lock, flags); if (ah) ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struc
Re: [openib-general] IB/mthca: question about HCA profile module parameters
Dotan Barak wrote: > Hi Moni. > > I tried to use the mthca module parameter: for example i tried to change > the number of QPs. > > I got several failures when i used the HCA 25204: > * sometimes i got the following error message (when using big values, > for example 512K QPs): > ib_mthca: :0c: INIT_HCA command failed aborting. > ib_mthca: probe of :0c: failed with error -16 > * when i tried to use small amount of QPs (1024) the machine just hanged > and i noticed a kernel oops message on the console > OK. So I ran more tests on my setup which now include - Dual x86_64 processor (Intel Xeon) - 1GB RAM - 25204 HCA - fw_ver=1.1.0 In the range of 16K - to 256K of value for num_qp I got no errors. For lower and higher values I got errors from INIT_HCA and (not always and just for very low values) a machine hung. Do you have the Oops saved somewhere? Can you put it here please? > > Did you verify the HCA profile module parameter feature? As I mentioned earlier, I verified that non default values can be assigned and that the HCA works for some selected values. I also noticed that illegal cause the driver to throw a message to the kernel log. However, I didn't test the exact behaviout of all possible values for each profile variable. > Is there is any known limitation for the values that should be used? > (for example: only values which are power of two) > > I guess that it is clear that there are hardware limitations that don't allow setting of any value. Unfotunately, even after looking for them in the PRM, I couldn't figure out which are they. The software limits the value to be a power of 2 and corrects the users if they try to set a wrong value (to the nearest power of 2). In that case a warning message is thrown to the kernel log. > thanks > Dotan > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IB/mthca: question about HCA profile module parameters
Dotan Barak wrote: > Hi Moni. > > I tried to use the mthca module parameter: for example i tried to change > the number of QPs. > > I got several failures when i used the HCA 25204: > * sometimes i got the following error message (when using big values, > for example 512K QPs): > ib_mthca: :0c: INIT_HCA command failed aborting. > ib_mthca: probe of :0c: failed with error -16 > * when i tried to use small amount of QPs (1024) the machine just hanged > and i noticed a kernel oops message on the console > > > Did you verify the HCA profile module parameter feature? > Is there is any known limitation for the values that should be used? > (for example: only values which are power of two) > > > thanks > Dotan > Hi Dotan, I verified the profile feature up to the level of successful modprobe. I am working now to look into your report. thanks ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Add bonding suuport to OFED
Vladimir Sokolovsky wrote: > Hi Moni, > Please review the following patch to ib-bonding.spec: > > Use %{_prefix} in RPM spec file instead of hard-coded /usr/local/ofed. > > Signed-off-by: Vladimir Sokolovsky <[EMAIL PROTECTED]> > --- > > diff --git a/ib-bonding.spec b/ib-bonding.spec > index db02fe8..77e51e0 100644 > --- a/ib-bonding.spec > +++ b/ib-bonding.spec > @@ -5,6 +5,8 @@ > > %define _build_name_fmt > %%{ARCH}/%%{NAME}-%%{VERSION}-%%{RELEASE}-%%{DISTRIBUTION}-%%{ARCH}.rpm > > +%{!?_prefix: %define _prefix /usr/local/ofed} > + > Summary : ib_bonding patch and modules. > Name: %{name} > Version : %{version} > @@ -39,11 +41,11 @@ fi > %install > [ "${RPM_BUILD_ROOT}" != "/" -a -d ${RPM_BUILD_ROOT} ] && rm -rf > ${RPM_BUILD_ROOT} > mkdir -p > ${RPM_BUILD_ROOT}/lib/modules/%{kversion}/kernel/drivers/net/bonding/ > -mkdir -p ${RPM_BUILD_ROOT}/usr/local/ofed/bin > -mkdir -p ${RPM_BUILD_ROOT}/usr/local/ofed/docs > +mkdir -p ${RPM_BUILD_ROOT}%{_prefix}/bin > +mkdir -p ${RPM_BUILD_ROOT}%{_prefix}/docs > install -m 755 linux/drivers/net/bonding/bonding.ko > ${RPM_BUILD_ROOT}/lib/modules/%{kversion}/kernel/drivers/net/bonding/ > -install -m 755 bin/bond-init.sh ${RPM_BUILD_ROOT}/usr/local/ofed/bin > -install -m 755 docs/ib-bonding.txt ${RPM_BUILD_ROOT}/usr/local/ofed/docs > +install -m 755 bin/bond-init.sh ${RPM_BUILD_ROOT}%{_prefix}/bin > +install -m 755 docs/ib-bonding.txt ${RPM_BUILD_ROOT}%{_prefix}/docs > > > > @@ -51,7 +53,7 @@ install -m 755 docs/ib-bonding.txt ${RP > if [ ! -z $STACK_PREFIX ] ; then > backup_dir=$STACK_PREFIX/backup > else > -backup_dir=/usr/local/ofed/backup > +backup_dir=%{_prefix}/backup > fi > > > @@ -69,7 +71,7 @@ STACK_PREFIX=$(test -x /etc/infiniband/i > if [ ! -z $STACK_PREFIX ] ; then > backup_dir=$STACK_PREFIX/backup > else > -backup_dir=/usr/local/ofed/backup > +backup_dir=%{_prefix}/backup > fi > cd $backup_dir > found_file=$(find -name bonding.ko) > @@ -81,6 +83,6 @@ fi > > %files > /lib/modules/%{kversion}/kernel/drivers/net/bonding/bonding.ko > -/usr/local/ofed/bin/bond-init.sh > -/usr/local/ofed/docs/ib-bonding.txt > +%{_prefix}/bin/bond-init.sh > +%{_prefix}/docs/ib-bonding.txt > > > Thabks. I applied that. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] The neigh_setup patch for upstream
Hi, This is the upstream version of the patch that I sent in for OFED. Please comment. thanks - MoniS IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Combing these two flows, there is a hole if some code at ipoib (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev is an ipoib device so for example netdev_priv(n->dev) would be of type struct ipoib_dev_priv. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one. Signed-off-by: Moni Shoua <[EMAIL PROTECTED]> Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> --- ipoib.h |4 +++- ipoib_main.c | 23 +-- ipoib_multicast.c |2 +- 3 files changed, 17 insertions(+), 12 deletions(-) Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib.h === --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib.h2007-01-22 12:11:25.0 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-22 12:18:06.101698456 +0200 @@ -216,6 +216,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-01-22 12:11:33.0 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c2007-01-22 12:34:57.599156580 +0200 @@ -490,7 +490,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -769,32 +769,34 @@ static void ipoib_set_mcast_list(struct static void ipoib_neigh_destructor(struct neighbour *n) { struct ipoib_neigh *neigh; - struct ipoib_dev_priv *priv = netdev_priv(n->dev); + struct ipoib_dev_priv *priv; unsigned long flags; struct ipoib_ah *ah = NULL; - ipoib_dbg(priv, - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", - IPOIB_QPN(n->ha), - IPOIB_GID_RAW_ARG(n->ha + 4)); - - spin_lock_irqsave(&priv->lock, flags); neigh = *to_ipoib_neigh(n); if (neigh) { + priv = netdev_priv(neigh->dev); + ipoib_dbg(priv, + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", + IPOIB_QPN(n->ha), + IPOIB_GID_RAW_ARG(n->ha + 4)); + + spin_lock_irqsave(&priv->lock, flags); if (neigh->ah) ah = neigh->ah; list_del(&neigh->list); ipoib_neigh_free(n->dev, neigh); + spin_unlock_irqrestore(&priv->lock, flags); } - spin_unlock_irqrestore(&priv->lock, flags); if (ah) ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -803,6 +805,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh->neighbour = neighbour; + neigh->dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(&neigh->queue); Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_multicast.c ===
Re: [openib-general] Add bonding suuport to OFED
Hi, Vlad, Can you please pull this to OFED-1.2? I guess this requires some changes in the build scripts and configuration files. I'd be happy to help and any way I can to help with that. Please let me know. thanks - MoniS Moni Shoua wrote: > Originally, bonding is a High Availability solution for Ethernet network > interfaces. > It is a module that implements a virtual network device (not bounded to > hardware) and enslaves "real" devices. Bonding device controls its slaves > according > to the bonding policy and the slave's health. > > I am adding a bonding device which is good for IPoIB interfaces. Feel free to > install it > send comments. > > You just have to build source RPM, rebuild it and install the binary. > > For now, I have tested the module under RH4-UP3 and SLES10 with OFED-1.1. > > HOW TO BUILD THE SOURCE RPM > === > git clone git://staging.openfabrics.org/~monis/ofed-bond-pkg.git mydir > cd mydir/ > ./build_rpm.sh > ./build_rpm.sh OR ./build_rpm.sh --git-url > > > After installing the binary RPM read the instructions in > /usr/local/ofed/docs/ib-bonding.txt > > Note: Using ib-bonding requires applying a patch for IPoIB and replacing > ib_ipoib.ko. Please find the patch in the following message. > Please also note that the patch should be applied after > ipoib_8111_to_2_6_16.patch. > > - MoniS > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib: Add field dev to struct ipoib_neigh
Michael S. Tsirkin wrote: >>> >>>Just to clarify - you previously mentionned you saw problems with 2.6.16 >>>backport. Is this an issue you see with 2.6.20 as well? >> >>Yes, the same thing happens with kernel 2.6.20. However, the patch for 2.6.20 >>looks a little bit different. I will post it today or tommorow. > > > Let's see that first. I prefer to first look at upstream code, then think > about backporting. > OK, I will post this patch today. > But this would hardly help if ipoib module is unloaded while neighbour > for bonding device is still around and has a pointer to > ipoib_neigh_destructor. > > >>For later kernels, bond device "borrows" the slave's neigh_setup >>function in the bond's setup function. >> >> ==> bond_dev->neigh_setup = slave_dev->neigh_setup; >> >>So even if the beighbour points to bond device the >>ipoib_neigh_destructor will be called. > > > Same applies here. > This is a good point. The right solution in my opinion is to enforce a correct order of unloading the modules. First bonding and than IPoIB. We still have to think how do we want to implement this. > Further, in both cases, it seems that accessing data at to_ipoib_neigh on a > neighbour for > non-ipoib device can cause a crash if hardware address is !=0 at offset 20. > I don't see such risk. the ipoib_neigh_destructor is called only for neighbours that were passed as an argument to ipoib_neigh_alloc (for kernels <= 2.6.16) or for devices that set their neigh_setup function to ipoib_neigh_setup_dev (for bigger kernels). The only one (besides IPoIB of course) that does that is bonding and bonding cannot enslave devices of different types. So, once bonding sets its neigh_setup to ipoib_neigh_setup_dev, it means it enslaves an IPoIB device and won't enslave devices of other types. However, it might be good idea to change the condition in bonding to "borrow" the neigh_setup function. Currently it is (slave_type != Ethernet) but should be (slave_type == IPoIB). ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib: Add field dev to struct ipoib_neigh
> > > Just to clarify - you previously mentionned you saw problems with 2.6.16 > backport. Is this an issue you see with 2.6.20 as well? Yes, the same thing happens with kernel 2.6.20. However, the patch for 2.6.20 looks a little bit different. I will post it today or tommorow. > > Also - in your approach, what prevents the device from going away while there > are still ipoib_neigh objects around? Nothing prevents it. You can modprobe -r bonding whenever you want (even when IPoIB is up) and still be safe from leaks. I think my answer for that is below. > Also - if neigh does not point to ipoib device, our neigh destructor won't be > called > for it, will it? What will clean the ipoib neigh then? > With kernels up to 2.6.16, patch ipoib_8111_to_2_6_16 adds this to ipoib_neigh_alloc ==> neigh->neighbour->ops->destructor = ipoib_neigh_destructor; So I guess there is no such problem here. For later kernels, bond device "borrows" the slave's neigh_setup function in the bond's setup function. ==> bond_dev->neigh_setup = slave_dev->neigh_setup; So even if the beighbour points to bond device the ipoib_neigh_destructor will be called. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/ipoib: Add field dev to struct ipoib_neigh
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Combing these two flows, there is a hole if some code at ipoib (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev is an ipoib device so for example netdev_priv(n->dev) would be of type struct ipoib_dev_priv. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one. Signed-off-by: Moni Shoua <[EMAIL PROTECTED]> Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> ipoib.h |3 ++- ipoib_main.c | 22 +++--- ipoib_multicast.c |2 +- 3 files changed, 14 insertions(+), 13 deletions(-) --- Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib.h === --- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib.h2007-01-10 17:53:02.744225722 +0200 +++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-10 17:55:04.121544018 +0200 @@ -218,6 +218,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headall_neigh_list; struct list_headlist; @@ -235,7 +236,7 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,struct net_device *dev); void ipoib_neigh_free(struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-01-10 17:53:02.717230544 +0200 +++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_main.c2007-01-10 17:58:55.531209253 +0200 @@ -516,7 +516,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -799,7 +799,7 @@ static void ipoib_set_mcast_list(struct static void ipoib_neigh_destructor(struct neighbour *n) { struct ipoib_neigh *neigh; - struct ipoib_dev_priv *priv = netdev_priv(n->dev); + struct ipoib_dev_priv *priv; unsigned long flags; struct ipoib_ah *ah = NULL; @@ -808,12 +808,14 @@ static void ipoib_neigh_destructor(struc list_for_each_entry(tn, &ipoib_all_neigh_list, all_neigh_list) if (tn->neighbour == n) { nn = tn; + neigh = *to_ipoib_neigh(n); break; } spin_unlock(&ipoib_all_neigh_list_lock); - if (!nn) + if (!nn || !neigh) return; + priv = netdev_priv(neigh->dev); ipoib_dbg(priv, "neigh_destructor for %06x " IPOIB_GID_FMT "\n", be32_to_cpup((__be32 *) n->ha), @@ -821,13 +823,9 @@ static void ipoib_neigh_destructor(struc spin_lock_irqsave(&priv->lock, flags); - neigh = *to_ipoib_neigh(n); - if (neigh) { - if (neigh->ah) - ah = neigh->ah; - list_del(&neigh->list); - ipoib_neigh_free(neigh); - } + ah = neigh->ah; + list_del(&neigh->list); + ipoib_neigh_free(neigh); spin_unlock_irqrestore(&priv->lock, flags); @@ -835,7 +833,8 @@ static void ipoib_neigh_destructor(struc ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -849,6 +848,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st spin_lock(&ipoib_all_neigh_list_lock); list_add_tail(&neigh->all_neigh_list, &ipoib_all_neigh_list); neigh->neighbour->ops->destructor = ipoib_neigh_destructor; + neigh->
[openib-general] Add bonding suuport to OFED
Originally, bonding is a High Availability solution for Ethernet network interfaces. It is a module that implements a virtual network device (not bounded to hardware) and enslaves "real" devices. Bonding device controls its slaves according to the bonding policy and the slave's health. I am adding a bonding device which is good for IPoIB interfaces. Feel free to install it send comments. You just have to build source RPM, rebuild it and install the binary. For now, I have tested the module under RH4-UP3 and SLES10 with OFED-1.1. HOW TO BUILD THE SOURCE RPM === git clone git://staging.openfabrics.org/~monis/ofed-bond-pkg.git mydir cd mydir/ ./build_rpm.sh ./build_rpm.sh OR ./build_rpm.sh --git-url After installing the binary RPM read the instructions in /usr/local/ofed/docs/ib-bonding.txt Note: Using ib-bonding requires applying a patch for IPoIB and replacing ib_ipoib.ko. Please find the patch in the following message. Please also note that the patch should be applied after ipoib_8111_to_2_6_16.patch. - MoniS ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH/RFC] IB/ipoib: add selective tx signaling
> Unless you can come up with a way that makes sure that all skbs are > completed even in low-traffic situations, I don't think this is > mergeable -- it's just too much of a usability nightmare to have a > flag that is essentially "break some workloads in a mysterious way to > make some benchmarks run a little faster." Thanks for the comment. My thinking on how to address this issue is: add a periodic task that checks if there are uncompleted sends beyond some threshold. If there are such, it sets a flag that causes the ipoib tx logic to enforce a signal on the next post and sends a packet which is practically a NO-OP. This packet can be for example a unicast arp (reply) with src and dst being this interface IP. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH/RFC] IB/ipoib: add selective tx signaling
> Thinking about this more - why does this patch help some benchmarks? > The amount of work it takes for the hardware to generate a completion > is likely negligeable, and we still are scanning the same amount > of TX WRs in a loop to unmap/free them. This makes sense but I think you should also consider the fact that the tx_lock is taken once per per tx_completion so, with the patch, the driver spends less time under lock. > If you think about it this way, it becomes clear that your workload, > for some reason, hits a path where you get an event very fast > after the first completion and there is only a small number of completions > to handle. So your patch helps just by delaying the event handler until > there's more work to do. And I expect it wouldn't help TCP much if at all > as there are RX WRs per each couple of TX WRs. > This is a good point to check. I hope I can get to it and spend time over it next week. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH/RFC] IB/ipoib: add selective tx signaling - performance measurements
Michael S. Tsirkin wrote: Tests with iperf and netperf for unicast and multicast destinations show an improvement in the ability of user applications to xmit packets. Examples: Number of successful writes as reported by 30 seconds UDP_STREAM of 100 byte packets. Tested with netperf on Dual CPU (64bit Intel Xeon 3GHz) running linux-2.6.20-rc1 (sender) and OFED-1.1 (receiver) >>> >>> >>>IMO netperf reporting is actually not too informative without stats settings. >>>Try running with e.g. -i 10,2 -I 99,5 - you might discover that your numbers >>>are >>>only accurate within 30% >> >>I tried that and I am getting a warning about confidence level not being >>achieved. I am still trying to learn about that and trying to understand why >>(any ideas?) but for the meantime can you explain why do I need statistics >>when >>I am only trying to count the number of successful writes? > > > Otherwise your results could be just noise. I'm sorry but I don't understand how can it be noise. I am not measuring average nor PPS (or BW) but true a counter (number of total sent packets) so confidence seems irrelevant here. Anyway, port counters and device counters show the same number as netperf so I guess this is the real confidence. > > Note that the results below show improvement only for TX so we see an end to end packet loss. >>> >>> >>>Hmm, as long as packet drops increase, BW improvements in UDP don't sound >>>too convincing, do they? You can get infinite BW at 100% drop ... >>> >>> >>> Improving the receiver (NAPI) will reduce the packet loss. >>> >>> >>>Needs testing with NAPI patch then? >> >>I tried NAPI and I get better results for the receiver but my opinion is that >>the receiver side is less important here since all I'm trying to improve is >>the ability to send packets. Am I right? > > > Only if you are sure something else is not dropping the packets (e.g. > buffer overruns triggered). > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH/RFC] IB/ipoib: add selective tx signaling - performance measurements
Michael S. Tsirkin wrote: Tests with iperf and netperf for unicast and multicast destinations show an improvement in the ability of user applications to xmit packets. Examples: Number of successful writes as reported by 30 seconds UDP_STREAM of 100 byte packets. Tested with netperf on Dual CPU (64bit Intel Xeon 3GHz) running linux-2.6.20-rc1 (sender) and OFED-1.1 (receiver) >>> >>> >>>IMO netperf reporting is actually not too informative without stats settings. >>>Try running with e.g. -i 10,2 -I 99,5 - you might discover that your numbers >>>are >>>only accurate within 30% >> >>I tried that and I am getting a warning about confidence level not being >>achieved. I am still trying to learn about that and trying to understand why >>(any ideas?) but for the meantime can you explain why do I need statistics >>when >>I am only trying to count the number of successful writes? > > > Otherwise your results could be just noise. > > Note that the results below show improvement only for TX so we see an end to end packet loss. >>> >>> >>>Hmm, as long as packet drops increase, BW improvements in UDP don't sound >>>too convincing, do they? You can get infinite BW at 100% drop ... >>> >>> >>> Improving the receiver (NAPI) will reduce the packet loss. >>> >>> >>>Needs testing with NAPI patch then? >> >>I tried NAPI and I get better results for the receiver but my opinion is that >>the receiver side is less important here since all I'm trying to improve is >>the ability to send packets. Am I right? > > > Only if you are sure something else is not dropping the packets (e.g. > buffer overruns triggered). > The number of sent packets reported by netperf is equal to the number of sent packets reported by netdev stats (from running ifconfig before and after netperf) and to the number of sent packets reported by the port (perfquery) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH/RFC] IB/ipoib: add selective tx signaling - performance measurements
Michael S. Tsirkin wrote: >>Tests with iperf and netperf for unicast and multicast destinations show >>an improvement in the ability of user applications to xmit packets. >> >>Examples: Number of successful writes as reported by 30 seconds UDP_STREAM of >>100 byte packets. >>Tested with netperf on Dual CPU (64bit Intel Xeon 3GHz) running >>linux-2.6.20-rc1 (sender) and >>OFED-1.1 (receiver) > > > IMO netperf reporting is actually not too informative without stats settings. > Try running with e.g. -i 10,2 -I 99,5 - you might discover that your numbers > are > only accurate within 30% I tried that and I am getting a warning about confidence level not being achieved. I am still trying to learn about that and trying to understand why (any ideas?) but for the meantime can you explain why do I need statistics when I am only trying to count the number of successful writes? > > >>Note that the results below show improvement only for TX so we see an end to >>end packet loss. > > > Hmm, as long as packet drops increase, BW improvements in UDP don't sound > too convincing, do they? You can get infinite BW at 100% drop ... > > >>Improving the receiver (NAPI) will reduce the packet loss. > > > Needs testing with NAPI patch then? > I tried NAPI and I get better results for the receiver but my opinion is that the receiver side is less important here since all I'm trying to improve is the ability to send packets. Am I right? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH/RFC] IB/ipoib: add selective tx signaling
Michael S. Tsirkin wrote: >>I don't think that holding the skb too long is a trigger for somethink. > > > > Are you sure? We are not talking about too long here - unsignalled TX packet > will never get a completion. As far as I can see, __kfree_skb will > 1. call dst_release - so this patch might keep a reference on dst > indefinitely? I don't think that holding dst too long is unsafe. Imagine a constant stream of packets to the same destination. In this case will always be a reference to a dst struct. > 2. call skb->destructor if not NULL - this is responsible for socket buffer >accounting I addressed the issue of the socket buffer accounting in the openning message. I don't see it as a problem but more than an note to the user. Don't you think? > 3. Releases reference to lots of other objects related to netfiltering > > Are you sure keeping all these references indefinitely is safe? I can't say I'm 100% sure but please see my comment below. > A comment regarding the word "indefinitely" - I understand that theoretically there is a chance that no packet will be sent through the ib interface causing unnecessary resource allocation as you described. I think however that the chance for that is very small and that the price is worth for gaining performance increase. This is true of course if the penalty is just resource allocation and not system safety. In this context I can say that my tests didn't cause any bad system behavior and my senses tell me there shouldn't be any. However, I would be glad to learn more from those who know more. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH/RFC] IB/ipoib: add selective tx signaling
Michael S. Tsirkin wrote: >>This patch implements selective tx signaling for IPoIB. > > > Let's assume that the last tx packet you have sent is marked unsignalled. > Since you never free the skb, won't the TX watchdog get triggered? > AFAIK, tx_timeout is called when (jiffies - dev->trans_start) > dev->watchdog_timeo. I don't think that holding the skb too long is a trigger for somethink. Anyway, I never saw ipoib_timeout being called during my tests. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH/RFC] IB/ipoib: add selective tx signaling - performance measurements
Tests with iperf and netperf for unicast and multicast destinations show an improvement in the ability of user applications to xmit packets. Examples: Number of successful writes as reported by 30 seconds UDP_STREAM of 100 byte packets. Tested with netperf on Dual CPU (64bit Intel Xeon 3GHz) running linux-2.6.20-rc1 (sender) and OFED-1.1 (receiver) Note that the results below show improvement only for TX so we see an end to end packet loss. Improving the receiver (NAPI) will reduce the packet loss. -- Without the patch PPS=230507 linux:~ # netperf -H 192.168.11.234 -t UDP_STREAM -l 30 -- -m 100 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.11.234 (192.168.11.234) port 0 AF_INET Socket Message Elapsed Messages SizeSize Time Okay Errors Throughput bytes bytessecs# # 10^6bits/sec 262144 100 30.00 6915225 0 184.40 135168 30.00 6366068169.75 -- tx_signal_rate=1 PPS=244116 linux:~ # netperf -H 192.168.11.234 -t UDP_STREAM -l 30 -- -m 100 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.11.234 (192.168.11.234) port 0 AF_INET Socket Message Elapsed Messages SizeSize Time Okay Errors Throughput bytes bytessecs# # 10^6bits/sec 262144 100 30.00 7323482 0 195.27 135168 30.00 6905764184.13 -- tx_signal_rate=4 PPS=254748 linux:~ # netperf -H 192.168.11.234 -t UDP_STREAM -l 30 -- -m 100 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.11.234 (192.168.11.234) port 0 AF_INET Socket Message Elapsed Messages SizeSize Time Okay Errors Throughput bytes bytessecs# # 10^6bits/sec 262144 100 30.00 7642461 0 203.77 135168 30.00 6741080179.74 -- tx_signal_rate=8 PPS=278458 linux:~ # netperf -H 192.168.11.234 -t UDP_STREAM -l 30 -- -m 100 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.11.234 (192.168.11.234) port 0 AF_INET Socket Message Elapsed Messages SizeSize Time Okay Errors Throughput bytes bytessecs# # 10^6bits/sec 262144 100 30.01 8353760 0 222.73 135168 30.01 6884056183.54 -- tx_signal_rate=16 PPS=316418 linux:~ # netperf -H 192.168.11.234 -t UDP_STREAM -l 30 -- -m 100 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.11.234 (192.168.11.234) port 0 AF_INET Socket Message Elapsed Messages SizeSize Time Okay Errors Throughput bytes bytessecs# # 10^6bits/sec 262144 100 30.00 9492551 0 253.11 135168 30.00 6501771173.37 -- tx_signal_rate=32 PPS=328316 linux:~ # netperf -H 192.168.11.234 -t UDP_STREAM -l 30 -- -m 100 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.11.234 (192.168.11.234) port 0 AF_INET Socket Message Elapsed Messages SizeSize Time Okay Errors Throughput bytes bytessecs# # 10^6bits/sec 262144 100 30.00 9849480 0 262.62 135168 30.00 6006394160.15 -- ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH/RFC] IB/ipoib: add selective tx signaling
This patch implements selective tx signaling for IPoIB. It lets the user set the ratio between the number of sent packets and the number of TX completion signals. This optimization has the following advantages: + increase the packet per second (PPS) rate + reduce the number of interrupts related to ipoib tx completions Since the IB HCA HW executes work requests posted QP in-order, we can i assume that a completion of a work request means that all the work requests posted before it are also completed and hence their associated resources (skbs in this context) can be recycled. The current driver implementation asks for a completion signaling for every sent packet (a ratio of 1). This patch enables the user to set a higher ratio. Asking for a completion signal for every n (>1) packets saves the following: 1. less interrupts to the host 2. the amortized cost for tx completion handing is lowered 3. the tx_lock is taken less often The cost of selective signaling is in the average amount of memory that the IPoIB driver consumes since skbs are freed in the TX completion handler (which is now executed less often). So, if the current driver holds only few skbs at any given time (and normally not more than one) the new driver holds skbs up to n (the ratio between sent packets and the number of tx completions). For reasonable value of n can lead to over consumption of few tens of Kbytes but the real issue is elsewhere. Applications that set the socket buffer to a small size (with setsockopt()) may suffer from ENOBUFS failures when calling to sendto() or sendmsg(). A good example for this is ping and a signaling ratio of 16 packets to 1 completion request. In this case few successful pings are followed by an endless sequence of errors (until ping restarts). The solution is to set n with attention to the specific user applications and to use setsockopt() with care (ping for instance, can be run with -S). Another issue is related to the ipoib_ib_dev_stop() operation. This function checks that the tail and head of the tx_ring are equal and if they are not it assumes that there are uncompleted work requests. With this patch it is normal that the tail and head of the tx_ring would be different sice we are not always asking for a completion notification. Since I don't see a way to tell if the tail/head gap is normal or due to a failure I only reduce the message severity from warn to dbg if the condition for expected gap is true. However, I still see there a tiny chance that a completion notification would arrive after the timeout in ipoib_ib_dev_stop() expires and the it tries to free the skbs in the tx_ring(). Solutions to that can be 1. protect the code with a lock - but I started with trying to avoid locks 2. reduce the hazard by adding more to the timeout and calling test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); in the TX completion handler to check if ipoib_ib_dev_stop() had started. I would be happy to get comments for the last issue and for the rest of the patch of course. thanks > MoniS ipoib.h |2 ++ ipoib_ib.c| 39 --- ipoib_main.c | 10 +- ipoib_verbs.c |4 ++-- 4 files changed, 41 insertions(+), 14 deletions(-) Signed-off-by: Moni Shoua <[EMAIL PROTECTED]> --- Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib.h === --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib.h2007-01-07 15:39:49.421190295 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-07 15:42:33.768824668 +0200 @@ -164,6 +164,7 @@ struct ipoib_dev_priv { struct ipoib_tx_buf *tx_ring; unsigned tx_head; unsigned tx_tail; + unsigned tx_completion_mark; struct ib_sgetx_sge; struct ib_send_wrtx_wr; @@ -335,6 +336,7 @@ static inline void ipoib_unregister_debu extern int ipoib_sendq_size; extern int ipoib_recvq_size; +extern int num_unsignal_tx; extern struct ib_sa_client ipoib_sa_client; Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_ib.c === --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-01-07 15:39:49.443186365 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2007-01-07 19:29:21.885896644 +0200 @@ -256,29 +256,32 @@ static void ipoib_ib_handle_tx_wc(struct return; } - tx_req = &priv->tx_ring[wr_id]; + do { + tx_req = &priv->tx_ring[wr_id]; - ib_dma_unmap_single(priv->ca, tx_req->mapping, - tx_req->skb->len, DMA_TO_DEVICE); + ib_dma_unmap_single(priv->ca, tx_req->mapping, + tx_req->skb->len, DMA_TO_DEVICE);
[openib-general] [PATCH v2] IB_mthca HCA profile module parameters
From: Leonid Arsh [EMAIL PROTECTED]> Adds module parameters that enable settting some of the HCA profile values Signed-off-by: Leonid Arsh <[EMAIL PROTECTED]> Signed-off-by: Moni Shoua <[EMAIL PROTECTED]> --- mthca_main.c | 139 ++- 1 files changed, 128 insertions(+), 11 deletions(-) --- mthca_main.c.orig 2006-11-14 22:07:58.0 -0500 +++ mthca_main.c2006-11-16 11:27:17.683513163 -0500 @@ -80,21 +80,134 @@ module_param(tune_pci, int, 0444); MODULE_PARM_DESC(tune_pci, "increase PCI burst from the default set by BIOS if nonzero"); +#define MTHCA_DEFAULT_NUM_QP(1 << 16) +#define MTHCA_DEFAULT_RDB_PER_QP(1 << 2) +#define MTHCA_DEFAULT_NUM_CQ(1 << 16) +#define MTHCA_DEFAULT_NUM_MCG (1 << 13) +#define MTHCA_DEFAULT_NUM_MPT (1 << 17) +#define MTHCA_DEFAULT_NUM_MTT (1 << 20) +#define MTHCA_DEFAULT_NUM_UDAV (1 << 15) +#define MTHCA_DEFAULT_NUM_RESERVED_MTTS (1 << 18) +#define MTHCA_DEFAULT_NUM_UARC_SIZE (1 << 18) + +static struct mthca_profile default_profile = { + .num_qp= MTHCA_DEFAULT_NUM_QP, + .rdb_per_qp= MTHCA_DEFAULT_RDB_PER_QP, + .num_cq= MTHCA_DEFAULT_NUM_CQ, + .num_mcg = MTHCA_DEFAULT_NUM_MCG, + .num_mpt = MTHCA_DEFAULT_NUM_MPT, + .num_mtt = MTHCA_DEFAULT_NUM_MTT, + .num_udav = MTHCA_DEFAULT_NUM_UDAV,/* Tavor only */ + .fmr_reserved_mtts = MTHCA_DEFAULT_NUM_RESERVED_MTTS, /* Tavor only */ + .uarc_size = MTHCA_DEFAULT_NUM_UARC_SIZE, /* Arbel only */ +}; + +module_param_named(num_qp, default_profile.num_qp, int, 0444); +MODULE_PARM_DESC(num_qp, "maximum number of available QPs per HCA"); + +module_param_named(rdb_per_qp, default_profile.rdb_per_qp, int, 0444); +MODULE_PARM_DESC(rdb_per_qp, "number of RDB buffers per QP"); + +module_param_named(num_cq, default_profile.num_cq, int, 0444); +MODULE_PARM_DESC(num_cq, "maximum number of CQs per HCA"); + +module_param_named(num_mcg, default_profile.num_mcg, int, 0444); +MODULE_PARM_DESC(num_mcg, "maximum number of multicast groups per HCA"); + +module_param_named(num_mpt, default_profile.num_mpt, int, 0444); +MODULE_PARM_DESC(num_mpt, + "maximum number of memory protection pable entries per HCA"); + +module_param_named(num_mtt, default_profile.num_mtt, int, 0444); +MODULE_PARM_DESC(num_mtt, +"maximum number of memory translation table segments per HCA"); +/* Tavor only */ +module_param_named(num_udav, default_profile.num_udav, int, 0444); +MODULE_PARM_DESC(num_udav, "maximum number of UD address vectors per HCA"); + +/* Tavor only */ +module_param_named(fmr_reserved_mtts, default_profile.fmr_reserved_mtts, int, 0444); +MODULE_PARM_DESC(fmr_reserved_mtts, +"number of memory translation table segments reserved for FMR"); + static const char mthca_version[] __devinitdata = DRV_NAME ": Mellanox InfiniBand HCA driver v" DRV_VERSION " (" DRV_RELDATE ")\n"; -static struct mthca_profile default_profile = { - .num_qp= 1 << 16, - .rdb_per_qp= 4, - .num_cq= 1 << 16, - .num_mcg = 1 << 13, - .num_mpt = 1 << 17, - .num_mtt = 1 << 20, - .num_udav = 1 << 15, /* Tavor only */ - .fmr_reserved_mtts = 1 << 18, /* Tavor only */ - .uarc_size = 1 << 18, /* Arbel only */ -}; +#define is_power_of_2(x) (!(x & (x - 1))) + +static int __devinit mthca_check_profile_value(int* pval,int pval_default){ +/* value must be positive and power of 2 */ +int old_pval = *pval; +if (old_pval <= 0) { +*pval = pval_default; +} else if (!is_power_of_2(old_pval)) { +*pval = roundup_pow_of_two(old_pval); +} +return old_pval-*pval; +} + +static int __devinit mthca_validate_profile(struct mthca_dev *mdev, + struct mthca_profile *profile) +{ +if (mthca_check_profile_value(&default_profile.num_qp, + MTHCA_DEFAULT_NUM_QP)){ + mthca_warn(mdev,"invalid num_qp passed. changed to %d.\n", + default_profile.num_qp); + } + + if (mthca_check_profile_value(&default_profile.rdb_per_qp, + MTHCA_DEFAULT_RDB_PER_QP)){ +mthca_warn(mdev,"invalid rdb_per_qp passed. changed to %d\n", + default_profile.rdb_per_qp); + } + + if (mthca_check_profile_value(&d
Re: [openib-general] [PATCH] IB/mthca: HCA profile module parameters
Roland Dreier wrote: >The patch is line-wrapped and bizarrely corrupted and won't apply, eg: > > > + mthca_warn(mdev, "num_qp rounded to power of 2 (%d).\n", > > + default_profile.num_qp); +} > >This is completely unnecessary: > > > +#define to_up_power_of_2(x) (x = roundup_pow_of_two(x)) > >...just open code this. > >And this seems strange: > > > +#define is_power_of_2(x) (x>0 &&(x & (x - 1))) > >so there's no warning if someone passes in a negative value?? and >it's backwards too, (x & (x - 1)) is 0 precisely for the powers of 2. >Was this patch tested at all? > >Anyway, all this > > > + if (!is_power_of_2(default_profile.num_qp)){ > > + to_up_power_of_2(default_profile.num_qp); > > + mthca_warn(mdev, "num_qp rounded to power of 2 (%d).\n", > > + default_profile.num_qp); +} > >seems very repetive. Can't it be wrapped up in a function so we just >do something like > > mthca_check_profile_value(&default_profile.num_qp); > mthca_check_profile_value(&default_profile.rdb_per_qp); > mthca_check_profile_value(&default_profile.num_cq); > >etc. > > - R. > > > Thanks for the comments Lines became wrapped because I used a "wrong" email client. I'll re-submit with another client but this would be in a new thread because I still have problems reading mail with it and therefore I can't reply to this thread. Sorry for the bother... The patch was tested but unfortunately I sent the wrong one (not the final). The new version is the one I should have sent + changes according to the comments here. thanks MoniS ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/mthca: HCA profile module parameters
From: Leonid Arsh <[EMAIL PROTECTED]> Adds module parameters that enable settting some of the HCA profile values. Signed-off-by: Leonid Arsh <[EMAIL PROTECTED]> Signed-off-by: Moni Shoua <[EMAIL PROTECTED]> --- mthca_main.c | 104 +-- 1 files changed, 101 insertions(+), 3 deletions(-) --- mthca_main.c.orig 2006-11-14 22:07:58.0 -0500 +++ mthca_main.c2006-11-15 09:42:30.151093815 -0500 @@ -80,9 +80,6 @@ module_param(tune_pci, int, 0444); MODULE_PARM_DESC(tune_pci, "increase PCI burst from the default set by BIOS if nonzero"); -static const char mthca_version[] __devinitdata = - DRV_NAME ": Mellanox InfiniBand HCA driver v" - DRV_VERSION " (" DRV_RELDATE ")\n"; static struct mthca_profile default_profile = { .num_qp= 1 << 16, @@ -96,6 +93,103 @@ .uarc_size = 1 << 18, /* Arbel only */ }; +module_param_named(num_qp, default_profile.num_qp, int, 0444); +MODULE_PARM_DESC(num_qp, "maximum number of available QPs per HCA"); + +module_param_named(rdb_per_qp, default_profile.rdb_per_qp, int, 0444); +MODULE_PARM_DESC(rdb_per_qp, "number of RDB buffers per QP"); + +module_param_named(num_cq, default_profile.num_cq, int, 0444); +MODULE_PARM_DESC(num_cq, "maximum number of CQs per HCA"); + +module_param_named(num_mcg, default_profile.num_mcg, int, 0444); +MODULE_PARM_DESC(num_mcg, "maximum number of multicast groups per HCA"); + +module_param_named(num_mpt, default_profile.num_mpt, int, 0444); +MODULE_PARM_DESC(num_mpt, + "maximum number of memory protection pable entries per HCA"); + +module_param_named(num_mtt, default_profile.num_mtt, int, 0444); +MODULE_PARM_DESC(num_mtt, +"maximum number of memory translation table segments per HCA"); +/* Tavor only */ +module_param_named(num_udav, default_profile.num_udav, int, 0444); +MODULE_PARM_DESC(num_udav, "maximum number of UD address vectors per HCA"); + +/* Tavor only */ +module_param_named(fmr_reserved_mtts, default_profile.fmr_reserved_mtts, int, 0444); +MODULE_PARM_DESC(fmr_reserved_mtts, +"number of memory translation table segments reserved for FMR"); + +static const char mthca_version[] __devinitdata = + DRV_NAME ": Mellanox InfiniBand HCA driver v" + DRV_VERSION " (" DRV_RELDATE ")\n"; + +#define is_power_of_2(x) (x>0 &&(x & (x - 1))) +#define to_up_power_of_2(x) (x = roundup_pow_of_two(x)) +static int __devinit mthca_validate_profile(struct mthca_dev *mdev, + struct mthca_profile *profile) +{ + if (!is_power_of_2(default_profile.num_qp)){ + to_up_power_of_2(default_profile.num_qp); + mthca_warn(mdev, "num_qp rounded to power of 2 (%d).\n", + default_profile.num_qp); + } + + if (!is_power_of_2(default_profile.rdb_per_qp)){ + to_up_power_of_2(default_profile.rdb_per_qp); + mthca_warn(mdev, "rdb_per_qp rounded to power of 2 (%d)\n", + default_profile.rdb_per_qp); + } + + if (!is_power_of_2(default_profile.num_cq)){ + to_up_power_of_2(default_profile.num_cq); + mthca_warn(mdev, "num_cq rounded to power of 2 (%d)\n", + default_profile.num_cq); + } + + if (!is_power_of_2(default_profile.num_mcg)){ + to_up_power_of_2(default_profile.num_mcg); + mthca_warn(mdev, "num_mcg rounded to power of 2 (%d)\n", + default_profile.num_mcg); + } + if (!is_power_of_2(default_profile.num_mpt)){ + to_up_power_of_2(default_profile.num_mpt); + mthca_warn(mdev, "num_mpt rounded to power of 2 (%d)\n", + default_profile.num_mpt); + } + + if (!is_power_of_2(default_profile.num_mtt)){ + to_up_power_of_2(default_profile.num_mtt); + mthca_warn(mdev, "num_mtt rounded to power of 2 (%d)\n", + default_profile.num_mtt); + } + + if (mthca_is_memfree(mdev)) { + if (!is_power_of_2(default_profile.num_udav)){ + to_up_power_of_2(default_profile.num_udav); + mthca_warn(mdev, "num_udav rounded to power of 2 (%d)\n", + default_profile.num_udav); + } + + if (!is_power_of_2(default_profile.fmr_reserved_mtts)){ + to_up_power_of_2(default_profile.fmr_reserved_mtts); + mthca_warn(mdev, "fmr_reserved_mtts rounded to power of 2 (%d)\n", +
[openib-general] Add module params to mthca to control the HCA profile
Hi, A few months ago, Leonid Arsh submitted a patch to mthca that enables to control some of the HCA profile values. This patch was discussed here (see references below) but wasn't accepted and somehow got lost and I'd like to re-submit it. http://openib.org/pipermail/openib-general/2006-May/021821.html http://openib.org/pipermail/openib-general/2006-May/022424.html MoniS ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Add module params to mthca to control the HCA profile
Moni Shoua wrote: >Hi, >A few months ago, Leonid Arsh submitted a patch to mthca that enables to >control some of the HCA profile values. >This patch was discussed here (see references below) but wasn't accepted >and somehow got lost and I'd like to re-submit it. > >http://openib.org/pipermail/openib-general/2006-May/021821.html >http://openib.org/pipermail/openib-general/2006-May/022424.html > > >___ >openib-general mailing list >openib-general@openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > > Sorry, submitted to the wrong place ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Add module params to mthca to control the HCA profile
Hi, A few months ago, Leonid Arsh submitted a patch to mthca that enables to control some of the HCA profile values. This patch was discussed here (see references below) but wasn't accepted and somehow got lost and I'd like to re-submit it. http://openib.org/pipermail/openib-general/2006-May/021821.html http://openib.org/pipermail/openib-general/2006-May/022424.html ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ipoib mtu problem with UDP
Michael S. Tsirkin wrote: >I tried using ifconfig to limit the ipoib mtu. >Once I do this on *either* both server and client, or only on the client side, >UDP seems to stop working: > >#ifconfig ib0 mtu 512 >#netperf -c -C -H 11.4.3.68 -f M -t UDP_STREAM >UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 >(11.4.3.68) port 0 AF_INET : demo >Socket Message Elapsed Messages CPU Service >SizeSize Time Okay Errors Throughput Util Demand >bytes bytessecs# # MBytes/sec % SS us/KB > >118784 65507 10.00 27582 0 172.2 26.33inf >118784 10.00 0 0.0 23.40inf > >Things work fine if the mtu on the client side is 2044: ># ifconfig ib0 mtu 2044 ># netperf -c -C -H 11.4.3.68 -f M -t UDP_STREAM >UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to >11.4.3.68 (11.4.3.68) port 0 AF_INET : demo >Socket Message Elapsed Messages CPU Service >SizeSize Time Okay Errors Throughput Util Demand >bytes bytessecs# # MBytes/sec % SS us/KB > >118784 65507 10.00 78488 0 490.1 25.312.310 >118784 10.00 68534 428.0 24.552.241 > >Tested with kernel 2.6.19-rc4 and netperf 2.4.2. > > > I get the same results with iperf. However they succeed with smaller datagrams (netperf uses 65507 by default) dodly5:/home/shared/testing-tools/x86_64/netperf/netperf-2.4.1 # ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:192.168.11.235 Bcast:192.168.11.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:512 Metric:1 RX packets:42 errors:0 dropped:0 overruns:0 frame:0 TX packets:14077513 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:5776 (5.6 Kb) TX bytes:6717604780 (6406.4 Mb) dodly5:/home/shared/testing-tools/x86_64/netperf/netperf-2.4.1 # ./netperf -H 192.168.11.233 -t UDP_STREAM -- -m 3 UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.11.233 (192.168.11.233) port 0 AF_INET Socket Message Elapsed Messages SizeSize Time Okay Errors Throughput bytes bytessecs# # 10^6bits/sec 262144 3 10.00 52533 01260.59 262144 10.00 22956550.86 dodly5:/home/shared/testing-tools/x86_64/iperf-2.0.2 # ./iperf -uc 192.168.11.233 -l 65000 Client connecting to 192.168.11.233, UDP port 5001 Sending 65000 byte datagrams UDP buffer size: 256 KByte (default) [ 3] local 192.168.11.235 port 32769 connected with 192.168.11.233 port 5001 [ 3] 0.0-10.9 sec 1.36 MBytes 1.05 Mbits/sec [ 3] Sent 22 datagrams [ 3] WARNING: did not receive ack of last datagram after 10 tries. dodly5:/home/shared/testing-tools/x86_64/iperf-2.0.2 # ./iperf -uc 192.168.11.233 Client connecting to 192.168.11.233, UDP port 5001 Sending 1470 byte datagrams UDP buffer size: 256 KByte (default) [ 3] local 192.168.11.235 port 32769 connected with 192.168.11.233 port 5001 [ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec [ 3] Sent 893 datagrams [ 3] Server Report: [ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec 0.002 ms0/ 893 (0%) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED 1.1 Build Issue
Vladimir Sokolovsky wrote: > > Ramachandra K wrote: > >> Moni Shoua wrote: >> >>> We already tried to go this way and found that a local >>> Module.symvers is not always generated (but we might have missed >>> something though). >>> I suggest that you check that this alternative way works under all >>> OSs compilation (SuSE and RedHat to be precise)... >>> >>> >> I think Module.symvers generation for external modules was added >> sometime >> around 2.6.16, so its not generated on the older kernels (for eg >> 2.6.9 kernels >> on RHEL) >> >> In this scenario, when there is no Module.symvers file, I guess the >> other >> option is to use a single Kbuild file to build both modules, >> as explained in section 7.3 of Documentation/kbuild/modules.txt. >> >> But this may not be feasible always. Come to think of it, why does the >> OFED installation procedure not update the kernel Module.symvers file >> when it replaces the old kernel modules present in /lib/modules/ >> with the new ones ? >> >>> BTW, Why not updating the kernel Module.symvers when kernel-ib-devel >>> is installed? This will free the developer from copying it to >>> his/hers private directory. >>> >>> >> It might be a good idea to update the Module.symvers file as part of the >> normal installation and not only kernel-ib-devel. Because if the kernel >> modules are being replaced (or new modules are being added), shouldn't >> the Module.symvers file also be updated ? >> Regards, >> Ram > > Agree, > Module.symvers should be updated by kernel-ib RPM. > So, need to implement Moni's suggestion with light changes: update > kernel-ib RPM %post and %preun sections instead of kernel-ib-devel RPM > %pre and %postun. > > Regards, > Vladimir > I agree although there is no use in updated Module.symvers when the devel RPM is not installed. This is a part of the shell script that updates Module.symvers which you can use if you don't find a way how to generate Module.symvers in 2.4 kernels *for mod in $(find -name *.ko) ; do* *nm -o $mod |grep __crc >> /tmp/syms* *n_mods=$((n_mods+1))* *done* *n_syms=$(wc -l /tmp/syms |cut -f1 -d" ")* *echo found $n_syms InfiniBand symbols in $n_mods InfiniBand modules* *n=1* *MOD_SYMVERS_IB=./Module.symvers.ib* *MOD_SYMVERS_PATCH=./Module.symvers.patch* *if [ -f /lib/modules/$K_VER/source/Module.symvers ] ; then* *MOD_SYMVERS_KERNEL=/lib/modules/$K_VER/source/Module.symvers* *elif [ -f /lib/modules/$K_VER/build/Module.symvers ] ; then* *MOD_SYMVERS_KERNEL=/lib/modules/$K_VER/build/Module.symvers* *else* *echo file Module.symvers not found* *fi* *if [ ! -z $MOD_SYMVERS_KERNEL ] ; then * ** *rm -f $MOD_SYMVERS_IB* *while [ $n -le $n_syms ] ; do* *line=$(head -$n /tmp/syms|tail -1)* *line1=$(echo $line|cut -f1 -d:)* *line2=$(echo $line|cut -f2 -d:)* *file=$(echo $line1|cut -f6- -d/)* *file=$(echo $file|cut -f1 -d.)* *crc=$(echo $line2|cut -f1 -d" ")* *crc=${crc:8}* *sym=$(echo $line2|cut -f3 -d" ")* *sym=${sym:6}* *echo -e "0x$crc\t$sym\t$file" >> $MOD_SYMVERS_IB* *if [ -z $allsyms ] ; then* *allsyms=$sym* *else* *allsyms="$allsyms|$sym"* *fi* *n=$((n+1))* *done* *egrep -v "$allsyms" $MOD_SYMVERS_KERNEL >> $MOD_SYMVERS_IB* *diff -u $MOD_SYMVERS_KERNEL $MOD_SYMVERS_IB > $MOD_SYMVERS_PATCH* *patch -d $(dirname $MOD_SYMVERS_KERNEL) < $MOD_SYMVERS_PATCH* *mkdir -p /usr/voltaire/backup* *cp $MOD_SYMVERS_PATCH /usr/voltaire/backup* *fi* ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED 1.1 Build Issue
Vladimir Sokolovsky wrote: > The alternative way to resolve this issue is the following: > Save Modules.symvers file generated by OFED kernel modules compilation > (drivers/infiniband/Modules.symvers). > It can be added to the kernel-ib-devel RPM in the next OFED release. > Then in order to compile external module copy this Modules.symvers to > the directory where external module is build. > > Regards, > Vladimir > > > Moni Shoua wrote: > >> We managed to avoid rebuilding the kernel to solve this issue. >> >> Before building any IB dependant modules (out of OFED) it is required to >> update the Module.symvers. >> The new values for the symbol CRCs can be taken from the modules >> themselves ( nm IB_MODULE |grep __crc_) >> When Module.symvers is up-to-date, there should be no problem building >> and installing the IB dependant modules. >> >> The solution step-by-step >> 1. The procedure should run after installing the kerne-ib-devel RPM. It >> is possible to run it in %pre section of the spec file. >> 2. Foreach IB module (ko) which is listed in $(rpm -ql kernel-ib) - >> 2.1 take out the __crc_ sybbols 2.2 extract the symbol name >> and it's CRC value (simple parsing) >> 2.3 add it (or replace the existing) to Module.symvers (usually >> under /lib/modules/$(uname -r)/build/ or /lib/modules/$(uname >> -r)/source/ ) >> 3. Save the diff of the current Module.symvers from the original (for >> future restore) >> 4. When kernel-ib-devel RPM is uninstalled use the patch from (3) to >> restore Module.symvers. This can be done in the %postun of the spec >> file) >> >> I'd be glad to get comments about this. >> >> >> >> >> -Original Message- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of Tom Tucker >> Sent: Friday, October 27, 2006 5:30 PM >> To: openib-general >> Subject: [openib-general] OFED 1.1 Build Issue >> >> >> I've been testing some code against the OFED 1.1 release and noticed >> that if you build anything that depends on IB (RNFS in this case) into >> the kernel, that the OFED kit doesn't work correctly. This is because >> the dependent modules (ib_core, etc...) get sucked into the kernel >> automagically and will cause the subsequent modprobe of the OFED module >> to fail. >> >> I don't think you can fix this without rebuilding the kernel so it >> should probably be listed in the OFED_release_notes as a known issue. >> Providing a mechanism to rebuild the kernel as part of the OFED install >> would be great too, sorry if it's already there and I missed it. >> >> Tom >> >> >> ___ >> openib-general mailing list >> openib-general@openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >> >> ___ >> openib-general mailing list >> openib-general@openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > > > We already tried to go this way and found that a local Module.symvers is not always generated (but we might have missed something though). I suggest that you check that this alternative way works under all OSs compilation (SuSE and RedHat to be precise)... BTW, Why not updating the kernel Module.symvers when kernel-ib-devel is installed? This will free the developer from copying it to his/hers private directory. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED 1.1 Build Issue
We managed to avoid rebuilding the kernel to solve this issue. Before building any IB dependant modules (out of OFED) it is required to update the Module.symvers. The new values for the symbol CRCs can be taken from the modules themselves ( nm IB_MODULE |grep __crc_) When Module.symvers is up-to-date, there should be no problem building and installing the IB dependant modules. The solution step-by-step 1. The procedure should run after installing the kerne-ib-devel RPM. It is possible to run it in %pre section of the spec file. 2. Foreach IB module (ko) which is listed in $(rpm -ql kernel-ib) - 2.1 take out the __crc_ sybbols 2.2 extract the symbol name and it's CRC value (simple parsing) 2.3 add it (or replace the existing) to Module.symvers (usually under /lib/modules/$(uname -r)/build/ or /lib/modules/$(uname -r)/source/ ) 3. Save the diff of the current Module.symvers from the original (for future restore) 4. When kernel-ib-devel RPM is uninstalled use the patch from (3) to restore Module.symvers. This can be done in the %postun of the spec file) I'd be glad to get comments about this. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Tucker Sent: Friday, October 27, 2006 5:30 PM To: openib-general Subject: [openib-general] OFED 1.1 Build Issue I've been testing some code against the OFED 1.1 release and noticed that if you build anything that depends on IB (RNFS in this case) into the kernel, that the OFED kit doesn't work correctly. This is because the dependent modules (ib_core, etc...) get sucked into the kernel automagically and will cause the subsequent modprobe of the OFED module to fail. I don't think you can fix this without rebuilding the kernel so it should probably be listed in the OFED_release_notes as a known issue. Providing a mechanism to rebuild the kernel as part of the OFED install would be great too, sorry if it's already there and I missed it. Tom ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [openfabrics-ewg] IPoIB ifconfig HWaddr blank on RHEL4 U3?
Scott Weitzenkamp (sweitzen) wrote: OFED 1.0 rc3 on RHEL4 U3. IPoIB is working, but I just noticed the HWaddr is 00-00-00-00-00-00-00-00-00-00-00-00-00-00-0, shouldn't this have the GID? [EMAIL PROTECTED] ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:13:72:50:B7:D1 inet addr:172.29.238.49 Bcast:172.29.239.255 Mask:255.255.252.0 inet6 addr: fe80::213:72ff:fe50:b7d1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:31232 errors:0 dropped:0 overruns:0 frame:0 TX packets:13122 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:20539406 (19.5 MiB) TX bytes:1415914 (1.3 MiB) Base address:0xdcc0 Memory:dfae-dfb0 ib0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-0 inet addr:192.168.2.49 Bcast:192.168.3.255 Mask:255.255.252.0 inet6 addr: fe80::202:c902:21:51d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:839425 errors:0 dropped:0 overruns:0 frame:0 TX packets:4384118 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:44110930 (42.0 MiB) TX bytes:8046551416 (7.4 GiB) ib1 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-0 inet addr:192.168.4.49 Bcast:192.168.5.255 Mask:255.255.254.0 inet6 addr: fe80::202:c902:21:51e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:364 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:46824 (45.7 KiB) TX bytes:408 (408.0 b) Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ___ openfabrics-ewg mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openfabrics-ewg That's probably a bug in ifconfig dealing with such a long address. Type ip address show and you'll see that the correct HW address is shown. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Calculation of maximum number of FMR remaps in mthca driver
Hi, Belwo is a suggested patch that makes ib_query_device (or more precisely mthca_query device) to return the number of max_map_per_fmr instead of zero. This is used by ib_create_fmr_pool as the number for maximum allowed FMR remaps instead of the constant IB_FMR_MAX_REMAPS. Since this is only a suggestion for now I let myslf not to take care of the other drivers for now. I would be happy to get a feedback on this from the Mellanox driver guys about the correctness of the calculation. thanks Moni S. Index: infiniband/core/fmr_pool.c === --- infiniband/core/fmr_pool.c(revision 8504) +++ infiniband/core/fmr_pool.c(working copy) @@ -214,6 +214,7 @@ { struct ib_device *device; struct ib_fmr_pool *pool; +struct ib_device_attr device_attr; int i; int ret; @@ -228,6 +229,12 @@ return ERR_PTR(-ENOSYS); } +ret = ib_query_device(device, &device_attr); +if (ret) { +printk(KERN_WARNING "couldn't query device"); +return ERR_PTR(ret); +} + pool = kmalloc(sizeof *pool, GFP_KERNEL); if (!pool) { printk(KERN_WARNING "couldn't allocate pool struct"); @@ -279,7 +286,7 @@ struct ib_pool_fmr *fmr; struct ib_fmr_attr attr = { .max_pages = params->max_pages_per_fmr, -.max_maps = IB_FMR_MAX_REMAPS, +.max_maps = device_attr.max_map_per_fmr, .page_shift = params->page_shift }; Index: infiniband/hw/mthca/mthca_provider.c === --- infiniband/hw/mthca/mthca_provider.c(revision 8504) +++ infiniband/hw/mthca/mthca_provider.c(working copy) @@ -116,6 +116,8 @@ props->max_total_mcast_qp_attach = props->max_mcast_qp_attach * props->max_mcast_grp; +props->max_map_per_fmr=(1 << (32 - long_log2(mdev->limits.num_mpts))) - 1; + err = 0; out: kfree(in_mad) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Please give 1.0 RC1 a whirl
James Lentini wrote: Arlin, This email was garbled. I'm 99% certain it was from you, but the from field reads "Moni Shoua": http://openib.org/pipermail/openib-general/2006-March/018318.html The patch was also mangled. Could you resend please? Thanks, james On Wed, 15 Mar 2006, Moni Shoua wrote: Davis, Arlin R wrote: James, I am in the process of building the autotools stuff for DAT and DAPL so it builds exactly like the rest of OpenIB user libraries. I should have something by the end of the day or tomorrow first thing. -arlin -Original Message- From: James Lentini [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 10:30 AM To: Woodruff, Robert J Cc: Bryan O'Sullivan; openib-general@openib.org; Davis, Arlin R Subject: RE: [openib-general] Please give 1.0 RC1 a whirl There are two parts of udapl, the registry and the provider. There is a provider .spec file at https://openib.org/svn/gen2/trunk/src/userspace/dapl/dat/udat/linux/dat -registry-1.1.spec If you build the dat registry with "make rpm" an rpm will be automatically created. I need to put together a .spec file for the provider. Do we need to do anything else for 1.0 packaging purposes? On Thu, 9 Mar 2006, Woodruff, Robert J wrote: James/Arlin ? woody -Original Message- From: Bryan O'Sullivan [mailto:[EMAIL PROTECTED] Sent: Thursday, March 09, 2006 3:00 PM To: Woodruff, Robert J Cc: openib-general@openib.org; Davis, Arlin R; 'James Lentini' Subject: RE: [openib-general] Please give 1.0 RC1 a whirl On Thu, 2006-03-09 at 14:53 -0800, Bob Woodruff wrote: Where are the uDAPL RPMs ? Nobody has fixed uDAPL to be autostools buildable or written a .spec.in file for it. That will be up to someone other than me to do :-) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general This is a patch that automates the build of dat and udapl. It also modifies the packaging of librdmacm to build dat/dapl one should run the following commands in src/userspace/dapl/dat and src/userspace/dapl/dapl # sh ./autogen.sh # ./configure # make dist-gzip # rpmbuild -ta *.gz build of dapl requires that RPMs libibverns-devel librdmacm and dat are installed. = diff --exclude=.svn -urN openib.org.fresh/src/userspace/dapl/dapl/autogen.sh openib.org/src/userspace/dapl/dapl/autogen.sh --- openib.org.fresh/src/userspace/dapl/dapl/autogen.sh1970-01-01 02:00:00.0 +0200 +++ openib.org/src/userspace/dapl/dapl/autogen.sh2006-03-14 17:03:45.0 +0200 @@ -0,0 +1,9 @@ +#! /bin/sh + +set -x +aclocal -I config +libtoolize --force --copy +autoheader +automake --foreign --add-missing --copy +autoconf + diff --exclude=.svn -urN openib.org.fresh/src/userspace/dapl/dapl/configure.in openib.org/src/userspace/dapl/dapl/configure.in --- openib.org.fresh/src/userspace/dapl/dapl/configure.in1970-01-01 02:00:00.0 +0200 +++ openib.org/src/userspace/dapl/dapl/configure.in2006-03-14 17:03:45.0 +0200 @@ -0,0 +1,41 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(dapl, 0.9.0, openib-general@openib.org) +AC_CONFIG_SRCDIR([udapl/dapl_init.c]) +AC_CONFIG_AUX_DIR(config) +AM_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE(dapl, 0.9.0) + +dnl Checks for programs +AC_PROG_CXX +AC_PROG_CC +AC_PROG_CPP +AC_PROG_INSTALL +AC_PROG_LN_S +AC_PROG_MAKE_SET +AM_PROG_LIBTOOL + +dnl Checks for header files. +AC_HEADER_STDC + +dnl Checks for library functions +AC_TYPE_SIGNAL +AC_FUNC_VPRINTF + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST +AC_C_INLINE +AC_STRUCT_TM + +AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, +if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then +ac_cv_version_script=yes +else +ac_cv_version_script=no +fi) + +AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") + +AC_CONFIG_FILES([Makefile dapl.spec]) +AC_OUTPUT diff --exclude=.svn -urN openib.org.fresh/src/userspace/dapl/dapl/dapl.spec.in openib.org/src/userspace/dapl/dapl/dapl.spec.in --- openib.org.fresh/src/userspace/dapl/dapl/dapl.spec.in1970-01-01 02:00:00.0 +0200 +++ openib.org/src/userspace/dapl/dapl/dapl.spec.in2006-03-15 15:25:53.0 +0200 @@ -0,0 +1,41 @@ +# $Id: ipoibcfg.spec.in 28 2004-04-07 20:00:33Z roland $ + +%define prefix /usr +%define ver @VERSION@ +%define RELEASE 1 +%define rel %{?CUSTOM_RELEASE} %{!?CUSTOM_RELEASE:%RELEASE} + + +Summary: This package contains the U
Re: [openib-general] Please give 1.0 RC1 a whirl
Davis, Arlin R wrote: James, I am in the process of building the autotools stuff for DAT and DAPL so it builds exactly like the rest of OpenIB user libraries. I should have something by the end of the day or tomorrow first thing. -arlin -Original Message- From: James Lentini [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 10:30 AM To: Woodruff, Robert J Cc: Bryan O'Sullivan; openib-general@openib.org; Davis, Arlin R Subject: RE: [openib-general] Please give 1.0 RC1 a whirl There are two parts of udapl, the registry and the provider. There is a provider .spec file at https://openib.org/svn/gen2/trunk/src/userspace/dapl/dat/udat/linux/dat -registry-1.1.spec If you build the dat registry with "make rpm" an rpm will be automatically created. I need to put together a .spec file for the provider. Do we need to do anything else for 1.0 packaging purposes? On Thu, 9 Mar 2006, Woodruff, Robert J wrote: James/Arlin ? woody -Original Message- From: Bryan O'Sullivan [mailto:[EMAIL PROTECTED] Sent: Thursday, March 09, 2006 3:00 PM To: Woodruff, Robert J Cc: openib-general@openib.org; Davis, Arlin R; 'James Lentini' Subject: RE: [openib-general] Please give 1.0 RC1 a whirl On Thu, 2006-03-09 at 14:53 -0800, Bob Woodruff wrote: Where are the uDAPL RPMs ? Nobody has fixed uDAPL to be autostools buildable or written a .spec.in file for it. That will be up to someone other than me to do :-) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general Sending patch again. This time as an attachment This patch automates the build of dat and udapl. It also modifies the packaging of librdmacm to build dat/dapl one should run the following commands in src/userspace/dapl/dat and src/userspace/dapl/dapl # sh ./autogen.sh # ./configure # make dist-gzip # rpmbuild -ta *.gz build of dapl requires that RPMs libibverns-devel librdmacm and dat are installed. diff --exclude=.svn -urN openib.org.fresh/src/userspace/dapl/dapl/autogen.sh openib.org/src/userspace/dapl/dapl/autogen.sh --- openib.org.fresh/src/userspace/dapl/dapl/autogen.sh 1970-01-01 02:00:00.0 +0200 +++ openib.org/src/userspace/dapl/dapl/autogen.sh 2006-03-14 17:03:45.0 +0200 @@ -0,0 +1,9 @@ +#! /bin/sh + +set -x +aclocal -I config +libtoolize --force --copy +autoheader +automake --foreign --add-missing --copy +autoconf + diff --exclude=.svn -urN openib.org.fresh/src/userspace/dapl/dapl/configure.in openib.org/src/userspace/dapl/dapl/configure.in --- openib.org.fresh/src/userspace/dapl/dapl/configure.in 1970-01-01 02:00:00.0 +0200 +++ openib.org/src/userspace/dapl/dapl/configure.in 2006-03-14 17:03:45.0 +0200 @@ -0,0 +1,41 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(dapl, 0.9.0, openib-general@openib.org) +AC_CONFIG_SRCDIR([udapl/dapl_init.c]) +AC_CONFIG_AUX_DIR(config) +AM_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE(dapl, 0.9.0) + +dnl Checks for programs +AC_PROG_CXX +AC_PROG_CC +AC_PROG_CPP +AC_PROG_INSTALL +AC_PROG_LN_S +AC_PROG_MAKE_SET +AM_PROG_LIBTOOL + +dnl Checks for header files. +AC_HEADER_STDC + +dnl Checks for library functions +AC_TYPE_SIGNAL +AC_FUNC_VPRINTF + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST +AC_C_INLINE +AC_STRUCT_TM + +AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, +if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then +ac_cv_version_script=yes +else +ac_cv_version_script=no +fi) + +AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") + +AC_CONFIG_FILES([Makefile dapl.spec]) +AC_OUTPUT diff --exclude=.svn -urN openib.org.fresh/src/userspace/dapl/dapl/dapl.spec.in openib.org/src/userspace/dapl/dapl/dapl.spec.in --- openib.org.fresh/src/userspace/dapl/dapl/dapl.spec.in 1970-01-01 02:00:00.0 +0200 +++ openib.org/src/userspace/dapl/dapl/dapl.spec.in 2006-03-15 15:25:53.0 +0200 @@ -0,0 +1,41 @@ +# $Id: ipoibcfg.spec.in 28 2004-04-07 20:00:33Z roland $ + +%define prefix /usr +%define ver @VERSION@ +%define RELEASE 1 +%define rel %{?CUSTOM_RELEASE} %{!?CUSTOM_RELEASE:%RELEASE} + + +Summary: This package contains the User Direct Access Programming Library (uDAPL) +Name: dapl +Version: %ver +Release: %{rel}%{?dist} +License: GPL/BSD +Group: Applications/System +BuildRoot: %{_tmppath}/%{name}-%{version}-root +Source: http://openib.org/downloads/%{name}-%{version}.tar.gz +Url: http://openib.org/ + +%description +udat is + +%prep +%setup -q + +%build +%configure +pwd +make -C udapl clean +make -C udapl + +%install +make -C udapl PREFIX=${RPM_BUILD_ROOT} install + +%clean +rm -rf
Re: [openib-general] Please give 1.0 RC1 a whirl
Davis, Arlin R wrote: James, I am in the process of building the autotools stuff for DAT and DAPL so it builds exactly like the rest of OpenIB user libraries. I should have something by the end of the day or tomorrow first thing. -arlin -Original Message- From: James Lentini [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2006 10:30 AM To: Woodruff, Robert J Cc: Bryan O'Sullivan; openib-general@openib.org; Davis, Arlin R Subject: RE: [openib-general] Please give 1.0 RC1 a whirl There are two parts of udapl, the registry and the provider. There is a provider .spec file at https://openib.org/svn/gen2/trunk/src/userspace/dapl/dat/udat/linux/dat -registry-1.1.spec If you build the dat registry with "make rpm" an rpm will be automatically created. I need to put together a .spec file for the provider. Do we need to do anything else for 1.0 packaging purposes? On Thu, 9 Mar 2006, Woodruff, Robert J wrote: James/Arlin ? woody -Original Message- From: Bryan O'Sullivan [mailto:[EMAIL PROTECTED] Sent: Thursday, March 09, 2006 3:00 PM To: Woodruff, Robert J Cc: openib-general@openib.org; Davis, Arlin R; 'James Lentini' Subject: RE: [openib-general] Please give 1.0 RC1 a whirl On Thu, 2006-03-09 at 14:53 -0800, Bob Woodruff wrote: Where are the uDAPL RPMs ? Nobody has fixed uDAPL to be autostools buildable or written a .spec.in file for it. That will be up to someone other than me to do :-) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general This is a patch that automates the build of dat and udapl. It also modifies the packaging of librdmacm to build dat/dapl one should run the following commands in src/userspace/dapl/dat and src/userspace/dapl/dapl # sh ./autogen.sh # ./configure # make dist-gzip # rpmbuild -ta *.gz build of dapl requires that RPMs libibverns-devel librdmacm and dat are installed. = diff --exclude=.svn -urN openib.org.fresh/src/userspace/dapl/dapl/autogen.sh openib.org/src/userspace/dapl/dapl/autogen.sh --- openib.org.fresh/src/userspace/dapl/dapl/autogen.sh1970-01-01 02:00:00.0 +0200 +++ openib.org/src/userspace/dapl/dapl/autogen.sh2006-03-14 17:03:45.0 +0200 @@ -0,0 +1,9 @@ +#! /bin/sh + +set -x +aclocal -I config +libtoolize --force --copy +autoheader +automake --foreign --add-missing --copy +autoconf + diff --exclude=.svn -urN openib.org.fresh/src/userspace/dapl/dapl/configure.in openib.org/src/userspace/dapl/dapl/configure.in --- openib.org.fresh/src/userspace/dapl/dapl/configure.in1970-01-01 02:00:00.0 +0200 +++ openib.org/src/userspace/dapl/dapl/configure.in2006-03-14 17:03:45.0 +0200 @@ -0,0 +1,41 @@ +dnl Process this file with autoconf to produce a configure script. + +AC_PREREQ(2.57) +AC_INIT(dapl, 0.9.0, openib-general@openib.org) +AC_CONFIG_SRCDIR([udapl/dapl_init.c]) +AC_CONFIG_AUX_DIR(config) +AM_CONFIG_HEADER(config.h) +AM_INIT_AUTOMAKE(dapl, 0.9.0) + +dnl Checks for programs +AC_PROG_CXX +AC_PROG_CC +AC_PROG_CPP +AC_PROG_INSTALL +AC_PROG_LN_S +AC_PROG_MAKE_SET +AM_PROG_LIBTOOL + +dnl Checks for header files. +AC_HEADER_STDC + +dnl Checks for library functions +AC_TYPE_SIGNAL +AC_FUNC_VPRINTF + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST +AC_C_INLINE +AC_STRUCT_TM + +AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, +if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then +ac_cv_version_script=yes +else +ac_cv_version_script=no +fi) + +AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") + +AC_CONFIG_FILES([Makefile dapl.spec]) +AC_OUTPUT diff --exclude=.svn -urN openib.org.fresh/src/userspace/dapl/dapl/dapl.spec.in openib.org/src/userspace/dapl/dapl/dapl.spec.in --- openib.org.fresh/src/userspace/dapl/dapl/dapl.spec.in1970-01-01 02:00:00.0 +0200 +++ openib.org/src/userspace/dapl/dapl/dapl.spec.in2006-03-15 15:25:53.0 +0200 @@ -0,0 +1,41 @@ +# $Id: ipoibcfg.spec.in 28 2004-04-07 20:00:33Z roland $ + +%define prefix /usr +%define ver @VERSION@ +%define RELEASE 1 +%define rel %{?CUSTOM_RELEASE} %{!?CUSTOM_RELEASE:%RELEASE} + + +Summary: This package contains the User Direct Access Programming Library (uDAPL) +Name: dapl +Version: %ver +Release: %{rel}%{?dist} +License: GPL/BSD +Group: Applications/System +BuildRoot: %{_tmppath}/%{name}-%{version}-root +Source: http://openib.org/downloads/%{name}-%{version}.tar.gz +Url: http://openib.org/ + +%description +udat is + +%prep +%setup -q + +%build +%configure +pwd +make -C udapl clean +make -C udapl + +%install +make -C udapl PREFIX=${RPM_BUILD_ROOT} instal
Re: [openib-general] Please give 1.0 RC1 a whirl
Davis, Arlin R wrote: James, I am in the process of building the autotools stuff for DAT and DAPL so it builds exactly like the rest of OpenIB user libraries. I should have something by the end of the day or tomorrow first thing. -arlin -Original Message- From: James Lentini [mailto:[EMAIL PROTECTED]] Sent: Monday, March 13, 2006 10:30 AM To: Woodruff, Robert J Cc: Bryan O'Sullivan; openib-general@openib.org; Davis, Arlin R Subject: RE: [openib-general] Please give 1.0 RC1 a whirl There are two parts of udapl, the registry and the provider. There is a provider .spec file at https://openib.org/svn/gen2/trunk/src/userspace/dapl/dat/udat/linux/dat -registry-1.1.spec If you build the dat registry with "make rpm" an rpm will be automatically created. I need to put together a .spec file for the provider. Do we need to do anything else for 1.0 packaging purposes? On Thu, 9 Mar 2006, Woodruff, Robert J wrote: James/Arlin ? woody -Original Message- From: Bryan O'Sullivan [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 09, 2006 3:00 PM To: Woodruff, Robert J Cc: openib-general@openib.org; Davis, Arlin R; 'James Lentini' Subject: RE: [openib-general] Please give 1.0 RC1 a whirl On Thu, 2006-03-09 at 14:53 -0800, Bob Woodruff wrote: Where are the uDAPL RPMs ? Nobody has fixed uDAPL to be autostools buildable or written a .spec.in file for it. That will be up to someone other than me to do :-) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general I have also done some work in automating the build of dat and udapl. I hope to send a patch to this soon. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] ib1 takes ib0 configuration
After building and installing openib stack on SUSE Linux Enterprise Server 9.90 Beta5 I noticed that ib1 interface has identical IP configuration to ib0 configuration. ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:192.68.3.238 Bcast:192.68.255.255 Mask:255.255.0.0 ib1 Link encap:UNSPEC HWaddr 00-00-04-05-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:192.68.3.238 Bcast:192.68.255.255 Mask:255.255.0.0 This causes problems in the machine's IP configuration even if the link of ib1 is down. The fact that a network script for ib1 exists doesn't make a difference. ib1 still takes ib0 configuration. When trying to query ib1 status (with /sbin/ifstatus ib1) I get this ib1 device: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor compatibility mode) (rev a0) ib1 configuration: ib0ib1 is up10: ib1: mtu 1500 qdisc pfifo_fast qlen 128 link/infiniband 00:00:04:05:fe:80:00:00:00:00:00:00:00:08:f1:04:03:97:08:ea brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 192.68.3.238/16 brd 192.68.255.255 scope global ib1 ib1 IP address: 192.68.3.238/16 When looking for the reason to this behavior I found that when the network is started, ifup calls getcfg like this: /sbin/getcfg -d . -f ifcfg- -- ib1 and one of the lines in the output is HWD_CONFIG_0="ib0". The HWD_CONFIG_0 variable is parsed in ifup and the variable CONFIG is set to its value. All the above doesn't happen on SUSE Linux Enterprise Server 9 and I suspect that the difference is in the version of the rpm sysconfig (0.50.6 vs. 0.31.0) which packages getcfg. Does anyone have an idea how to solve this? Is this a bug of getcfg, bug in IPoIB or just a wrong IP configuration? thanks ____ Moni Shoua | +972-9-9718630 (o) | +072-52-8232979 (m) SW Engineer, Mainstream IB host stack Voltaire – The Grid Backbone www.voltaire.com ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general