Re: [openib-general] prototype version of ebus driver
On Wed, Oct 26, 2005 at 04:56:08PM +0200, IBMEHCA DD wrote: > on kernel 2.6.13 and 14 a "ebus" driver is needed to enable the ehca > driver on power5. > I just uploaded a prototype patch to gen2/users/ehca svn 3879 > Please get some responses from the PPC64 maintainers, or possibly linux-kernel. I'd like to see ehca get reviewed as well, but it may be a little early for that ;) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [git pull] InfiniBand updates for 2.6.14
Andrew> That would be suitable, I guess. It's a bit of a hassle, Andrew> but some bugs will likely be found, and useful suggestions Andrew> will be made. No objections here... the more people I can get reading patches, the better. I'll see about scripting something to make it a semi-automatic part of my workflow. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: ehca testing
On Thu, Oct 27, 2005 at 10:03:17AM -0700, Roland Dreier wrote: > OK, looks like you have two problems. First of all, you seem to have > two versions of ib_mthca, one of which gets picked up by hotplug on > boot and one of which gets picked up by modprobe. Notice how you > don't see the > > dev->ib_dev.node_type = 1 > > line when mthca runs on boot? The only explanation I can come up with > for that would be that you have an old version of it in an initrd or > something that's screwing thing up. Whoops, that's exactly what's going on.. Now to figure out how to not have IB stuff included in my initrd.. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [git pull] InfiniBand updates for 2.6.14
Roland Dreier <[EMAIL PROTECTED]> wrote: > > Andrew> a) arrange for the current infiniband devel tree to be > Andrew> included in -mm and > > Sure. How do you want to handle that? The way I've been working > lately is to merge things onto my "upstream" branch when I intend for > them to go to Linus eventually, and merge that onto the "for-linus" > branch when I'm going to ask Linus to pull. I guess it would make > sense for you to grab the upstream branch for -mm. That suits. I'll include master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git#upstream > Andrew> b) arrange for infiniband patches to get wider review than this? > > No objection from me. How do you suggest I do that? Post things to > linux-kernel as I merge them into git? That would be suitable, I guess. It's a bit of a hassle, but some bugs will likely be found, and useful suggestions will be made. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [git pull] InfiniBand updates for 2.6.14
Andrew> a) arrange for the current infiniband devel tree to be Andrew> included in -mm and Sure. How do you want to handle that? The way I've been working lately is to merge things onto my "upstream" branch when I intend for them to go to Linus eventually, and merge that onto the "for-linus" branch when I'm going to ask Linus to pull. I guess it would make sense for you to grab the upstream branch for -mm. Andrew> b) arrange for infiniband patches to get wider review than this? No objection from me. How do you suggest I do that? Post things to linux-kernel as I merge them into git? Thanks, Roland ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [git pull] InfiniBand updates for 2.6.14
Roland Dreier <[EMAIL PROTECTED]> wrote: > > 43 files changed, 2675 insertions(+), 1773 deletions(-) That's rather a lot of code. AFAIK it hasn't been past linux-kernel. It hasn't been in -mm. Can we please a) arrange for the current infiniband devel tree to be included in -mm and b) arrange for infiniband patches to get wider review than this? Thanks. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [git pull] InfiniBand updates for 2.6.14
Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus The pull will get the following changes: Jack Morgenstein: [IB] Add checks to multicast attach and detach [IB] mthca: Report correct atomic capability [IB] mthca: Fill in more fields in query_port method [IB] mthca: Better limit checking and reporting [IB] mthca: Don't enter QP into MCG more than once. Roland Dreier: [IB] uverbs: ABI-breaking fixes for userspace verbs [IB] uverbs: Fix up resource creation error paths [IB] uverbs: Add device-specific ABI version attribute [IB] uverbs: reject invalid memory registration permission flags [IB] Check port number in ib_query_port()/ib_modify_port() [IB] mthca: SRQ limit reached events [IB] mthca: detect SRQ overflow [IB] Fix leak on MAD initialization failure [IPoIB] Rename ipoib_create_qp() -> ipoib_init_qp() and fix error cleanup [IB] uverbs: unlock correctly in error paths [IB] fail SA queries if device initialization failed [IB] uverbs: Add a mask of device methods allowed for userspace [IB] uverbs: Add ABI structures for more commands [IB] uverbs: Implement more commands [IB] ucm: quiet sparse warnings [IPoIB] Improve ipoib_timeout() output [IB] mthca: Use enum in mthca_alloc_db() prototype [IB] mthca: Add struct pci_driver.owner field [IB] Fail sysfs queries after device is unregistered [IB] cm: Add missing break in switch [IB] user_mad: trivial coding style fixes [IB] user_mad: Use class_device.devt [IB] mthca: Always re-arm EQs in mthca_tavor_interrupt() Merge master.kernel.org:/.../torvalds/linux-2.6 [IB] Add idr_destroy() calls on module unload Manual merge of for-linus to upstream (fix conflicts in drivers/infiniband/core/ucm.c) [IB] mthca: correct modify QP attribute masks for UC [IB] simplify mad_rmpp.c:alloc_response_msg() [IB] mthca: first pass at catastrophic error reporting [IB] ib_umad: fix crash when freeing send buffers [IPoIB] Drop RX packets when out of memory [IB] umad: Fix device lifetime problems [IB] uverbs: Fix device lifetime problems Merge master.kernel.org:/.../torvalds/linux-2.6 [IB] fix up class_device_create() calls Sean Hefty: [IB] merge ucm.h into ucm.c [IB] CM: bind IDs to a specific device [IB] CM: Fix initialization of QP attributes for UC QPs. [IB] Fix MAD layer DMA mappings to avoid touching data buffer once mapped [IB] ib_umad: various cleanups drivers/infiniband/core/agent.c | 301 ++--- drivers/infiniband/core/agent.h | 13 drivers/infiniband/core/agent_priv.h | 62 -- drivers/infiniband/core/cm.c | 217 +++ drivers/infiniband/core/cm_msgs.h|1 drivers/infiniband/core/device.c | 12 drivers/infiniband/core/mad.c| 329 +- drivers/infiniband/core/mad_priv.h |8 drivers/infiniband/core/mad_rmpp.c | 112 ++- drivers/infiniband/core/mad_rmpp.h |2 drivers/infiniband/core/sa_query.c | 272 drivers/infiniband/core/smi.h|2 drivers/infiniband/core/sysfs.c | 16 drivers/infiniband/core/ucm.c| 267 ++-- drivers/infiniband/core/ucm.h| 83 --- drivers/infiniband/core/user_mad.c | 403 ++-- drivers/infiniband/core/uverbs.h | 62 +- drivers/infiniband/core/uverbs_cmd.c | 858 +- drivers/infiniband/core/uverbs_main.c| 503 ++- drivers/infiniband/core/verbs.c | 18 - drivers/infiniband/hw/mthca/Makefile |3 drivers/infiniband/hw/mthca/mthca_catas.c| 153 + drivers/infiniband/hw/mthca/mthca_cmd.c | 11 drivers/infiniband/hw/mthca/mthca_dev.h | 22 + drivers/infiniband/hw/mthca/mthca_eq.c | 21 + drivers/infiniband/hw/mthca/mthca_mad.c | 72 -- drivers/infiniband/hw/mthca/mthca_main.c | 11 drivers/infiniband/hw/mthca/mthca_mcg.c | 11 drivers/infiniband/hw/mthca/mthca_memfree.c |3 drivers/infiniband/hw/mthca/mthca_memfree.h |3 drivers/infiniband/hw/mthca/mthca_provider.c | 49 + drivers/infiniband/hw/mthca/mthca_qp.c | 16 drivers/infiniband/hw/mthca/mthca_srq.c | 43 + drivers/infiniband/hw/mthca/mthca_user.h |6 drivers/infiniband/ulp/ipoib/ipoib.h | 23 - drivers/infiniband/ulp/ipoib/ipoib_ib.c | 122 ++-- drivers/infiniband/ulp/ipoib/ipoib_main.c| 15 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |9 include/rdma/ib_cm.h | 10 include/rdma/ib_mad.h| 66 +- include/rdma/ib_user_cm.h| 10 include/rdma/ib_user_verbs.h
Re: uDAPL Problem : [WasRe: [openib-general] OpenSM crash with today's trunk
Aniruddha Bohra wrote: Now, I have a problem with udapl : The following is a code snippet from : dapl_ib_dto.h for (i = 0; i < segments; i++ ) { if ( !local_iov[i].segment_length ) continue; ds_array_p->addr = (uint64_t) local_iov[i].virtual_address; ds_array_p->length = local_iov[i].segment_length; ds_array_p->lkey = local_iov[i].lmr_context; dapl_dbg_log ( DAPL_DBG_TYPE_EP, " post_snd: lkey 0x%x va %p len %d \n", ds_array_p->lkey, ds_array_p->addr, ds_array_p->length ); total_len += ds_array_p->length; wr.num_sge++; ds_array_p++; } The following is the relevant part of the log with DAPL_DBG_TYPE=0x dapl_ep_post_send (0x8087110, 2, 0x80f9910, %P, b5f395bc)^M post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x80f9910 r_iov 0xbfc29060 f 0^M post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x80f9910^M post_snd: lkey 0x10de003b va 0xb5f3976c len 0 ^M post_snd: lkey 0x10de003b va 0xb5f39924 len 0 ^M From the above loop, how is this possible : If local_iov[i].segment_length == 0, it should not be printed. And the if the assignment is successful, len must not be 0. Any ideas? Of course following this, the ep is disconnected in the next step :( local_iov (LMR) length is 64bits and the ibv_sge (ds_array) length is 32 bits so it truncates. Sounds like you setup a transfer greater then 4GB-1? If you query the device via uDAPL you will see the max limits (2GB): query_hca: (a0.0) ep 64512 ep_q 65535 evd 65408 evd_q 131071 query_hca: msg 2147483648 rdma 2147483648 iov 59 lmr 131056 rmr 0 -arlin Also a minor patch, you can see that %P is printed as %P and not used as a format character. Index: common/dapl_ep_post_rdma_write.c === --- common/dapl_ep_post_rdma_write.c(revision 3892) +++ common/dapl_ep_post_rdma_write.c(working copy) @@ -78,7 +78,7 @@ DAT_RETURN dat_status; dapl_dbg_log (DAPL_DBG_TYPE_API, - "dapl_ep_post_rdma_write (%p, %d, %p, %P, %p, %x)\n", + "dapl_ep_post_rdma_write (%p, %d, %p, %p, %p, %x)\n", ep_handle, num_segments, local_iov, Index: common/dapl_ep_post_send.c === --- common/dapl_ep_post_send.c (revision 3892) +++ common/dapl_ep_post_send.c (working copy) @@ -75,7 +75,7 @@ DAT_RETURN dat_status; dapl_dbg_log (DAPL_DBG_TYPE_API, - "dapl_ep_post_send (%p, %d, %p, %P, %x)\n", + "dapl_ep_post_send (%p, %d, %p, %p, %x)\n", ep_handle, num_segments, local_iov, Index: common/dapl_srq_post_recv.c === --- common/dapl_srq_post_recv.c (revision 3892) +++ common/dapl_srq_post_recv.c (working copy) @@ -79,7 +79,7 @@ DAT_RETURN dat_status; dapl_dbg_log (DAPL_DBG_TYPE_API, - "dapl_srq_post_recv (%p, %d, %p, %P)\n", + "dapl_srq_post_recv (%p, %d, %p, %p)\n", srq_handle, num_segments, local_iov, Index: common/dapl_ep_post_recv.c === --- common/dapl_ep_post_recv.c (revision 3892) +++ common/dapl_ep_post_recv.c (working copy) @@ -79,7 +79,7 @@ DAT_RETURN dat_status; dapl_dbg_log (DAPL_DBG_TYPE_API, - "dapl_ep_post_recv (%p, %d, %p, %P, %x)\n", + "dapl_ep_post_recv (%p, %d, %p, %p, %x)\n", ep_handle, num_segments, local_iov, Thanks Aniruddha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] fix umad object lifetime stuff
I just committed the following patch for user_mad.c, which fixes various issues with possibly freeing various data structures before the last reference is gone. For example, cdev_del() might return before the last reference to the cdev is gone, so freeing a structure containing the cdev is wrong at that point. (Side note: it's essentially impossible to use cdev_init() safely unless the cdev in question is statically allocated as part of the module). Something like this is probably required for ucm and anything else that exports a character device, since everyone seems to have copied my bad user_mad code. But I haven't had a chance to do anything beyond user_mad and uverbs so far... - R. --- infiniband/core/user_mad.c (revision 3890) +++ infiniband/core/user_mad.c (working copy) @@ -64,18 +64,39 @@ enum { IB_UMAD_MINOR_BASE = 0 }; +/* + * Our lifetime rules for these structs are the following: each time a + * device special file is opened, we look up the corresponding struct + * ib_umad_port by minor in the umad_port[] table while holding the + * port_lock. If this lookup succeeds, we take a reference on the + * ib_umad_port's struct ib_umad_device while still holding the + * port_lock; if the lookup fails, we fail the open(). We drop these + * references in the corresponding close(). + * + * In addition to references coming from open character devices, there + * is one more reference to each ib_umad_device representing the + * module's reference taken when allocating the ib_umad_device in + * ib_umad_add_one(). + * + * When destroying an ib_umad_device, we clear all of its + * ib_umad_ports from umad_port[] while holding port_lock before + * dropping the module's reference to the ib_umad_device. This is + * always safe because any open() calls will either succeed and obtain + * a reference before we clear the umad_port[] entries, or fail after + * we clear the umad_port[] entries. + */ + struct ib_umad_port { - intdevnum; - struct cdevdev; - struct class_deviceclass_dev; - - intsm_devnum; - struct cdevsm_dev; - struct class_devicesm_class_dev; + struct cdev *dev; + struct class_device *class_dev; + + struct cdev *sm_dev; + struct class_device *sm_class_dev; struct semaphore sm_sem; struct ib_device *ib_dev; struct ib_umad_device *umad_dev; + intdev_num; u8 port_num; }; @@ -102,13 +123,25 @@ struct ib_umad_packet { struct ib_user_mad mad; }; +static struct class *umad_class; + static const dev_t base_dev = MKDEV(IB_UMAD_MAJOR, IB_UMAD_MINOR_BASE); -static spinlock_t map_lock; + +static DEFINE_SPINLOCK(port_lock); +static struct ib_umad_port *umad_port[IB_UMAD_MAX_PORTS]; static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS * 2); static void ib_umad_add_one(struct ib_device *device); static void ib_umad_remove_one(struct ib_device *device); +static void ib_umad_release_dev(struct kref *ref) +{ + struct ib_umad_device *dev = + container_of(ref, struct ib_umad_device, ref); + + kfree(dev); +} + static int queue_packet(struct ib_umad_file *file, struct ib_mad_agent *agent, struct ib_umad_packet *packet) @@ -534,13 +567,23 @@ static long ib_umad_ioctl(struct file *f static int ib_umad_open(struct inode *inode, struct file *filp) { - struct ib_umad_port *port = - container_of(inode->i_cdev, struct ib_umad_port, dev); + struct ib_umad_port *port; struct ib_umad_file *file; + spin_lock(&port_lock); + port = umad_port[iminor(inode) - IB_UMAD_MINOR_BASE]; + if (port) + kref_get(&port->umad_dev->ref); + spin_unlock(&port_lock); + + if (!port) + return -ENXIO; + file = kzalloc(sizeof *file, GFP_KERNEL); - if (!file) + if (!file) { + kref_put(&port->umad_dev->ref, ib_umad_release_dev); return -ENOMEM; + } spin_lock_init(&file->recv_lock); init_rwsem(&file->agent_mutex); @@ -556,6 +599,7 @@ static int ib_umad_open(struct inode *in static int ib_umad_close(struct inode *inode, struct file *filp) { struct ib_umad_file *file = filp->private_data; + struct ib_umad_device *dev = file->port->umad_dev; struct ib_umad_packet *packet, *tmp; int i; @@ -570,6 +614,8 @@ static int ib_umad_close(struct inode *i kfree(file); + kref_put(&dev->ref, ib_umad_release_dev); + return 0; } @@ -586,30 +632,46 @@ static struct file_operations umad_fops static int ib_umad_sm_open(struct inode *inode, struct file *filp) { - struct ib_umad_port *port = - container_of(inode->i_cdev, struct ib_umad_port, sm_dev); + struct ib_umad_p
uDAPL Problem : [WasRe: [openib-general] OpenSM crash with today's trunk
Roland Dreier wrote: > OK so, what options do I have right now -- compile a new kernel and > apply patches and > continue, or is there some patch that I can apply ? I don't think anyone has prepared a kzalloc() patch, but just adding something like static void *kzalloc(size_t size, unsigned int flags) { void *ret = kmalloc(size, flags); if (ret) memset(ret, 0, size); return ret; } to files that use kzalloc() should let you use 2.6.13 (assuming there are no other incompatibilities). Thanks, that works. Now, I have a problem with udapl : The following is a code snippet from : dapl_ib_dto.h for (i = 0; i < segments; i++ ) { if ( !local_iov[i].segment_length ) continue; ds_array_p->addr = (uint64_t) local_iov[i].virtual_address; ds_array_p->length = local_iov[i].segment_length; ds_array_p->lkey = local_iov[i].lmr_context; dapl_dbg_log ( DAPL_DBG_TYPE_EP, " post_snd: lkey 0x%x va %p len %d \n", ds_array_p->lkey, ds_array_p->addr, ds_array_p->length ); total_len += ds_array_p->length; wr.num_sge++; ds_array_p++; } The following is the relevant part of the log with DAPL_DBG_TYPE=0x dapl_ep_post_send (0x8087110, 2, 0x80f9910, %P, b5f395bc)^M post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x80f9910 r_iov 0xbfc29060 f 0^M post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x80f9910^M post_snd: lkey 0x10de003b va 0xb5f3976c len 0 ^M post_snd: lkey 0x10de003b va 0xb5f39924 len 0 ^M From the above loop, how is this possible : If local_iov[i].segment_length == 0, it should not be printed. And the if the assignment is successful, len must not be 0. Any ideas? Of course following this, the ep is disconnected in the next step :( Also a minor patch, you can see that %P is printed as %P and not used as a format character. Index: common/dapl_ep_post_rdma_write.c === --- common/dapl_ep_post_rdma_write.c(revision 3892) +++ common/dapl_ep_post_rdma_write.c(working copy) @@ -78,7 +78,7 @@ DAT_RETURN dat_status; dapl_dbg_log (DAPL_DBG_TYPE_API, - "dapl_ep_post_rdma_write (%p, %d, %p, %P, %p, %x)\n", + "dapl_ep_post_rdma_write (%p, %d, %p, %p, %p, %x)\n", ep_handle, num_segments, local_iov, Index: common/dapl_ep_post_send.c === --- common/dapl_ep_post_send.c (revision 3892) +++ common/dapl_ep_post_send.c (working copy) @@ -75,7 +75,7 @@ DAT_RETURN dat_status; dapl_dbg_log (DAPL_DBG_TYPE_API, - "dapl_ep_post_send (%p, %d, %p, %P, %x)\n", + "dapl_ep_post_send (%p, %d, %p, %p, %x)\n", ep_handle, num_segments, local_iov, Index: common/dapl_srq_post_recv.c === --- common/dapl_srq_post_recv.c (revision 3892) +++ common/dapl_srq_post_recv.c (working copy) @@ -79,7 +79,7 @@ DAT_RETURN dat_status; dapl_dbg_log (DAPL_DBG_TYPE_API, - "dapl_srq_post_recv (%p, %d, %p, %P)\n", + "dapl_srq_post_recv (%p, %d, %p, %p)\n", srq_handle, num_segments, local_iov, Index: common/dapl_ep_post_recv.c === --- common/dapl_ep_post_recv.c (revision 3892) +++ common/dapl_ep_post_recv.c (working copy) @@ -79,7 +79,7 @@ DAT_RETURN dat_status; dapl_dbg_log (DAPL_DBG_TYPE_API, - "dapl_ep_post_recv (%p, %d, %p, %P, %x)\n", + "dapl_ep_post_recv (%p, %d, %p, %p, %x)\n", ep_handle, num_segments, local_iov, Thanks Aniruddha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] [SRP] srp_cm_handler expanded response handling
Thanks, applied. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
> OK so, what options do I have right now -- compile a new kernel and > apply patches and > continue, or is there some patch that I can apply ? I don't think anyone has prepared a kzalloc() patch, but just adding something like static void *kzalloc(size_t size, unsigned int flags) { void *ret = kmalloc(size, flags); if (ret) memset(ret, 0, size); return ret; } to files that use kzalloc() should let you use 2.6.13 (assuming there are no other incompatibilities). - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
Roland Dreier wrote: > With 3892 I now get the following warnings on compilation: > WARNING: > /lib/modules/2.6.13bohra/kernel/drivers/infiniband/hw/mthca/ib_mthca.ko > needs unknown symbol kzalloc > WARNING: > /lib/modules/2.6.13bohra/kernel/drivers/infiniband/core/ib_umad.ko > needs unknown symbol kzalloc Yes, kzalloc() was added in 2.6.14. Now that 2.6.14 has been released, the subversion trunk is targeted against that kernel rather than the old 2.6.13 release. - R. OK so, what options do I have right now -- compile a new kernel and apply patches and continue, or is there some patch that I can apply ? Thanks Aniruddha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
> With 3892 I now get the following warnings on compilation: > WARNING: > /lib/modules/2.6.13bohra/kernel/drivers/infiniband/hw/mthca/ib_mthca.ko > needs unknown symbol kzalloc > WARNING: > /lib/modules/2.6.13bohra/kernel/drivers/infiniband/core/ib_umad.ko > needs unknown symbol kzalloc Yes, kzalloc() was added in 2.6.14. Now that 2.6.14 has been released, the subversion trunk is targeted against that kernel rather than the old 2.6.13 release. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
Roland Dreier wrote: > Now there is an OOPS in the dmesg : This really looks like the bug I fixed in r3889. What svn rev are your kernel modules built from? - R. And of course, the module does not load : Oct 28 16:21:57 hora-3 kernel: ib_mthca: Unknown symbol kzalloc Oct 28 16:21:58 hora-3 kernel: ib_umad: Unknown symbol kzalloc Aniruddha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
Roland Dreier wrote: > Now there is an OOPS in the dmesg : This really looks like the bug I fixed in r3889. What svn rev are your kernel modules built from? - R. With 3892 I now get the following warnings on compilation: WARNING: /lib/modules/2.6.13bohra/kernel/drivers/infiniband/hw/mthca/ib_mthca.ko needs unknown symbol kzalloc WARNING: /lib/modules/2.6.13bohra/kernel/drivers/infiniband/core/ib_umad.ko needs unknown symbol kzalloc Aniruddha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Jay> I also looked into "user_mad.c" and see that you don't have Jay> the same compatibility defines for kzalloc that you used in Jay> sdp. Right, now that 2.6.14 is out, we won't try to maintain backward compatibility with 2.6.13 in the main subversion trunk. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Roland Dreier wrote: Jay> I am using the source dowloaded from kernel.org for 2.6.14 Jay> with only the sk98lin and infiniband patches. What newer Jay> headers are you refering to? Where are the supposed to be Jay> located? If you link a subversion tree into your kernel's drivers/infiniband subdirectory, then you have to rm -rf include/rdma in your kernel tree, or else the build will pick up the old headers from the kernel tree instead of the new headers from the subversion tree. - R. Thanks. I'll try that. I also looked into "user_mad.c" and see that you don't have the same compatibility defines for kzalloc that you used in sdp. I've attempted to duplicate them and am trying the recompile with 2.6.13 right now. -Jay ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] cq callback question
Steve> This may seem like a dumb question, but can a kernel ULP Steve> assume that after returning from ib_destroy_qp(), there Steve> will be no more callbacks for that QP on the associated cq Steve> event handler? No, I don't think that's a valid assumption, at least with the current code. Also, there's no requirement in Documentation/infiniband/core_locking.txt that destroy QP operations synchronize against CQ callbacks. It is valid to assume that no callbacks will happen after ib_destroy_cq() returns. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Jay> I am using the source dowloaded from kernel.org for 2.6.14 Jay> with only the sk98lin and infiniband patches. What newer Jay> headers are you refering to? Where are the supposed to be Jay> located? If you link a subversion tree into your kernel's drivers/infiniband subdirectory, then you have to rm -rf include/rdma in your kernel tree, or else the build will pick up the old headers from the kernel tree instead of the new headers from the subversion tree. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Sean Hefty wrote: Jay Higley wrote: I updated to version 3891 and tried it with the 2.6.13.4 Kernel that I was using and got unresolved symbol errors for kzalloc. I upgraded the kernel to 2.6.14 and tried agin and got the below compile errors. As an aside, when I was running the unpatched openSM on a single-processor system I occasionally got it to start up, but it would hang and the port state would never change. The same sort of behavior as in the "OpenSM crash with today's trunk" thread. It looks like you have old header files (possibly the original ones shipped with 2.6.14). I'm updating my systems to 2.6.14 at the moment, and will start testing this once done. - Sean I am using the source dowloaded from kernel.org for 2.6.14 with only the sk98lin and infiniband patches. What newer headers are you refering to? Where are the supposed to be located? -Jay ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
> Now there is an OOPS in the dmesg : This really looks like the bug I fixed in r3889. What svn rev are your kernel modules built from? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] add node_guid to struct ib_device
Sean> Thanks. I forgot to include the changes to sysfs.c in my Sean> previous patch. Not sure if we want to wait on this until Sean> the other drivers have been updated. We'll probably want to Sean> remove node_guid from the device attributes as well. Yes, I think that needs to wait until you or someone else updates ipath and ehca. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Jay Higley wrote: I updated to version 3891 and tried it with the 2.6.13.4 Kernel that I was using and got unresolved symbol errors for kzalloc. I upgraded the kernel to 2.6.14 and tried agin and got the below compile errors. As an aside, when I was running the unpatched openSM on a single-processor system I occasionally got it to start up, but it would hang and the port state would never change. The same sort of behavior as in the "OpenSM crash with today's trunk" thread. It looks like you have old header files (possibly the original ones shipped with 2.6.14). I'm updating my systems to 2.6.14 at the moment, and will start testing this once done. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM causes kernel trap
Roland Dreier wrote: BTW, Jay, can you confirm that this patch fixes your problem too? Thanks, Roland I updated to version 3891 and tried it with the 2.6.13.4 Kernel that I was using and got unresolved symbol errors for kzalloc. I upgraded the kernel to 2.6.14 and tried agin and got the below compile errors. As an aside, when I was running the unpatched openSM on a single-processor system I occasionally got it to start up, but it would hang and the port state would never change. The same sort of behavior as in the "OpenSM crash with today's trunk" thread. -Jay Higley CC [M] drivers/infiniband/core/addr.o CC [M] net/sched/em_text.o CC [M] net/sctp/outqueue.o CC [M] net/sunrpc/xprt.o drivers/infiniband/core/addr.c:330: warning: initialization from incompatible po inter type CC [M] net/sctp/ulpqueue.o CC [M] drivers/infiniband/core/at.o drivers/infiniband/core/at.c:1547: warning: initialization from incompatible poi nter type CC [M] drivers/infiniband/core/cm.o CC [M] net/sctp/command.o drivers/infiniband/core/cm.c: In function `cm_alloc_msg': drivers/infiniband/core/cm.c:179: error: `IB_MGMT_MAD_HDR' undeclared (first use in this function) drivers/infiniband/core/cm.c:179: error: (Each undeclared identifier is reported only once drivers/infiniband/core/cm.c:179: error: for each function it appears in.) drivers/infiniband/core/cm.c:180: error: too few arguments to function `ib_creat e_send_mad' drivers/infiniband/core/cm.c:187: error: structure has no member named `ah' drivers/infiniband/core/cm.c:188: error: structure has no member named `retries' drivers/infiniband/core/cm.c: In function `cm_alloc_response_msg': drivers/infiniband/core/cm.c:209: error: `IB_MGMT_MAD_HDR' undeclared (first use in this function) drivers/infiniband/core/cm.c:210: error: too few arguments to function `ib_creat e_send_mad' drivers/infiniband/core/cm.c:215: error: structure has no member named `ah' drivers/infiniband/core/cm.c: In function `cm_free_msg': drivers/infiniband/core/cm.c:222: error: structure has no member named `ah' drivers/infiniband/core/cm.c: In function `cm_insert_listen': drivers/infiniband/core/cm.c:371: error: structure has no member named `device' drivers/infiniband/core/cm.c:371: error: structure has no member named `device' drivers/infiniband/core/cm.c:374: error: structure has no member named `device' drivers/infiniband/core/cm.c:374: error: structure has no member named `device' drivers/infiniband/core/cm.c:376: error: structure has no member named `device' drivers/infiniband/core/cm.c:376: error: structure has no member named `device' drivers/infiniband/core/cm.c: In function `cm_find_listen': drivers/infiniband/core/cm.c:398: error: structure has no member named `device' drivers/infiniband/core/cm.c:401: error: structure has no member named `device' drivers/infiniband/core/cm.c:403: error: structure has no member named `device' drivers/infiniband/core/cm.c: At top level: drivers/infiniband/core/cm.c:543: error: conflicting types for 'ib_create_cm_id' include/rdma/ib_cm.h:306: error: previous declaration of 'ib_create_cm_id' was h ere drivers/infiniband/core/cm.c:543: error: conflicting types for 'ib_create_cm_id' include/rdma/ib_cm.h:306: error: previous declaration of 'ib_create_cm_id' was h ere drivers/infiniband/core/cm.c: In function `ib_create_cm_id': drivers/infiniband/core/cm.c:553: error: structure has no member named `device' drivers/infiniband/core/cm.c: In function `ib_destroy_cm_id': drivers/infiniband/core/cm.c:681: warning: passing arg 2 of `ib_cancel_mad' make s integer from pointer without a cast drivers/infiniband/core/cm.c:692: warning: passing arg 2 of `ib_cancel_mad' make s integer from pointer without a cast drivers/infiniband/core/cm.c:709: warning: passing arg 2 of `ib_cancel_mad' make s integer from pointer without a cast drivers/infiniband/core/cm.c: In function `ib_send_cm_req': drivers/infiniband/core/cm.c:935: error: structure has no member named `timeout_ ms' drivers/infiniband/core/cm.c:944: warning: passing arg 1 of `ib_post_send_mad' f rom incompatible pointer type drivers/infiniband/core/cm.c:944: error: too few arguments to function `ib_post_ send_mad' drivers/infiniband/core/cm.c: In function `cm_issue_rej': drivers/infiniband/core/cm.c:989: warning: passing arg 1 of `ib_post_send_mad' f rom incompatible pointer type drivers/infiniband/core/cm.c:989: error: too few arguments to function `ib_post_ send_mad' drivers/infiniband/core/cm.c: In function `cm_dup_req_handler': drivers/infiniband/core/cm.c:1197: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:1197: error: too few arguments to function `ib_post _send_mad' drivers/infiniband/core/cm.c: In function `cm_match_req': drivers/infiniband/core/cm.c:1237: error: structure has no member named `device' drivers/infiniband/core/cm.c: In function `ib_send_cm_rep': drivers/inf
Re: [openib-general] OpenSM crash with today's trunk
Hal Rosenstock wrote: Or perhaps something crashed and didn't clean up properly. Does this occur immediately after a boot ? After a fresh reboot of the machines on the switch, I get the log at http://www.cs.rutgers.edu/~bohra/osm-v2.log The opensm process does not crash but hangs. The state of the port never changes. Now there is an OOPS in the dmesg : ct 28 13:52:13 hora-3 OpenSM[5168]: OpenSM Rev:openib-1.1.0 Oct 28 13:52:14 hora-3 kernel: Unable to handle kernel paging request at virtual address 0910 Oct 28 13:52:14 hora-3 kernel: printing eip: Oct 28 13:52:14 hora-3 kernel: f883f12d Oct 28 13:52:14 hora-3 kernel: *pde = Oct 28 13:52:14 hora-3 kernel: Oops: [#1] Oct 28 13:52:14 hora-3 kernel: SMP Oct 28 13:52:14 hora-3 kernel: Modules linked in: ib_uverbs ib_umad ipv6 i2c_dev i2c_core sunrpc dm_mod video button battery ac uhci_hcd hw_random ib_mthca ib_mad ib_core e1000 floppy Oct 28 13:52:14 hora-3 kernel: CPU:1 Oct 28 13:52:14 hora-3 kernel: EIP:0060:[]Not tainted VLI Oct 28 13:52:14 hora-3 kernel: EFLAGS: 00010286 (2.6.13bohra) Oct 28 13:52:14 hora-3 kernel: EIP is at ib_post_send_mad+0x1c/0x1b1 [ib_mad] Oct 28 13:52:14 hora-3 kernel: eax: 0900 ebx: c1a7d900 ecx: c1a7d918 edx: Oct 28 13:52:14 hora-3 kernel: esi: c1a7d918 edi: f6571f68 ebp: f6571efc esp: f6571ed8 Oct 28 13:52:14 hora-3 kernel: ds: 007b es: 007b ss: 0068 Oct 28 13:52:14 hora-3 kernel: Process opensm (pid: 5224, threadinfo=f657 task=f7dfb020) Oct 28 13:52:14 hora-3 kernel: Stack: f883ef5a c1a7d800 080bd018 f6571efc f6a42900 a0f684f6 Oct 28 13:52:14 hora-3 kernel:f6571f68 f6571f74 f88f1728 0018 00e8 00d0 f6a42948 Oct 28 13:52:14 hora-3 kernel:f68bda24 0009 a0f684f6 0009 c1a7d918 0100 Oct 28 13:52:14 hora-3 kernel: Call Trace: Oct 28 13:52:14 hora-3 kernel: [] show_stack+0x7c/0x92 Oct 28 13:52:14 hora-3 kernel: [] show_registers+0x152/0x1ca Oct 28 13:52:14 hora-3 kernel: [] die+0xf4/0x16f Oct 28 13:52:14 hora-3 kernel: [] do_page_fault+0x463/0x649 Oct 28 13:52:14 hora-3 kernel: [] error_code+0x4f/0x54 Oct 28 13:52:14 hora-3 kernel: [] ib_umad_write+0x2d0/0x30e [ib_umad] Oct 28 13:52:14 hora-3 kernel: [] vfs_write+0x155/0x15a Oct 28 13:52:14 hora-3 kernel: [] sys_write+0x3d/0x64 Oct 28 13:52:14 hora-3 kernel: [] sysenter_past_esp+0x54/0x75 Oct 28 13:52:14 hora-3 kernel: Code: e8 d8 63 af c7 89 d8 83 c4 0c 5b 5e 5f 5d c3 55 89 e5 57 56 89 c6 53 83 ec 18 85 f6 89 55 f0 0f 84 ff 00 00 00 8b 46 08 8d 5e e8 <8b> 50 10 8b 7b 14 85 d2 0f 84 7c 01 00 00 8b 4e 18 85 c9 74 0b Thanks Aniruddha From: [EMAIL PROTECTED] on behalf of Sean Hefty Sent: Fri 10/28/2005 12:01 PM To: Aniruddha Bohra Cc: openib-general@openib.org Subject: Re: [openib-general] OpenSM crash with today's trunk Aniruddha Bohra wrote: Oh well, I guess this is a different bug. Is there an oops or anything in your kernel log, or is this just a userspace crash? This is what I see : Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0 Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM Is this useful? Is there any chance opensm is already running on the system? It sounds like something has already registered to receive the same MADs that opensm wants to receive. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
Hal Rosenstock wrote: Or perhaps something crashed and didn't clean up properly. Does this occur immediately after a boot ? This is after a clean reboot. There are two systems on the switch and this is the only active one. I will reboot both and see again. Thanks Aniruddha From: [EMAIL PROTECTED] on behalf of Sean Hefty Sent: Fri 10/28/2005 12:01 PM To: Aniruddha Bohra Cc: openib-general@openib.org Subject: Re: [openib-general] OpenSM crash with today's trunk Aniruddha Bohra wrote: Oh well, I guess this is a different bug. Is there an oops or anything in your kernel log, or is this just a userspace crash? This is what I see : Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0 Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM Is this useful? Is there any chance opensm is already running on the system? It sounds like something has already registered to receive the same MADs that opensm wants to receive. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [PATCH] add node_guid to struct ib_device
>Thanks, I applied the following version (doesn't add a private kzalloc() >now that 2.6.14 is out and doesn't rename cap_mask_mutex). Thanks. I forgot to include the changes to sysfs.c in my previous patch. Not sure if we want to wait on this until the other drivers have been updated. We'll probably want to remove node_guid from the device attributes as well. Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> Index: sysfs.c === --- sysfs.c (revision 3892) +++ sysfs.c (working copy) @@ -622,21 +622,15 @@ static ssize_t show_node_guid(struct class_device *cdev, char *buf) { struct ib_device *dev = container_of(cdev, struct ib_device, class_dev); - struct ib_device_attr attr; - ssize_t ret; if (!ibdev_is_alive(dev)) return -ENODEV; - ret = ib_query_device(dev, &attr); - if (ret) - return ret; - return sprintf(buf, "%04x:%04x:%04x:%04x\n", - be16_to_cpu(((__be16 *) &attr.node_guid)[0]), - be16_to_cpu(((__be16 *) &attr.node_guid)[1]), - be16_to_cpu(((__be16 *) &attr.node_guid)[2]), - be16_to_cpu(((__be16 *) &attr.node_guid)[3])); + be16_to_cpu(((__be16 *) &dev->node_guid)[0]), + be16_to_cpu(((__be16 *) &dev->node_guid)[1]), + be16_to_cpu(((__be16 *) &dev->node_guid)[2]), + be16_to_cpu(((__be16 *) &dev->node_guid)[3])); } static CLASS_DEVICE_ATTR(node_type, S_IRUGO, show_node_type, NULL); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] IB traffic generators
On Fri, Oct 28, 2005 at 08:40:09AM -0400, Suresh Shelvapille wrote: > can you please point me to some traffic generators out there. Hypothetically one could use IPoIB and pktgen driver to generate UDP-like traffic. Someone more experienced than I could rewrite pktgen driver to use OpenIB Verbs API to produce "raw" IB traffic. ib_pktgen would be a cool ULP to have for testing. grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] OpenSM crash with today's trunk
Or perhaps something crashed and didn't clean up properly. Does this occur immediately after a boot ? From: [EMAIL PROTECTED] on behalf of Sean Hefty Sent: Fri 10/28/2005 12:01 PM To: Aniruddha Bohra Cc: openib-general@openib.org Subject: Re: [openib-general] OpenSM crash with today's trunk Aniruddha Bohra wrote: >> Oh well, I guess this is a different bug. Is there an oops or >> anything in your kernel log, or is this just a userspace crash? >> > This is what I see : > Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0 > Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use > Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM > > Is this useful? Is there any chance opensm is already running on the system? It sounds like something has already registered to receive the same MADs that opensm wants to receive. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
Aniruddha Bohra wrote: Oh well, I guess this is a different bug. Is there an oops or anything in your kernel log, or is this just a userspace crash? This is what I see : Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0 Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM Is this useful? Is there any chance opensm is already running on the system? It sounds like something has already registered to receive the same MADs that opensm wants to receive. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] OpenSM crash with today's trunk
Title: RE: [openib-general] OpenSM crash with today's trunk This means you have another SM or application already registered for handling SubnetManagement packets. Thus OpenSM fails to start (register as the handler for such requests). The crash is a bug that should be solved. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -Original Message- > From: Aniruddha Bohra [mailto:[EMAIL PROTECTED]] > Sent: Friday, October 28, 2005 5:28 PM > To: Roland Dreier > Cc: openib-general@openib.org > Subject: Re: [openib-general] OpenSM crash with today's trunk > > Roland Dreier wrote: > > > Aniruddha> I tried with r3888 and r3891 with the same result. > > > >Oh well, I guess this is a different bug. Is there an oops or > >anything in your kernel log, or is this just a userspace crash? > > > > > This is what I see : > Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0 > Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use > Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM > > Is this useful? > > Aniruddha > > > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
Roland Dreier wrote: Aniruddha> I tried with r3888 and r3891 with the same result. Oh well, I guess this is a different bug. Is there an oops or anything in your kernel log, or is this just a userspace crash? This is what I see : Oct 27 22:03:34 hora-3 OpenSM[7995]: OpenSM Rev:openib-1.1.0 Oct 27 22:03:34 hora-3 kernel: ib_mad: Method 1 already in use Oct 27 22:03:34 hora-3 OpenSM[7995]: Exiting SM Is this useful? Aniruddha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] cq callback question
This may seem like a dumb question, but can a kernel ULP assume that after returning from ib_destroy_qp(), there will be no more callbacks for that QP on the associated cq event handler? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] add node_guid to struct ib_device
Thanks, I applied the following version (doesn't add a private kzalloc() now that 2.6.14 is out and doesn't rename cap_mask_mutex). By the way, the ipath and ehca drivers will need something similar. - R. --- include/rdma/ib_verbs.h (revision 3861) +++ include/rdma/ib_verbs.h (working copy) @@ -951,6 +951,7 @@ u64 uverbs_cmd_mask; int uverbs_abi_ver; + __be64 node_guid; u8 node_type; u8 phys_port_cnt; }; --- hw/mthca/mthca_provider.c (revision 3830) +++ hw/mthca/mthca_provider.c (working copy) @@ -45,6 +45,14 @@ #include "mthca_user.h" #include "mthca_memfree.h" +static void init_query_mad(struct ib_smp *mad) +{ + mad->base_version = 1; + mad->mgmt_class= IB_MGMT_CLASS_SUBN_LID_ROUTED; + mad->class_version = 1; + mad->method= IB_MGMT_METHOD_GET; +} + static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { @@ -55,7 +63,7 @@ u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; @@ -64,12 +72,8 @@ props->fw_ver = mdev->fw_ver; - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id= IB_SMP_ATTR_NODE_INFO; + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_NODE_INFO; err = mthca_MAD_IFC(mdev, 1, 1, 1, NULL, NULL, in_mad, out_mad, @@ -127,20 +131,16 @@ int err = -ENOMEM; u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; memset(props, 0, sizeof *props); - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id= IB_SMP_ATTR_PORT_INFO; - in_mad->attr_mod = cpu_to_be32(port); + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_PORT_INFO; + in_mad->attr_mod = cpu_to_be32(port); err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1, port, NULL, NULL, in_mad, out_mad, @@ -219,18 +219,14 @@ int err = -ENOMEM; u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id= IB_SMP_ATTR_PKEY_TABLE; - in_mad->attr_mod = cpu_to_be32(index / 32); + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_PKEY_TABLE; + in_mad->attr_mod = cpu_to_be32(index / 32); err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1, port, NULL, NULL, in_mad, out_mad, @@ -258,18 +254,14 @@ int err = -ENOMEM; u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id= IB_SMP_ATTR_PORT_INFO; - in_mad->attr_mod = cpu_to_be32(port); + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_PORT_INFO; + in_mad->attr_mod = cpu_to_be32(port); err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1, port, NULL, NULL, in_mad, out_mad, @@ -283,13 +275,9 @@ memcpy(gid->raw, out_mad->data + 8, 8); - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method
Re: [openib-general] OpenSM crash with today's trunk
Aniruddha> I tried with r3888 and r3891 with the same result. Oh well, I guess this is a different bug. Is there an oops or anything in your kernel log, or is this just a userspace crash? If it's just opensm crashing then I'm not much use in debugging. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Boot over IB - support in Bproc status?
Hb Chen <[EMAIL PROTECTED]> writes: > Hi, > Can anyone point out the current staus of Boot over IB - support in Bproc? We have it working here :) kexec appears to work fine with the openIB stack. The raw packet interfaces are a little more difficult to use in the kernel because of the long MAC address. But no real problems. > Also what is the other solution about "mass boot over IB" now? (openSM, > SRP...) Eric ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] IB traffic generators
Hi Suri, The only traffic generator I am aware of is from Agilent (E2950 series) but they discontinued their IB support a while ago. I'm not sure if it is still available from them. -- Hal From: [EMAIL PROTECTED] on behalf of Suresh Shelvapille Sent: Fri 10/28/2005 8:40 AM To: openib-general@openib.org Subject: [openib-general] IB traffic generators Folks: can you please point me to some traffic generators out there. Thanks, Suri ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] IB traffic generators
Folks: can you please point me to some traffic generators out there. Thanks, Suri ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenSM crash with today's trunk
Roland Dreier wrote: I believe that this is in r3889. - R. I tried with r3888 and r3891 with the same result. Aniruddha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] SRQ limit reached async event.
Title: RE: [openib-general] SRQ limit reached async event. Which HCA are you using? Till lately SRQ limit event was supported only for mem-free HCAs. Now it is supported for full-mem too but you need a special FW for this (4.7.400 release will be next week) In gen2 it is implemented already for both types of HCAs and if you have the correct FW it will work. in VAPI (gen1) you need an update of VAPI for this since we blocked it for full-mem cards. Tziporet -Original Message- From: Galen M. Shipman [mailto:[EMAIL PROTECTED]] Sent: Friday, October 28, 2005 12:34 AM To: openib-general@openib.org Subject: [openib-general] SRQ limit reached async event. Hello, Does anyone now if openib supports the SRQ limit asynchronous event? I am working with mellanox verbs right now and it doesn't seem to support this. I say this because I have to set the srq_limit attribute via VAPI_modify_srq in order to get the event, unfortunately when I call VAPI_modify_srq I get: error in VAPI_modify_srq: Not implemented Any insight is appreciated. Thanks, Galen ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general