Re: [PATCH] librdmacm: Do not modify qp_init_attr in rdma_get_request
Hefty, Sean sean.he...@... writes: I added a while(1) loop to rdma_server to allow clients to connected repeatedly, and this worked for me. Jonathan, can you see if this works for your testing as well? If so, I'll commit. Yesterday I tried setting attr-send/recv_cq = NULL in rdma_get_request() which fixes the bug in a somewhat ugly manner. Passing a copy of the attributes is a much tidier solution, and your patch works for me. Many Thanks, Jonathan. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ib mad definitions
Works for me. -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Hefty, Sean Sent: Monday, October 18, 2010 6:25 PM To: linux-rdma@vger.kernel.org; Sasha Khapyorsky Subject: ib mad definitions This has probably been discussed before, but is there a strong reason why ib_types.h can't be moved from opensm/include/iba to libibumad/include/infiniband? This appears to be the only place where IB MAD definitions are available for user space applications, and having them available at the libibumad level makes sense to me. (I'm trying to port madeye to user space as a diag, and want all IB MAD definitions.) - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] svcrdma: NFSRDMA Server fixes for 2.6.37
On Tue, Oct 12, 2010 at 03:33:46PM -0500, Tom Tucker wrote: Hi Bruce, These fixes are ready for 2.6.37. They fix two bugs in the server-side NFSRDMA transport. Both applied and pushed out, thanks. --b. Thanks, Tom --- Tom Tucker (2): svcrdma: Cleanup DMA unmapping in error paths. svcrdma: Change DMA mapping logic to avoid the page_address kernel API net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 19 --- net/sunrpc/xprtrdma/svc_rdma_sendto.c| 82 ++ net/sunrpc/xprtrdma/svc_rdma_transport.c | 41 +++ 3 files changed, 92 insertions(+), 50 deletions(-) -- Signed-off-by: Tom Tucker t...@ogc.us -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Mon, Oct 18, 2010 at 6:24 PM, Hefty, Sean sean.he...@intel.com wrote: This has probably been discussed before, Yes, several times AFAIR. but is there a strong reason why ib_types.h can't be moved from opensm/include/iba to libibumad/include/infiniband? Why does this need to be moved ? This appears to be the only place where IB MAD definitions are available for user space applications, and having them available at the libibumad level makes sense to me. (I'm trying to port madeye to user space as a diag, and want all IB MAD definitions.) There already are diags including ib_types.h (saquery for one). -- Hal - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2.6.36-rc7] infiniband: update workqueue usage
* ib_wq is added, which is used as the common workqueue for infiniband instead of the system workqueue. All system workqueue usages including flush_scheduled_work() callers are converted to use and flush ib_wq. * cancel_delayed_work() + flush_scheduled_work() converted to cancel_delayed_work_sync(). * qib_wq is removed and ib_wq is used instead. This is to prepare for deprecation of flush_scheduled_work(). Signed-off-by: Tejun Heo t...@kernel.org --- Hello, I think this patch is safe but don't have any experience with or access to infiniband stuff and it's only compile tested. Also, while looking through the code, I got curious about several things. * Can any of the works in infiniband be used during memory reclaim? * qib_cq_wq is a separate singlethread workqueue. Does the queue require strict single thread execution ordering? IOW, does each work have to be executed in the exact queued order and no two works should execute in parallel? Or was the singlethreadedness chosen just to reduce the number of workers? * The same question for ipoib_workqueue. Thank you. drivers/infiniband/core/cache.c|4 +-- drivers/infiniband/core/device.c | 11 -- drivers/infiniband/core/sa_query.c |4 +-- drivers/infiniband/core/umem.c |2 - drivers/infiniband/hw/ipath/ipath_driver.c |2 - drivers/infiniband/hw/ipath/ipath_user_pages.c |2 - drivers/infiniband/hw/qib/qib_iba7220.c|7 ++ drivers/infiniband/hw/qib/qib_iba7322.c| 14 ++--- drivers/infiniband/hw/qib/qib_init.c | 26 +++-- drivers/infiniband/hw/qib/qib_qsfp.c |9 +++- drivers/infiniband/hw/qib/qib_verbs.h |5 +--- drivers/infiniband/ulp/srp/ib_srp.c|4 +-- include/rdma/ib_verbs.h|3 ++ 13 files changed, 41 insertions(+), 52 deletions(-) Index: work/drivers/infiniband/core/cache.c === --- work.orig/drivers/infiniband/core/cache.c +++ work/drivers/infiniband/core/cache.c @@ -308,7 +308,7 @@ static void ib_cache_event(struct ib_eve INIT_WORK(work-work, ib_cache_task); work-device = event-device; work-port_num = event-element.port_num; - schedule_work(work-work); + queue_work(ib_wq, work-work); } } } @@ -368,7 +368,7 @@ static void ib_cache_cleanup_one(struct int p; ib_unregister_event_handler(device-cache.event_handler); - flush_scheduled_work(); + flush_workqueue(ib_wq); for (p = 0; p = end_port(device) - start_port(device); ++p) { kfree(device-cache.pkey_cache[p]); Index: work/drivers/infiniband/core/device.c === --- work.orig/drivers/infiniband/core/device.c +++ work/drivers/infiniband/core/device.c @@ -38,7 +38,6 @@ #include linux/slab.h #include linux/init.h #include linux/mutex.h -#include linux/workqueue.h #include core_priv.h @@ -52,6 +51,9 @@ struct ib_client_data { void *data; }; +struct workqueue_struct *ib_wq; +EXPORT_SYMBOL_GPL(ib_wq); + static LIST_HEAD(device_list); static LIST_HEAD(client_list); @@ -718,6 +720,10 @@ static int __init ib_core_init(void) { int ret; + ib_wq = alloc_workqueue(infiniband, 0, 0); + if (!ib_wq) + return -ENOMEM; + ret = ib_sysfs_setup(); if (ret) printk(KERN_WARNING Couldn't create InfiniBand device class\n); @@ -726,6 +732,7 @@ static int __init ib_core_init(void) if (ret) { printk(KERN_WARNING Couldn't set up InfiniBand P_Key/GID cache\n); ib_sysfs_cleanup(); + destroy_workqueue(ib_wq); } return ret; @@ -736,7 +743,7 @@ static void __exit ib_core_cleanup(void) ib_cache_cleanup(); ib_sysfs_cleanup(); /* Make sure that any pending umem accounting work is done. */ - flush_scheduled_work(); + destroy_workqueue(ib_wq); } module_init(ib_core_init); Index: work/drivers/infiniband/core/sa_query.c === --- work.orig/drivers/infiniband/core/sa_query.c +++ work/drivers/infiniband/core/sa_query.c @@ -422,7 +422,7 @@ static void ib_sa_event(struct ib_event_ port-sm_ah = NULL; spin_unlock_irqrestore(port-ah_lock, flags); - schedule_work(sa_dev-port[event-element.port_num - + queue_work(ib_wq, sa_dev-port[event-element.port_num - sa_dev-start_port].update_task); } } @@ -1068,7 +1068,7 @@ static void ib_sa_remove_one(struct ib_d
RE: ib mad definitions
but is there a strong reason why ib_types.h can't be moved from opensm/include/iba to libibumad/include/infiniband? Why does this need to be moved ? The dependency should be on libibumad, not opensm. libibumad is pretty much useless without these definitions. Why wouldn't you move them? There already are diags including ib_types.h (saquery for one). Yes, but we're either stuck with everything that needs ib_types.h to be part of the management.git tree, or the app needs to depend on opensm. Currently, ibacm duplicates definitions because they aren't available anywhere else. N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
Re: [PATCH] SIW: Documentation (initial)
Randy, ...back from vacation. Many thanks! I'll take it all over. Bernard. Randy Dunlap randy.dun...@oracle.com wrote on 10/15/2010 12:57:03 AM: snip + +User Interface +-- +All fast path operations such as posting of work requests and +reaping of work completions currently involve a system call into +the siw module. Kernel/user-mapped send and receive as well as I didn't find the system call(s). Are they new syscalls or just (socket) reads/writes? (I was probably looking for new syscalls.) I will have to clarify. Currently all operations are using the infiniband/core infrastructure (e.g. via uverbs write file operation). There is no private interface between libsiw and siw kernel module in place. snip -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Tue, Oct 19, 2010 at 11:28 AM, Hefty, Sean sean.he...@intel.com wrote: but is there a strong reason why ib_types.h can't be moved from opensm/include/iba to libibumad/include/infiniband? Why does this need to be moved ? The dependency should be on libibumad, not opensm. libibumad is pretty much useless without these definitions. Why wouldn't you move them? Off the top of my head, OpenSM is layered on top of libibumad but doesn't need/use libibmad. I think that was the main reason although that could be changed if ib_types.h were to be moved. I'm not sure what other reasons came up in the previous discussions. There already are diags including ib_types.h (saquery for one). Yes, but we're either stuck with everything that needs ib_types.h to be part of the management.git tree, or the app needs to depend on opensm. Currently, ibacm duplicates definitions because they aren't available anywhere else. I agree ib_types.h is more generic than opensm. Moving to libibmad and making opensm depend on this is probably better than all the duplication. There have been viewpoints that libibumad and libibmad shouldn't be separate (as they are small) but they were never combined into a single library. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ib mad definitions
I agree ib_types.h is more generic than opensm. Moving to libibmad and making opensm depend on this is probably better than all the duplication. There have been viewpoints that libibumad and libibmad shouldn't be separate (as they are small) but they were never combined into a single library. My motivation with these changes is for ibacm to receive and use notification of CM timeouts to update its path record cache. ibacm already defines the basic mad structure, multicast record, and path record. It would also need the CM mad format. I'd happily remove these definitions if they were already available. Porting madeye to user space is a side benefit to the proposed kernel changes. ibacm only depends on libibumad. The madeye port also only depends on libibumad. Honestly, I find the libibmad APIs confusing. I'd much rather libibumad provide mad definitions. Sasha/Ira, do either of you have opinions on this?
Re: ib mad definitions
On Tue, Oct 19, 2010 at 12:48 PM, Hefty, Sean sean.he...@intel.com wrote: I agree ib_types.h is more generic than opensm. Moving to libibmad and making opensm depend on this is probably better than all the duplication. There have been viewpoints that libibumad and libibmad shouldn't be separate (as they are small) but they were never combined into a single library. The other thing I just recalled was the OpenSM portability issue. ib_types.h is needed here and libibmad/libibumad is not in all those environments. As you''re all too well aware, this was even the case in Windows until very recently. There may still be others we care about where moving ib_types.h might be problematic. -- Hal My motivation with these changes is for ibacm to receive and use notification of CM timeouts to update its path record cache. ibacm already defines the basic mad structure, multicast record, and path record. It would also need the CM mad format. I'd happily remove these definitions if they were already available. Porting madeye to user space is a side benefit to the proposed kernel changes. ibacm only depends on libibumad. The madeye port also only depends on libibumad. Honestly, I find the libibmad APIs confusing. I'd much rather libibumad provide mad definitions. Sasha/Ira, do either of you have opinions on this? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2.6.36-rc7] infiniband: update workqueue usage
On Tue, 2010-10-19 at 08:24 -0700, Tejun Heo wrote: * qib_cq_wq is a separate singlethread workqueue. Does the queue require strict single thread execution ordering? IOW, does each work have to be executed in the exact queued order and no two works should execute in parallel? Or was the singlethreadedness chosen just to reduce the number of workers? The work functions need to be called in-order and single threaded or memory will be freed multiple times and other bad things. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Tue, 19 Oct 2010 08:43:22 -0700 Hal Rosenstock hal.rosenst...@gmail.com wrote: On Tue, Oct 19, 2010 at 11:28 AM, Hefty, Sean sean.he...@intel.com wrote: but is there a strong reason why ib_types.h can't be moved from opensm/include/iba to libibumad/include/infiniband? Why does this need to be moved ? The dependency should be on libibumad, not opensm. libibumad is pretty much useless without these definitions. Why wouldn't you move them? Off the top of my head, OpenSM is layered on top of libibumad but doesn't need/use libibmad. I think that was the main reason although that could be changed if ib_types.h were to be moved. I'm not sure what other reasons came up in the previous discussions. I think ib_types.h should be part of ibumad. Everything depends on libibumad at some point.[*] Therefore common mad definitions should be in ib_types.h and packaged with libibumad. [*] ok OpenSM does not strictly, see below. There already are diags including ib_types.h (saquery for one). Yes, but we're either stuck with everything that needs ib_types.h to be part of the management.git tree, or the app needs to depend on opensm. Currently, ibacm duplicates definitions because they aren't available anywhere else. I agree ib_types.h is more generic than opensm. Moving to libibmad and making opensm depend on this is probably better than all the duplication. There have been viewpoints that libibumad and libibmad shouldn't be separate (as they are small) but they were never combined into a single library. The opposing view is that libibumad is only an interface to the kernel umad module, where libibmad is more abstract. As far as moving ib_types, I suggested this a while back. http://www.mail-archive.com/gene...@lists.openfabrics.org/msg27439.html Let's see if I can summarize the thread. - Sean was workiong on libibacm and redefined ib_types.h definitions. - I suggested moving ib_types.h to umad so he would not have a dependancy on OpenSM. - Sean brought up that ib_types.h is large and probably should be split - I agreed, and asked Sasha if such a patch would be acceptable, or create a new library to deal with the inline functions in ib_types.h - Hal said that ibutils requires ib_types.h but does not want a dependancy on libibumad... - I suggested a separate library to solve this problem. - Hal corrected himself saying that ibutils requires osm_vendor_ibumad. However, OpenSM does not always use libibumad (depending on the underlying stack) so it would need to get ib_types somewhere else. Hal was also concerned about a library with little more than a header file in it. - Jason chimed in with Please no more libraries... :-) (and digressed with Sean in to PR queries, MPI, and other useful, but unrelated, stuff) - Sean says libibumad is pretty useless without some network structure definitions. - I state that it looks like ibutils dependancy is on the static functions in ib_types.h only. - Hal says yes ibutils depends on OpenSM for the vendor layer and that Mellanox is better able to answer questions regarding ibutils support. - Hal says he thinks ib_types is more akin to what is in libibmad rather than libibumad - Sean finds that ib_types.h includes complib headers. - I submit a rough hack to remove complib headers. - Jason, Sean, and myself discuss ugly byteswapping functions. - Sasha agrees that he is not sure that umad is the right place for ib_types - Sean says we should split the file up and at least some of the definitions should be in umad... We all get busy... I think we need to move ib_types (mad definitions to umad). Basic MAD definitions should be provided at the lowest possible level so all software can use them. The issues (solutions) are: ib_types depends on complib at the moment (fixable) ibutils depends on OpenSM (it will anyway -- non-issue) somethings in ib_types are ugly, byteswapping (non-issue; deal with it later) OpenSM may _not_ include umad and therefore miss defines. (fixable?) As for this last item, would it be a big deal to require umad for the header only? Does umad not compile somewhere that other vendor layers are used? I think it is much better for OpenSM to require umad than for other MAD processing software to require OpenSM. Also, would splitting ib_types help this at all? Ira -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://BLOCKEDvger.kernel.org/majordomo-info.html -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2.6.36-rc7] infiniband: update workqueue usage
On Tue, Oct 19, 2010 at 5:24 PM, Tejun Heo t...@kernel.org wrote: [ ... ] This is to prepare for deprecation of flush_scheduled_work(). [ ... ] Index: work/include/rdma/ib_verbs.h [ ... ] +extern struct workqueue_struct *ib_wq; [ ... ] This patch adds a declaration of a global variable to a public header file. That might be unavoidable, but it doesn't make me happy. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ib mad definitions
ib_types depends on complib at the moment (fixable) ibutils depends on OpenSM (it will anyway -- non-issue) somethings in ib_types are ugly, byteswapping (non-issue; deal with it later) OpenSM may _not_ include umad and therefore miss defines. (fixable?) As for this last item, would it be a big deal to require umad for the header only? Does umad not compile somewhere that other vendor layers are used? I think it is much better for OpenSM to require umad than for other MAD processing software to require OpenSM. Also, would splitting ib_types help this at all? I'll propose the following: 1. Add to libibumad/include/infiniband: umad_types.h - basic mad, rmpp headers umad_sa.h- SA attributes umad_cm.h- CM messages 2. Include umad_types.h and umad_sa.h from ib_types.h 3. Include umad_cm.h from ib_cm_types.h We start with a minimal set of definitions to umad and add/move other definitions later as needed, creating new header files where appropriate (umad_smi.h, umad_pm.h, etc.) If we can get some basic agreement on this, I'll start on the patches immediately. In an ideal world, the new header files would work on any platform. - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Tue, 19 Oct 2010 11:50:46 -0700 Hefty, Sean sean.he...@intel.com wrote: ib_types depends on complib at the moment (fixable) ibutils depends on OpenSM (it will anyway -- non-issue) somethings in ib_types are ugly, byteswapping (non-issue; deal with it later) OpenSM may _not_ include umad and therefore miss defines. (fixable?) As for this last item, would it be a big deal to require umad for the header only? Does umad not compile somewhere that other vendor layers are used? I think it is much better for OpenSM to require umad than for other MAD processing software to require OpenSM. Also, would splitting ib_types help this at all? I'll propose the following: 1. Add to libibumad/include/infiniband: umad_types.h - basic mad, rmpp headers umad_sa.h- SA attributes umad_cm.h- CM messages 2. Include umad_types.h and umad_sa.h from ib_types.h 3. Include umad_cm.h from ib_cm_types.h We start with a minimal set of definitions to umad and add/move other definitions later as needed, creating new header files where appropriate (umad_smi.h, umad_pm.h, etc.) If we can get some basic agreement on this, I'll start on the patches immediately. In an ideal world, the new header files would work on any platform. I agree, Ira - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://BLOCKEDvger.kernel.org/majordomo-info.html -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] Hang in dat_ia_open()
Thanks! Applied -Original Message- From: Pradeep Satyanarayana [mailto:prade...@linux.vnet.ibm.com] Sent: Monday, October 18, 2010 1:23 PM To: Davis, Arlin R Cc: linux-rdma Subject: [PATCH] Hang in dat_ia_open() Hi Arlin, During some error case testing we discovered a hang in dat_ia_open(). A colleague wrote a test program that duplicates the issue. Here is the trace of the hang: # ./testUdaplDyn coralxib40:6122: open_hca: rdma_bind ERR Cannot assign requested address. Is ib1 configured? Executable hangs here: Stack: (gdb) where #0 0x2b5906a8 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0 #1 0x2b58e3ba in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #2 0x2b7bd82d in rdma_destroy_id () from /usr/lib64/librdmacm.so.1 #3 0x2b6b0144 in ?? () from /usr/lib64/libdaplofa.so.2 #4 0x2b6a7a03 in ?? () from /usr/lib64/libdaplofa.so.2 #5 0x2b3703fb in dat_ia_openv () from /usr/lib64/libdat2.so #6 0x004009c6 in isDatDeviceValidDyn(char*) () #7 0x00400b87 in main () (gdb) I checked (the code in) several versions of dapl-2.0 and this problem exists in all of them including dapl-2.0.30. In this case I happened to use dapl-2.0.27. The hang is caused due to the erroneous invocation of rdma_destroy_id() twice in a row. --- Signed-off-by: Pradeep Satyanarayana prade...@linux.vnet.ibm.com$diff -Nup dapl-2.0.27/dapl/openib_cma/device.c.orig dapl-2.0.27/dapl/openib_cma/device.c --- dapl-2.0.27/dapl/openib_cma/device.c.orig 2010-10-15 17:19:06.572503024 -0400 +++ dapl-2.0.27/dapl/openib_cma/device.c2010-10-15 17:19:16.013082441 -0400 @@ -358,7 +358,6 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_N } ret = rdma_bind_addr(cm_id, (struct sockaddr *)hca_ptr-hca_address); if ((ret) || (cm_id-verbs == NULL)) { - rdma_destroy_id(cm_id); dapl_log(DAPL_DBG_TYPE_ERR, open_hca: rdma_bind ERR %s. Is %s configured?\n, strerror(errno), hca_name); $ -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Tue, Oct 19, 2010 at 11:50:46AM -0700, Hefty, Sean wrote: We start with a minimal set of definitions to umad and add/move other definitions later as needed, creating new header files where appropriate (umad_smi.h, umad_pm.h, etc.) If we can get some basic agreement on this, I'll start on the patches immediately. In an ideal world, the new header files would work on any platform. Can we at least agree on the usage of these structures first? Are the constants going to be in host or network byte order? Are you going to make something like the kernel where there is a native structure and pack/unpack function set? Something macro-based like foo = GET_MEMBER(*pr,preference) Network byte order casting structures? Host byte order casting structures? (my favorite) bitfields? For years now I've had a set of data files that describe all the IB structures bitfield layouts. I think I can contribute the data files but not the generator script. Since they all have various merits, maybe the smartest thing is to just codegen all of the above permutations from single data source? ie // network endian bitfield casting structure struct MADHeader_NE x = {}; x.status = htons(1); // host endian bitfield casting structure struct MADHeader_HE x = {}; x.status = 1 to_network(x,sizeof(x)); // x[i] = htonl(x[i]) for i in len/4 /* Non-bitfield macro access structure (using the 1 byte = 1 bit helper structure technique) */ struct MADHeader_M x = {} SET_MEMBER(x,status,1); // Pack/unpack function structure struct MADHeader_UP x = {}; x.status = htons(1); pack_MADHeader(x,mad_buf,sizeof(mad_buf)); I'd like to think we don't need the last one, but people seem to like that scheme .. I also like to codegen structure printing functions, that is surprisingly useful - and implements a good chunk of madeye. What do you think? I've also very recently been thinking that I'd like python bindings for MADs for some projects. I was planning on building it out with the code gen scheme. Ira, I think the cleanest answer is that OSM keeps its type file, and umad gets a new one that is cleaner, more capable and probably incompatible. I'd hate to see us stick to the OSM scheme for umad just for code compatability. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Opensm crash with OFED 1.5
Just want to let you all know that OpenSM seems to work fine with Centos5.5 on the same HW. Thanks, Suri -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Suresh Shelvapille Sent: Wednesday, October 13, 2010 3:07 PM To: 'Linux RDMA list'; 'Tziporet Koren' Subject: RE: Opensm crash with OFED 1.5 I tried 1.5.2 and that did not help, same kernel oops. -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Suresh Shelvapille Sent: Tuesday, October 12, 2010 7:22 PM To: 'Linux RDMA list' Subject: Opensm crash with OFED 1.5 Folks: I have a multi-processor machine, running FedoraCore 12. I have installed OFED 1.5. Everything seems to come up ok, I can look at the ibstat and it shows that the Mellanox card stats etc... As soon as I start opensm, I get the following kernel oops and the machine locks up. Any ideas Thanks, Suri -- Oct 12 17:19:38 localhost OpenSM[2617]: OpenSM 3.3.5#012 Oct 12 17:19:38 localhost OpenSM[2617]: Entering DISCOVERING state#012 Oct 12 17:20:20 localhost kernel: ib0: ib_query_gid() failed Oct 12 17:20:30 localhost kernel: ib0: ib_query_port failed Oct 12 17:20:52 localhost kernel: BUG: soft lockup - CPU#15 stuck for 61s! [opensm:2637] Oct 12 17:20:52 localhost kernel: Modules linked in: fuse sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables cpufreq_ondemand acpi_cpufreq freq_table rdma_ucm ib_sdp rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mlx4_en mlx4_ib ib_mthca ib_mad ib_core dm_multipath uinput mlx4_core igb i2c_i801 joydev dca i2c_core iTCO_wdt iTCO_vendor_support mpt2sas scsi_transport_sas [last unloaded: microcode] Oct 12 17:20:52 localhost kernel: CPU 15: Oct 12 17:20:52 localhost kernel: Modules linked in: fuse sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables cpufreq_ondemand acpi_cpufreq freq_table rdma_ucm ib_sdp rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mlx4_en mlx4_ib ib_mthca ib_mad ib_core dm_multipath uinput mlx4_core igb i2c_i801 joydev dca i2c_core iTCO_wdt iTCO_vendor_support mpt2sas scsi_transport_sas [last unloaded: microcode] Oct 12 17:20:52 localhost kernel: Pid: 2637, comm: opensm Not tainted 2.6.31.5-127.fc12.x86_64 #1 X8DTH-i/6/iF/6F Oct 12 17:20:52 localhost kernel: RIP: 0010:[81203558] [81203558] __bitmap_empty+0x0/0x64 Oct 12 17:20:52 localhost kernel: RSP: 0018:880c174bbd90 EFLAGS: 0246 Oct 12 17:20:52 localhost kernel: RAX: RBX: 880c174bbdd8 RCX: 0001 Oct 12 17:20:52 localhost kernel: RDX: 818ba920 RSI: 0100 RDI: 818ba918 Oct 12 17:20:52 localhost kernel: RBP: 8101286e R08: R09: 0004 Oct 12 17:20:52 localhost kernel: R10: 0004 R11: 0206 R12: 880c174bbdd8 Oct 12 17:20:52 localhost kernel: R13: 8101286e R14: 810dc920 R15: 880c174bbcf8 Oct 12 17:20:52 localhost kernel: FS: 7ff2d02e7710() GS:c90001e0() knlGS: Oct 12 17:20:52 localhost kernel: CS: 0010 DS: ES: CR0: 80050033 Oct 12 17:20:52 localhost kernel: CR2: 0041f0c0 CR3: 000c19074000 CR4: 06e0 Oct 12 17:20:52 localhost kernel: DR0: DR1: DR2: Oct 12 17:20:52 localhost kernel: DR3: DR6: 0ff0 DR7: 0400 Oct 12 17:20:52 localhost kernel: Call Trace: Oct 12 17:20:52 localhost kernel: [810383f2] ? native_flush_tlb_others+0xc3/0xf2 Oct 12 17:20:52 localhost kernel: [8103859d] ? flush_tlb_mm+0x6f/0x76 Oct 12 17:20:52 localhost kernel: [810debbc] ? mprotect_fixup+0x480/0x611 Oct 12 17:20:52 localhost kernel: [810da81d] ? free_pgtables+0xa9/0xcc Oct 12 17:20:52 localhost kernel: [810f185d] ? virt_to_head_page+0xe/0x2f Oct 12 17:20:52 localhost kernel: [810deee9] ? sys_mprotect+0x19c/0x227 Oct 12 17:20:52 localhost kernel: [81011cf2] ? system_call_fastpath+0x16/0x1b -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at
RE: ib mad definitions
Ira Weiny wrote: On Tue, 19 Oct 2010 11:50:46 -0700 Hefty, Sean sean.he...@intel.com wrote: ib_types depends on complib at the moment (fixable) ibutils depends on OpenSM (it will anyway -- non-issue) somethings in ib_types are ugly, byteswapping (non-issue; deal with it later) OpenSM may _not_ include umad and therefore miss defines. (fixable?) As for this last item, would it be a big deal to require umad for the header only? Does umad not compile somewhere that other vendor layers are used? I think it is much better for OpenSM to require umad than for other MAD processing software to require OpenSM. Also, would splitting ib_types help this at all? I'll propose the following: 1. Add to libibumad/include/infiniband: umad_types.h - basic mad, rmpp headers umad_sa.h- SA attributes umad_cm.h- CM messages 2. Include umad_types.h and umad_sa.h from ib_types.h 3. Include umad_cm.h from ib_cm_types.h We start with a minimal set of definitions to umad and add/move other definitions later as needed, creating new header files where appropriate (umad_smi.h, umad_pm.h, etc.) If we can get some basic agreement on this, I'll start on the patches immediately. In an ideal world, the new header files would work on any platform. I agree, Ira Just to be painfully clear ... A user-mode application would then only need to include ib_types.h + CM flavor of choice .h files ? - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://BLOCKEDvger.kernel.org/majordomo-info.html -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ib mad definitions
Just to be painfully clear ... A user-mode application would then only need to include ib_types.h + CM flavor of choice .h files ? For compatibility, ib_types.h would include whatever files any definitions were moved to. An application that includes ib_types.h today wouldn't need additional includes. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ib mad definitions
Can we at least agree on the usage of these structures first? Are the constants going to be in host or network byte order? I was simply suggesting to 'move' some of the existing structures and defines. Are you going to make something like the kernel where there is a native structure and pack/unpack function set? This would not be my preference. Something macro-based like foo = GET_MEMBER(*pr,preference) Network byte order casting structures? Host byte order casting structures? (my favorite) bitfields? again - not my preference Ira, I think the cleanest answer is that OSM keeps its type file, and umad gets a new one that is cleaner, more capable and probably incompatible. I'd hate to see us stick to the OSM scheme for umad just for code compatability. Whatever is done must fit within the windows development framework that we use. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Tue, Oct 19, 2010 at 06:00:51PM -0700, Hefty, Sean wrote: Can we at least agree on the usage of these structures first? Are the constants going to be in host or network byte order? I was simply suggesting to 'move' some of the existing structures and defines. But they are horrible and little used outside opensm right now, you really want to commit to that forever? Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Tue, 19 Oct 2010 18:00:51 -0700 Hefty, Sean sean.he...@intel.com wrote: Can we at least agree on the usage of these structures first? Are the constants going to be in host or network byte order? I was simply suggesting to 'move' some of the existing structures and defines. Are you going to make something like the kernel where there is a native structure and pack/unpack function set? This would not be my preference. Something macro-based like foo = GET_MEMBER(*pr,preference) Network byte order casting structures? Host byte order casting structures? (my favorite) bitfields? again - not my preference Ira, I think the cleanest answer is that OSM keeps its type file, and umad gets a new one that is cleaner, more capable and probably incompatible. I'd hate to see us stick to the OSM scheme for umad just for code compatability. Whatever is done must fit within the windows development framework that we use. I am all for cleaner, more capable... but why incompatible? If we want to start fresh and then convert OpenSM later, fine. But _don't_ forget to go back and convert OpenSM, because if you leave ib_types.h out there someone is going to use it and we are back to where we started... :-( Same for ibmad, when these definitions become available in umad, mad can be simplified. What I would like right now is to get the definitions in 1 place! Right now there are 3 headers I find path record in. libibverbs: sa.h libibmad: mad.h opensm: ib_types.h Node type is defined in: libibverbs: verbs.h opensm: ib_types.h libibmad: mad.h I could go on. What Sean is offering to do is move ib_types to umad. From there I can use those definitions in mad (thus removing them from mad and consolidating at least 2 of the 3 above). Perhaps use them in ibverbs as well? As a first step I think we should take Sean up on his offer to start cleaning things up. But we have to remove stuff as we go or we will just be defining yet another place to look for these. After this we can look at making things cleaner (perhaps even combining mad and umad, and including some of the ideas you have above). As Sean said in another email, after this change; including ib_types.h will be the same for anyone using it. The exception is that we have simplified the code. I think this is a win-win with minimal work. Ira -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Tue, 19 Oct 2010 18:09:58 -0700 Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Tue, Oct 19, 2010 at 06:00:51PM -0700, Hefty, Sean wrote: Can we at least agree on the usage of these structures first? Are the constants going to be in host or network byte order? I was simply suggesting to 'move' some of the existing structures and defines. But they are horrible and little used outside opensm right now, you really want to commit to that forever? Not everything is horrible. And if it is we can fix it. But I think defining yet another header with the same functionality is worse. Like it or not ib_types is there. If you don't remove/fix it, someone will find it and use it. How does that make things cleaner just because there is something clean somewhere else? Someone will find ib_types use it. I still feel this is the best first step at getting rid of ib_types.h (at least as it currently stands). Ira Jason -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Tue, Oct 19, 2010 at 06:32:57PM -0700, Ira Weiny wrote: On Tue, 19 Oct 2010 18:09:58 -0700 Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Tue, Oct 19, 2010 at 06:00:51PM -0700, Hefty, Sean wrote: Can we at least agree on the usage of these structures first? Are the constants going to be in host or network byte order? I was simply suggesting to 'move' some of the existing structures and defines. But they are horrible and little used outside opensm right now, you really want to commit to that forever? Not everything is horrible. And if it is we can fix it. But I think defining yet another header with the same functionality is worse. Like it or libibumad is a system library. It needs to have a stable ABI, low churn and ideally be 'complete'. My database of IB structs has 117 structures, all with wakky alignment and all manner of strangeness. IMHO, it is infeasible to keep with the ad hoc approach in ibtypes.h and generate a complete header set without a lot of churn. This is why it is horrible. There are things worse than 'yet another' header - for instance a system library being churned again and again for cleanups. Figure out what you want, do it once, do it right, be done. If we could all agree what these structs should look like I can provide my database and someone can write the codegen AND WE CAN BE DONE FOREVER. How is this not much better?? Don't treat the API of a system library as some casual thing. :( Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ib mad definitions
On Tue, Oct 19, 2010 at 06:12:56PM -0700, Ira Weiny wrote: I am all for cleaner, more capable... but why incompatible? If we want to start fresh and then convert OpenSM later, fine. But _don't_ forget to go back and convert OpenSM, because if you leave ib_types.h out there someone is going to use it and we are back to where we started... :-( Same for ibmad, when these definitions become available in umad, mad can be simplified. ib_types.h should not be installed in /usr/include, stop doing that and that risk goes away. ibmad can't really be changed, it is system library with a defined API. Maybe ibmad.2 or something, I don't know. I tried to use some of the PR APIs in it, and I've found them not useful :( For instance we can't just abandon the mad_get_fields approach because we have real, usuable field access in umad, it has to stay. Right now there are 3 headers I find path record in. libibverbs: sa.h This isn't a MAD path record, this is the kernel version, which is unpacked. What we really needs is MAD 2 kernel and vice versa conversion in a library. I already have code that does this in several places :( libibmad: mad.h You mean mad_get_fields IB_SA_PR_DGID_F, etc? It doesn't even have all the fields :( opensm: ib_types.h Yep. Node type is defined in: libibverbs: verbs.h opensm: ib_types.h libibmad: mad.h I could go on. Keep in mind that for the most part libibmad is someones attempt to make a set of accessors and structures for mads. It is incomplete. It is largely unusable. I certainly haven't been able to use its PR structure parsing functions for any real app. Was it just pulled out of opensm? I don't know, I'd just as soon see that part of it be discarded, and a complete set of structures added to umad. opensm has unique problems because they want to remain independent of the OFA stack, I don't think they have a choice but to duplicate. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html