Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
Roland Dreier wrote: Or> Sure, thanks for the clarification. As for the CMA merge, you Or> prefer to be on the safe side and do it **before** and not Or> **with** iSER? Yes, that's what I'm planning on. Sure, better safe than sorry is good habit! its just this two weeks short time frame for three (iscsi && cma -> iser) serialized pushes which worries me a little, i guess there's nothing we can do about it. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [resend][RFC][PATCH] adding call to madvise
On Thu, May 18, 2006 at 07:24:27AM +0300, Michael S. Tsirkin wrote: > Quoting r. Gleb Natapov <[EMAIL PROTECTED]>: > > @@ -187,8 +194,8 @@ int ibv_lock_range(void *base, size_t si > > > > > > if (node->refcnt++ == 0) { > > - ret = mlock((void *) node->start, > > - node->end - node->start + 1); > > + ret = madvise((void *) node->start, > > + node->end - node->start + 1, MADV_DONTFORK); > > if (ret) > > goto out; > > } > > Will this break libibverbs on older kernels that don't have madvise? > Maybe test MADV_DONTFORK during library startup and set a flag? > madvise is always there, but older kernels will return EINVAL and we don't check return value of ibv_lock_range() in ibv_reg_mr() so no harm is done. It is possible to test for MADV_DONTFORK support during libibvervs init and disable all madvise pathes if it is not available, but then we will have two different configuration to test with no much gain. -- Gleb. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
Or> Sure, thanks for the clarification. As for the CMA merge, you Or> prefer to be on the safe side and do it **before** and not Or> **with** iSER? Yes, that's what I'm planning on. Or> Do you have any estimate when the 2.6.18 merge window opens? Right after 2.6.17 is released. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
Roland Dreier wrote: Or> Just to make sure... does "not ready to queue in for-2.6.18" Or> wrt iSER relates to the dependency on the 2.6.18 iSCSI updates Or> (as the CMA can [should ?] be pushed with iSER), or you see Or> any further issues which needs to be fixed before the code is Or> ready? The only issue is that iser can't be merged until both James Bottomly and I have merged other stuff upstream first. Sure, thanks for the clarification. As for the CMA merge, you prefer to be on the safe side and do it **before** and not **with** iSER? Do you have any estimate when the 2.6.18 merge window opens? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: testing IB with unreleased kernels
Grant Grundler wrote: On Wed, May 17, 2006 at 07:40:07AM -0700, Roland Dreier wrote: Yes, I agree. That's why I think we should get rid of the "linux-kernel" part of the svn tree entirely. Because everyone who wants to test new code seems to run last stable kernel + svn drivers instead of the new development kernel. That's because openib guarantee SVN drivers will build with last stable kernel. Change that policy and document the steps that folks should follow. I'd be willing to occasionally try newer kernels if you think that's what we should be doing. Please note that both approaches suggested above will not force to test latest IB code with the under-development kernel... This is b/c most of the code (specifically the already in-tree) has zero backport to the latest stable kernel, eg the kernel portion of OFED which is targeted for 2.6.16 is based on the for-2.6.17 branch of Roland's GIT tree (expect for the components not there yet, which are co from the SVN), but OFED is not tested with (does not support) 2.6.17-rcX The same "trick" would work also with Grant's approach. So there's no replacement for testing done at least by the openib maintainers (and distros!!! when they start moving to IB...) for: +1 next-kernel-RC-versions downloaded from kernel.org (eg 2.6.17-RCX) +2 next-next-kernel-branches of infiniband.git (Roland's tree) Ofcourse people are busy, and testing is derived from needs.. for example the iSER maintainers (...) are testing now with what's closet to 2.6.18 and i guess the ipath maintainers are testing with 2.6.17-rc4 But at some point of the cycle, its a must that each maintainer would test his/her code with next-kernel-RC-versions from kernel.org Or. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
On Wed, 17 May 2006, Dave Olson wrote: | On Wed, 17 May 2006, Roland Dreier wrote: | | | Am I understanding correctly that you see a hang or watchdog timeout | | even with the mthca driver? | | Yes. That is, the symptoms are the same, although the cause | may be different. | | | Is there any possibility of posting the test case to reproduce this? | | It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed | to do messaging rate), running 8 copies per dual-core 4-socket opteron, | both on InfiniPath MPI, and MVAPICH (built for gen2). Here's the typical case where the watchdog fires (with infinipath MPI), on FC4 2.6.16 2108 (without kprobes, with kprobes things are slightly different, but not much; I'm running without since we were often in the kprobes code from the exit code, but I think that's just a red-herring). The sysrq p was some seconds prior to the watchdog. It's almost as though something is looping far too many times during the close cleanup. The other 7 exitting processes are typically in sys_exit_group -> do_exit -> __up_red --> __spin_lock_irqsave -> __up_read (or __down_read) (from what sysrq t prints). They are all runnable on the other 7 processors. The infinipath driver does mmap both memory and device pages for each of these processes. SysRq : Show Regs CPU 0: Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U) Pid: 23788, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1 RIP: 0010:[] {__do_softirq+81} RSP: 0018:8048d368 EFLAGS: 0206 RAX: 0022 RBX: 0022 RCX: 0080 RDX: RSI: 00c0 RDI: 81007f1fd0c0 RBP: 80528f80 R08: 0200 R09: 0002 R10: 804a6a38 R11: R12: 80577c80 R13: R14: 000a R15: 2aaabba6c000 FS: 2b32ffa0() GS:80511000() knlGS:f7fc86c0 CS: 0010 DS: ES: CR0: 8005003b CR2: 5565ebe8 CR3: 7ac6d000 CR4: 06e0 Call Trace: {call_softirq+30} {do_softirq+44} {apic_timer_interrupt+132} {_write_unlock_irq+14} {__set_page_dirty_nobuffers+183} {unmap_vmas+1042} {exit_mmap+124} {mmput+37} {do_exit+584} {__dequeue_signal+459} {sys_exit_group+0} {get_signal_to_deliver+1568} {do_signal+116} {__pollwait+0} {sys_select+934} {sysret_signal+28} {ptregscall_common+103} [ perhaps 20 or 30 seconds later, NMI fires; we had already been sort of stuck for 60 seconds or so when I did the sysrq p above ] NMI Watchdog detected LOCKUP on CPU 1 CPU 1 Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U) Pid: 23789, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1 RIP: 0010:[] {_raw_write_lock+161} RSP: 0018:81007c5b5c18 EFLAGS: 0086 RAX: 8f02e600 RBX: 810037cec680 RCX: 002c2671 RDX: 00927190 RSI: 0001 RDI: 810037cec680 RBP: 810037cec668 R08: 810002d6b500 R09: fffa R10: 0003 R11: 80165922 R12: 810037cec680 R13: 2c20 R14: 810002d6b540 R15: 2aaabba6c000 FS: 2aae6080() GS:81011fc466c0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0033f38bdaf0 CR3: 7c296000 CR4: 06e0 Process mpi_multibw (pid: 23789, threadinfo 81007c5b4000, task 8100030557a0) Stack: 810002d6b540 8016596b 75ad5067 2c1b4000 81007d451da0 8016cc80 81007c5b5d38 Call Trace: {__set_page_dirty_nobuffers+73} {unmap_vmas+1042} {exit_mmap+124} {mmput+37} {do_exit+584} {__dequeue_signal+459} {sys_exit_group+0} {get_signal_to_deliver+1568} {do_s
Re: [openib-general] [librdmacm] changes to cmatose to return a value different than 0 when there is a failure
On Wednesday 17 May 2006 21:25, Sean Hefty wrote: > Dotan Barak wrote: > > Added checks to the return values of all of the functions that may fail > > (in order to add this test to the regression system). > > Thanks - applied with one minor change. > > > + int rc; > > Changed 'rc' to 'ret' to match the rest of the code. > > - Sean > great, thanks (next time i will pay attention to this issue). Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
Bryan> Wow. I have no idea where that extra "goto bail" came Bryan> from. It's not supposed to be there. Even without it you still leak the work structure, because there's no schedule_work(). Now that I look at it, in uverbs_mem.c, the mm will be leaked if the kmalloc fails... - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
Or> Just to make sure... does "not ready to queue in for-2.6.18" Or> wrt iSER relates to the dependency on the 2.6.18 iSCSI updates Or> (as the CMA can [should ?] be pushed with iSER), or you see Or> any further issues which needs to be fixed before the code is Or> ready? The only issue is that iser can't be merged until both James Bottomly and I have merged other stuff upstream first. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
On Wed, 2006-05-17 at 21:55 -0700, Roland Dreier wrote: > So with the "goto bail" you skip the code which does something with > the work you allocate, which means that you leak not only the work > structure but also the reference to the task's mm that you took. Wow. I have no idea where that extra "goto bail" came from. It's not supposed to be there. http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
Roland Dreier wrote: Or> Can you spare few words whats the difference between the Or> for-2.6.18 and for-mm branches of your git tree? for-mm is what Andrew pulls to get patches for -mm. It has things that I think should be seen in -mm, but which I am not ready to queue in for-2.6.18. You can use git show-branch or gitk to visualize exactly how the branches relate. Just to make sure... does "not ready to queue in for-2.6.18" wrt iSER relates to the dependency on the 2.6.18 iSCSI updates (as the CMA can [should ?] be pushed with iSER), or you see any further issues which needs to be fixed before the code is ready? I will try git show-branch, thanks. Or> When you say the code is pushed into master.kernel.org are you Or> referring to the mm tree of Andrew Morton? i don't see he has Or> one under kernel.org/git? No, I mean it's in my tree on master.kernel.org, rather than just sitting on my local hard disk. OK, thanks for the clarification. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] OFED RC4 also can't support >2000 connections
After executing command 'SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n 2000 -X 193.12.10.14:3129', there are some problems either: (1)sometimes server(running squid) occurred kernel panic. (2)client never connect server successfully. If using ttcp.aio to test, the error occurred on client is: [EMAIL PROTECTED] ~]# ./ttcp.aio -t 193.12.10.14 ttcp-t: buflen = 8192 nbuf = 2048 align = 16384/0 port = 5001 193.12.10.14 ttcp-t: socket ttcp-t: connect: Cannot allocate memory errno=12 [EMAIL PROTECTED] ~]# how to solve the problem? tks zhu --- Eitan Zahavi <[EMAIL PROTECTED]> wrote: > Hi Zhu, > > If you are using libsdp.conf to select which ports > should map to SDP and > which to TCP you might run out of resources for > tracking the opened > sockets. > > Try increasing the following constant in libsdp: > libsdp/src/port.c line 48: > #define MAPPED_SOCKET_MAX 1024 > to something like: > #define MAPPED_SOCKET_MAX 1 > > Or, if you can use SDP sockets only (your config > file is empty anyway): > SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so squid -d 10 -f > squid.conf > SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 2000 > -n 2000 -X > 193.12.10.14:3129 > > Hope this fixes the issue you see > > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -Original Message- > > From: [EMAIL PROTECTED] > [mailto:openib-general- > > [EMAIL PROTECTED] On Behalf Of zhu shi song > > Sent: Wednesday, May 17, 2006 3:17 PM > > To: openib-general@openib.org > > Subject: [openib-general] OFED RC4 also can't > support >2000 > connections > > > > I have installed OFED RC4 on my RHEL 4.3(2.6.9-34 > > kernel). I use the same method I told in previous > > mail. When increasing concurrent sdp connection > to > > 2000. sdp refuse connection in server side. And > client > > can't connect to server through sdp connection > > forever. > > > > OS: RHEL 4.3 (2.6.9-34) > > IB: OFED RC4 > > Test Method: > > Server: LD_PRELOAD=libsdp.so squid -d 10 -f > > squid.conf( sdp listening on IB0: > 193.12.10.14:3129) > > Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000 > -n > > 2000 -X 193.12.10.14:3129 > > http://www.google.com/index.html ( IB0: > 193.12.10.24) > > > > > > Who know what's wrong with sdp many concurrent > > connections? I have bought the cards for about 3 > > weeks, but I can't make them work correctly. > Urgent! > > > > tks > > zhu > > > > > > > > > > __ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > > protection around > > http://mail.yahoo.com > > > > __ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
Dave> We did discover one possible problem today, which is shared Dave> between our device code and the core openib code, and that's Dave> doing some memory freeing and accounting from a work thread Dave> (updating mm->locked_vm and cleaning up from earlier Dave> get_user_pages); the code in our driver was copied from the Dave> openib core code, it's not literally shared. Dave> I have a strong suspicion that at least sometimes, it's Dave> executing after the current->mm has gone away. I'm looking Dave> at that more right now. It doesn't seem likely to me. In uverbs_mem.c, ib_umem_release_on_close() does get_task_mm() and gives up if it can't take a reference to the task's mm. The mmput() doesn't happen until ib_umem_account() runs in the work thread. I do see obvious bugs in ipath_user_pages.c, though. In ipath_release_user_pages_on_close(), you have: mm = get_task_mm(current); if (!mm) goto bail; work = kmalloc(sizeof(*work), GFP_KERNEL); if (!work) goto bail_mm; goto bail; INIT_WORK(&work->work, user_pages_account, work); work->mm = mm; work->num_pages = num_pages; bail_mm: mmput(mm); bail: return; So with the "goto bail" you skip the code which does something with the work you allocate, which means that you leak not only the work structure but also the reference to the task's mm that you took. Even without the "goto bail" the code still wouldn't actually schedule the work, so the work structure would be leaked, although you would do mmput(). I'm not sure what you were trying to do here.c - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OpenIB 1.0 RC + PathScale problem
On Thu, 2006-05-18 at 00:11 -0400, Tim Miller wrote: > But when I try to run my main application, I get the > following error: > > libibverbs: Warning: couldn't load driver > /usr/local/lib/infiniband/libipathverbs.so: > /usr/local/lib/infiniband/libipathverbs.so: undefined symbol: > ibv_cmd_poll_cq > > Does anyone know what might cause this error? No. We don't see this problem here. Can you provide some more information, please? Running ldd on /usr/local/lib/infiniband/libipathverbs.so would be a good place to start, so you can see exactly which libibverbs.so is being linked against. Also, if you could post the relevant nm output for both libraries, that would be good. Thanks, http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [resend][RFC][PATCH] adding call to madvise
Quoting r. Gleb Natapov <[EMAIL PROTECTED]>: > @@ -187,8 +194,8 @@ int ibv_lock_range(void *base, size_t si > > > if (node->refcnt++ == 0) { > - ret = mlock((void *) node->start, > - node->end - node->start + 1); > + ret = madvise((void *) node->start, > + node->end - node->start + 1, MADV_DONTFORK); > if (ret) > goto out; > } Will this break libibverbs on older kernels that don't have madvise? Maybe test MADV_DONTFORK during library startup and set a flag? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
On Wed, 17 May 2006, Roland Dreier wrote: | Dave> We are seeing a bug (with both our driver native MPI | Dave> processes and mthca mvapic), where when 8 processes using | Dave> "simultaneously exit", we get watchdogs and/or hangs in the | Dave> close routines. Moving the freeing outside the mutex was an | Dave> attempt to see if we were running into some VM issues by | Dave> doing lots of page unlocking and freeing with the mutex | Dave> held. It seemed to help somewhat, but not to solve the | Dave> problem. | | Am I understanding correctly that you see a hang or watchdog timeout | even with the mthca driver? Yes. That is, the symptoms are the same, although the cause may be different. | Is there any possibility of posting the test case to reproduce this? It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed to do messaging rate), running 8 copies per dual-core 4-socket opteron, both on InfiniPath MPI, and MVAPICH (built for gen2). We ship the source with our upcoming release, and will probably make it available outside our release. We did discover one possible problem today, which is shared between our device code and the core openib code, and that's doing some memory freeing and accounting from a work thread (updating mm->locked_vm and cleaning up from earlier get_user_pages); the code in our driver was copied from the openib core code, it's not literally shared. I have a strong suspicion that at least sometimes, it's executing after the current->mm has gone away. I'm looking at that more right now. | It doesn't seem likely that ipath changes are going to fix a generic | bug like this... It wasn't an attempt to fix it, so much as to work around it, while I worked on other higher priority stuff. As I mentioned, it also helps a bit in allowing multiple processes to be in the open and close code simultaneously, when you have multiple cpus, so even on that basis, I'd probably leave it as it now is. Dave Olson [EMAIL PROTECTED] http://www.unixfolk.com/dave ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OpenIB 1.0 RC + PathScale problem
Hi All, I'm trying to test the 1.0 RC branch from subversion with PathScale InfiniPath HT-460 (I've used previous versions with some success). The code compiles successfully, and I can run ibv_rc_pingpong and even a simple MPI program. But when I try to run my main application, I get the following error: libibverbs: Warning: couldn't load driver /usr/local/lib/infiniband/libipathverbs.so: /usr/local/lib/infiniband/libipathverbs.so: undefined symbol: ibv_cmd_poll_cq Does anyone know what might cause this error? I ran an nm on libipathverbs.so and saw ibv_cmd_poll_cq and I found it in the libibverbs source, too, so I'm a bit confused about what the root cause of this is. My apologies if this has already been raised. I took a quick look in the archives and did not see anything off hand that matches this. Thanks, Tim M. -- Tim Miller System Administrator -- Laboratory of Computational Biology National Institutes of Health -- Bldg. 50 Rm. 3309-- 301-402-0618 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] iSER Status
Hi Mohit, Linux kernel 2.6.16 does not include ISER. ISER (initiator only) is scheduled for kernel 2.6.18. The ISER initiator code from openIB trunk is stable, and works with the open-iscsi initiator. The ISER target code is the seed of a project aimed to provide an iSCSI/ISER target. It is in early development. The code itself is stable, but there is no iSCSI target you can interface it with. We plan to interface it with the stgt project. Dan > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Mohit Katiyar > Sent: Thursday, May 18, 2006 4:22 AM > To: openib-general@openib.org > Subject: [openib-general] iSER Status > > Hi all, > Can anyone tell me whether the latest stable release of > Linux(2.6.16.16) contains both iSER intiator and target code or only > the initiator code? The open-iser target code available at > https://openfabrics.org/svn/gen2/ulps/open-iser-target/ is stable or > not? > > Thanks > Mohit > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] iSER Status
Hi all, Can anyone tell me whether the latest stable release of Linux(2.6.16.16) contains both iSER intiator and target code or only the initiator code? The open-iser target code available at https://openfabrics.org/svn/gen2/ulps/open-iser-target/ is stable or not? Thanks Mohit ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general][PATCH] srp: param sg_tablesize,
@@ -1914,6 +1920,11 @@ static int __init srp_init_module(void) { int ret; Thanks, should we do a check and put some cap on srp_sg_tablesize value ie. + srp_sg_tablesize = max(1, srp_sg_tablesize); + srp_sg_tablesize = min(srp_sg_tablesize, SRP_MAX_SG_TABLESIZE); + srp_template.sg_tablesize = srp_sg_tablesize; + srp_max_iu_len = (sizeof (struct srp_cmd) + + sizeof (struct srp_indirect_buf) + + srp_sg_tablesize * 16); + SRP_MAX_LUN = 512, - SRP_MAX_IU_LEN = 256, + SRP_DEF_SG_TABLESIZE= 12, + SRP_MAX_SG_TABLESIZE= 128, ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] opensm: remove cl_mem* stuff from diags [was: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines]
On 01:02 Thu 18 May , Sasha Khapyorsky wrote: > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > OpenSM: Use memory routines directly and eliminate cl_mem* routines > > as these routines are part of ISO C > > > > Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]> > > Following Hal's cleanup And even more: This cleans cl_mem*() wrappers from diags sources Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]> --- diags/src/saquery.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) d1950d51d8a6ada9b69ed194cd8cc4b2e9aa7902 diff --git a/diags/src/saquery.c b/diags/src/saquery.c index 5526bff..7c07253 100644 --- a/diags/src/saquery.c +++ b/diags/src/saquery.c @@ -42,6 +42,7 @@ #include #include #include #include +#include #define _GNU_SOURCE #include @@ -203,8 +204,8 @@ get_all_records(osm_bind_handle_t bind_h osmv_query_req_t req; osmv_user_query_t user; - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); user.attr_id = query_id; user.attr_offset = attr_offset; -- 1.3.2 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] Replace cl_memory.h by string.h
On Wed, 2006-05-17 at 18:20, Sasha Khapyorsky wrote: > On 15:05 Wed 17 May , Roland Dreier wrote: > > Just curious -- what's the reason behind changes like: > > > > > --- a/osm/complib/cl_event_wheel.c > > > +++ b/osm/complib/cl_event_wheel.c > > > @@ -40,6 +40,7 @@ # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > +#include > > > #include > > > #include > > > > It seems including cl_memory.h in more places is a step backwards, or > > am I missing the point here? > > It is necessary for explicit prototyping yet used cl_malloc(), > cl_free(). I guess this will be removed with next wave of Hal's cleanup. Yes, that's what I expect too. When cl_malloc*/cl_free get removed, this will go away... -- Hal > Sasha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] opensm: remove unused cl_memory_osd.h [was: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines]
On 01:02 Thu 18 May , Sasha Khapyorsky wrote: > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > OpenSM: Use memory routines directly and eliminate cl_mem* routines > > as these routines are part of ISO C > > > > Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]> > > Following Hal's cleanup And more: This removes unused cl_memory_osd.h file from complib Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]> --- osm/complib/Makefile.am |1 osm/include/Makefile.am |1 osm/include/complib/cl_memory_osd.h | 79 --- 3 files changed, 0 insertions(+), 81 deletions(-) delete mode 100644 osm/include/complib/cl_memory_osd.h 95ce6332a6531ae1c7dab4060bfa5800e1b8f4ec diff --git a/osm/complib/Makefile.am b/osm/complib/Makefile.am index ecbd8e2..809a404 100644 --- a/osm/complib/Makefile.am +++ b/osm/complib/Makefile.am @@ -51,7 +51,6 @@ libosmcompinclude_HEADERS = $(srcdir)/.. $(srcdir)/../include/complib/cl_map.h \ $(srcdir)/../include/complib/cl_math.h \ $(srcdir)/../include/complib/cl_memory.h \ - $(srcdir)/../include/complib/cl_memory_osd.h \ $(srcdir)/../include/complib/cl_memtrack.h \ $(srcdir)/../include/complib/cl_packoff.h \ $(srcdir)/../include/complib/cl_packon.h \ diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am index c7054ad..b23b1de 100644 --- a/osm/include/Makefile.am +++ b/osm/include/Makefile.am @@ -124,7 +124,6 @@ EXTRA_DIST = \ $(srcdir)/opensm/osm_state_mgr_ctrl.h \ $(srcdir)/complib/cl_thread_osd.h \ $(srcdir)/complib/cl_packon.h \ - $(srcdir)/complib/cl_memory_osd.h \ $(srcdir)/complib/cl_atomic_osd.h \ $(srcdir)/complib/cl_spinlock.h \ $(srcdir)/complib/cl_passivelock.h \ diff --git a/osm/include/complib/cl_memory_osd.h b/osm/include/complib/cl_memory_osd.h deleted file mode 100644 index 9ef17e0..000 --- a/osm/include/complib/cl_memory_osd.h +++ /dev/null @@ -1,79 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - *copyright notice, this list of conditions and the following - *disclaimer. - * - * - Redistributions in binary form must reproduce the above - *copyright notice, this list of conditions and the following - *disclaimer in the documentation and/or other materials - *provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - - -/* - * Abstract: - * Defines sized datatypes for Linux Kernel and User mode - * exported sizes are int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t - * int64_t, uint64_t. uintn_t is a polymorphic type, size is native size and - * also size of the pointer. - * - * Environment: - * Linux User and Kernel Mode - * - * $Revision: 1.2 $ - */ - -#ifndef _CL_MEMORY_OSD_H_ -#define _CL_MEMORY_OSD_H_ - -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -#ifndef __WIN__ - -static inline uint32_t -cl_get_pagesize( void ) -{ - return getpagesize(); -} - -#endif - -END_C_DECLS - -#endif /* _CL_MEMORY_OSD_H_ */ -- 1.3.2 ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general][PATCH] srp: param sg_tablesize,
Thanks, applied in slightly tweaked form as below: diff-tree 7c0543697efa99b2f1d308c415b0b2f3c0810f74 (from fbd15762bd05491db039ecd0ea57ee5f848759b0) Author: Vu Pham <[EMAIL PROTECTED]> Date: Wed May 17 15:21:41 2006 -0700 IB/srp: Allow sg_tablesize to be adjusted Make the sg_tablesize used by SRP adjustable at module load time via a module parameter. Calculate the corresponding IU length required to support this. Signed-off-by: Vu Pham <[EMAIL PROTECTED]> Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 72b61cd..4dd6e6a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -62,6 +62,13 @@ MODULE_DESCRIPTION("InfiniBand SCSI RDMA "v" DRV_VERSION " (" DRV_RELDATE ")"); MODULE_LICENSE("Dual BSD/GPL"); +static int srp_sg_tablesize = SRP_DEF_SG_TABLESIZE; +static int srp_max_iu_len; + +module_param(srp_sg_tablesize, int, 0444); +MODULE_PARM_DESC(srp_sg_tablesize, +"Max number of gather/scatter entries per I/O (default is 12)"); + static int topspin_workarounds = 1; module_param(topspin_workarounds, int, 0444); @@ -311,7 +318,7 @@ static int srp_send_req(struct srp_targe req->priv.opcode= SRP_LOGIN_REQ; req->priv.tag = 0; - req->priv.req_it_iu_len = cpu_to_be32(SRP_MAX_IU_LEN); + req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); memcpy(req->priv.initiator_port_id, target->srp_host->initiator_port_id, 16); @@ -953,7 +960,7 @@ static int srp_queuecommand(struct scsi_ goto err; dma_sync_single_for_cpu(target->srp_host->dev->dev->dma_device, iu->dma, - SRP_MAX_IU_LEN, DMA_TO_DEVICE); + srp_max_iu_len, DMA_TO_DEVICE); req = list_entry(target->free_reqs.next, struct srp_request, list); @@ -986,7 +993,7 @@ static int srp_queuecommand(struct scsi_ } dma_sync_single_for_device(target->srp_host->dev->dev->dma_device, iu->dma, - SRP_MAX_IU_LEN, DMA_TO_DEVICE); + srp_max_iu_len, DMA_TO_DEVICE); if (__srp_post_send(target, iu, len)) { printk(KERN_ERR PFX "Send failed\n"); @@ -1018,7 +1025,7 @@ static int srp_alloc_iu_bufs(struct srp_ for (i = 0; i < SRP_SQ_SIZE + 1; ++i) { target->tx_ring[i] = srp_alloc_iu(target->srp_host, - SRP_MAX_IU_LEN, + srp_max_iu_len, GFP_KERNEL, DMA_TO_DEVICE); if (!target->tx_ring[i]) goto err; @@ -1436,7 +1443,6 @@ static struct scsi_host_template srp_tem .eh_host_reset_handler = srp_reset_host, .can_queue = SRP_SQ_SIZE, .this_id= -1, - .sg_tablesize = SRP_MAX_INDIRECT, .cmd_per_lun= SRP_SQ_SIZE, .use_clustering = ENABLE_CLUSTERING, .shost_attrs= srp_host_attrs @@ -1914,6 +1920,11 @@ static int __init srp_init_module(void) { int ret; + srp_template.sg_tablesize = srp_sg_tablesize; + srp_max_iu_len = (sizeof (struct srp_cmd) + + sizeof (struct srp_indirect_buf) + + srp_sg_tablesize * 16); + ret = class_register(&srp_class); if (ret) { printk(KERN_ERR PFX "couldn't register class infiniband_srp\n"); diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index c071c30..033a447 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -56,7 +56,7 @@ enum { SRP_DLID_REDIRECT = 2, SRP_MAX_LUN = 512, - SRP_MAX_IU_LEN = 256, + SRP_DEF_SG_TABLESIZE= 12, SRP_RQ_SHIFT= 6, SRP_RQ_SIZE = 1 << SRP_RQ_SHIFT, @@ -71,9 +71,6 @@ enum { }; #define SRP_OP_RECV(1 << 31) -#define SRP_MAX_INDIRECT ((SRP_MAX_IU_LEN - \ - sizeof (struct srp_cmd) - \ - sizeof (struct srp_indirect_buf)) / 16) enum srp_target_state { SRP_TARGET_LIVE, ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] Replace cl_memory.h by string.h
On 15:05 Wed 17 May , Roland Dreier wrote: > Just curious -- what's the reason behind changes like: > > > --- a/osm/complib/cl_event_wheel.c > > +++ b/osm/complib/cl_event_wheel.c > > @@ -40,6 +40,7 @@ # include > > #endif /* HAVE_CONFIG_H */ > > > > #include > > +#include > > #include > > #include > > It seems including cl_memory.h in more places is a step backwards, or > am I missing the point here? It is necessary for explicit prototyping yet used cl_malloc(), cl_free(). I guess this will be removed with next wave of Hal's cleanup. Sasha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > Right now, there is memory > tracking code implemented. Doesn't MALLOC_CHECK_ do what you want? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] Replace cl_memory.h by string.h
Just curious -- what's the reason behind changes like: > --- a/osm/complib/cl_event_wheel.c > +++ b/osm/complib/cl_event_wheel.c > @@ -40,6 +40,7 @@ # include > #endif /* HAVE_CONFIG_H */ > > #include > +#include > #include > #include It seems including cl_memory.h in more places is a step backwards, or am I missing the point here? - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: [PATCH2] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug
>-Original Message- >From: Arlin Davis [mailto:[EMAIL PROTECTED] >Sent: Wednesday, May 17, 2006 12:17 PM >To: 'James Lentini' >Cc: openib-general >Subject: [PATCH] uDAPL: fix uCMA provider event types and dapl_ep_create segv >bug > >James, > >Fix for uCMA provider to return the correct event as a result of rejects. >Also, ran into a segv bug >with dapl_ep_create when creating without a conn_evd. > >Thanks, > >-arlin > > Signed-off by: Arlin Davis <[EMAIL PROTECTED]> Sorry, the last patch was wrong. Try again... -arlin Signed-off by: Arlin Davis <[EMAIL PROTECTED]> Index: dapl/common/dapl_ep_create.c === --- dapl/common/dapl_ep_create.c(revision 7299) +++ dapl/common/dapl_ep_create.c(working copy) @@ -310,7 +310,10 @@ dapl_ep_create ( * * N.B. This should really be done by a util routine. */ -dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); +if (connect_evd_handle != DAT_HANDLE_NULL) +{ + dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); +} /* Optional handles */ if (recv_evd_handle != DAT_HANDLE_NULL) { Index: dapl/openib_cma/dapl_ib_cm.c === --- dapl/openib_cma/dapl_ib_cm.c(revision 7299) +++ dapl/openib_cma/dapl_ib_cm.c(working copy) @@ -287,14 +287,24 @@ static void dapli_cm_active_cb(struct da NULL, conn->ep); break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; + dapl_dbg_log( DAPL_DBG_TYPE_WARN, " dapli_cm_active_handler: REJECTED reason=%d\n", event->status); - dapl_evd_connection_callback(conn, IB_CME_DESTINATION_REJECT, -NULL, conn->ep); + + dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep); + break; - + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, @@ -383,6 +393,14 @@ static void dapli_cm_passive_cb(struct d break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; dapl_dbg_log( DAPL_DBG_TYPE_WARN, @@ -397,10 +415,11 @@ static void dapli_cm_passive_cb(struct d &ipaddr->dst_addr)->sin_addr.s_addr), ntohs(((struct sockaddr_in *) &ipaddr->dst_addr)->sin_port)); - - dapls_cr_callback(conn, IB_CME_DESTINATION_REJECT, - NULL, conn->sp); + + dapls_cr_callback(conn, cm_event, NULL, conn->sp); + break; + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
On Wed, 2006-05-17 at 15:29, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > Subject: Re: multcast join failed > > > > On Wed, 2006-05-17 at 15:15, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > > > Subject: Re: multcast join failed > > > > > > > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > > > > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > > > > > Subject: Re: multcast join failed > > > > > > > > > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > > > > > Hi, Roland! > > > > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > > > > > > > ib0: multicast join failed for ff12:401b::0:0:0::, > > > > > > > status -22 > > > > > > > > > > > > > > and I can't ping this machine over ipoib. > > > > > > > Any idea? > > > > > > > > > > > > What SM are you using ? > > > > > > > > > > > > If OpenSM, are there any errors in the osm.log ? > > > > > > > > > > > > -- Hal > > > > > > > > > > > > > > > > > > > > > opensm > > > > > I see these > > > > > > > > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > > failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > > failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > > failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > > failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > > failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > > > > Are you attempting to join a 4x group from a 1x port (or perhaps there > > > > is a MTU mismatch between the port and the group) ? > > > > > > > > -- Hal > > > > > > > > > > Yes, for some reason it came up 1x. but why? > > > > If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it > > is an autonegotiation thing. Perhaps you have a bad cable ? > > OKay, I'll check, but why isn't ipoib working? Why is the mcast group 4x? It defaults to 4x. If you want the group to be 1x, do something like the following in /etc/osm-partitions.conf Default=0x7fff,ipoib,rate=2:ALL=full; You can check osm/doc/partition-config.txt for more config info. > ITs a back-to-back configuration ... OK. That shouldn't matter. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > Subject: Re: multcast join failed > > On Wed, 2006-05-17 at 15:15, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > > Subject: Re: multcast join failed > > > > > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > > > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > > > > Subject: Re: multcast join failed > > > > > > > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > > > > Hi, Roland! > > > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > > > > > ib0: multicast join failed for ff12:401b::0:0:0::, > > > > > > status -22 > > > > > > > > > > > > and I can't ping this machine over ipoib. > > > > > > Any idea? > > > > > > > > > > What SM are you using ? > > > > > > > > > > If OpenSM, are there any errors in the osm.log ? > > > > > > > > > > -- Hal > > > > > > > > > > > > > > > > > opensm > > > > I see these > > > > > > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > > failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > > Are you attempting to join a 4x group from a 1x port (or perhaps there > > > is a MTU mismatch between the port and the group) ? > > > > > > -- Hal > > > > > > > Yes, for some reason it came up 1x. but why? > > If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it > is an autonegotiation thing. Perhaps you have a bad cable ? OKay, I'll check, but why isn't ipoib working? Why is the mcast group 4x? ITs a back-to-back configuration ... -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > Yes, for some reason it came up 1x. but why? > > If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it > is an autonegotiation thing. Perhaps you have a bad cable ? Hmm. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
On Wed, 2006-05-17 at 15:15, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > Subject: Re: multcast join failed > > > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > > > Subject: Re: multcast join failed > > > > > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > > > Hi, Roland! > > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > > > ib0: multicast join failed for ff12:401b::0:0:0::, status > > > > > -22 > > > > > > > > > > and I can't ping this machine over ipoib. > > > > > Any idea? > > > > > > > > What SM are you using ? > > > > > > > > If OpenSM, are there any errors in the osm.log ? > > > > > > > > -- Hal > > > > > > > > > > > > > opensm > > > I see these > > > > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 > > > failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > Are you attempting to join a 4x group from a 1x port (or perhaps there > > is a MTU mismatch between the port and the group) ? > > > > -- Hal > > > > Yes, for some reason it came up 1x. but why? If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it is an autonegotiation thing. Perhaps you have a bad cable ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug
James, Fix for uCMA provider to return the correct event as a result of rejects. Also, ran into a segv bug with dapl_ep_create when creating without a conn_evd. Thanks, -arlin Signed-off by: Arlin Davis <[EMAIL PROTECTED]> Index: dapl/common/dapl_ep_create.c === --- dapl/common/dapl_ep_create.c(revision 7140) +++ dapl/common/dapl_ep_create.c(working copy) @@ -310,7 +310,10 @@ dapl_ep_create ( * * N.B. This should really be done by a util routine. */ -dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); +if (connect_evd_handle != DAT_HANDLE_NULL) +{ + dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); +} /* Optional handles */ if (recv_evd_handle != DAT_HANDLE_NULL) { Index: dapl/openib_cma/dapl_ib_cm.c === --- dapl/openib_cma/dapl_ib_cm.c(revision 7140) +++ dapl/openib_cma/dapl_ib_cm.c(working copy) @@ -285,14 +285,24 @@ static void dapli_cm_active_cb(struct da NULL, conn->ep); break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; + dapl_dbg_log( DAPL_DBG_TYPE_WARN, " dapli_cm_active_handler: REJECTED reason=%d\n", event->status); - dapl_evd_connection_callback(conn, IB_CME_DESTINATION_REJECT, -NULL, conn->ep); + + dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep); + break; - + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, @@ -381,6 +391,14 @@ static void dapli_cm_passive_cb(struct d break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; dapl_dbg_log( DAPL_DBG_TYPE_WARN, @@ -395,10 +413,11 @@ static void dapli_cm_passive_cb(struct d &ipaddr->dst_addr)->sin_addr.s_addr), ntohs(((struct sockaddr_in *) &ipaddr->dst_addr)->sin_port)); - - dapls_cr_callback(conn, IB_CME_DESTINATION_REJECT, - NULL, conn->sp); + + dapl_cr_callback(conn, cm_event, NULL, conn->sp); + break; + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > Subject: Re: multcast join failed > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > > Subject: Re: multcast join failed > > > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > > Hi, Roland! > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > ib0: multicast join failed for ff12:401b::0:0:0::, status > > > > -22 > > > > > > > > and I can't ping this machine over ipoib. > > > > Any idea? > > > > > > What SM are you using ? > > > > > > If OpenSM, are there any errors in the osm.log ? > > > > > > -- Hal > > > > > > > > > opensm > > I see these > > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > Are you attempting to join a 4x group from a 1x port (or perhaps there > is a MTU mismatch between the port and the group) ? > > -- Hal > Yes, for some reason it came up 1x. but why? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > Subject: Re: multcast join failed > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > Hi, Roland! > > > With svn trunk, I started getting the following on one machine: > > > > > > ib0: multicast join failed for ff12:401b::0:0:0::, status -22 > > > > > > and I can't ping this machine over ipoib. > > > Any idea? > > > > What SM are you using ? > > > > If OpenSM, are there any errors in the osm.log ? > > > > -- Hal > > > > > opensm > I see these > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID Are you attempting to join a 4x group from a 1x port (or perhaps there is a MTU mismatch between the port and the group) ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: multcast join failed
On 21:23 Wed 17 May , Michael S. Tsirkin wrote: > > opensm > I see these > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID Is it 1x port? Sasha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [librdmacm] changes to cmatose to return a value different than 0 when there is a failure
Dotan Barak wrote: Added checks to the return values of all of the functions that may fail (in order to add this test to the regression system). Thanks - applied with one minor change. + int rc; Changed 'rc' to 'ret' to match the rest of the code. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > Subject: Re: multcast join failed > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > Hi, Roland! > > With svn trunk, I started getting the following on one machine: > > > > ib0: multicast join failed for ff12:401b::0:0:0::, status -22 > > > > and I can't ping this machine over ipoib. > > Any idea? > > What SM are you using ? > > If OpenSM, are there any errors in the osm.log ? > > -- Hal > opensm I see these May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general][PATCH] srp: throttle command per lun,
Thanks, applied. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: multcast join failed
On 13:39 Wed 17 May , Hal Rosenstock wrote: > On Wed, 2006-05-17 at 13:38, Eitan Zahavi wrote: > > You are probably running with no partition policy file. > > Assuming he is running with OpenSM from the trunk > > > I think you need one to get the default partition setup for IPoIB. > > > > Hal, is this correct? > > Assuming the above is true, he either needs to run: > > opensm -N > > or create a partition configuration file /etc/osm-partitions.conf > > Default=0x7fff,ipoib:ALL=full; > > and run: > > opensm > > assuming all he cares about is the default partition Without partition policy file OpenSM should configure Default partition with full membership (pkey=0x) for all ports and precreate IPoIB MCG (it is equivalent to "Default=0x7fff,ipoib:ALL=full;" as in Hal's example). Actual pkey tables content could be checked with: $ smpquery pkeys [port number] Sasha ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
Dave> We are seeing a bug (with both our driver native MPI Dave> processes and mthca mvapic), where when 8 processes using Dave> "simultaneously exit", we get watchdogs and/or hangs in the Dave> close routines. Moving the freeing outside the mutex was an Dave> attempt to see if we were running into some VM issues by Dave> doing lots of page unlocking and freeing with the mutex Dave> held. It seemed to help somewhat, but not to solve the Dave> problem. Am I understanding correctly that you see a hang or watchdog timeout even with the mthca driver? Is there any possibility of posting the test case to reproduce this? It doesn't seem likely that ipath changes are going to fix a generic bug like this... - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: SRP [PATCH] Cleaning in srp_remove_one
Thanks. I had already merged some changes from Matthew Wilcox that clean up that loop a little bit, but I merged the rest of your patch too. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] multcast join failed
On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > Hi, Roland! > With svn trunk, I started getting the following on one machine: > > ib0: multicast join failed for ff12:401b::0:0:0::, status -22 > > and I can't ping this machine over ipoib. > Any idea? What SM are you using ? If OpenSM, are there any errors in the osm.log ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
Michael> When can we get EINVAL from multicast join? If the SM returned a bad status I think. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: multcast join failed
On Wed, 2006-05-17 at 13:38, Eitan Zahavi wrote: > You are probably running with no partition policy file. Assuming he is running with OpenSM from the trunk > I think you need one to get the default partition setup for IPoIB. > > Hal, is this correct? Assuming the above is true, he either needs to run: opensm -N or create a partition configuration file /etc/osm-partitions.conf Default=0x7fff,ipoib:ALL=full; and run: opensm assuming all he cares about is the default partition -- Hal > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:openib-general- > > [EMAIL PROTECTED] On Behalf Of Michael S. Tsirkin > > Sent: Wednesday, May 17, 2006 7:45 PM > > To: Roland Dreier > > Cc: openib-general@openib.org > > Subject: [openib-general] Re: multcast join failed > > > > Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > > > Subject: Re: multcast join failed > > > > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > ib0: multicast join failed for ff12:401b::0:0:0::, > status -22 > > > > > > > > and I can't ping this machine over ipoib. > > > > Any idea? > > > > > > No, nothing of significance has changed in ipoib for a while. > > > > When can we get EINVAL from multicast join? > > > > -- > > MST > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] opensm: make more statics
On Wed, 2006-05-17 at 12:44, Sasha Khapyorsky wrote: > This makes local functions to be static in osm_link_mgr.c. > > Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]> Thanks. Applied to both trunk and 1.0 branch. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED-1.0-rc4 need db-devel
why don't use db4-devel. db-devel sems obsolete. zhu --- Vladimir Sokolovsky <[EMAIL PROTECTED]> wrote: > Scott Weitzenkamp (sweitzen) wrote: > >> db-devel package is required to build open_iscsi > package RPM. > >> This package is not relevant for RHEL 4.3. > >> There are two options to install OFED-1.0-rc4 on > RHEL 4.3 without > >> open_iscsi: > >> 1. Select "Custom installation" and don't choose > to install > >> open_iscsi. > >> 2. Edit ofed.conf (created automatically under > OFED-1.0-rc4 directory > >> when you run install.sh or build.sh) and set > *open_iscsi=n*. > >> Then run: > >> ./install.sh -c ofed.conf > >> > > > > Why don't we ignore these packages on RHEL4 U3, > just like we ignore > > uDAPL on ppc64? > > > > Scott > > > > > We will do this in OFED-1.0-rc5. > > Vladimir > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: multcast join failed
You are probably running with no partition policy file. I think you need one to get the default partition setup for IPoIB. Hal, is this correct? Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of Michael S. Tsirkin > Sent: Wednesday, May 17, 2006 7:45 PM > To: Roland Dreier > Cc: openib-general@openib.org > Subject: [openib-general] Re: multcast join failed > > Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > > Subject: Re: multcast join failed > > > > > With svn trunk, I started getting the following on one machine: > > > > > > ib0: multicast join failed for ff12:401b::0:0:0::, status -22 > > > > > > and I can't ping this machine over ipoib. > > > Any idea? > > > > No, nothing of significance has changed in ipoib for a while. > > When can we get EINVAL from multicast join? > > -- > MST > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines
On Wed, 2006-05-17 at 13:11, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > * > > * SEE ALSO > > -* Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, > > cl_memcmp > > +* Memory Management, cl_free, cl_malloc > > **/ > > Next: cl_malloc/cl_free? Yes, I didn't want to hold this part up for that. There will be a separate patch for that but not sure when. Right now, there is memory tracking code implemented. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re:
On Wed, May 17, 2006 at 07:40:07AM -0700, Roland Dreier wrote: > Yes, I agree. That's why I think we should get rid of the > "linux-kernel" part of the svn tree entirely. Because everyone who > wants to test new code seems to run last stable kernel + svn drivers > instead of the new development kernel. That's because openib guarantee SVN drivers will build with last stable kernel. Change that policy and document the steps that folks should follow. I'd be willing to occasionally try newer kernels if you think that's what we should be doing. thanks, grant ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > * > * SEE ALSO > -*Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, > cl_memcmp > +*Memory Management, cl_free, cl_malloc > **/ Next: cl_malloc/cl_free? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out
On 08:11 Wed 17 May , Hal Rosenstock wrote: > On Wed, 2006-05-17 at 07:41, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > > Subject: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code > > > be compiled out > > > > > > OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out > > > > > > Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]> > > > > > > > > > @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb( > > > lid. > > >*/ > > >/* For now - do not add the alternate dr path to the release */ > > > - if (0) > > > -if ( p_madw->mad_addr.dest_lid != 0x ) > > > +#if 0 > > > + if ( p_madw->mad_addr.dest_lid != 0x ) > > > > In my experience, if you compile with -O, gcc does a good enough job of > > dead code elimination. > > But not all builds are that way though. Also "#if 0" makes temporary disabled code more "visible" (for future improvements). Sasha. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes
On Mon, 15 May 2006, Roland Dreier wrote: | This looks like a pastiche of several patches. Why can't it be split | up into logical pieces? | | > Call dma_free_coherent without ipath_mutex held. | | Why? Doesn't freeing work with the mutex held? Sure, that's the way the previous code worked. We are seeing a bug (with both our driver native MPI processes and mthca mvapic), where when 8 processes using "simultaneously exit", we get watchdogs and/or hangs in the close routines. Moving the freeing outside the mutex was an attempt to see if we were running into some VM issues by doing lots of page unlocking and freeing with the mutex held. It seemed to help somewhat, but not to solve the problem. It also allows other processes to open and close in a somewhat more timely fashion. Dave Olson [EMAIL PROTECTED] http://www.unixfolk.com/dave ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Subject: Re: multcast join failed > > > With svn trunk, I started getting the following on one machine: > > > > ib0: multicast join failed for ff12:401b::0:0:0::, status -22 > > > > and I can't ping this machine over ipoib. > > Any idea? > > No, nothing of significance has changed in ipoib for a while. When can we get EINVAL from multicast join? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] opensm: make more statics
This makes local functions to be static in osm_link_mgr.c. Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]> --- osm/opensm/osm_link_mgr.c | 14 +++--- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/osm/opensm/osm_link_mgr.c b/osm/opensm/osm_link_mgr.c index 5d9ab7d..2b0d2de 100644 --- a/osm/opensm/osm_link_mgr.c +++ b/osm/opensm/osm_link_mgr.c @@ -111,8 +111,8 @@ osm_link_mgr_init( /** **/ -void -osm_link_mgr_set_physp_pi( +static void +__osm_link_mgr_set_physp_pi( IN osm_link_mgr_t* const p_mgr, IN osm_physp_t* const p_physp, IN uint8_t const port_state ) @@ -129,7 +129,7 @@ osm_link_mgr_set_physp_pi( boolean_tsend_set = FALSE; osm_physp_t *p_remote_physp; - OSM_LOG_ENTER( p_mgr->p_log, osm_link_mgr_set_physp_pi ); + OSM_LOG_ENTER( p_mgr->p_log, __osm_link_mgr_set_physp_pi ); CL_ASSERT( p_physp ); CL_ASSERT( osm_physp_is_valid( p_physp ) ); @@ -151,7 +151,7 @@ osm_link_mgr_set_physp_pi( if (! p_switch ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "osm_link_mgr_set_physp_pi: ERR 4201: " + "__osm_link_mgr_set_physp_pi: ERR 4201: " "Cannot find switch by guid: 0x%" PRIx64 "\n", cl_ntoh64( p_node->node_info.node_guid ) ); goto Exit; @@ -165,7 +165,7 @@ osm_link_mgr_set_physp_pi( if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) { osm_log( p_mgr->p_log, OSM_LOG_DEBUG, - "osm_link_mgr_set_physp_pi: " + "__osm_link_mgr_set_physp_pi: " "Skipping port 0, GUID = 0x%016" PRIx64 "\n", cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); } @@ -366,7 +366,7 @@ osm_link_mgr_set_physp_pi( /** **/ -osm_signal_t +static osm_signal_t __osm_link_mgr_process_port( IN osm_link_mgr_t* const p_mgr, IN osm_port_t* const p_port, @@ -419,7 +419,7 @@ __osm_link_mgr_process_port( (current_state < link_state) ) { p_mgr->send_set_reqs = FALSE; -osm_link_mgr_set_physp_pi( +__osm_link_mgr_set_physp_pi( p_mgr, p_physp, link_state ); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] ib_mthca fails to load with old firmware
Hi Scott - On Wed, 17 May 2006 at 08:40:50 -0700, Scott Weitzenkamp wrote: > What kind of blade systems are these? For some blade systems, Cisco > provides HCA firmware that has been configured to provide better signal > integrity. > > If you run /usr/local/ofed/sbin/tvflash -i, I can then tell which > firmware you need. The blade systems are all Dell 1855's. Here's the output you requested: ---8<--- blade9:/usr/local/ofed/sbin # ./tvflash -i HCA #0: MT25208 Tavor Compat, DLGL, revision A0 Primary image is v4.6.000 build 3.0.0.160, with label 'HCA.DLGL.A0' Secondary image is v4.6.000 build 3.0.0.160, with label 'HCA.DLGL.A0' Vital Product Data Product Name: DLGL P/N: 99-00063-03 E/C: Rev: A8 S/N: 57O1771 Freq/Power: PW=10W;PCIe 8X Date Code: 3105 Checksum: Ok --->8--- Regards, -- Ken L Johnson <[EMAIL PROTECTED]> ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: compilation warning in diags tools
On Wed, 2006-05-17 at 11:47, Dotan Barak wrote: > Hi. > > Here is a compilation warning when using gcc 3.4.5: > > src/grouping.c: In function `get_router_slot': > src/grouping.c:213: warning: implicit declaration of function `calloc' > /bin/sh ./libtool --tag=CC --mode=link gcc -m64 -L../libibcommon -libcommon > -L../libibumad -libumad -L../osm/opensm/.libs -lopensm -L../os > m/libvendor/.libs -losmvendor -L../osm/complib/.libs -losmcomp -o > src/ibnetdiscover src_ibnetdiscover-ibnetdiscover.o src_ibnetdiscover-gro > uping.o ../libibcommon/libibcommon.la ../libibumad/libibumad.la > ../libibmad/libibmad.la > > (i think that stdlib.h should be included to prevent this warning) Fixed in r7290. Can you update and try to be sure ? Thanks. -- Hal > > thanks > Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] [PATCH] IB: Make needlessly global ib_mad_cachestatic
>Any reason not to apply this? Looks fine to apply be me. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP [PATCH] Cleaning in srp_remove_one
3 changes in the same place: 1) The if statement is redundant. 2) There is no need to save the flags - it is inside a mutex_lock. 3) We hold the mutex for the list and we are not deleting from the list so there is no need for list_for_each_entry_safe. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-14 14:22:12.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 14:26:54.0 +0300 @@ -1750,7 +1750,6 @@ static void srp_remove_one(struct ib_dev struct srp_host *host, *tmp_host; LIST_HEAD(target_list); struct srp_target_port *target, *tmp_target; - unsigned long flags; dev_list = ib_get_client_data(device, &srp_client); @@ -1767,12 +1766,10 @@ static void srp_remove_one(struct ib_dev * commands and don't try to reconnect. */ mutex_lock(&host->target_mutex); - list_for_each_entry_safe(target, tmp_target, -&host->target_list, list) { - spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state != SRP_TARGET_REMOVED) - target->state = SRP_TARGET_REMOVED; - spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + list_for_each_entry(target, &host->target_list, list) { + spin_lock_irq(target->scsi_host->host_lock); + target->state = SRP_TARGET_REMOVED; + spin_unlock_irq(target->scsi_host->host_lock); } mutex_unlock(&host->target_mutex); -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re:
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Michael> But, I think it's still useful to make it possible for > Michael> people to test development snapshots on stable kernels > Michael> simply because we'll get more testing and feedback this > Michael> way. > > It's fine except when API changes force us to diverge from upstream. > Then it becomes a hassle. Yes. Still, its mostly easy. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP [PATCH] Looks like a potantial bug
Yes, good catch. Thanks, applied. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re:
Michael> Yea, we are going that way. Soon all we'll need will be Michael> a git tree that we can used for development. BTW, how Michael> easy is it to get an account at kernel.org? It's not hard if you have some history as a kernel developer. Of course hosting a git tree is pretty easy as well. Michael> But, I think it's still useful to make it possible for Michael> people to test development snapshots on stable kernels Michael> simply because we'll get more testing and feedback this Michael> way. It's fine except when API changes force us to diverge from upstream. Then it becomes a hassle. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: multcast join failed
> With svn trunk, I started getting the following on one machine: > > ib0: multicast join failed for ff12:401b::0:0:0::, status -22 > > and I can't ping this machine over ipoib. > Any idea? No, nothing of significance has changed in ipoib for a while. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP [PATCH] Looks like a potantial bug
On Wed, May 17, 2006 at 06:40:04PM +0300, Ishai Rabinovitz wrote: > Hi, > > While doing a code review I found a potential bug. > I did not manage to execute a test to check this code. > Please take a look: Sorry, I made a mistake in the patch. Please look at this one. In srp_reconnect_target it uses req->scmnd->scsi_done(req->scmnd); (like in the patch) Ishai > Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> > -- > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > === > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 > 16:24:24.0 +0300 > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 > 17:13:47.0 +0300 > @@ -1326,7 +1326,7 @@ static int srp_reset_device(struct scsi_ > list_for_each_entry_safe(req, tmp, &target->req_queue, list) > if (req->scmnd->device == scmnd->device) { > req->scmnd->result = DID_RESET << 16; > - scmnd->scsi_done(scmnd); > + req->scmnd->scsi_done(req->scmnd); > srp_remove_req(target, req); > } > > -- > Ishai Rabinovitz > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] multcast join failed
Hi, Roland! With svn trunk, I started getting the following on one machine: ib0: multicast join failed for ff12:401b::0:0:0::, status -22 and I can't ping this machine over ipoib. Any idea? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] compilation warning in diags tools
Hi. Here is a compilation warning when using gcc 3.4.5: src/grouping.c: In function `get_router_slot': src/grouping.c:213: warning: implicit declaration of function `calloc' /bin/sh ./libtool --tag=CC --mode=link gcc -m64 -L../libibcommon -libcommon -L../libibumad -libumad -L../osm/opensm/.libs -lopensm -L../os m/libvendor/.libs -losmvendor -L../osm/complib/.libs -losmcomp -o src/ibnetdiscover src_ibnetdiscover-ibnetdiscover.o src_ibnetdiscover-gro uping.o ../libibcommon/libibcommon.la ../libibumad/libibumad.la ../libibmad/libibmad.la (i think that stdlib.h should be included to prevent this warning) thanks Dotan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re:
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Subject: Re: > That's why I think we should get rid of the > "linux-kernel" part of the svn tree entirely. Because everyone who > wants to test new code seems to run last stable kernel + svn drivers > instead of the new development kernel. > > - R. Yea, we are going that way. Soon all we'll need will be a git tree that we can used for development. BTW, how easy is it to get an account at kernel.org? But, I think it's still useful to make it possible for people to test development snapshots on stable kernels simply because we'll get more testing and feedback this way. One way would be to put snapshots under https://openib.org/downloads/ -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP [PATCH] Looks like a potantial bug
Hi, While doing a code review I found a potential bug. I did not manage to execute a test to check this code. Please take a look: Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> -- Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-17 16:24:24.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 17:13:47.0 +0300 @@ -1326,7 +1326,7 @@ static int srp_reset_device(struct scsi_ list_for_each_entry_safe(req, tmp, &target->req_queue, list) if (req->scmnd->device == scmnd->device) { req->scmnd->result = DID_RESET << 16; - scmnd->scsi_done(scmnd); + req->scmnd->scsi_done(scmnd); srp_remove_req(target, req); } -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] ib_mthca fails to load with old firmware
> Ken> I'm running into a problem when I try to use the OFED RC4 > Ken> release on some blade systems that have TopSpin HCA daughter > Ken> cards installed (actually Mellanox). I'm trying to figure out > Ken> how to update the firmware to the latest [ > Ken> http://mellanox.com/support/firmware_table.php ] but it seems > Ken> I must know the PSID so I can grab the right firmware > Ken> image. Can anyone point me in the right direction here? > > For blade HCAs you should contact the HCA vendor for firmware updates. > > You could try passing the module option "fw_cmd_doorbell=0" to > ib_mthca. That may work around things. > > - R. What kind of blade systems are these? For some blade systems, Cisco provides HCA firmware that has been configured to provide better signal integrity. If you run /usr/local/ofed/sbin/tvflash -i, I can then tell which firmware you need. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED-1.0-rc4 need db-devel
Scott Weitzenkamp (sweitzen) wrote: db-devel package is required to build open_iscsi package RPM. This package is not relevant for RHEL 4.3. There are two options to install OFED-1.0-rc4 on RHEL 4.3 without open_iscsi: 1. Select "Custom installation" and don't choose to install open_iscsi. 2. Edit ofed.conf (created automatically under OFED-1.0-rc4 directory when you run install.sh or build.sh) and set *open_iscsi=n*. Then run: ./install.sh -c ofed.conf Why don't we ignore these packages on RHEL4 U3, just like we ignore uDAPL on ppc64? Scott We will do this in OFED-1.0-rc5. Vladimir ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [librdmacm] changes to cmatose to return a value different than 0 when there is a failure
Added checks to the return values of all of the functions that may fail (in order to add this test to the regression system). Signed-off-by: Dotan Barak <[EMAIL PROTECTED]> Index: last_stable/src/userspace/librdmacm/examples/cmatose.c === --- last_stable.orig/src/userspace/librdmacm/examples/cmatose.c 2006-05-17 18:30:35.0 +0300 +++ last_stable/src/userspace/librdmacm/examples/cmatose.c 2006-05-17 18:31:35.0 +0300 @@ -219,7 +219,7 @@ static void connect_error(void) test.connects_left--; } -static void addr_handler(struct cmatest_node *node) +static int addr_handler(struct cmatest_node *node) { int ret; @@ -228,9 +228,10 @@ static void addr_handler(struct cmatest_ printf("cmatose: resolve route failed: %d\n", ret); connect_error(); } + return ret; } -static void route_handler(struct cmatest_node *node) +static int route_handler(struct cmatest_node *node) { struct rdma_conn_param conn_param; int ret; @@ -252,9 +253,10 @@ static void route_handler(struct cmatest printf("cmatose: failure connecting: %d\n", ret); goto err; } - return; + return 0; err: connect_error(); + return ret; } static int connect_handler(struct rdma_cm_id *cma_id) @@ -305,10 +307,10 @@ static int cma_handler(struct rdma_cm_id switch (event->event) { case RDMA_CM_EVENT_ADDR_RESOLVED: - addr_handler(cma_id->context); + ret = addr_handler(cma_id->context); break; case RDMA_CM_EVENT_ROUTE_RESOLVED: - route_handler(cma_id->context); + ret = route_handler(cma_id->context); break; case RDMA_CM_EVENT_CONNECT_REQUEST: ret = connect_handler(cma_id); @@ -420,35 +422,45 @@ static int poll_cqs(void) return 0; } -static void connect_events(void) +static int connect_events(void) { struct rdma_cm_event *event; - int err = 0; + int err = 0, ret = 0; while (test.connects_left && !err) { err = rdma_get_cm_event(test.channel, &event); if (!err) { cma_handler(event->id, event); rdma_ack_cm_event(event); + } else { + printf("cmatose: failure in rdma_get_cm_event in connect events\n"); + ret = err; } } + + return ret; } -static void disconnect_events(void) +static int disconnect_events(void) { struct rdma_cm_event *event; - int err = 0; + int err = 0, ret = 0; while (test.disconnects_left && !err) { err = rdma_get_cm_event(test.channel, &event); if (!err) { cma_handler(event->id, event); rdma_ack_cm_event(event); + } else { + printf("cmatose: failure in rdma_get_cm_event in disconnect events\n"); + ret = err; } } + + return ret; } -static void run_server(void) +static int run_server(void) { struct rdma_cm_id *listen_id; int i, ret; @@ -457,7 +469,7 @@ static void run_server(void) ret = rdma_create_id(test.channel, &listen_id, &test); if (ret) { printf("cmatose: listen request failed\n"); - return; + return ret; } test.src_in.sin_family = PF_INET; @@ -465,7 +477,7 @@ static void run_server(void) ret = rdma_bind_addr(listen_id, test.src_addr); if (ret) { printf("cmatose: bind address failed: %d\n", ret); - return; + return ret; } ret = rdma_listen(listen_id, 0); @@ -474,16 +486,21 @@ static void run_server(void) goto out; } - connect_events(); + ret = connect_events(); + if (ret) + goto out; if (message_count) { printf("initiating data transfers\n"); - for (i = 0; i < connections; i++) - if (post_sends(&test.nodes[i])) + for (i = 0; i < connections; i++) { + ret = post_sends(&test.nodes[i]); + if (ret) goto out; + } printf("receiving data transfers\n"); - if (poll_cqs()) + ret = poll_cqs(); + if (ret) goto out; printf("data transfers complete\n"); @@ -497,10 +514,13 @@ static void run_server(void) rdma_disconnect(test.nodes[i].cma_id); } - disconnect_events(); + ret = disconnect_events(); + printf("disconnected\n"); + out: rdma_dest
RE: [openib-general] OFED-1.0-rc4 need db-devel
> db-devel package is required to build open_iscsi package RPM. > This package is not relevant for RHEL 4.3. > There are two options to install OFED-1.0-rc4 on RHEL 4.3 without > open_iscsi: > 1. Select "Custom installation" and don't choose to install > open_iscsi. > 2. Edit ofed.conf (created automatically under OFED-1.0-rc4 directory > when you run install.sh or build.sh) and set *open_iscsi=n*. > Then run: > ./install.sh -c ofed.conf Why don't we ignore these packages on RHEL4 U3, just like we ignore uDAPL on ppc64? Scott ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading
On Wed, May 17, 2006 at 02:55:57AM +0300, Roland Dreier wrote: > Hmm, this doesn't seem right to me. If I try this, then I get a crash > because the scsi_host is already gone after the first put. I verified > that the reference count is 1 before these puts, and with the > unmodified module I don't see anything left in /sys/class/scsi_host > after unloading the module. > > What kernel are you seeing problems with? I'm testing with an > up-to-date git kernel, although I doubt it makes a difference (did > SCSI reference counting change recently??). > > I do think there are some extra scsi_host_put() calls in > srp_remove_work() -- I think the double scsi_host_put() dates back to > a version (which I may never even have checked in) where there was a > scsi_host_get() to avoid the scsi_host going away between the > schedule_work() and srp_remove_work() actually running. > > So the patch below seems correct to me. > > What do you think? I could not reproduce the problem again, so this patch works for me. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading
On Wed, May 17, 2006 at 02:56:58AM +0300, Roland Dreier wrote: > BTW, I think the patch below is correct as well. This avoids problems > where the SRP driver waits forever for a completion, for example if > sending the DREQ fails because the connection has already been > disconnected by the target. > > Does this scenario seem like the deadlock you thought you saw? > > --- linux-kernel/infiniband/ulp/srp/ib_srp.c (revision 7245) > +++ linux-kernel/infiniband/ulp/srp/ib_srp.c (working copy) > @@ -342,7 +342,10 @@ static void srp_disconnect_target(struct > /* XXX should send SRP_I_LOGOUT request */ > > init_completion(&target->done); > - ib_send_cm_dreq(target->cm_id, NULL, 0); > + if (ib_send_cm_dreq(target->cm_id, NULL, 0)) { > + printk(KERN_DEBUG PFX "Sending CM DREQ failed\n"); > + return; > + } > wait_for_completion(&target->done); > } > I don't think this caused the deadlock I had. Still it looks like an important patch. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: ib_mthca fails to load with old firmware
OK, I put this into my 2.6.17 branch: diff-tree 1db76c14d215c8b26024dd532de3dcaf66ea30f7 (from 032ebf2620ef99a4fedaa0f77dc2272095ac5863) Author: Roland Dreier <[EMAIL PROTECTED]> Date: Wed May 17 07:48:07 2006 -0700 IB/mthca: Make fw_cmd_doorbell default to 0 Setting fw_cmd_doorbell allows FW command to be queued using posted writes instead of requiring polling on a "go" bit, so it should be a performance boost. However, the option causes problems with at least some device/firmware combinations, so set the default to 0 until we understand what's going on better. Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 1985b5d..798e13e 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -182,7 +182,7 @@ struct mthca_cmd_context { u8status; }; -static int fw_cmd_doorbell = 1; +static int fw_cmd_doorbell = 0; module_param(fw_cmd_doorbell, int, 0644); MODULE_PARM_DESC(fw_cmd_doorbell, "post FW commands through doorbell page if nonzero " "(and supported by FW)"); ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: NOP problem in ib_mthca on OFED RC4
Quoting r. [EMAIL PROTECTED] <[EMAIL PROTECTED]>: > Subject: Re: NOP problem in ib_mthca on OFED RC4 > > > Michael, > > > > > Which FW revision do you have? > > > > > > > The "ibstat" command shows: > > > > > > CA type: MT25204 > > > Number of ports: 1 > > > Firmware version: 1.0.800 > > > Hardware version: a0 > > > Node GUID: 0x0002c90200216dc4 > > > System image GUID: 0x0002c90200216dc7 > > > > > > -Don Albert- > > > > > > > Yes, that's the latest revision. Hmm. > > > > What about the other thing I mentioned in my first message: the "lspci" > command complains about the board slot that the HCA is plugged into: > >pcilib: Resource 2 in /sys/bus/pci/devices/:03:00.0/resource has a > 64-bit address, ignoring > >03:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] > (rev 20) > > I also found out that on this machine the HCA is plugged into a 16X PCI-e > slot, which is different than the other machine which is working, where the > slot is 8X. > > Bear in mind, however, that both machines were previously working with the > 2.6.9-34 kernel with the backport patches and the OpenIB svn 6500 code. Did > something happen in 2.6.16, or am I missing a patch? > > -Don Albert- > Could you please give more detail on the exact system that had/has this problem? Model, chipset revision, full lspci -v output, etc. Also, is there some way to login to such a system there remotely? Thanks a bunch, -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB: Make needlessly global ib_mad_cache static
Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- Any reason not to apply this? diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 5ad41a6..92c7362 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -45,8 +45,7 @@ MODULE_DESCRIPTION("kernel IB MAD API"); MODULE_AUTHOR("Hal Rosenstock"); MODULE_AUTHOR("Sean Hefty"); - -kmem_cache_t *ib_mad_cache; +static kmem_cache_t *ib_mad_cache; static struct list_head ib_mad_port_list; static u32 ib_mad_client_id = 0; diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index b4fa28d..d147f3b 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -212,8 +212,6 @@ struct ib_mad_port_private { struct ib_mad_qp_info qp_info[IB_MAD_QPS_CORE]; }; -extern kmem_cache_t *ib_mad_cache; - int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr); struct ib_mad_send_wr_private * ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: ib_mthca fails to load with old firmware
Michael> Hmm. There have been recent reports on configurations Michael> which have trouble working with fw_cmd_doorbell=1, and Michael> not all of them old FW. I never saw this in the lab. Michael> Roland, should we change fw_cmd_doorbell to 0 by default, Michael> until we figure out what is going on? Yes, it's looking like that option is causing problems. I will put a patch changing the default to 0 into 2.6.17. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re:
Or> The impression i was getting from those responses and the luck Or> of others is that (say) almost no one of the openib Or> maintainers test infiniband with the "next" kernel which is Or> not released yet. Yes, I agree. That's why I think we should get rid of the "linux-kernel" part of the svn tree entirely. Because everyone who wants to test new code seems to run last stable kernel + svn drivers instead of the new development kernel. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
Or> Can you spare few words whats the difference between the Or> for-2.6.18 and for-mm branches of your git tree? for-mm is what Andrew pulls to get patches for -mm. It has things that I think should be seen in -mm, but which I am not ready to queue in for-2.6.18. You can use git show-branch or gitk to visualize exactly how the branches relate. Or> When you say the code is pushed into master.kernel.org are you Or> referring to the mm tree of Andrew Morton? i don't see he has Or> one under kernel.org/git? No, I mean it's in my tree on master.kernel.org, rather than just sitting on my local hard disk. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] (was: slab error while removing ib_mad) testing IB of a kernel before its release
Roland Dreier wrote: Or> I think you were on vacation when i posted this, there were Or> two responses saying they were not able to reproduce it, but Or> no one was trying 2.6.17-X Not sure why you expect me to solve this -- other than the fact that I am a great debugger ;) Let me clarify a little: The test case for itself (probing out a module loaded by the pci hotplug subsystem) is kind of rare and its not the issue (I am doing it when replacing the ib stack with newer code) When i posted the original report, i got responses from two people both saying they have tried it with this or that flavor of the current stable kernel (2.6.16) and that the problem does not reproduce (sure...). The impression i was getting from those responses and the luck of others is that (say) almost no one of the openib maintainers test infiniband with the "next" kernel which is not released yet. And if we don't test it, sure we can't expect it to work, no less. That's the point i wanted to make later in that thread. I opted for deferring this discussion for a time you are around, so this is why i write it only now. Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] The AF_INET_RDS value
Hello, I noticed that in ulp/rds/rds_inet.h, the value for AF_INET_RDS is: #define AF_INET_RDS 30 But in include/linux/socket.h, there is already a AF_TIPC with the same value: #define AF_WANPIPE 25 /* Wanpipe API Sockets */ #define AF_LLC 26 /* Linux LLC*/ #define AF_TIPC 30 /* TIPC sockets */ #define AF_BLUETOOTH31 /* Bluetooth sockets*/ #define AF_MAX 32 /* For now.. */ Just wondering if the AF_INET_RDS value should be changed? Thanks. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] OFED RC4 also can't support >2000 connections
Hi Zhu, If you are using libsdp.conf to select which ports should map to SDP and which to TCP you might run out of resources for tracking the opened sockets. Try increasing the following constant in libsdp: libsdp/src/port.c line 48: #define MAPPED_SOCKET_MAX 1024 to something like: #define MAPPED_SOCKET_MAX 1 Or, if you can use SDP sockets only (your config file is empty anyway): SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so squid -d 10 -f squid.conf SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n 2000 -X 193.12.10.14:3129 Hope this fixes the issue you see Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -Original Message- > From: [EMAIL PROTECTED] [mailto:openib-general- > [EMAIL PROTECTED] On Behalf Of zhu shi song > Sent: Wednesday, May 17, 2006 3:17 PM > To: openib-general@openib.org > Subject: [openib-general] OFED RC4 also can't support >2000 connections > > I have installed OFED RC4 on my RHEL 4.3(2.6.9-34 > kernel). I use the same method I told in previous > mail. When increasing concurrent sdp connection to > 2000. sdp refuse connection in server side. And client > can't connect to server through sdp connection > forever. > > OS: RHEL 4.3 (2.6.9-34) > IB: OFED RC4 > Test Method: > Server: LD_PRELOAD=libsdp.so squid -d 10 -f > squid.conf( sdp listening on IB0: 193.12.10.14:3129) > Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n > 2000 -X 193.12.10.14:3129 > http://www.google.com/index.html ( IB0: 193.12.10.24) > > > Who know what's wrong with sdp many concurrent > connections? I have bought the cards for about 3 > weeks, but I can't make them work correctly. Urgent! > > tks > zhu > > > > > __ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > __ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>: > When you say the code is pushed into master.kernel.org are you referring > to the mm tree of Andrew Morton? i don't see he has one under > kernel.org/git? Andrew does not use git for development. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator
Roland Dreier wrote: Or> I don't see the niether of the two iscsi updates for 2.6.18 Or> (both sent by Mike Christie) in your git tree, i was looking Or> for it all over (in the for-2.6.18 , for-mm, master, for-linus Or> branches ...). Do i missing anything or you were waiting for Or> my repost of the patches to pull the iscsi updates? Yeah, I haven't pushed it out yet. I will be putting iSER into an iser branch of my tree, which I'll ask Linus to pull once the SCSI changes are in his tree. OK, i have tested iSCSI/iSER with the kernel being built from the for-mm branch of your git tree and it works fine! Can you spare few words whats the difference between the for-2.6.18 and for-mm branches of your git tree? > Or> OK, thanks. Let me know when you have the branch, so i will be > Or> able to test it with this exact code configuration. > > It's there and pushed to master.kernel.org When you say the code is pushed into master.kernel.org are you referring to the mm tree of Andrew Morton? i don't see he has one under kernel.org/git? Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OFED RC4 also can't support >2000 connections
I have installed OFED RC4 on my RHEL 4.3(2.6.9-34 kernel). I use the same method I told in previous mail. When increasing concurrent sdp connection to 2000. sdp refuse connection in server side. And client can't connect to server through sdp connection forever. OS: RHEL 4.3 (2.6.9-34) IB: OFED RC4 Test Method: Server: LD_PRELOAD=libsdp.so squid -d 10 -f squid.conf( sdp listening on IB0: 193.12.10.14:3129) Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n 2000 -X 193.12.10.14:3129 http://www.google.com/index.html ( IB0: 193.12.10.24) Who know what's wrong with sdp many concurrent connections? I have bought the cards for about 3 weeks, but I can't make them work correctly. Urgent! tks zhu __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out
On Wed, 2006-05-17 at 07:41, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > > Subject: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be > > compiled out > > > > OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out > > > > Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]> > > > > > @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb( > > lid. > >*/ > >/* For now - do not add the alternate dr path to the release */ > > - if (0) > > -if ( p_madw->mad_addr.dest_lid != 0x ) > > +#if 0 > > + if ( p_madw->mad_addr.dest_lid != 0x ) > > In my experience, if you compile with -O, gcc does a good enough job of > dead code elimination. But not all builds are that way though. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>: > Subject: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be > compiled out > > OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out > > Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]> > @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb( > lid. >*/ >/* For now - do not add the alternate dr path to the release */ > - if (0) > -if ( p_madw->mad_addr.dest_lid != 0x ) > +#if 0 > + if ( p_madw->mad_addr.dest_lid != 0x ) In my experience, if you compile with -O, gcc does a good enough job of dead code elimination. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out
OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]> Index: opensm/osm_sm_mad_ctrl.c === --- opensm/osm_sm_mad_ctrl.c(revision 7202) +++ opensm/osm_sm_mad_ctrl.c(working copy) @@ -803,7 +803,9 @@ __osm_sm_mad_ctrl_send_err_cb( IN osm_madw_t *p_madw ) { osm_sm_mad_ctrl_t* p_ctrl = (osm_sm_mad_ctrl_t*)bind_context; +#if 0 osm_physp_t* p_physp; +#endif ib_api_status_t status; ib_smp_t* p_smp; @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb( lid. */ /* For now - do not add the alternate dr path to the release */ - if (0) -if ( p_madw->mad_addr.dest_lid != 0x ) +#if 0 + if ( p_madw->mad_addr.dest_lid != 0x ) + { +p_physp = + osm_get_physp_by_mad_addr(p_ctrl->p_log, +p_ctrl->p_subn, +&(p_madw->mad_addr)); +if (!p_physp) { - p_physp = -osm_get_physp_by_mad_addr(p_ctrl->p_log, - p_ctrl->p_subn, - &(p_madw->mad_addr)); - if (! p_physp) - { -osm_log( p_ctrl->p_log, OSM_LOG_ERROR, - "__osm_sm_mad_ctrl_send_err_cb: ERR 3114: " - "Failed to find the corresponding phys port\n"); - } - else - { -osm_physp_replace_dr_path_with_alternate_dr_path( - p_ctrl->p_log, p_ctrl->p_subn, p_physp, p_madw->h_bind ); - } + osm_log( p_ctrl->p_log, OSM_LOG_ERROR, + "__osm_sm_mad_ctrl_send_err_cb: ERR 3114: " + "Failed to find the corresponding phys port\n"); } +else +{ + osm_physp_replace_dr_path_with_alternate_dr_path( + p_ctrl->p_log, p_ctrl->p_subn, p_physp, p_madw->h_bind ); +} + } +#endif /* An error occurred. No response was received to a request MAD. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] opensm segfault?
On Wed, 2006-05-17 at 02:10, Eitan Zahavi wrote: > cl_memcpy should have some debug capabilities on top of memcpy . I don't see any. Did I miss something ? .. > cl memory management provide means to track all memory allocations, etc. Yes, there is extra memory tracking code for malloc and free. This is a separable item in my mind right now. -- Hal > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:openib-general- > > [EMAIL PROTECTED] On Behalf Of Sasha Khapyorsky > > Sent: Wednesday, May 17, 2006 2:11 AM > > To: Troy Benjegerdes > > Cc: openib-general@openib.org > > Subject: Re: [openib-general] opensm segfault? > > > > Hi Troy, > > > > On 14:41 Tue 16 May , Troy Benjegerdes wrote: > > > I got this after an indeterminate amount of time running opensm.. > > > > May this be reproducible? Or it is completely random failure? > > > > > (gdb) bt > > > #0 0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, > p_src=0x0, > > > count=64) at cl_memory_osd.c:87 > > > #1 0x00415053 in osm_pkey_tbl_sync_new_blocks ( > > > p_pkey_tbl=0x2ad99228) at osm_pkey.c:127 > > > #2 0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40) > > > at osm_pkey_mgr.c:407 > > > #3 0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, > > > signal=3) > > > at osm_state_mgr.c:2243 > > > #4 0x0043c88f in __osm_state_mgr_ctrl_disp_callback ( > > > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 > > > #5 0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0) > > > at cl_dispatcher.c:108 > > > #6 0x2b90b0dc1ca3 in __cl_thread_pool_routine > (context=0x583268) > > > at cl_threadpool.c:78 > > > #7 0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at > > > cl_thread.c:61 > > > #8 0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 > > > #9 0x2b90b12c8273 in clone () from /lib/libc.so.6 > > > > > > > > > > > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This > > > just seems like excessive uneeded abstraction. > > > > Absolutely agree with you. > > > > Sasha. > > > > > I'm running opensm from subversion rev 7091.. > > > > > > May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn > > > 6251:7091M > > > > > > the only local changes are as follows: > > > > > > [EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff > > > Index: osm/opensm/osm_port_info_rcv.c > > > === > > > --- osm/opensm/osm_port_info_rcv.c (revision 7091) > > > +++ osm/opensm/osm_port_info_rcv.c (working copy) > > > @@ -469,9 +469,14 @@ > > >goto Exit; > > > } > > > > > > +#if 0 > > > /* Check for IBM eHCA firmware defect in reporting partition > > > * enforcement cap */ > > > if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) > == > > > IBM_VENDOR_ID) > > >p_switch->switch_info.enforce_cap = 0; > > > +#endif > > > +/* Check for busted divergenet switch on ameslab network */ > > > +if (cl_ntoh64(p_node->node_info.node_guid) == > 0x00084e000152) > > > + p_switch->switch_info.enforce_cap = 0; > > > > > > /* Bail out if this is a switch with no partition enforcement > > > * capability */ > > > if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) > > > ___ > > > openib-general mailing list > > > openib-general@openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > ___ > > openib-general mailing list > > openib-general@openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] OFED-1.0-rc4 need db-devel
Zhu, db-devel package is required to build open_iscsi package RPM. This package is not relevant for RHEL 4.3. There are two options to install OFED-1.0-rc4 on RHEL 4.3 without open_iscsi: 1. Select "Custom installation" and don't choose to install open_iscsi. 2. Edit ofed.conf (created automatically under OFED-1.0-rc4 directory when you run install.sh or build.sh) and set *open_iscsi=n*. Then run: ./install.sh -c ofed.conf Regards, Vladimir zhu shi song wrote: I have downloaded OFED-1.0-rc4 for my RHEL 4.3. But I can't build all modules because it needs db-devel. RHEL 4.3 just have db4-devel there is no db-devel. Is there anything I don't know? tks zhu __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] OFED-1.0-rc4 need db-devel
I have downloaded OFED-1.0-rc4 for my RHEL 4.3. But I can't build all modules because it needs db-devel. RHEL 4.3 just have db4-devel there is no db-devel. Is there anything I don't know? tks zhu __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general