date:20060517

Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator

2006-05-17 Thread Or Gerlitz


Roland Dreier wrote:

Or> Sure, thanks for the clarification. As for the CMA merge, you
Or> prefer to be on the safe side and do it **before** and not
Or> **with** iSER?

Yes, that's what I'm planning on.


Sure, better safe than sorry is good habit! its just this two weeks 
short time frame for three (iscsi && cma -> iser) serialized pushes 
which worries me a little, i guess there's nothing we can do about it.


Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [resend][RFC][PATCH] adding call to madvise

2006-05-17 Thread Gleb Natapov

On Thu, May 18, 2006 at 07:24:27AM +0300, Michael S. Tsirkin wrote:
> Quoting r. Gleb Natapov <[EMAIL PROTECTED]>:
> > @@ -187,8 +194,8 @@ int ibv_lock_range(void *base, size_t si
> >  
> >  
> > if (node->refcnt++ == 0) {
> > -   ret = mlock((void *) node->start,
> > -   node->end - node->start + 1);
> > +   ret = madvise((void *) node->start,
> > +   node->end - node->start + 1, MADV_DONTFORK);
> > if (ret)
> > goto out;
> > }
> 
> Will this break libibverbs on older kernels that don't have madvise?
> Maybe test MADV_DONTFORK during library startup and set a flag?
> 
madvise is always there, but older kernels will return EINVAL and we
don't check return value of ibv_lock_range() in ibv_reg_mr() so no harm is
done. It is possible to test for MADV_DONTFORK support during libibvervs
init and disable all madvise pathes if it is not available, but then we
will have two different configuration to test with no much gain.

--
Gleb.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator

2006-05-17 Thread Roland Dreier

Or> Sure, thanks for the clarification. As for the CMA merge, you
Or> prefer to be on the safe side and do it **before** and not
Or> **with** iSER?

Yes, that's what I'm planning on.

Or> Do you have any estimate when the 2.6.18 merge window opens?

Right after 2.6.17 is released.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator

2006-05-17 Thread Or Gerlitz


Roland Dreier wrote:

Or> Just to make sure... does "not ready to queue in for-2.6.18"
Or> wrt iSER relates to the dependency on the 2.6.18 iSCSI updates
Or> (as the CMA can [should ?] be pushed with iSER), or you see
Or> any further issues which needs to be fixed before the code is
Or> ready?

The only issue is that iser can't be merged until both James Bottomly
and I have merged other stuff upstream first.


Sure, thanks for the clarification. As for the CMA merge, you prefer to 
be on the safe side and do it **before** and not **with** iSER?


Do you have any estimate when the 2.6.18 merge window opens?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: testing IB with unreleased kernels

2006-05-17 Thread Or Gerlitz


Grant Grundler wrote:

On Wed, May 17, 2006 at 07:40:07AM -0700, Roland Dreier wrote:

Yes, I agree.  That's why I think we should get rid of the
"linux-kernel" part of the svn tree entirely.  Because everyone who
wants to test new code seems to run last stable kernel + svn drivers
instead of the new development kernel.



That's because openib guarantee SVN drivers will build with last
stable kernel. Change that policy and document the steps
that folks should follow. I'd be willing to occasionally try
newer kernels if you think that's what we should be doing.


Please note that both approaches suggested above will not force to test 
latest IB code with the under-development kernel...


This is b/c most of the code (specifically the already in-tree) has zero 
backport to the latest stable kernel, eg the kernel portion of OFED 
which is targeted for 2.6.16 is based on the for-2.6.17 branch of 
Roland's GIT tree (expect for the components not there yet, which are co 
from the SVN), but OFED is not tested with (does not support) 2.6.17-rcX


The same "trick" would work also with Grant's approach.

So there's no replacement for testing done at least by the openib 
maintainers (and distros!!! when they start moving to IB...) for:


+1 next-kernel-RC-versions downloaded from kernel.org (eg 2.6.17-RCX)
+2 next-next-kernel-branches of infiniband.git (Roland's tree)

Ofcourse people are busy, and testing is derived from needs.. for 
example the iSER maintainers (...) are testing now with what's closet to 
2.6.18 and i guess the ipath maintainers are testing with 2.6.17-rc4


But at some point of the cycle, its a must that each maintainer would 
test his/her code with next-kernel-RC-versions from kernel.org


Or.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

2006-05-17 Thread Dave Olson

On Wed, 17 May 2006, Dave Olson wrote:

| On Wed, 17 May 2006, Roland Dreier wrote:
| 
| | Am I understanding correctly that you see a hang or watchdog timeout
| | even with the mthca driver?
| 
| Yes.   That is, the symptoms are the same, although the cause
| may be different.
| 
| | Is there any possibility of posting the test case to reproduce this?
| 
| It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed
| to do messaging rate), running 8 copies per dual-core 4-socket opteron,
| both on InfiniPath MPI, and MVAPICH (built for gen2).

Here's the typical case where the watchdog fires (with infinipath MPI),
on FC4 2.6.16 2108 (without kprobes, with kprobes things are slightly
different, but not much; I'm running without since we were often in
the kprobes code from the exit code, but I think that's just a red-herring).

The sysrq p was some seconds prior to the watchdog.  It's almost as
though something is looping far too many times during the close cleanup.

The other 7 exitting processes are typically in
sys_exit_group -> do_exit -> __up_red --> __spin_lock_irqsave -> 
__up_read (or __down_read)
(from what sysrq t prints).  They are all runnable on the other 7
processors.  

The infinipath driver does mmap both memory and device pages for each of
these processes.

SysRq : Show Regs
CPU 0:   
Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) 
ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) 
exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) 
battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) 
dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) 
libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U)
Pid: 23788, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1  
 
RIP: 0010:[] {__do_softirq+81}
RSP: 0018:8048d368  EFLAGS: 0206  
RAX: 0022 RBX: 0022 RCX: 0080
RDX:  RSI: 00c0 RDI: 81007f1fd0c0
RBP: 80528f80 R08: 0200 R09: 0002
R10: 804a6a38 R11:  R12: 80577c80
R13:  R14: 000a R15: 2aaabba6c000
FS:  2b32ffa0() GS:80511000() knlGS:f7fc86c0
CS:  0010 DS:  ES:  CR0: 8005003b   
CR2: 5565ebe8 CR3: 7ac6d000 CR4: 06e0
 
Call Trace:  {call_softirq+30}
   {do_softirq+44} 
{apic_timer_interrupt+132} 
   {_write_unlock_irq+14} 
{__set_page_dirty_nobuffers+183}
   {unmap_vmas+1042} {exit_mmap+124}
   {mmput+37} {do_exit+584} 
   {__dequeue_signal+459} 
{sys_exit_group+0}
   {get_signal_to_deliver+1568} 
{do_signal+116}
   {__pollwait+0} {sys_select+934}  
   
   {sysret_signal+28} 
{ptregscall_common+103}

[ perhaps 20 or 30 seconds later, NMI fires; we had already been sort of
stuck for 60 seconds or so when I did the sysrq p above ]

NMI Watchdog detected LOCKUP on CPU 1   
 
CPU 1
Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) 
ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) 
exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) 
battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) 
dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) 
libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U)
Pid: 23789, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1  
 
RIP: 0010:[] {_raw_write_lock+161}
RSP: 0018:81007c5b5c18  EFLAGS: 0086  
RAX: 8f02e600 RBX: 810037cec680 RCX: 002c2671
RDX: 00927190 RSI: 0001 RDI: 810037cec680
RBP: 810037cec668 R08: 810002d6b500 R09: fffa
R10: 0003 R11: 80165922 R12: 810037cec680
R13: 2c20 R14: 810002d6b540 R15: 2aaabba6c000
FS:  2aae6080() GS:81011fc466c0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b   
CR2: 0033f38bdaf0 CR3: 7c296000 CR4: 06e0
Process mpi_multibw (pid: 23789, threadinfo 81007c5b4000, task 
8100030557a0)
Stack: 810002d6b540 8016596b 75ad5067 2c1b4000  

   81007d451da0 8016cc80  81007c5b5d38 
       
Call Trace: {__set_page_dirty_nobuffers+73}
   {unmap_vmas+1042} {exit_mmap+124}
   {mmput+37} {do_exit+584} 
   {__dequeue_signal+459} 
{sys_exit_group+0}
   {get_signal_to_deliver+1568} 
{do_s

Re: [openib-general] [librdmacm] changes to cmatose to return a value different than 0 when there is a failure

2006-05-17 Thread Dotan Barak

On Wednesday 17 May 2006 21:25, Sean Hefty wrote:
> Dotan Barak wrote:
> > Added checks to the return values of all of the functions that may fail
> > (in order to add this test to the regression system).
> 
> Thanks - applied with one minor change.
> 
> > +   int rc;
> 
> Changed 'rc' to 'ret' to match the rest of the code.
> 
> - Sean
> 

great, thanks (next time i will pay attention to this  issue).

Dotan
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

2006-05-17 Thread Roland Dreier

Bryan> Wow.  I have no idea where that extra "goto bail" came
Bryan> from.  It's not supposed to be there.

Even without it you still leak the work structure, because there's no
schedule_work().

Now that I look at it, in uverbs_mem.c, the mm will be leaked if the
kmalloc fails...

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator

2006-05-17 Thread Roland Dreier

Or> Just to make sure... does "not ready to queue in for-2.6.18"
Or> wrt iSER relates to the dependency on the 2.6.18 iSCSI updates
Or> (as the CMA can [should ?] be pushed with iSER), or you see
Or> any further issues which needs to be fixed before the code is
Or> ready?

The only issue is that iser can't be merged until both James Bottomly
and I have merged other stuff upstream first.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

2006-05-17 Thread Bryan O'Sullivan

On Wed, 2006-05-17 at 21:55 -0700, Roland Dreier wrote:

> So with the "goto bail" you skip the code which does something with
> the work you allocate, which means that you leak not only the work
> structure but also the reference to the task's mm that you took.

Wow.  I have no idea where that extra "goto bail" came from.  It's not
supposed to be there.

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator

2006-05-17 Thread Or Gerlitz


Roland Dreier wrote:

Or> Can you spare few words whats the difference between the
Or> for-2.6.18 and for-mm branches of your git tree?

for-mm is what Andrew pulls to get patches for -mm.  It has things
that I think should be seen in -mm, but which I am not ready to queue
in for-2.6.18.  You can use git show-branch or gitk to visualize
exactly how the branches relate.


Just to make sure... does "not ready to queue in for-2.6.18" wrt iSER 
relates to the dependency on the 2.6.18 iSCSI updates (as the CMA can 
[should ?] be pushed with iSER), or you see any further issues which 
needs to be fixed before the code is ready?


I will try git show-branch, thanks.


Or> When you say the code is pushed into master.kernel.org are you
Or> referring to the mm tree of Andrew Morton? i don't see he has
Or> one under kernel.org/git?

No, I mean it's in my tree on master.kernel.org, rather than just
sitting on my local hard disk.


OK, thanks for the clarification.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] OFED RC4 also can't support >2000 connections

2006-05-17 Thread zhu shi song

After executing command 'SIMPLE_LIBSDP=1
LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n 2000 -X
193.12.10.14:3129', there are some problems either:
 (1)sometimes server(running squid) occurred kernel
panic.
 (2)client never connect server successfully.  If
using ttcp.aio to test, the error occurred on client
is:
  [EMAIL PROTECTED] ~]# ./ttcp.aio -t 193.12.10.14
ttcp-t: buflen = 8192 nbuf = 2048 align = 16384/0 port
= 5001  193.12.10.14
ttcp-t: socket
ttcp-t: connect: Cannot allocate memory
errno=12
[EMAIL PROTECTED] ~]#

 how to solve the problem?
  tks
  zhu



--- Eitan Zahavi <[EMAIL PROTECTED]> wrote:

> Hi Zhu,
> 
> If you are using libsdp.conf to select which ports
> should map to SDP and
> which to TCP you might run out of resources for
> tracking the opened
> sockets. 
> 
> Try increasing the following constant in libsdp:
> libsdp/src/port.c line 48:
> #define MAPPED_SOCKET_MAX 1024
> to something like:
> #define MAPPED_SOCKET_MAX 1
> 
> Or, if you can use SDP sockets only (your config
> file is empty anyway):
> SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so squid -d 10 -f
> squid.conf
> SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 2000
> -n 2000 -X
> 193.12.10.14:3129
> 
> Hope this fixes the issue you see
> 
> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> [mailto:openib-general-
> > [EMAIL PROTECTED] On Behalf Of zhu shi song
> > Sent: Wednesday, May 17, 2006 3:17 PM
> > To: openib-general@openib.org
> > Subject: [openib-general] OFED RC4 also can't
> support >2000
> connections
> > 
> > I have installed OFED RC4 on my RHEL 4.3(2.6.9-34
> > kernel). I use the same method I told in previous
> > mail.  When increasing concurrent sdp connection
> to
> > 2000. sdp refuse connection in server side. And
> client
> > can't connect to server through sdp connection
> > forever.
> > 
> > OS: RHEL 4.3 (2.6.9-34)
> > IB: OFED RC4
> > Test Method:
> > Server: LD_PRELOAD=libsdp.so squid -d 10 -f
> > squid.conf( sdp listening on IB0:
> 193.12.10.14:3129)
> > Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000
> -n
> > 2000 -X 193.12.10.14:3129
> > http://www.google.com/index.html ( IB0:
> 193.12.10.24)
> > 
> > 
> > Who know what's wrong with sdp many concurrent
> > connections?  I have bought the cards for about 3
> > weeks, but I can't make them work correctly. 
> Urgent!
> > 
> > tks
> > zhu
> > 
> > 
> > 
> > 
> > __
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> > protection around
> > http://mail.yahoo.com
> > 
> > __
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam
protection around 
http://mail.yahoo.com 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

2006-05-17 Thread Roland Dreier

Dave> We did discover one possible problem today, which is shared
Dave> between our device code and the core openib code, and that's
Dave> doing some memory freeing and accounting from a work thread
Dave> (updating mm->locked_vm and cleaning up from earlier
Dave> get_user_pages); the code in our driver was copied from the
Dave> openib core code, it's not literally shared.

Dave> I have a strong suspicion that at least sometimes, it's
Dave> executing after the current->mm has gone away.  I'm looking
Dave> at that more right now.

It doesn't seem likely to me.  In uverbs_mem.c,
ib_umem_release_on_close() does get_task_mm() and gives up if it can't
take a reference to the task's mm.  The mmput() doesn't happen until
ib_umem_account() runs in the work thread.

I do see obvious bugs in ipath_user_pages.c, though.  In
ipath_release_user_pages_on_close(), you have:

mm = get_task_mm(current);
if (!mm)
goto bail;

work = kmalloc(sizeof(*work), GFP_KERNEL);
if (!work)
goto bail_mm;

goto bail;

INIT_WORK(&work->work, user_pages_account, work);
work->mm = mm;
work->num_pages = num_pages;

bail_mm:
mmput(mm);
bail:
return;

So with the "goto bail" you skip the code which does something with
the work you allocate, which means that you leak not only the work
structure but also the reference to the task's mm that you took.

Even without the "goto bail" the code still wouldn't actually schedule
the work, so the work structure would be leaked, although you would do
mmput().

I'm not sure what you were trying to do here.c

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] OpenIB 1.0 RC + PathScale problem

2006-05-17 Thread Bryan O'Sullivan

On Thu, 2006-05-18 at 00:11 -0400, Tim Miller wrote:

> But when I try to run my main application, I get the 
> following error:
> 
> libibverbs: Warning: couldn't load driver
> /usr/local/lib/infiniband/libipathverbs.so: 
> /usr/local/lib/infiniband/libipathverbs.so: undefined symbol: 
> ibv_cmd_poll_cq
> 
> Does anyone know what might cause this error?

No.  We don't see this problem here.

Can you provide some more information, please?  Running ldd
on /usr/local/lib/infiniband/libipathverbs.so would be a good place to
start, so you can see exactly which libibverbs.so is being linked
against.  Also, if you could post the relevant nm output for both
libraries, that would be good.

Thanks,

http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [resend][RFC][PATCH] adding call to madvise

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Gleb Natapov <[EMAIL PROTECTED]>:
> @@ -187,8 +194,8 @@ int ibv_lock_range(void *base, size_t si
>  
>  
>   if (node->refcnt++ == 0) {
> - ret = mlock((void *) node->start,
> - node->end - node->start + 1);
> + ret = madvise((void *) node->start,
> + node->end - node->start + 1, MADV_DONTFORK);
>   if (ret)
>   goto out;
>   }

Will this break libibverbs on older kernels that don't have madvise?
Maybe test MADV_DONTFORK during library startup and set a flag?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

2006-05-17 Thread Dave Olson

On Wed, 17 May 2006, Roland Dreier wrote:

| Dave> We are seeing a bug (with both our driver native MPI
| Dave> processes and mthca mvapic), where when 8 processes using
| Dave> "simultaneously exit", we get watchdogs and/or hangs in the
| Dave> close routines.  Moving the freeing outside the mutex was an
| Dave> attempt to see if we were running into some VM issues by
| Dave> doing lots of page unlocking and freeing with the mutex
| Dave> held.  It seemed to help somewhat, but not to solve the
| Dave> problem.
| 
| Am I understanding correctly that you see a hang or watchdog timeout
| even with the mthca driver?

Yes.   That is, the symptoms are the same, although the cause
may be different.

| Is there any possibility of posting the test case to reproduce this?

It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed
to do messaging rate), running 8 copies per dual-core 4-socket opteron,
both on InfiniPath MPI, and MVAPICH (built for gen2).

We ship the source with our upcoming release, and will probably make
it available outside our release.

We did discover one possible problem today, which is shared between
our device code and the core openib code, and that's doing some 
memory freeing and accounting from a work thread (updating mm->locked_vm
and cleaning up from earlier get_user_pages); the code in our driver
was copied from the openib core code, it's not literally shared.

I have a strong suspicion that at least sometimes, it's executing after
the current->mm has gone away.   I'm looking at that more right now.

| It doesn't seem likely that ipath changes are going to fix a generic
| bug like this...

It wasn't an attempt to fix it, so much as to work around it, while
I worked on other higher priority stuff.   As I mentioned, it also helps
a bit in allowing multiple processes to be in the open and close code
simultaneously, when you have multiple cpus, so even on that basis,
I'd probably leave it as it now is.

Dave Olson
[EMAIL PROTECTED]
http://www.unixfolk.com/dave
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] OpenIB 1.0 RC + PathScale problem

2006-05-17 Thread Tim Miller


Hi All,

I'm trying to test the 1.0 RC branch from subversion with PathScale 
InfiniPath HT-460 (I've used previous versions with some success). The 
code compiles successfully, and I can run ibv_rc_pingpong and even a 
simple MPI program. But when I try to run my main application, I get the 
following error:


libibverbs: Warning: couldn't load driver
/usr/local/lib/infiniband/libipathverbs.so: 
/usr/local/lib/infiniband/libipathverbs.so: undefined symbol: 
ibv_cmd_poll_cq


Does anyone know what might cause this error? I ran an nm on 
libipathverbs.so and saw ibv_cmd_poll_cq and I found it in the libibverbs 
source, too, so I'm a bit confused about what the root cause of this is.


My apologies if this has already been raised. I took a quick look in the 
archives and did not see anything off hand that matches this.


Thanks,
Tim M.

--
Tim Miller
System Administrator -- Laboratory of Computational Biology
National Institutes of Health   --   Bldg. 50 Rm. 3309-- 301-402-0618
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] iSER Status

2006-05-17 Thread Dan Bar Dov

Hi Mohit,

Linux kernel 2.6.16 does not include ISER. ISER (initiator only) is scheduled 
for kernel 2.6.18.
The ISER initiator code from openIB trunk is stable, and works with the 
open-iscsi initiator.

The ISER target code is the seed of a project aimed to provide an iSCSI/ISER 
target. It
is in early development. The code itself is stable, but there is no iSCSI 
target you can interface it with.
We plan to interface it with the stgt project.

Dan

 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Mohit Katiyar
> Sent: Thursday, May 18, 2006 4:22 AM
> To: openib-general@openib.org
> Subject: [openib-general] iSER Status
> 
> Hi all,
> Can anyone tell me whether the latest stable release of
> Linux(2.6.16.16) contains both iSER intiator and target code or only
> the initiator code? The open-iser target code available at
> https://openfabrics.org/svn/gen2/ulps/open-iser-target/ is stable or
> not?
> 
> Thanks
> Mohit
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] iSER Status

2006-05-17 Thread Mohit Katiyar


Hi all,
Can anyone tell me whether the latest stable release of
Linux(2.6.16.16) contains both iSER intiator and target code or only
the initiator code? The open-iser target code available at
https://openfabrics.org/svn/gen2/ulps/open-iser-target/ is stable or
not?

Thanks
Mohit
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general][PATCH] srp: param sg_tablesize,

2006-05-17 Thread Vu Pham





@@ -1914,6 +1920,11 @@ static int __init srp_init_module(void)
 {
int ret;
 



Thanks, should we do a check and put some cap on 
srp_sg_tablesize value ie.


+   srp_sg_tablesize = max(1, srp_sg_tablesize);
+	srp_sg_tablesize = min(srp_sg_tablesize, 
SRP_MAX_SG_TABLESIZE);



+   srp_template.sg_tablesize = srp_sg_tablesize;
+   srp_max_iu_len = (sizeof (struct srp_cmd) +
+ sizeof (struct srp_indirect_buf) +
+ srp_sg_tablesize * 16);
+


 
 	SRP_MAX_LUN		= 512,

-   SRP_MAX_IU_LEN  = 256,
+   SRP_DEF_SG_TABLESIZE= 12,


+   SRP_MAX_SG_TABLESIZE= 128,

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] opensm: remove cl_mem* stuff from diags [was: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines]

2006-05-17 Thread Sasha Khapyorsky

On 01:02 Thu 18 May , Sasha Khapyorsky wrote:
> On 12:14 Wed 17 May , Hal Rosenstock wrote:
> > OpenSM: Use memory routines directly and eliminate cl_mem* routines
> > as these routines are part of ISO C
> > 
> > Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>
> 
> Following Hal's cleanup

And even more:

This cleans cl_mem*() wrappers from diags sources

Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>


---

 diags/src/saquery.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

d1950d51d8a6ada9b69ed194cd8cc4b2e9aa7902
diff --git a/diags/src/saquery.c b/diags/src/saquery.c
index 5526bff..7c07253 100644
--- a/diags/src/saquery.c
+++ b/diags/src/saquery.c
@@ -42,6 +42,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 
 #define _GNU_SOURCE
 #include 
@@ -203,8 +204,8 @@ get_all_records(osm_bind_handle_t bind_h
osmv_query_req_t  req;
osmv_user_query_t user;
 
-   cl_memclr( &req, sizeof( req ) );
-   cl_memclr( &user, sizeof( user ) );
+   memset( &req, 0, sizeof( req ) );
+   memset( &user, 0, sizeof( user ) );
 
user.attr_id = query_id;
user.attr_offset = attr_offset;
-- 
1.3.2

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] Replace cl_memory.h by string.h

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 18:20, Sasha Khapyorsky wrote:
> On 15:05 Wed 17 May , Roland Dreier wrote:
> > Just curious -- what's the reason behind changes like:
> > 
> >  > --- a/osm/complib/cl_event_wheel.c
> >  > +++ b/osm/complib/cl_event_wheel.c
> >  > @@ -40,6 +40,7 @@ #  include 
> >  >  #endif /* HAVE_CONFIG_H */
> >  >  
> >  >  #include 
> >  > +#include 
> >  >  #include 
> >  >  #include 
> > 
> > It seems including cl_memory.h in more places is a step backwards, or
> > am I missing the point here?
> 
> It is necessary for explicit prototyping yet used cl_malloc(),
> cl_free(). I guess this will be removed with next wave of Hal's cleanup.

Yes, that's what I expect too. When cl_malloc*/cl_free get removed, this
will go away...

-- Hal

> Sasha

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] opensm: remove unused cl_memory_osd.h [was: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines]

2006-05-17 Thread Sasha Khapyorsky

On 01:02 Thu 18 May , Sasha Khapyorsky wrote:
> On 12:14 Wed 17 May , Hal Rosenstock wrote:
> > OpenSM: Use memory routines directly and eliminate cl_mem* routines
> > as these routines are part of ISO C
> > 
> > Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>
> 
> Following Hal's cleanup

And more:

This removes unused cl_memory_osd.h file from complib

Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>


---

 osm/complib/Makefile.am |1 
 osm/include/Makefile.am |1 
 osm/include/complib/cl_memory_osd.h |   79 ---
 3 files changed, 0 insertions(+), 81 deletions(-)
 delete mode 100644 osm/include/complib/cl_memory_osd.h

95ce6332a6531ae1c7dab4060bfa5800e1b8f4ec
diff --git a/osm/complib/Makefile.am b/osm/complib/Makefile.am
index ecbd8e2..809a404 100644
--- a/osm/complib/Makefile.am
+++ b/osm/complib/Makefile.am
@@ -51,7 +51,6 @@ libosmcompinclude_HEADERS = $(srcdir)/..
$(srcdir)/../include/complib/cl_map.h \
$(srcdir)/../include/complib/cl_math.h \
$(srcdir)/../include/complib/cl_memory.h \
-   $(srcdir)/../include/complib/cl_memory_osd.h \
$(srcdir)/../include/complib/cl_memtrack.h \
$(srcdir)/../include/complib/cl_packoff.h \
$(srcdir)/../include/complib/cl_packon.h \
diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am
index c7054ad..b23b1de 100644
--- a/osm/include/Makefile.am
+++ b/osm/include/Makefile.am
@@ -124,7 +124,6 @@ EXTRA_DIST = \
$(srcdir)/opensm/osm_state_mgr_ctrl.h \
$(srcdir)/complib/cl_thread_osd.h \
$(srcdir)/complib/cl_packon.h \
-   $(srcdir)/complib/cl_memory_osd.h \
$(srcdir)/complib/cl_atomic_osd.h \
$(srcdir)/complib/cl_spinlock.h \
$(srcdir)/complib/cl_passivelock.h \
diff --git a/osm/include/complib/cl_memory_osd.h 
b/osm/include/complib/cl_memory_osd.h
deleted file mode 100644
index 9ef17e0..000
--- a/osm/include/complib/cl_memory_osd.h
+++ /dev/null
@@ -1,79 +0,0 @@
-/*
- * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
- * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- *
- * $Id$
- */
-
-
-
-/*
- * Abstract:
- * Defines sized datatypes for Linux Kernel and User mode
- *  exported sizes are int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t
- *  int64_t, uint64_t. uintn_t is a polymorphic type, size is native size and
- *  also size of the pointer.
- *
- * Environment:
- * Linux User and Kernel Mode
- *
- * $Revision: 1.2 $
- */
-
-#ifndef _CL_MEMORY_OSD_H_
-#define _CL_MEMORY_OSD_H_
-
-#include 
-
-#ifdef __cplusplus
-#  define BEGIN_C_DECLS extern "C" {
-#  define END_C_DECLS   }
-#else /* !__cplusplus */
-#  define BEGIN_C_DECLS
-#  define END_C_DECLS
-#endif /* __cplusplus */
-
-BEGIN_C_DECLS
-
-#ifndef __WIN__
-
-static inline uint32_t
-cl_get_pagesize( void )
-{
-   return getpagesize();
-}
-
-#endif
-
-END_C_DECLS
-
-#endif /* _CL_MEMORY_OSD_H_ */
-- 
1.3.2

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general][PATCH] srp: param sg_tablesize,

2006-05-17 Thread Roland Dreier

Thanks, applied in slightly tweaked form as below:

diff-tree 7c0543697efa99b2f1d308c415b0b2f3c0810f74 (from 
fbd15762bd05491db039ecd0ea57ee5f848759b0)
Author: Vu Pham <[EMAIL PROTECTED]>
Date:   Wed May 17 15:21:41 2006 -0700

IB/srp: Allow sg_tablesize to be adjusted

Make the sg_tablesize used by SRP adjustable at module load time via a
module parameter.  Calculate the corresponding IU length required to
support this.

Signed-off-by: Vu Pham <[EMAIL PROTECTED]>
Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 72b61cd..4dd6e6a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -62,6 +62,13 @@ MODULE_DESCRIPTION("InfiniBand SCSI RDMA
   "v" DRV_VERSION " (" DRV_RELDATE ")");
 MODULE_LICENSE("Dual BSD/GPL");
 
+static int srp_sg_tablesize = SRP_DEF_SG_TABLESIZE;
+static int srp_max_iu_len;
+
+module_param(srp_sg_tablesize, int, 0444);
+MODULE_PARM_DESC(srp_sg_tablesize,
+"Max number of gather/scatter entries per I/O (default is 
12)");
+
 static int topspin_workarounds = 1;
 
 module_param(topspin_workarounds, int, 0444);
@@ -311,7 +318,7 @@ static int srp_send_req(struct srp_targe
 
req->priv.opcode= SRP_LOGIN_REQ;
req->priv.tag   = 0;
-   req->priv.req_it_iu_len = cpu_to_be32(SRP_MAX_IU_LEN);
+   req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len);
req->priv.req_buf_fmt   = cpu_to_be16(SRP_BUF_FORMAT_DIRECT |
  SRP_BUF_FORMAT_INDIRECT);
memcpy(req->priv.initiator_port_id, 
target->srp_host->initiator_port_id, 16);
@@ -953,7 +960,7 @@ static int srp_queuecommand(struct scsi_
goto err;
 
dma_sync_single_for_cpu(target->srp_host->dev->dev->dma_device, iu->dma,
-   SRP_MAX_IU_LEN, DMA_TO_DEVICE);
+   srp_max_iu_len, DMA_TO_DEVICE);
 
req = list_entry(target->free_reqs.next, struct srp_request, list);
 
@@ -986,7 +993,7 @@ static int srp_queuecommand(struct scsi_
}
 
dma_sync_single_for_device(target->srp_host->dev->dev->dma_device, 
iu->dma,
-  SRP_MAX_IU_LEN, DMA_TO_DEVICE);
+  srp_max_iu_len, DMA_TO_DEVICE);
 
if (__srp_post_send(target, iu, len)) {
printk(KERN_ERR PFX "Send failed\n");
@@ -1018,7 +1025,7 @@ static int srp_alloc_iu_bufs(struct srp_
 
for (i = 0; i < SRP_SQ_SIZE + 1; ++i) {
target->tx_ring[i] = srp_alloc_iu(target->srp_host,
- SRP_MAX_IU_LEN,
+ srp_max_iu_len,
  GFP_KERNEL, DMA_TO_DEVICE);
if (!target->tx_ring[i])
goto err;
@@ -1436,7 +1443,6 @@ static struct scsi_host_template srp_tem
.eh_host_reset_handler  = srp_reset_host,
.can_queue  = SRP_SQ_SIZE,
.this_id= -1,
-   .sg_tablesize   = SRP_MAX_INDIRECT,
.cmd_per_lun= SRP_SQ_SIZE,
.use_clustering = ENABLE_CLUSTERING,
.shost_attrs= srp_host_attrs
@@ -1914,6 +1920,11 @@ static int __init srp_init_module(void)
 {
int ret;
 
+   srp_template.sg_tablesize = srp_sg_tablesize;
+   srp_max_iu_len = (sizeof (struct srp_cmd) +
+ sizeof (struct srp_indirect_buf) +
+ srp_sg_tablesize * 16);
+
ret = class_register(&srp_class);
if (ret) {
printk(KERN_ERR PFX "couldn't register class infiniband_srp\n");
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h 
b/drivers/infiniband/ulp/srp/ib_srp.h
index c071c30..033a447 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -56,7 +56,7 @@ enum {
SRP_DLID_REDIRECT   = 2,
 
SRP_MAX_LUN = 512,
-   SRP_MAX_IU_LEN  = 256,
+   SRP_DEF_SG_TABLESIZE= 12,
 
SRP_RQ_SHIFT= 6,
SRP_RQ_SIZE = 1 << SRP_RQ_SHIFT,
@@ -71,9 +71,6 @@ enum {
 };
 
 #define SRP_OP_RECV(1 << 31)
-#define SRP_MAX_INDIRECT   ((SRP_MAX_IU_LEN -  \
- sizeof (struct srp_cmd) - \
- sizeof (struct srp_indirect_buf)) / 16)
 
 enum srp_target_state {
SRP_TARGET_LIVE,
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] Replace cl_memory.h by string.h

2006-05-17 Thread Sasha Khapyorsky

On 15:05 Wed 17 May , Roland Dreier wrote:
> Just curious -- what's the reason behind changes like:
> 
>  > --- a/osm/complib/cl_event_wheel.c
>  > +++ b/osm/complib/cl_event_wheel.c
>  > @@ -40,6 +40,7 @@ #  include 
>  >  #endif /* HAVE_CONFIG_H */
>  >  
>  >  #include 
>  > +#include 
>  >  #include 
>  >  #include 
> 
> It seems including cl_memory.h in more places is a step backwards, or
> am I missing the point here?

It is necessary for explicit prototyping yet used cl_malloc(),
cl_free(). I guess this will be removed with next wave of Hal's cleanup.

Sasha
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> Right now, there is memory
> tracking code implemented.

Doesn't MALLOC_CHECK_ do what you want?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] Replace cl_memory.h by string.h

2006-05-17 Thread Roland Dreier

Just curious -- what's the reason behind changes like:

 > --- a/osm/complib/cl_event_wheel.c
 > +++ b/osm/complib/cl_event_wheel.c
 > @@ -40,6 +40,7 @@ #  include 
 >  #endif /* HAVE_CONFIG_H */
 >  
 >  #include 
 > +#include 
 >  #include 
 >  #include 

It seems including cl_memory.h in more places is a step backwards, or
am I missing the point here?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: [PATCH2] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug

2006-05-17 Thread Arlin Davis


>-Original Message-
>From: Arlin Davis [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, May 17, 2006 12:17 PM
>To: 'James Lentini'
>Cc: openib-general
>Subject: [PATCH] uDAPL: fix uCMA provider event types and dapl_ep_create segv 
>bug
>
>James,
>
>Fix for uCMA provider to return the correct event as a result of rejects. 
>Also, ran into a segv bug
>with dapl_ep_create when creating without a conn_evd.
>
>Thanks,
>
>-arlin
>
> Signed-off by: Arlin Davis <[EMAIL PROTECTED]>


Sorry, the last patch was wrong. Try again...

-arlin


Signed-off by: Arlin Davis <[EMAIL PROTECTED]>


Index: dapl/common/dapl_ep_create.c
===
--- dapl/common/dapl_ep_create.c(revision 7299)
+++ dapl/common/dapl_ep_create.c(working copy)
@@ -310,7 +310,10 @@ dapl_ep_create (
  *
  * N.B. This should really be done by a util routine.
  */
-dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count);
+if (connect_evd_handle != DAT_HANDLE_NULL)
+{
+   dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count);
+}
 /* Optional handles */
 if (recv_evd_handle != DAT_HANDLE_NULL)
 {
Index: dapl/openib_cma/dapl_ib_cm.c
===
--- dapl/openib_cma/dapl_ib_cm.c(revision 7299)
+++ dapl/openib_cma/dapl_ib_cm.c(working copy)
@@ -287,14 +287,24 @@ static void dapli_cm_active_cb(struct da
 NULL, conn->ep);
break;
case RDMA_CM_EVENT_REJECTED:
+   {
+   ib_cm_events_t cm_event;
+
+   /* no device type specified so assume IB for now */
+   if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */
+   cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA;
+   else 
+   cm_event = IB_CME_DESTINATION_REJECT;
+
dapl_dbg_log(
DAPL_DBG_TYPE_WARN,
" dapli_cm_active_handler: REJECTED reason=%d\n",   
event->status);
-   dapl_evd_connection_callback(conn, IB_CME_DESTINATION_REJECT,
-NULL, conn->ep);
+   
+   dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep);
+   
break;
-
+   }
case RDMA_CM_EVENT_ESTABLISHED:

dapl_dbg_log(DAPL_DBG_TYPE_CM, 
@@ -383,6 +393,14 @@ static void dapli_cm_passive_cb(struct d
break;
 
case RDMA_CM_EVENT_REJECTED:
+   {
+   ib_cm_events_t cm_event;
+
+   /* no device type specified so assume IB for now */
+   if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */
+   cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA;
+   else 
+   cm_event = IB_CME_DESTINATION_REJECT;
 
dapl_dbg_log(
DAPL_DBG_TYPE_WARN, 
@@ -397,10 +415,11 @@ static void dapli_cm_passive_cb(struct d
&ipaddr->dst_addr)->sin_addr.s_addr),
ntohs(((struct sockaddr_in *)
&ipaddr->dst_addr)->sin_port));
-
-   dapls_cr_callback(conn, IB_CME_DESTINATION_REJECT, 
- NULL, conn->sp);
+   
+   dapls_cr_callback(conn, cm_event, NULL, conn->sp);
+   
break;
+   }
case RDMA_CM_EVENT_ESTABLISHED:

dapl_dbg_log(DAPL_DBG_TYPE_CM, 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 15:29, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > Subject: Re: multcast join failed
> > 
> > On Wed, 2006-05-17 at 15:15, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > > Subject: Re: multcast join failed
> > > > 
> > > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote:
> > > > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > > > > Subject: Re: multcast join failed
> > > > > > 
> > > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote:
> > > > > > > Hi, Roland!
> > > > > > > With svn trunk, I started getting the following on one machine:
> > > > > > > 
> > > > > > > ib0: multicast join failed for ff12:401b::0:0:0::, 
> > > > > > > status -22
> > > > > > > 
> > > > > > > and I can't ping this machine over ipoib.
> > > > > > > Any idea?
> > > > > > 
> > > > > > What SM are you using ?
> > > > > > 
> > > > > > If OpenSM, are there any errors in the osm.log ?
> > > > > > 
> > > > > > -- Hal
> > > > > > 
> > > > > 
> > > > > 
> > > > > opensm
> > > > > I see these
> > > > > 
> > > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > > failed,
> > > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > > failed,
> > > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > > failed,
> > > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > > failed,
> > > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > > failed,
> > > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > > 
> > > > Are you attempting to join a 4x group from a 1x port (or perhaps there
> > > > is a MTU mismatch between the port and the group) ?
> > > > 
> > > > -- Hal
> > > > 
> > > 
> > > Yes, for some reason it came up 1x. but why?
> > 
> > If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it
> > is an autonegotiation thing. Perhaps you have a bad cable ?
> 
> OKay, I'll check, but why isn't ipoib working? Why is the mcast group 4x?

It defaults to 4x. If you want the group to be 1x, do something like the
following in /etc/osm-partitions.conf

Default=0x7fff,ipoib,rate=2:ALL=full;

You can check osm/doc/partition-config.txt for more config info.

> ITs a back-to-back configuration ...

OK. That shouldn't matter.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> Subject: Re: multcast join failed
> 
> On Wed, 2006-05-17 at 15:15, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > Subject: Re: multcast join failed
> > > 
> > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote:
> > > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > > > Subject: Re: multcast join failed
> > > > > 
> > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote:
> > > > > > Hi, Roland!
> > > > > > With svn trunk, I started getting the following on one machine:
> > > > > > 
> > > > > > ib0: multicast join failed for ff12:401b::0:0:0::, 
> > > > > > status -22
> > > > > > 
> > > > > > and I can't ping this machine over ipoib.
> > > > > > Any idea?
> > > > > 
> > > > > What SM are you using ?
> > > > > 
> > > > > If OpenSM, are there any errors in the osm.log ?
> > > > > 
> > > > > -- Hal
> > > > > 
> > > > 
> > > > 
> > > > opensm
> > > > I see these
> > > > 
> > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > failed,
> > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > failed,
> > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > failed,
> > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > failed,
> > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > > failed,
> > > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > 
> > > Are you attempting to join a 4x group from a 1x port (or perhaps there
> > > is a MTU mismatch between the port and the group) ?
> > > 
> > > -- Hal
> > > 
> > 
> > Yes, for some reason it came up 1x. but why?
> 
> If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it
> is an autonegotiation thing. Perhaps you have a bad cable ?

OKay, I'll check, but why isn't ipoib working? Why is the mcast group 4x?
ITs a back-to-back configuration ...

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > Yes, for some reason it came up 1x. but why?
> 
> If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it
> is an autonegotiation thing. Perhaps you have a bad cable ?

Hmm.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 15:15, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > Subject: Re: multcast join failed
> > 
> > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > > Subject: Re: multcast join failed
> > > > 
> > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote:
> > > > > Hi, Roland!
> > > > > With svn trunk, I started getting the following on one machine:
> > > > > 
> > > > > ib0: multicast join failed for ff12:401b::0:0:0::, status 
> > > > > -22
> > > > > 
> > > > > and I can't ping this machine over ipoib.
> > > > > Any idea?
> > > > 
> > > > What SM are you using ?
> > > > 
> > > > If OpenSM, are there any errors in the osm.log ?
> > > > 
> > > > -- Hal
> > > > 
> > > 
> > > 
> > > opensm
> > > I see these
> > > 
> > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > failed,
> > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > failed,
> > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > failed,
> > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > failed,
> > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 
> > > failed,
> > > sending IB_SA_MAD_STATUS_REQ_INVALID
> > 
> > Are you attempting to join a 4x group from a 1x port (or perhaps there
> > is a MTU mismatch between the port and the group) ?
> > 
> > -- Hal
> > 
> 
> Yes, for some reason it came up 1x. but why?

If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it
is an autonegotiation thing. Perhaps you have a bad cable ?

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug

2006-05-17 Thread Arlin Davis

James,

Fix for uCMA provider to return the correct event as a result of rejects. Also, 
ran into a segv bug
with dapl_ep_create when creating without a conn_evd.

Thanks,

-arlin

Signed-off by: Arlin Davis <[EMAIL PROTECTED]>

Index: dapl/common/dapl_ep_create.c
===
--- dapl/common/dapl_ep_create.c(revision 7140)
+++ dapl/common/dapl_ep_create.c(working copy)
@@ -310,7 +310,10 @@ dapl_ep_create (
  *
  * N.B. This should really be done by a util routine.
  */
-dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count);
+if (connect_evd_handle != DAT_HANDLE_NULL)
+{
+   dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count);
+}
 /* Optional handles */
 if (recv_evd_handle != DAT_HANDLE_NULL)
 {
Index: dapl/openib_cma/dapl_ib_cm.c
===
--- dapl/openib_cma/dapl_ib_cm.c(revision 7140)
+++ dapl/openib_cma/dapl_ib_cm.c(working copy)
@@ -285,14 +285,24 @@ static void dapli_cm_active_cb(struct da
 NULL, conn->ep);
break;
case RDMA_CM_EVENT_REJECTED:
+   {
+   ib_cm_events_t cm_event;
+
+   /* no device type specified so assume IB for now */
+   if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */
+   cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA;
+   else 
+   cm_event = IB_CME_DESTINATION_REJECT;
+
dapl_dbg_log(
DAPL_DBG_TYPE_WARN,
" dapli_cm_active_handler: REJECTED reason=%d\n",   
event->status);
-   dapl_evd_connection_callback(conn, IB_CME_DESTINATION_REJECT,
-NULL, conn->ep);
+   
+   dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep);
+   
break;
-
+   }
case RDMA_CM_EVENT_ESTABLISHED:

dapl_dbg_log(DAPL_DBG_TYPE_CM, 
@@ -381,6 +391,14 @@ static void dapli_cm_passive_cb(struct d
break;
 
case RDMA_CM_EVENT_REJECTED:
+   {
+   ib_cm_events_t cm_event;
+
+   /* no device type specified so assume IB for now */
+   if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */
+   cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA;
+   else 
+   cm_event = IB_CME_DESTINATION_REJECT;
 
dapl_dbg_log(
DAPL_DBG_TYPE_WARN, 
@@ -395,10 +413,11 @@ static void dapli_cm_passive_cb(struct d
&ipaddr->dst_addr)->sin_addr.s_addr),
ntohs(((struct sockaddr_in *)
&ipaddr->dst_addr)->sin_port));
-
-   dapls_cr_callback(conn, IB_CME_DESTINATION_REJECT, 
- NULL, conn->sp);
+   
+   dapl_cr_callback(conn, cm_event, NULL, conn->sp);
+   
break;
+   }
case RDMA_CM_EVENT_ESTABLISHED:

dapl_dbg_log(DAPL_DBG_TYPE_CM, 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> Subject: Re: multcast join failed
> 
> On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > Subject: Re: multcast join failed
> > > 
> > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote:
> > > > Hi, Roland!
> > > > With svn trunk, I started getting the following on one machine:
> > > > 
> > > > ib0: multicast join failed for ff12:401b::0:0:0::, status 
> > > > -22
> > > > 
> > > > and I can't ping this machine over ipoib.
> > > > Any idea?
> > > 
> > > What SM are you using ?
> > > 
> > > If OpenSM, are there any errors in the osm.log ?
> > > 
> > > -- Hal
> > > 
> > 
> > 
> > opensm
> > I see these
> > 
> > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> > sending IB_SA_MAD_STATUS_REQ_INVALID
> > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> > sending IB_SA_MAD_STATUS_REQ_INVALID
> > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> > sending IB_SA_MAD_STATUS_REQ_INVALID
> > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> > sending IB_SA_MAD_STATUS_REQ_INVALID
> > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> > sending IB_SA_MAD_STATUS_REQ_INVALID
> 
> Are you attempting to join a 4x group from a 1x port (or perhaps there
> is a MTU mismatch between the port and the group) ?
> 
> -- Hal
> 

Yes, for some reason it came up 1x. but why?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > Subject: Re: multcast join failed
> > 
> > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote:
> > > Hi, Roland!
> > > With svn trunk, I started getting the following on one machine:
> > > 
> > > ib0: multicast join failed for ff12:401b::0:0:0::, status -22
> > > 
> > > and I can't ping this machine over ipoib.
> > > Any idea?
> > 
> > What SM are you using ?
> > 
> > If OpenSM, are there any errors in the osm.log ?
> > 
> > -- Hal
> > 
> 
> 
> opensm
> I see these
> 
> May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID
> May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID
> May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID
> May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID
> May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID

Are you attempting to join a 4x group from a 1x port (or perhaps there
is a MTU mismatch between the port and the group) ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: multcast join failed

2006-05-17 Thread Sasha Khapyorsky

On 21:23 Wed 17 May , Michael S. Tsirkin wrote:
> 
> opensm
> I see these
> 
> May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID
> May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID
> May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID
> May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID
> May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
> __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
> sending IB_SA_MAD_STATUS_REQ_INVALID

Is it 1x port?

Sasha
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [librdmacm] changes to cmatose to return a value different than 0 when there is a failure

2006-05-17 Thread Sean Hefty


Dotan Barak wrote:

Added checks to the return values of all of the functions that may fail
(in order to add this test to the regression system).


Thanks - applied with one minor change.


+   int rc;


Changed 'rc' to 'ret' to match the rest of the code.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> Subject: Re: multcast join failed
> 
> On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote:
> > Hi, Roland!
> > With svn trunk, I started getting the following on one machine:
> > 
> > ib0: multicast join failed for ff12:401b::0:0:0::, status -22
> > 
> > and I can't ping this machine over ipoib.
> > Any idea?
> 
> What SM are you using ?
> 
> If OpenSM, are there any errors in the osm.log ?
> 
> -- Hal
> 


opensm
I see these

May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
__validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
sending IB_SA_MAD_STATUS_REQ_INVALID
May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
__validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
sending IB_SA_MAD_STATUS_REQ_INVALID
May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
__validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
sending IB_SA_MAD_STATUS_REQ_INVALID
May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
__validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
sending IB_SA_MAD_STATUS_REQ_INVALID
May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12:
__validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed,
sending IB_SA_MAD_STATUS_REQ_INVALID

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general][PATCH] srp: throttle command per lun,

2006-05-17 Thread Roland Dreier

Thanks, applied.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: multcast join failed

2006-05-17 Thread Sasha Khapyorsky

On 13:39 Wed 17 May , Hal Rosenstock wrote:
> On Wed, 2006-05-17 at 13:38, Eitan Zahavi wrote:
> > You are probably running with no partition policy file.
> 
> Assuming he is running with OpenSM from the trunk
> 
> > I think you need one to get the default partition setup for IPoIB.
> > 
> > Hal, is this correct?
> 
> Assuming the above is true, he either needs to run:
> 
> opensm -N
> 
> or create a partition configuration file /etc/osm-partitions.conf 
> 
> Default=0x7fff,ipoib:ALL=full;
> 
> and run:
> 
> opensm
> 
> assuming all he cares about is the default partition

Without partition policy file OpenSM should configure Default partition
with full membership (pkey=0x) for all ports and precreate IPoIB MCG
(it is equivalent to "Default=0x7fff,ipoib:ALL=full;" as in Hal's
example).

Actual pkey tables content could be checked with:

 $ smpquery pkeys  [port number]

Sasha
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

2006-05-17 Thread Roland Dreier

Dave> We are seeing a bug (with both our driver native MPI
Dave> processes and mthca mvapic), where when 8 processes using
Dave> "simultaneously exit", we get watchdogs and/or hangs in the
Dave> close routines.  Moving the freeing outside the mutex was an
Dave> attempt to see if we were running into some VM issues by
Dave> doing lots of page unlocking and freeing with the mutex
Dave> held.  It seemed to help somewhat, but not to solve the
Dave> problem.

Am I understanding correctly that you see a hang or watchdog timeout
even with the mthca driver?

Is there any possibility of posting the test case to reproduce this?
It doesn't seem likely that ipath changes are going to fix a generic
bug like this...

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: SRP [PATCH] Cleaning in srp_remove_one

2006-05-17 Thread Roland Dreier

Thanks.  I had already merged some changes from Matthew Wilcox that
clean up that loop a little bit, but I merged the rest of your patch
too.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] multcast join failed

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote:
> Hi, Roland!
> With svn trunk, I started getting the following on one machine:
> 
> ib0: multicast join failed for ff12:401b::0:0:0::, status -22
> 
> and I can't ping this machine over ipoib.
> Any idea?

What SM are you using ?

If OpenSM, are there any errors in the osm.log ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Roland Dreier

Michael> When can we get EINVAL from multicast join?

If the SM returned a bad status I think.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: multcast join failed

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 13:38, Eitan Zahavi wrote:
> You are probably running with no partition policy file.

Assuming he is running with OpenSM from the trunk

> I think you need one to get the default partition setup for IPoIB.
> 
> Hal, is this correct?

Assuming the above is true, he either needs to run:

opensm -N

or create a partition configuration file /etc/osm-partitions.conf 

Default=0x7fff,ipoib:ALL=full;

and run:

opensm

assuming all he cares about is the default partition

-- Hal

> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:openib-general-
> > [EMAIL PROTECTED] On Behalf Of Michael S. Tsirkin
> > Sent: Wednesday, May 17, 2006 7:45 PM
> > To: Roland Dreier
> > Cc: openib-general@openib.org
> > Subject: [openib-general] Re: multcast join failed
> > 
> > Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> > > Subject: Re: multcast join failed
> > >
> > >  > With svn trunk, I started getting the following on one machine:
> > >  >
> > >  > ib0: multicast join failed for ff12:401b::0:0:0::,
> status -22
> > >  >
> > >  > and I can't ping this machine over ipoib.
> > >  > Any idea?
> > >
> > > No, nothing of significance has changed in ipoib for a while.
> > 
> > When can we get EINVAL from multicast join?
> > 
> > --
> > MST
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] opensm: make more statics

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 12:44, Sasha Khapyorsky wrote:
> This makes local functions to be static in osm_link_mgr.c.
> 
> Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>

Thanks. Applied to both trunk and 1.0 branch.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] OFED-1.0-rc4 need db-devel

2006-05-17 Thread zhu shi song

why don't use db4-devel.  db-devel sems obsolete.

zhu

--- Vladimir Sokolovsky <[EMAIL PROTECTED]> wrote:

> Scott Weitzenkamp (sweitzen) wrote:
> >> db-devel package is required to build open_iscsi
> package RPM.
> >> This package is not relevant for RHEL 4.3.
> >> There are two options to install OFED-1.0-rc4 on
> RHEL 4.3 without 
> >> open_iscsi:
> >> 1. Select "Custom installation" and don't choose
> to install 
> >> open_iscsi.
> >> 2. Edit ofed.conf (created automatically under
> OFED-1.0-rc4 directory 
> >> when you run install.sh or build.sh) and set
> *open_iscsi=n*.
> >> Then run:
> >> ./install.sh -c ofed.conf
> >> 
> >
> > Why don't we ignore these packages on RHEL4 U3,
> just like we ignore
> > uDAPL on ppc64?
> >
> > Scott
> >
> >   
> We will do this in OFED-1.0-rc5.
> 
> Vladimir
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: multcast join failed

2006-05-17 Thread Eitan Zahavi

You are probably running with no partition policy file.
I think you need one to get the default partition setup for IPoIB.

Hal, is this correct?

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, May 17, 2006 7:45 PM
> To: Roland Dreier
> Cc: openib-general@openib.org
> Subject: [openib-general] Re: multcast join failed
> 
> Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> > Subject: Re: multcast join failed
> >
> >  > With svn trunk, I started getting the following on one machine:
> >  >
> >  > ib0: multicast join failed for ff12:401b::0:0:0::,
status -22
> >  >
> >  > and I can't ping this machine over ipoib.
> >  > Any idea?
> >
> > No, nothing of significance has changed in ipoib for a while.
> 
> When can we get EINVAL from multicast join?
> 
> --
> MST
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 13:11, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> >  *
> >  * SEE ALSO
> > -*  Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, 
> > cl_memcmp
> > +*  Memory Management, cl_free, cl_malloc
> >  **/
> 
> Next: cl_malloc/cl_free?

Yes, I didn't want to hold this part up for that. There will be a
separate patch for that but not sure when. Right now, there is memory
tracking code implemented.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re:

2006-05-17 Thread Grant Grundler

On Wed, May 17, 2006 at 07:40:07AM -0700, Roland Dreier wrote:
> Yes, I agree.  That's why I think we should get rid of the
> "linux-kernel" part of the svn tree entirely.  Because everyone who
> wants to test new code seems to run last stable kernel + svn drivers
> instead of the new development kernel.

That's because openib guarantee SVN drivers will build with last
stable kernel. Change that policy and document the steps
that folks should follow. I'd be willing to occasionally try
newer kernels if you think that's what we should be doing.

thanks,
grant
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
>  *
>  * SEE ALSO
> -*Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, 
> cl_memcmp
> +*Memory Management, cl_free, cl_malloc
>  **/

Next: cl_malloc/cl_free?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out

2006-05-17 Thread Sasha Khapyorsky

On 08:11 Wed 17 May , Hal Rosenstock wrote:
> On Wed, 2006-05-17 at 07:41, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > Subject: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code 
> > > be compiled out
> > > 
> > > OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out
> > > 
> > > Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>
> > 
> > 
> > 
> > > @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb(
> > >   lid.
> > >*/
> > >/* For now - do not add the alternate dr path to the release */
> > > -  if (0)
> > > -if ( p_madw->mad_addr.dest_lid != 0x )
> > > +#if 0
> > > +  if ( p_madw->mad_addr.dest_lid != 0x )
> > 
> > In my experience, if you compile with -O, gcc does a good enough job of
> > dead code elimination.
> 
> But not all builds are that way though.

Also "#if 0" makes temporary disabled code more "visible" (for future
improvements).

Sasha.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

2006-05-17 Thread Dave Olson

On Mon, 15 May 2006, Roland Dreier wrote:

| This looks like a pastiche of several patches.  Why can't it be split
| up into logical pieces?
| 
|  > Call dma_free_coherent without ipath_mutex held.
| 
| Why?  Doesn't freeing work with the mutex held?

Sure, that's the way the previous code worked.

We are seeing a bug (with both our driver native MPI processes and mthca 
mvapic),
where when 8 processes using "simultaneously exit", we get watchdogs and/or 
hangs
in the close routines.   Moving the freeing outside the mutex was an attempt
to see if we were running into some VM issues by doing lots of page unlocking
and freeing with the mutex held.   It seemed to help somewhat, but not to solve
the problem.

It also allows other processes to open and close in a somewhat more timely
fashion.

Dave Olson
[EMAIL PROTECTED]
http://www.unixfolk.com/dave
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: multcast join failed
> 
>  > With svn trunk, I started getting the following on one machine:
>  > 
>  > ib0: multicast join failed for ff12:401b::0:0:0::, status -22
>  > 
>  > and I can't ping this machine over ipoib.
>  > Any idea?
> 
> No, nothing of significance has changed in ipoib for a while.

When can we get EINVAL from multicast join?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] opensm: make more statics

2006-05-17 Thread Sasha Khapyorsky


This makes local functions to be static in osm_link_mgr.c.

Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>
---

 osm/opensm/osm_link_mgr.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/osm/opensm/osm_link_mgr.c b/osm/opensm/osm_link_mgr.c
index 5d9ab7d..2b0d2de 100644
--- a/osm/opensm/osm_link_mgr.c
+++ b/osm/opensm/osm_link_mgr.c
@@ -111,8 +111,8 @@ osm_link_mgr_init(
 
 /**
  **/
-void
-osm_link_mgr_set_physp_pi(
+static void
+__osm_link_mgr_set_physp_pi(
   IN osm_link_mgr_t*   const p_mgr,
   IN osm_physp_t*  const p_physp,
   IN uint8_t   const port_state )
@@ -129,7 +129,7 @@ osm_link_mgr_set_physp_pi(
   boolean_tsend_set = FALSE;
   osm_physp_t *p_remote_physp;
 
-  OSM_LOG_ENTER( p_mgr->p_log, osm_link_mgr_set_physp_pi );
+  OSM_LOG_ENTER( p_mgr->p_log, __osm_link_mgr_set_physp_pi );
 
   CL_ASSERT( p_physp );
   CL_ASSERT( osm_physp_is_valid( p_physp ) );
@@ -151,7 +151,7 @@ osm_link_mgr_set_physp_pi(
 if (! p_switch )
 {
   osm_log( p_mgr->p_log, OSM_LOG_ERROR,
-   "osm_link_mgr_set_physp_pi: ERR 4201: "
+   "__osm_link_mgr_set_physp_pi: ERR 4201: "
"Cannot find switch by guid: 0x%" PRIx64 "\n",
cl_ntoh64( p_node->node_info.node_guid ) );
   goto Exit;
@@ -165,7 +165,7 @@ osm_link_mgr_set_physp_pi(
   if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) )
   {
 osm_log( p_mgr->p_log, OSM_LOG_DEBUG,
- "osm_link_mgr_set_physp_pi: "
+ "__osm_link_mgr_set_physp_pi: "
  "Skipping port 0, GUID = 0x%016" PRIx64 "\n",
  cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) );
   }
@@ -366,7 +366,7 @@ osm_link_mgr_set_physp_pi(
 
 /**
  **/
-osm_signal_t
+static osm_signal_t
 __osm_link_mgr_process_port(
   IN osm_link_mgr_t* const p_mgr,
   IN osm_port_t* const p_port,
@@ -419,7 +419,7 @@ __osm_link_mgr_process_port(
   (current_state < link_state) )
   {
 p_mgr->send_set_reqs = FALSE;
-osm_link_mgr_set_physp_pi(
+__osm_link_mgr_set_physp_pi(
   p_mgr,
   p_physp,
   link_state );
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] ib_mthca fails to load with old firmware

2006-05-17 Thread Ken L Johnson

Hi Scott -

On Wed, 17 May 2006 at 08:40:50 -0700, Scott Weitzenkamp wrote:

> What kind of blade systems are these?  For some blade systems, Cisco
> provides HCA firmware that has been configured to provide better signal
> integrity.
>
> If you run /usr/local/ofed/sbin/tvflash -i, I can then tell which
> firmware you need.

The blade systems are all Dell 1855's. Here's the output you requested:

---8<---

blade9:/usr/local/ofed/sbin # ./tvflash -i
HCA #0: MT25208 Tavor Compat, DLGL, revision A0
  Primary image is v4.6.000 build 3.0.0.160, with label 'HCA.DLGL.A0'
  Secondary image is v4.6.000 build 3.0.0.160, with label 'HCA.DLGL.A0'

  Vital Product Data
Product Name: DLGL
P/N: 99-00063-03
E/C: Rev: A8
S/N: 57O1771
Freq/Power: PW=10W;PCIe 8X
Date Code: 3105
Checksum: Ok

--->8---

Regards,
-- 
Ken L Johnson  <[EMAIL PROTECTED]>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: compilation warning in diags tools

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 11:47, Dotan Barak wrote:
> Hi.
> 
> Here is a compilation warning when using gcc 3.4.5:
> 
> src/grouping.c: In function `get_router_slot':
> src/grouping.c:213: warning: implicit declaration of function `calloc'
> /bin/sh ./libtool --tag=CC --mode=link gcc  -m64  -L../libibcommon -libcommon 
> -L../libibumad -libumad -L../osm/opensm/.libs -lopensm -L../os
> m/libvendor/.libs -losmvendor -L../osm/complib/.libs -losmcomp -o 
> src/ibnetdiscover  src_ibnetdiscover-ibnetdiscover.o src_ibnetdiscover-gro
> uping.o ../libibcommon/libibcommon.la ../libibumad/libibumad.la 
> ../libibmad/libibmad.la
> 
> (i think that stdlib.h should be included to prevent this warning)

Fixed in r7290. Can you update and try to be sure ? Thanks.

-- Hal

> 
> thanks
> Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [PATCH] IB: Make needlessly global ib_mad_cachestatic

2006-05-17 Thread Sean Hefty

>Any reason not to apply this?

Looks fine to apply be me.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] SRP [PATCH] Cleaning in srp_remove_one

2006-05-17 Thread Ishai Rabinovitz

3 changes in the same place:

1) The if statement is redundant.
2) There is no need to save the flags - it is inside a mutex_lock.
3) We hold the mutex for the list and we are not deleting from the list so 
   there is no need for list_for_each_entry_safe.

Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]>
Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c
===
--- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-14 
14:22:12.0 +0300
+++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 
14:26:54.0 +0300
@@ -1750,7 +1750,6 @@ static void srp_remove_one(struct ib_dev
struct srp_host *host, *tmp_host;
LIST_HEAD(target_list);
struct srp_target_port *target, *tmp_target;
-   unsigned long flags;
 
dev_list = ib_get_client_data(device, &srp_client);
 
@@ -1767,12 +1766,10 @@ static void srp_remove_one(struct ib_dev
 * commands and don't try to reconnect.
 */
mutex_lock(&host->target_mutex);
-   list_for_each_entry_safe(target, tmp_target,
-&host->target_list, list) {
-   spin_lock_irqsave(target->scsi_host->host_lock, flags);
-   if (target->state != SRP_TARGET_REMOVED)
-   target->state = SRP_TARGET_REMOVED;
-   spin_unlock_irqrestore(target->scsi_host->host_lock, 
flags);
+   list_for_each_entry(target, &host->target_list, list) {
+   spin_lock_irq(target->scsi_host->host_lock);
+   target->state = SRP_TARGET_REMOVED;
+   spin_unlock_irq(target->scsi_host->host_lock);
}
mutex_unlock(&host->target_mutex);
 
-- 
Ishai Rabinovitz
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re:

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Michael> But, I think it's still useful to make it possible for
> Michael> people to test development snapshots on stable kernels
> Michael> simply because we'll get more testing and feedback this
> Michael> way.
> 
> It's fine except when API changes force us to diverge from upstream.
> Then it becomes a hassle.

Yes. Still, its mostly easy.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] SRP [PATCH] Looks like a potantial bug

2006-05-17 Thread Roland Dreier

Yes, good catch.  Thanks, applied.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re:

2006-05-17 Thread Roland Dreier

Michael> Yea, we are going that way.  Soon all we'll need will be
Michael> a git tree that we can used for development.  BTW, how
Michael> easy is it to get an account at kernel.org?

It's not hard if you have some history as a kernel developer.  Of
course hosting a git tree is pretty easy as well.

Michael> But, I think it's still useful to make it possible for
Michael> people to test development snapshots on stable kernels
Michael> simply because we'll get more testing and feedback this
Michael> way.

It's fine except when API changes force us to diverge from upstream.
Then it becomes a hassle.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: multcast join failed

2006-05-17 Thread Roland Dreier

 > With svn trunk, I started getting the following on one machine:
 > 
 > ib0: multicast join failed for ff12:401b::0:0:0::, status -22
 > 
 > and I can't ping this machine over ipoib.
 > Any idea?

No, nothing of significance has changed in ipoib for a while.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] SRP [PATCH] Looks like a potantial bug

2006-05-17 Thread Ishai Rabinovitz

On Wed, May 17, 2006 at 06:40:04PM +0300, Ishai Rabinovitz wrote:
> Hi,
> 
> While doing a code review I found a potential bug.
> I did not manage to execute a test to check this code.
> Please take a look:

Sorry, I made a mistake in the patch.
Please look at this one.

In srp_reconnect_target it uses req->scmnd->scsi_done(req->scmnd); (like in the 
patch)

Ishai

> Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]>
> --
> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c
> ===
> --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c  2006-05-17 
> 16:24:24.0 +0300
> +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c   2006-05-17 
> 17:13:47.0 +0300
> @@ -1326,7 +1326,7 @@ static int srp_reset_device(struct scsi_
>   list_for_each_entry_safe(req, tmp, &target->req_queue, list)
>   if (req->scmnd->device == scmnd->device) {
>   req->scmnd->result = DID_RESET << 16;
> - scmnd->scsi_done(scmnd);
> + req->scmnd->scsi_done(req->scmnd);
>   srp_remove_req(target, req);
>   }
>  
> -- 
> Ishai Rabinovitz
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
Ishai Rabinovitz
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] multcast join failed

2006-05-17 Thread Michael S. Tsirkin

Hi, Roland!
With svn trunk, I started getting the following on one machine:

ib0: multicast join failed for ff12:401b::0:0:0::, status -22

and I can't ping this machine over ipoib.
Any idea?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] compilation warning in diags tools

2006-05-17 Thread Dotan Barak

Hi.

Here is a compilation warning when using gcc 3.4.5:

src/grouping.c: In function `get_router_slot':
src/grouping.c:213: warning: implicit declaration of function `calloc'
/bin/sh ./libtool --tag=CC --mode=link gcc  -m64  -L../libibcommon -libcommon 
-L../libibumad -libumad -L../osm/opensm/.libs -lopensm -L../os
m/libvendor/.libs -losmvendor -L../osm/complib/.libs -losmcomp -o 
src/ibnetdiscover  src_ibnetdiscover-ibnetdiscover.o src_ibnetdiscover-gro
uping.o ../libibcommon/libibcommon.la ../libibumad/libibumad.la 
../libibmad/libibmad.la

(i think that stdlib.h should be included to prevent this warning)

thanks
Dotan
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re:

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re:
> That's why I think we should get rid of the
> "linux-kernel" part of the svn tree entirely.  Because everyone who
> wants to test new code seems to run last stable kernel + svn drivers
> instead of the new development kernel.
> 
>  - R.

Yea, we are going that way.
Soon all we'll need will be a git tree that we can used for development.
BTW, how easy is it to get an account at kernel.org?

But, I think it's still useful to make it possible for people to test
development snapshots on stable kernels simply because we'll get
more testing and feedback this way.

One way would be to put snapshots under
https://openib.org/downloads/

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] SRP [PATCH] Looks like a potantial bug

2006-05-17 Thread Ishai Rabinovitz

Hi,

While doing a code review I found a potential bug.
I did not manage to execute a test to check this code.
Please take a look:
Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]>
--
Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c
===
--- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-17 
16:24:24.0 +0300
+++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 
17:13:47.0 +0300
@@ -1326,7 +1326,7 @@ static int srp_reset_device(struct scsi_
list_for_each_entry_safe(req, tmp, &target->req_queue, list)
if (req->scmnd->device == scmnd->device) {
req->scmnd->result = DID_RESET << 16;
-   scmnd->scsi_done(scmnd);
+   req->scmnd->scsi_done(scmnd);
srp_remove_req(target, req);
}
 
-- 
Ishai Rabinovitz
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] ib_mthca fails to load with old firmware

2006-05-17 Thread Scott Weitzenkamp (sweitzen)

> Ken> I'm running into a problem when I try to use the OFED RC4
> Ken> release on some blade systems that have TopSpin HCA daughter
> Ken> cards installed (actually Mellanox). I'm trying to figure out
> Ken> how to update the firmware to the latest [
> Ken> http://mellanox.com/support/firmware_table.php ] but it seems
> Ken> I must know the PSID so I can grab the right firmware
> Ken> image. Can anyone point me in the right direction here?
> 
> For blade HCAs you should contact the HCA vendor for firmware updates.
> 
> You could try passing the module option "fw_cmd_doorbell=0" to
> ib_mthca.  That may work around things.
> 
>  - R.

What kind of blade systems are these?  For some blade systems, Cisco
provides HCA firmware that has been configured to provide better signal
integrity.

If you run /usr/local/ofed/sbin/tvflash -i, I can then tell which
firmware you need.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] OFED-1.0-rc4 need db-devel

2006-05-17 Thread Vladimir Sokolovsky


Scott Weitzenkamp (sweitzen) wrote:

db-devel package is required to build open_iscsi package RPM.
This package is not relevant for RHEL 4.3.
There are two options to install OFED-1.0-rc4 on RHEL 4.3 without 
open_iscsi:
1. Select "Custom installation" and don't choose to install 
open_iscsi.
2. Edit ofed.conf (created automatically under OFED-1.0-rc4 directory 
when you run install.sh or build.sh) and set *open_iscsi=n*.

Then run:
./install.sh -c ofed.conf



Why don't we ignore these packages on RHEL4 U3, just like we ignore
uDAPL on ppc64?

Scott

  

We will do this in OFED-1.0-rc5.

Vladimir
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [librdmacm] changes to cmatose to return a value different than 0 when there is a failure

2006-05-17 Thread Dotan Barak

Added checks to the return values of all of the functions that may fail
(in order to add this test to the regression system).

Signed-off-by: Dotan Barak <[EMAIL PROTECTED]>

Index: last_stable/src/userspace/librdmacm/examples/cmatose.c
===
--- last_stable.orig/src/userspace/librdmacm/examples/cmatose.c 2006-05-17 
18:30:35.0 +0300
+++ last_stable/src/userspace/librdmacm/examples/cmatose.c  2006-05-17 
18:31:35.0 +0300
@@ -219,7 +219,7 @@ static void connect_error(void)
test.connects_left--;
 }
 
-static void addr_handler(struct cmatest_node *node)
+static int addr_handler(struct cmatest_node *node)
 {
int ret;
 
@@ -228,9 +228,10 @@ static void addr_handler(struct cmatest_
printf("cmatose: resolve route failed: %d\n", ret);
connect_error();
}
+   return ret;
 }
 
-static void route_handler(struct cmatest_node *node)
+static int route_handler(struct cmatest_node *node)
 {
struct rdma_conn_param conn_param;
int ret;
@@ -252,9 +253,10 @@ static void route_handler(struct cmatest
printf("cmatose: failure connecting: %d\n", ret);
goto err;
}
-   return;
+   return 0;
 err:
connect_error();
+   return ret;
 }
 
 static int connect_handler(struct rdma_cm_id *cma_id)
@@ -305,10 +307,10 @@ static int cma_handler(struct rdma_cm_id
 
switch (event->event) {
case RDMA_CM_EVENT_ADDR_RESOLVED:
-   addr_handler(cma_id->context);
+   ret = addr_handler(cma_id->context);
break;
case RDMA_CM_EVENT_ROUTE_RESOLVED:
-   route_handler(cma_id->context);
+   ret = route_handler(cma_id->context);
break;
case RDMA_CM_EVENT_CONNECT_REQUEST:
ret = connect_handler(cma_id);
@@ -420,35 +422,45 @@ static int poll_cqs(void)
return 0;
 }
 
-static void connect_events(void)
+static int connect_events(void)
 {
struct rdma_cm_event *event;
-   int err = 0;
+   int err = 0, ret = 0;
 
while (test.connects_left && !err) {
err = rdma_get_cm_event(test.channel, &event);
if (!err) {
cma_handler(event->id, event);
rdma_ack_cm_event(event);
+   } else {
+   printf("cmatose: failure in rdma_get_cm_event in 
connect events\n");
+   ret = err;
}
}
+
+   return ret;
 }
 
-static void disconnect_events(void)
+static int disconnect_events(void)
 {
struct rdma_cm_event *event;
-   int err = 0;
+   int err = 0, ret = 0;
 
while (test.disconnects_left && !err) {
err = rdma_get_cm_event(test.channel, &event);
if (!err) {
cma_handler(event->id, event);
rdma_ack_cm_event(event);
+   } else {
+   printf("cmatose: failure in rdma_get_cm_event in 
disconnect events\n");
+   ret = err;
}
}
+
+   return ret;
 }
 
-static void run_server(void)
+static int run_server(void)
 {
struct rdma_cm_id *listen_id;
int i, ret;
@@ -457,7 +469,7 @@ static void run_server(void)
ret = rdma_create_id(test.channel, &listen_id, &test);
if (ret) {
printf("cmatose: listen request failed\n");
-   return;
+   return ret;
}
 
test.src_in.sin_family = PF_INET;
@@ -465,7 +477,7 @@ static void run_server(void)
ret = rdma_bind_addr(listen_id, test.src_addr);
if (ret) {
printf("cmatose: bind address failed: %d\n", ret);
-   return;
+   return ret;
}
 
ret = rdma_listen(listen_id, 0);
@@ -474,16 +486,21 @@ static void run_server(void)
goto out;
}
 
-   connect_events();
+   ret = connect_events();
+   if (ret)
+   goto out;
 
if (message_count) {
printf("initiating data transfers\n");
-   for (i = 0; i < connections; i++)
-   if (post_sends(&test.nodes[i]))
+   for (i = 0; i < connections; i++) {
+   ret = post_sends(&test.nodes[i]);
+   if (ret)
goto out;
+   }
 
printf("receiving data transfers\n");
-   if (poll_cqs())
+   ret = poll_cqs();
+   if (ret)
goto out;
printf("data transfers complete\n");
 
@@ -497,10 +514,13 @@ static void run_server(void)
rdma_disconnect(test.nodes[i].cma_id);
}
 
-   disconnect_events();
+   ret = disconnect_events();
+
printf("disconnected\n");
+
 out:
rdma_dest

RE: [openib-general] OFED-1.0-rc4 need db-devel

2006-05-17 Thread Scott Weitzenkamp (sweitzen)

> db-devel package is required to build open_iscsi package RPM.
> This package is not relevant for RHEL 4.3.
> There are two options to install OFED-1.0-rc4 on RHEL 4.3 without 
> open_iscsi:
> 1. Select "Custom installation" and don't choose to install 
> open_iscsi.
> 2. Edit ofed.conf (created automatically under OFED-1.0-rc4 directory 
> when you run install.sh or build.sh) and set *open_iscsi=n*.
> Then run:
> ./install.sh -c ofed.conf

Why don't we ignore these packages on RHEL4 U3, just like we ignore
uDAPL on ppc64?

Scott
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading

2006-05-17 Thread Ishai Rabinovitz

On Wed, May 17, 2006 at 02:55:57AM +0300, Roland Dreier wrote:
> Hmm, this doesn't seem right to me.  If I try this, then I get a crash
> because the scsi_host is already gone after the first put.  I verified
> that the reference count is 1 before these puts, and with the
> unmodified module I don't see anything left in /sys/class/scsi_host
> after unloading the module.
> 
> What kernel are you seeing problems with?  I'm testing with an
> up-to-date git kernel, although I doubt it makes a difference (did
> SCSI reference counting change recently??).
> 
> I do think there are some extra scsi_host_put() calls in
> srp_remove_work() -- I think the double scsi_host_put() dates back to
> a version (which I may never even have checked in) where there was a
> scsi_host_get() to avoid the scsi_host going away between the
> schedule_work() and srp_remove_work() actually running.
> 
> So the patch below seems correct to me.
> 
> What do you think?

I could not reproduce the problem again, so this patch works for me.

-- 
Ishai Rabinovitz
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading

2006-05-17 Thread Ishai Rabinovitz

On Wed, May 17, 2006 at 02:56:58AM +0300, Roland Dreier wrote:
> BTW, I think the patch below is correct as well.  This avoids problems
> where the SRP driver waits forever for a completion, for example if
> sending the DREQ fails because the connection has already been
> disconnected by the target.
> 
> Does this scenario seem like the deadlock you thought you saw?
> 
> --- linux-kernel/infiniband/ulp/srp/ib_srp.c  (revision 7245)
> +++ linux-kernel/infiniband/ulp/srp/ib_srp.c  (working copy)
> @@ -342,7 +342,10 @@ static void srp_disconnect_target(struct
>   /* XXX should send SRP_I_LOGOUT request */
>  
>   init_completion(&target->done);
> - ib_send_cm_dreq(target->cm_id, NULL, 0);
> + if (ib_send_cm_dreq(target->cm_id, NULL, 0)) {
> + printk(KERN_DEBUG PFX "Sending CM DREQ failed\n");
> + return;
> + }
>   wait_for_completion(&target->done);
>  }
>  

I don't think this caused the deadlock I had.
Still it looks like an important patch.
-- 
Ishai Rabinovitz
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: ib_mthca fails to load with old firmware

2006-05-17 Thread Roland Dreier

OK, I put this into my 2.6.17 branch:

diff-tree 1db76c14d215c8b26024dd532de3dcaf66ea30f7 (from 
032ebf2620ef99a4fedaa0f77dc2272095ac5863)
Author: Roland Dreier <[EMAIL PROTECTED]>
Date:   Wed May 17 07:48:07 2006 -0700

IB/mthca: Make fw_cmd_doorbell default to 0

Setting fw_cmd_doorbell allows FW command to be queued using posted
writes instead of requiring polling on a "go" bit, so it should be a
performance boost.  However, the option causes problems with at least
some device/firmware combinations, so set the default to 0 until we
understand what's going on better.

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c 
b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 1985b5d..798e13e 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -182,7 +182,7 @@ struct mthca_cmd_context {
u8status;
 };
 
-static int fw_cmd_doorbell = 1;
+static int fw_cmd_doorbell = 0;
 module_param(fw_cmd_doorbell, int, 0644);
 MODULE_PARM_DESC(fw_cmd_doorbell, "post FW commands through doorbell page if 
nonzero "
 "(and supported by FW)");
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: NOP problem in ib_mthca on OFED RC4

2006-05-17 Thread Michael S. Tsirkin

Quoting r. [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
> Subject: Re: NOP problem in ib_mthca on OFED RC4
> 
> 
> Michael,
> 
> > > > Which FW revision do you have?
> > > >
> > > The "ibstat" command shows:
> > >
> > > CA type: MT25204
> > > Number of ports: 1
> > > Firmware version: 1.0.800
> > > Hardware version: a0
> > > Node GUID: 0x0002c90200216dc4
> > > System image GUID: 0x0002c90200216dc7
> > >
> > >   -Don Albert-
> > >
> >
> > Yes, that's the latest revision. Hmm.
> >
> 
> What about the other thing I mentioned in my first message:  the "lspci" 
> command complains about the board slot that the HCA is plugged into:
> 
>pcilib: Resource 2 in /sys/bus/pci/devices/:03:00.0/resource has a 
> 64-bit address, ignoring
>
>03:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] 
> (rev 20)
> 
> I also found out that on this machine the HCA is plugged into a 16X PCI-e 
> slot, which is different than the other machine which is working, where the 
> slot is 8X.
> 
> Bear in mind, however, that both machines were previously working with the 
> 2.6.9-34 kernel with the backport patches and the OpenIB svn 6500 code.  Did 
> something happen in 2.6.16, or am I missing a patch?
> 
> -Don Albert-
> 

Could you please give more detail on the exact system that had/has
this problem? Model, chipset revision, full lspci -v output, etc.

Also, is there some way to login to such a system there remotely?

Thanks a bunch,


-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] IB: Make needlessly global ib_mad_cache static

2006-05-17 Thread Roland Dreier

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

Any reason not to apply this?

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 5ad41a6..92c7362 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -45,8 +45,7 @@ MODULE_DESCRIPTION("kernel IB MAD API");
 MODULE_AUTHOR("Hal Rosenstock");
 MODULE_AUTHOR("Sean Hefty");
 
-
-kmem_cache_t *ib_mad_cache;
+static kmem_cache_t *ib_mad_cache;
 
 static struct list_head ib_mad_port_list;
 static u32 ib_mad_client_id = 0;
diff --git a/drivers/infiniband/core/mad_priv.h 
b/drivers/infiniband/core/mad_priv.h
index b4fa28d..d147f3b 100644
--- a/drivers/infiniband/core/mad_priv.h
+++ b/drivers/infiniband/core/mad_priv.h
@@ -212,8 +212,6 @@ struct ib_mad_port_private {
struct ib_mad_qp_info qp_info[IB_MAD_QPS_CORE];
 };
 
-extern kmem_cache_t *ib_mad_cache;
-
 int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr);
 
 struct ib_mad_send_wr_private *
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: ib_mthca fails to load with old firmware

2006-05-17 Thread Roland Dreier

Michael> Hmm. There have been recent reports on configurations
Michael> which have trouble working with fw_cmd_doorbell=1, and
Michael> not all of them old FW. I never saw this in the lab.
Michael> Roland, should we change fw_cmd_doorbell to 0 by default,
Michael> until we figure out what is going on?

Yes, it's looking like that option is causing problems.  I will put a
patch changing the default to 0 into 2.6.17.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re:

2006-05-17 Thread Roland Dreier

Or> The impression i was getting from those responses and the luck
Or> of others is that (say) almost no one of the openib
Or> maintainers test infiniband with the "next" kernel which is
Or> not released yet.

Yes, I agree.  That's why I think we should get rid of the
"linux-kernel" part of the svn tree entirely.  Because everyone who
wants to test new code seems to run last stable kernel + svn drivers
instead of the new development kernel.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator

2006-05-17 Thread Roland Dreier

Or> Can you spare few words whats the difference between the
Or> for-2.6.18 and for-mm branches of your git tree?

for-mm is what Andrew pulls to get patches for -mm.  It has things
that I think should be seen in -mm, but which I am not ready to queue
in for-2.6.18.  You can use git show-branch or gitk to visualize
exactly how the branches relate.

Or> When you say the code is pushed into master.kernel.org are you
Or> referring to the mm tree of Andrew Morton? i don't see he has
Or> one under kernel.org/git?

No, I mean it's in my tree on master.kernel.org, rather than just
sitting on my local hard disk.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] (was: slab error while removing ib_mad) testing IB of a kernel before its release

2006-05-17 Thread Or Gerlitz


Roland Dreier wrote:

Or> I think you were on vacation when i posted this, there were
Or> two responses saying they were not able to reproduce it, but
Or> no one was trying 2.6.17-X

Not sure why you expect me to solve this -- other than the fact that I
am a great debugger ;)


Let me clarify a little:

The test case for itself (probing out a module loaded by the pci hotplug 
subsystem) is kind of rare and its not the issue (I am doing it when 
replacing the ib stack with newer code)


When i posted the original report, i got responses from two people both 
saying they have tried it with this or that flavor of the current stable 
kernel (2.6.16) and that the problem does not reproduce (sure...).


The impression i was getting from those responses and the luck of others 
is that (say) almost no one of the openib maintainers test infiniband 
with the "next" kernel which is not released yet.


And if we don't test it, sure we can't expect it to work, no less.

That's the point i wanted to make later in that thread. I opted for 
deferring this discussion for a time you are around, so this is why i 
write it only now.


Or.









___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] The AF_INET_RDS value

2006-05-17 Thread John Blackwood


Hello,

I noticed that in ulp/rds/rds_inet.h, the value for AF_INET_RDS is:

#define AF_INET_RDS 30


But in include/linux/socket.h, there is already a AF_TIPC with the
same value:

#define AF_WANPIPE  25  /* Wanpipe API Sockets */
#define AF_LLC  26  /* Linux LLC*/
#define AF_TIPC 30  /* TIPC sockets */
#define AF_BLUETOOTH31  /* Bluetooth sockets*/
#define AF_MAX  32  /* For now.. */


Just wondering if the AF_INET_RDS value should be changed?

Thanks.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] OFED RC4 also can't support >2000 connections

2006-05-17 Thread Eitan Zahavi

Hi Zhu,

If you are using libsdp.conf to select which ports should map to SDP and
which to TCP you might run out of resources for tracking the opened
sockets. 

Try increasing the following constant in libsdp:
libsdp/src/port.c line 48:
#define MAPPED_SOCKET_MAX   1024
to something like:
#define MAPPED_SOCKET_MAX   1

Or, if you can use SDP sockets only (your config file is empty anyway):
SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so squid -d 10 -f squid.conf
SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n 2000 -X
193.12.10.14:3129

Hope this fixes the issue you see

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of zhu shi song
> Sent: Wednesday, May 17, 2006 3:17 PM
> To: openib-general@openib.org
> Subject: [openib-general] OFED RC4 also can't support >2000
connections
> 
> I have installed OFED RC4 on my RHEL 4.3(2.6.9-34
> kernel). I use the same method I told in previous
> mail.  When increasing concurrent sdp connection to
> 2000. sdp refuse connection in server side. And client
> can't connect to server through sdp connection
> forever.
> 
> OS: RHEL 4.3 (2.6.9-34)
> IB: OFED RC4
> Test Method:
> Server: LD_PRELOAD=libsdp.so squid -d 10 -f
> squid.conf( sdp listening on IB0: 193.12.10.14:3129)
> Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n
> 2000 -X 193.12.10.14:3129
> http://www.google.com/index.html ( IB0: 193.12.10.24)
> 
> 
> Who know what's wrong with sdp many concurrent
> connections?  I have bought the cards for about 3
> weeks, but I can't make them work correctly.  Urgent!
> 
> tks
> zhu
> 
> 
> 
> 
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around
> http://mail.yahoo.com
> 
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
> When you say the code is pushed into master.kernel.org are you referring 
> to the mm tree of Andrew Morton? i don't see he has one under 
> kernel.org/git?

Andrew does not use git for development.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator

2006-05-17 Thread Or Gerlitz

Roland Dreier wrote:

Or> I don't see the niether of the two iscsi updates for 2.6.18
Or> (both sent by Mike Christie) in your git tree, i was looking
Or> for it all over (in the for-2.6.18 , for-mm, master, for-linus
Or> branches ...). Do i missing anything or you were waiting for
Or> my repost of the patches to pull the iscsi updates?

Yeah, I haven't pushed it out yet.

I will be putting iSER into an iser branch of my tree, which I'll ask
Linus to pull once the SCSI changes are in his tree.

OK, i have tested iSCSI/iSER with the kernel being built from the for-mm 
branch of your git tree and it works fine!

Can you spare few words whats the difference between the for-2.6.18 and 
for-mm branches of your git tree?

> Or> OK, thanks. Let me know when you have the branch, so i will be
> Or> able to test it with this exact code configuration.
>
> It's there and pushed to master.kernel.org

When you say the code is pushed into master.kernel.org are you referring 
to the mm tree of Andrew Morton? i don't see he has one under 
kernel.org/git?

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] OFED RC4 also can't support >2000 connections

2006-05-17 Thread zhu shi song

I have installed OFED RC4 on my RHEL 4.3(2.6.9-34
kernel). I use the same method I told in previous
mail.  When increasing concurrent sdp connection to
2000. sdp refuse connection in server side. And client
can't connect to server through sdp connection
forever.

OS: RHEL 4.3 (2.6.9-34)
IB: OFED RC4
Test Method:
Server: LD_PRELOAD=libsdp.so squid -d 10 -f
squid.conf( sdp listening on IB0: 193.12.10.14:3129)
Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n
2000 -X 193.12.10.14:3129
http://www.google.com/index.html ( IB0: 193.12.10.24)


Who know what's wrong with sdp many concurrent
connections?  I have bought the cards for about 3
weeks, but I can't make them work correctly.  Urgent!

tks
zhu




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam
protection around 
http://mail.yahoo.com 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 07:41, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > Subject: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be 
> > compiled out
> > 
> > OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out
> > 
> > Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>
> 
> 
> 
> > @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb(
> >   lid.
> >*/
> >/* For now - do not add the alternate dr path to the release */
> > -  if (0)
> > -if ( p_madw->mad_addr.dest_lid != 0x )
> > +#if 0
> > +  if ( p_madw->mad_addr.dest_lid != 0x )
> 
> In my experience, if you compile with -O, gcc does a good enough job of
> dead code elimination.

But not all builds are that way though.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] Re: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out

2006-05-17 Thread Michael S. Tsirkin

Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> Subject: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be 
> compiled out
> 
> OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out
> 
> Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>



> @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb(
>   lid.
>*/
>/* For now - do not add the alternate dr path to the release */
> -  if (0)
> -if ( p_madw->mad_addr.dest_lid != 0x )
> +#if 0
> +  if ( p_madw->mad_addr.dest_lid != 0x )

In my experience, if you compile with -O, gcc does a good enough job of
dead code elimination.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out

2006-05-17 Thread Hal Rosenstock

OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out

Signed-off-by: Hal Rosenstock <[EMAIL PROTECTED]>

Index: opensm/osm_sm_mad_ctrl.c
===
--- opensm/osm_sm_mad_ctrl.c(revision 7202)
+++ opensm/osm_sm_mad_ctrl.c(working copy)
@@ -803,7 +803,9 @@ __osm_sm_mad_ctrl_send_err_cb(
   IN osm_madw_t *p_madw )
 {
   osm_sm_mad_ctrl_t* p_ctrl = (osm_sm_mad_ctrl_t*)bind_context;
+#if 0
   osm_physp_t* p_physp;
+#endif
   ib_api_status_t status;
   ib_smp_t* p_smp;
 
@@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb(
  lid.
   */
   /* For now - do not add the alternate dr path to the release */
-  if (0)
-if ( p_madw->mad_addr.dest_lid != 0x )
+#if 0
+  if ( p_madw->mad_addr.dest_lid != 0x )
+  {
+p_physp =
+  osm_get_physp_by_mad_addr(p_ctrl->p_log,
+p_ctrl->p_subn,
+&(p_madw->mad_addr));
+if (!p_physp)
 {
-  p_physp =
-osm_get_physp_by_mad_addr(p_ctrl->p_log,
-  p_ctrl->p_subn,
-  &(p_madw->mad_addr));
-  if (! p_physp)
-  {
-osm_log( p_ctrl->p_log, OSM_LOG_ERROR,
- "__osm_sm_mad_ctrl_send_err_cb: ERR 3114: "
- "Failed to find the corresponding phys port\n");
-  }
-  else
-  {
-osm_physp_replace_dr_path_with_alternate_dr_path(
-  p_ctrl->p_log, p_ctrl->p_subn, p_physp, p_madw->h_bind );
-  }
+  osm_log( p_ctrl->p_log, OSM_LOG_ERROR,
+   "__osm_sm_mad_ctrl_send_err_cb: ERR 3114: "
+   "Failed to find the corresponding phys port\n");
 }
+else
+{
+  osm_physp_replace_dr_path_with_alternate_dr_path(
+  p_ctrl->p_log, p_ctrl->p_subn, p_physp, p_madw->h_bind );
+}
+  }
+#endif
 
   /*
 An error occurred.  No response was received to a request MAD.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] opensm segfault?

2006-05-17 Thread Hal Rosenstock

On Wed, 2006-05-17 at 02:10, Eitan Zahavi wrote:
> cl_memcpy  should have some debug capabilities on top of memcpy .

I don't see any. Did I miss something ?
..
> cl memory management provide means to track all memory allocations, etc.

Yes, there is extra memory tracking code for malloc and free. This is a
separable item in my mind right now.

-- Hal

> Eitan Zahavi
> Senior Engineering Director, Software Architect
> Mellanox Technologies LTD
> Tel:+972-4-9097208
> Fax:+972-4-9593245
> P.O. Box 586 Yokneam 20692 ISRAEL
> 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:openib-general-
> > [EMAIL PROTECTED] On Behalf Of Sasha Khapyorsky
> > Sent: Wednesday, May 17, 2006 2:11 AM
> > To: Troy Benjegerdes
> > Cc: openib-general@openib.org
> > Subject: Re: [openib-general] opensm segfault?
> > 
> > Hi Troy,
> > 
> > On 14:41 Tue 16 May , Troy Benjegerdes wrote:
> > > I got this after an indeterminate amount of time running opensm..
> > 
> > May this be reproducible? Or it is completely random failure?
> > 
> > > (gdb) bt
> > > #0  0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850,
> p_src=0x0,
> > > count=64) at cl_memory_osd.c:87
> > > #1  0x00415053 in osm_pkey_tbl_sync_new_blocks (
> > > p_pkey_tbl=0x2ad99228) at osm_pkey.c:127
> > > #2  0x00416687 in osm_pkey_mgr_process (p_osm=0x580e40)
> > > at osm_pkey_mgr.c:407
> > > #3  0x0043bb22 in osm_state_mgr_process (p_mgr=0x581ad8,
> > > signal=3)
> > > at osm_state_mgr.c:2243
> > > #4  0x0043c88f in __osm_state_mgr_ctrl_disp_callback (
> > > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70
> > > #5  0x2b90b0db9437 in __cl_disp_worker (context=0x5831f0)
> > > at cl_dispatcher.c:108
> > > #6  0x2b90b0dc1ca3 in __cl_thread_pool_routine
> (context=0x583268)
> > > at cl_threadpool.c:78
> > > #7  0x2b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at
> > > cl_thread.c:61
> > > #8  0x2b90b0fe3b1c in start_thread () from /lib/libpthread.so.0
> > > #9  0x2b90b12c8273 in clone () from /lib/libc.so.6
> > >
> > >
> > >
> > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This
> > > just seems like excessive uneeded abstraction.
> > 
> > Absolutely agree with you.
> > 
> > Sasha.
> > 
> > > I'm running opensm from subversion rev 7091..
> > >
> > > May 10 16:27:53 145969 [] -> OpenSM Rev:openib-1.2.0 OpenIB svn
> > > 6251:7091M
> > >
> > > the only local changes are as follows:
> > >
> > > [EMAIL PROTECTED]:/usr/src/openib-src/userspace/management$ svn diff
> > > Index: osm/opensm/osm_port_info_rcv.c
> > > ===
> > > --- osm/opensm/osm_port_info_rcv.c  (revision 7091)
> > > +++ osm/opensm/osm_port_info_rcv.c  (working copy)
> > > @@ -469,9 +469,14 @@
> > >goto Exit;
> > >  }
> > >
> > > +#if 0
> > >  /* Check for IBM eHCA firmware defect in reporting partition
> > >  * enforcement cap */
> > >  if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info))
> ==
> > > IBM_VENDOR_ID)
> > >p_switch->switch_info.enforce_cap = 0;
> > > +#endif
> > > +/* Check for busted divergenet switch on ameslab network */
> > > +if (cl_ntoh64(p_node->node_info.node_guid) ==
> 0x00084e000152)
> > > +   p_switch->switch_info.enforce_cap = 0;
> > >
> > >  /* Bail out if this is a switch with no partition enforcement
> > >  * capability */
> > >  if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0)
> > > ___
> > > openib-general mailing list
> > > openib-general@openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] OFED-1.0-rc4 need db-devel

2006-05-17 Thread Vladimir Sokolovsky


Zhu,
db-devel package is required to build open_iscsi package RPM.
This package is not relevant for RHEL 4.3.
There are two options to install OFED-1.0-rc4 on RHEL 4.3 without 
open_iscsi:

1. Select "Custom installation" and don't choose to install open_iscsi.
2. Edit ofed.conf (created automatically under OFED-1.0-rc4 directory 
when you run install.sh or build.sh) and set *open_iscsi=n*.

   Then run:
   ./install.sh -c ofed.conf

Regards,
Vladimir

zhu shi song wrote:

I have downloaded OFED-1.0-rc4 for my RHEL 4.3.  But I
can't build all modules because it needs db-devel.
RHEL 4.3 just have db4-devel there is no db-devel.  Is
there anything I don't know?

  tks
  zhu

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___

openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

  


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] OFED-1.0-rc4 need db-devel

2006-05-17 Thread zhu shi song

I have downloaded OFED-1.0-rc4 for my RHEL 4.3.  But I
can't build all modules because it needs db-devel.
RHEL 4.3 just have db4-devel there is no db-devel.  Is
there anything I don't know?

  tks
  zhu

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

92 matches

Mail list logo