Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

2006-10-01 Thread Sean Hefty
>> This is correct.  Note that the number of DREQ retries was changed to 15 now.
>
>do you mean internally changed in the CM or somehow controlled from
>the outside by uDAPL?

I meant the number of retries set by RDMA CM.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64

2006-10-01 Thread Scott Weitzenkamp (sweitzen)
$ uname -a
Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006
x86_64
x86_64 x86_64 GNU/Linux
$ /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
192.168.2.46 192.168.2.49 hostname
svbu-qa1850-4
svbu-qa1850-3
$ /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2
192.168.2.46 192.168.2.49
/usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_benchmarks-2.2/
osu_latency

The last command just hangs.  Can I try your binary RPMs?

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -Original Message-
> From: Aviram Gutman [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, October 01, 2006 2:29 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: OpenFabricsEWG; openib; [EMAIL PROTECTED]
> Subject: Re: [openfabrics-ewg] problems running MVAPICH on 
> OFED 1.1 rc6 with SLES10 x86_64
> 
> Can you please elaborate on MVAPICH issues, can you send 
> command line? 
> We ran it here on 32 Opteron nodes each quad core and also rigorous 
> tests on the many other nodes?
> 
> 
> 
> Scott Weitzenkamp (sweitzen) wrote:
> > We are just getting started with OFED testing on SLES10, first 
> > platform is x86_64.
> >  
> > IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are 
> working so far.  
> > MVAPICH with OSU benchmarks just hang.This same hardware works 
> > fine with OFED and RHEL4 U3.
> >  
> > Has anyone else seen this?
> >  
> > Scott Weitzenkamp
> > SQA and Release Manager
> > Server Virtualization Business Unit
> > Cisco Systems
> >  
> > 
> --
> --
> >
> > ___
> > openfabrics-ewg mailing list
> > [EMAIL PROTECTED]
> > http://openib.org/mailman/listinfo/openfabrics-ewg
> >   
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ

2006-10-01 Thread Or Gerlitz
On 9/28/06, Sean Hefty <[EMAIL PROTECTED]> wrote:
> Or Gerlitz wrote:
> > My understanding is that without this patch the side that sends the DREQ
> > would do few DREQ resends as of the "firsts" DREPs being lost and no
> > DREPs sent once the id at the peer side left the timewait state, correct?
>
> This is correct.  Note that the number of DREQ retries was changed to 15 now.

do you mean internally changed in the CM or somehow controlled from
the outside by uDAPL?

> > Can you please share what were the implications with intel MPI running a
> > 64 nodes (128 ranks?) job? was the issue here just making the ***job
> > termination time*** bigger?
>
> The job termination time was taking about a minute waiting for the DREQ to
> timeout.  When running a series of tests, this becomes a fairly large issue.

Just something you might want to verify with the intel MPI  team, does
their terminate code looks like:

for (i=0,N-1)
  call dat_ep_disconnect(ep[i]...)

j=0
while(j < N) {
   dat_evd_wait(conn_evd)
   verify its a disconnected event on EP[i] for some 0 < i < N-1
   j++
 }

and not

for (i=0,N-1)
   dat_ep_disconnect(ep[i]...)
   dat_evd_wait(conn_evd)
}

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/SRP: Enable multichannel

2006-10-01 Thread Lakshmanan, Madhu
> Quoting r. Michael S. Tsirkin <[EMAIL PROTECTED]>:
> Subject: RE: [openib-general] [PATCH] IB/SRP: Enable multichannel
> >Quoting r. Lakshmanan, Madhu <[EMAIL PROTECTED]>:
> >Subject: RE: [openib-general] [PATCH] IB/SRP: Enable multichannel
> >
> >Quoting r.  Vu Pham [EMAIL PROTECTED]:
> > >Subject: Re: [PATCH] IB/SRP: Enable multichannel
> > >What is the advantage to have multiple connections/qps on the same
> > >physical port to the same target? The disavantages are wasting
> > >resources, instability, no fail-over on physical port error...
> >
> >The advantage is if the target in question is an IOC that connects to
a
> > FC SAN for example. In this case, the host is physically connected
to
> > the same IOC, but can maintain independent logical connections to
> > specific storage devices on the SAN that are "behind" the IOC.
>
> We could just let the user specify the Id Ext when adding the device.
> How does this sound?
> --
> MST

I agree. That was exactly what I had in mind. I'll work on the patch
that does that.

Madhu


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/SRP: Enable multichannel

2006-10-01 Thread Michael S. Tsirkin
Quoting r. Lakshmanan, Madhu <[EMAIL PROTECTED]>:
> Subject: RE: [openib-general] [PATCH] IB/SRP: Enable multichannel
> 
> Quoting r.  Vu Pham [EMAIL PROTECTED]:
> > Subject: Re: [PATCH] IB/SRP: Enable multichannel
> > What is the advantage to have multiple connections/qps on the same
> >physical port to the same target? The disavantages are wasting
> >resources, instability, no fail-over on physical port error...
> 
> The advantage is if the target in question is an IOC that connects to a
> FC SAN for example. In this case, the host is physically connected to
> the same IOC, but can maintain independent logical connections to
> specific storage devices on the SAN that are "behind" the IOC.

We could just let the user specify the Id Ext when adding the device.
How does this sound?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Kernel Oops in user-mad, mad

2006-10-01 Thread Michael S. Tsirkin
Quoting r. Jack Morgenstein <[EMAIL PROTECTED]>:
> Subject: Kernel Oops in user-mad, mad
> 
> We received the following kernel Oops while running regression
> (see console picture attached).
> 
> This looks like a possible race condition between handling umad send 
> completions
> and ib_unregister_mad_agent.
> 
> The Oops is at the list_del line of dequeue_send (user_mad.c: 186)
> Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads 
> -> agent send handler.
> 
> Is there a possibility that there is a double deletion from a list somewhere?
> 
> Jack
> 
> 
> 

Was this during module unload?
-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/SRP: Enable multichannel

2006-10-01 Thread Lakshmanan, Madhu
Quoting r.  Vu Pham [EMAIL PROTECTED]:
> Subject: Re: [PATCH] IB/SRP: Enable multichannel
> What is the advantage to have multiple connections/qps on the same
>physical port to the same target? The disavantages are wasting
resources, >instability, no fail-over on physical port error...

The advantage is if the target in question is an IOC that connects to a
FC SAN for example. In this case, the host is physically connected to
the same IOC, but can maintain independent logical connections to
specific storage devices on the SAN that are "behind" the IOC.

Quoting r. Michael S. Tsirkin <[EMAIL PROTECTED]>
>> Subject: Re: [PATCH] IB/SRP: Enable multichannel
>> 
>> Maybe we should just use the port GUID instead of the node GUID to
>> form the initiator ID?  That would solve this pretty cleanly I think.

> Sounds good.
> I think we should also stick the pkey into the identifier extension -
> I think it's nice for each partition to be able to act as a separate
>virtual network, not affecting others.

> What do you think?
> -- 
> MST

Sticking the pkey into the identifier extension may once again restrict
the ability of the host to have multiple logical connections to an SRP
IOC target. 

The most flexible approach appears to be:
Identifier ID = Port GUID
Identifier Extension = User specified 

Ishai's IB-SRP patch of 09/27 appears to accomplish the above.

Madhu


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64

2006-10-01 Thread Aviram Gutman
Can you please elaborate on MVAPICH issues, can you send command line? 
We ran it here on 32 Opteron nodes each quad core and also rigorous 
tests on the many other nodes?



Scott Weitzenkamp (sweitzen) wrote:
> We are just getting started with OFED testing on SLES10, first 
> platform is x86_64.
>  
> IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are working so far.  
> MVAPICH with OSU benchmarks just hang.This same hardware works 
> fine with OFED and RHEL4 U3.
>  
> Has anyone else seen this?
>  
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
>
> ___
> openfabrics-ewg mailing list
> [EMAIL PROTECTED]
> http://openib.org/mailman/listinfo/openfabrics-ewg
>   


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD

2006-10-01 Thread Michael S. Tsirkin
Quoting r. Tseng-Hui (Frank) Lin <[EMAIL PROTECTED]>:
> Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is notloaded 
> on AMD
> 
> The ppc64 problem is actually in pci_64.c. Here is the patch:
> 
>  cut here =
> diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> index 4c4449b..490403c 100644
> --- a/arch/powerpc/kernel/pci_64.c
> +++ b/arch/powerpc/kernel/pci_64.c
> @@ -734,9 +734,7 @@ static struct resource *__pci_mmap_make_
>   if (hose == 0)
>   return NULL;/* should never happen */
>  
> - /* If memory, add on the PCI bridge address offset */
>   if (mmap_state == pci_mmap_mem) {
> - *offset += hose->pci_mem_offset;
>   res_bit = IORESOURCE_MEM;
>   } else {
>   io_offset = (unsigned long)hose->io_base_virt - pci_io_base;
> = end cut =
> 
> The mmap() system call on resource0 does not work on ppc64 without this
> patch. PowerMAC G5 got away with this because its hose->pci_mem_offset
> was set to 0.
> 
> The fix is made on 8/21. It may be able to make it into 2.6.19. But it
> certainly won't get into SLES10, SLES9-SP3, or REHL4-U4 which have
> already been released. 
> 
> To cover both cases with and without the fix, my patch try to
> mmap /sys/bus/pci//resource0 first. It it failed it tries
> mmap /proc/bus/pci/ If it failed again, we have no choice but fall
> back to use PCI config space.

OK, so for OFED just mmap from /proc/bus/pci/ should be sufficient 
work-around - it will make things work when driver is loaded.
Correct?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64

2006-10-01 Thread Scott Weitzenkamp (sweitzen)



We are just getting 
started with OFED testing on SLES10, first platform is 
x86_64.
 
IPoIB, SDP, SRP, 
Open MPI, HP MPI, and Intel MPI are working so far.  MVAPICH with OSU 
benchmarks just hang.    This same hardware works fine with OFED 
and RHEL4 U3.
 
Has anyone else seen 
this?
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business 
Unit
Cisco Systems
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general