Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ
>> This is correct. Note that the number of DREQ retries was changed to 15 now. > >do you mean internally changed in the CM or somehow controlled from >the outside by uDAPL? I meant the number of retries set by RDMA CM. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64
$ uname -a Linux svbu-qa1850-3 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux $ /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 192.168.2.46 192.168.2.49 hostname svbu-qa1850-4 svbu-qa1850-3 $ /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 192.168.2.46 192.168.2.49 /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_benchmarks-2.2/ osu_latency The last command just hangs. Can I try your binary RPMs? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -Original Message- > From: Aviram Gutman [mailto:[EMAIL PROTECTED] > Sent: Sunday, October 01, 2006 2:29 AM > To: Scott Weitzenkamp (sweitzen) > Cc: OpenFabricsEWG; openib; [EMAIL PROTECTED] > Subject: Re: [openfabrics-ewg] problems running MVAPICH on > OFED 1.1 rc6 with SLES10 x86_64 > > Can you please elaborate on MVAPICH issues, can you send > command line? > We ran it here on 32 Opteron nodes each quad core and also rigorous > tests on the many other nodes? > > > > Scott Weitzenkamp (sweitzen) wrote: > > We are just getting started with OFED testing on SLES10, first > > platform is x86_64. > > > > IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are > working so far. > > MVAPICH with OSU benchmarks just hang.This same hardware works > > fine with OFED and RHEL4 U3. > > > > Has anyone else seen this? > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > -- > -- > > > > ___ > > openfabrics-ewg mailing list > > [EMAIL PROTECTED] > > http://openib.org/mailman/listinfo/openfabrics-ewg > > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ
On 9/28/06, Sean Hefty <[EMAIL PROTECTED]> wrote: > Or Gerlitz wrote: > > My understanding is that without this patch the side that sends the DREQ > > would do few DREQ resends as of the "firsts" DREPs being lost and no > > DREPs sent once the id at the peer side left the timewait state, correct? > > This is correct. Note that the number of DREQ retries was changed to 15 now. do you mean internally changed in the CM or somehow controlled from the outside by uDAPL? > > Can you please share what were the implications with intel MPI running a > > 64 nodes (128 ranks?) job? was the issue here just making the ***job > > termination time*** bigger? > > The job termination time was taking about a minute waiting for the DREQ to > timeout. When running a series of tests, this becomes a fairly large issue. Just something you might want to verify with the intel MPI team, does their terminate code looks like: for (i=0,N-1) call dat_ep_disconnect(ep[i]...) j=0 while(j < N) { dat_evd_wait(conn_evd) verify its a disconnected event on EP[i] for some 0 < i < N-1 j++ } and not for (i=0,N-1) dat_ep_disconnect(ep[i]...) dat_evd_wait(conn_evd) } Or. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/SRP: Enable multichannel
> Quoting r. Michael S. Tsirkin <[EMAIL PROTECTED]>: > Subject: RE: [openib-general] [PATCH] IB/SRP: Enable multichannel > >Quoting r. Lakshmanan, Madhu <[EMAIL PROTECTED]>: > >Subject: RE: [openib-general] [PATCH] IB/SRP: Enable multichannel > > > >Quoting r. Vu Pham [EMAIL PROTECTED]: > > >Subject: Re: [PATCH] IB/SRP: Enable multichannel > > >What is the advantage to have multiple connections/qps on the same > > >physical port to the same target? The disavantages are wasting > > >resources, instability, no fail-over on physical port error... > > > >The advantage is if the target in question is an IOC that connects to a > > FC SAN for example. In this case, the host is physically connected to > > the same IOC, but can maintain independent logical connections to > > specific storage devices on the SAN that are "behind" the IOC. > > We could just let the user specify the Id Ext when adding the device. > How does this sound? > -- > MST I agree. That was exactly what I had in mind. I'll work on the patch that does that. Madhu ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/SRP: Enable multichannel
Quoting r. Lakshmanan, Madhu <[EMAIL PROTECTED]>: > Subject: RE: [openib-general] [PATCH] IB/SRP: Enable multichannel > > Quoting r. Vu Pham [EMAIL PROTECTED]: > > Subject: Re: [PATCH] IB/SRP: Enable multichannel > > What is the advantage to have multiple connections/qps on the same > >physical port to the same target? The disavantages are wasting > >resources, instability, no fail-over on physical port error... > > The advantage is if the target in question is an IOC that connects to a > FC SAN for example. In this case, the host is physically connected to > the same IOC, but can maintain independent logical connections to > specific storage devices on the SAN that are "behind" the IOC. We could just let the user specify the Id Ext when adding the device. How does this sound? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Kernel Oops in user-mad, mad
Quoting r. Jack Morgenstein <[EMAIL PROTECTED]>: > Subject: Kernel Oops in user-mad, mad > > We received the following kernel Oops while running regression > (see console picture attached). > > This looks like a possible race condition between handling umad send > completions > and ib_unregister_mad_agent. > > The Oops is at the list_del line of dequeue_send (user_mad.c: 186) > Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads > -> agent send handler. > > Is there a possibility that there is a double deletion from a list somewhere? > > Jack > > > Was this during module unload? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/SRP: Enable multichannel
Quoting r. Vu Pham [EMAIL PROTECTED]: > Subject: Re: [PATCH] IB/SRP: Enable multichannel > What is the advantage to have multiple connections/qps on the same >physical port to the same target? The disavantages are wasting resources, >instability, no fail-over on physical port error... The advantage is if the target in question is an IOC that connects to a FC SAN for example. In this case, the host is physically connected to the same IOC, but can maintain independent logical connections to specific storage devices on the SAN that are "behind" the IOC. Quoting r. Michael S. Tsirkin <[EMAIL PROTECTED]> >> Subject: Re: [PATCH] IB/SRP: Enable multichannel >> >> Maybe we should just use the port GUID instead of the node GUID to >> form the initiator ID? That would solve this pretty cleanly I think. > Sounds good. > I think we should also stick the pkey into the identifier extension - > I think it's nice for each partition to be able to act as a separate >virtual network, not affecting others. > What do you think? > -- > MST Sticking the pkey into the identifier extension may once again restrict the ability of the host to have multiple logical connections to an SRP IOC target. The most flexible approach appears to be: Identifier ID = Port GUID Identifier Extension = User specified Ishai's IB-SRP patch of 09/27 appears to accomplish the above. Madhu ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64
Can you please elaborate on MVAPICH issues, can you send command line? We ran it here on 32 Opteron nodes each quad core and also rigorous tests on the many other nodes? Scott Weitzenkamp (sweitzen) wrote: > We are just getting started with OFED testing on SLES10, first > platform is x86_64. > > IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are working so far. > MVAPICH with OSU benchmarks just hang.This same hardware works > fine with OFED and RHEL4 U3. > > Has anyone else seen this? > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > ___ > openfabrics-ewg mailing list > [EMAIL PROTECTED] > http://openib.org/mailman/listinfo/openfabrics-ewg > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] FW: Mstflint - not working on ppc64 andwhendriver is notloaded on AMD
Quoting r. Tseng-Hui (Frank) Lin <[EMAIL PROTECTED]>: > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is notloaded > on AMD > > The ppc64 problem is actually in pci_64.c. Here is the patch: > > cut here = > diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c > index 4c4449b..490403c 100644 > --- a/arch/powerpc/kernel/pci_64.c > +++ b/arch/powerpc/kernel/pci_64.c > @@ -734,9 +734,7 @@ static struct resource *__pci_mmap_make_ > if (hose == 0) > return NULL;/* should never happen */ > > - /* If memory, add on the PCI bridge address offset */ > if (mmap_state == pci_mmap_mem) { > - *offset += hose->pci_mem_offset; > res_bit = IORESOURCE_MEM; > } else { > io_offset = (unsigned long)hose->io_base_virt - pci_io_base; > = end cut = > > The mmap() system call on resource0 does not work on ppc64 without this > patch. PowerMAC G5 got away with this because its hose->pci_mem_offset > was set to 0. > > The fix is made on 8/21. It may be able to make it into 2.6.19. But it > certainly won't get into SLES10, SLES9-SP3, or REHL4-U4 which have > already been released. > > To cover both cases with and without the fix, my patch try to > mmap /sys/bus/pci//resource0 first. It it failed it tries > mmap /proc/bus/pci/ If it failed again, we have no choice but fall > back to use PCI config space. OK, so for OFED just mmap from /proc/bus/pci/ should be sufficient work-around - it will make things work when driver is loaded. Correct? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] problems running MVAPICH on OFED 1.1 rc6 with SLES10 x86_64
We are just getting started with OFED testing on SLES10, first platform is x86_64. IPoIB, SDP, SRP, Open MPI, HP MPI, and Intel MPI are working so far. MVAPICH with OSU benchmarks just hang. This same hardware works fine with OFED and RHEL4 U3. Has anyone else seen this? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general