Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB

2006-09-05 Thread Michael S. Tsirkin
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
> Subject: getting LOC_QP_OP_ERR with IPoIB
> 
> Hi,
> 
> While doing some work to have linux bonding driver be able to work on top
> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 
> 62.
> 
>   ib0: failed send event (status=2, wrid=52 vend_err 62)
> 
> What does this vendor error means? its the same system over which i saw the 
> qp modify error.

vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD mismatched

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Leonid Arsh
Hi list,

 I have a question regarding the guid2lid cache file.

  The file is read by OpenSM on the start up.
  OpenSM may reassign LIDs according to the LIDs saved in this file.
 It isn't always acceptable.

 Is it a right policy? Am I missing anything here?
 Is there a way to disable the file reading on start up?

Regards,
   Leonid

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB

2006-09-05 Thread Or Gerlitz
Michael S. Tsirkin wrote:
> Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:

>> While doing some work to have linux bonding driver be able to work on top
>> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 
>> 62.
>>  ib0: failed send event (status=2, wrid=52 vend_err 62)
>> What does this vendor error means? its the same system over which i saw the 
>> qp modify error.


> vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD 
> mismatched

Thanks.

So what's your thinking, am i running into some ipoib bogus scenario?

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB

2006-09-05 Thread Michael S. Tsirkin
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
> Subject: Re: getting LOC_QP_OP_ERR with IPoIB
> 
> Michael S. Tsirkin wrote:
> > Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
> 
> >> While doing some work to have linux bonding driver be able to work on top
> >> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) 
> >> error 62.
> >>ib0: failed send event (status=2, wrid=52 vend_err 62)
> >> What does this vendor error means? its the same system over which i saw 
> >> the qp modify error.
> 
> 
> > vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD 
> > mismatched
> 
> Thanks.
> 
> So what's your thinking, am i running into some ipoib bogus scenario?
> 
> Or.

Donnu, it looks really weird. Could you try firmware 3.5.0 please?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Hal Rosenstock
Hi Leonid,

On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
> Hi list,
> 
>  I have a question regarding the guid2lid cache file.
> 
>   The file is read by OpenSM on the start up.
>   OpenSM may reassign LIDs according to the LIDs saved in this file.
>  It isn't always acceptable.
> 
>  Is it a right policy? Am I missing anything here?
>  Is there a way to disable the file reading on start up?

There is the -r (--reassign_lids) option for this but it is not the
default behavior of OpenSM.

-- Hal

> 
> Regards,
>Leonid
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] MPI Brodcast doubt

2006-09-05 Thread Hal Rosenstock
John,

On Mon, 2006-09-04 at 08:56, john t wrote:
> Hi,
>  
> I have 3 nodes connected via IB as shown below:
>  
> node1 ---> switch1 ---> node2
> |--> node3
>  
> If node1 sends a brodcast message to node2 and node3, I want to know
> if the message is delivered to the switch twice (first time for node2
> and second time for node3) or just once (where switch will know by
> looking at some headers or so that its a brodcast message and will
> send it on all the outgoing ports) ?

Assuming nodes 1, 2, and 3 are part of the same multicast group, the
multicast send is sent once from node 1. When received at the switch, it
is replicated to all ports which have members in the same group (in this
case, nodes 2 and 3). The switch knows by the header (specifically the
LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable
to determine on which ports to forward it. However, IB multicast is
unreliable so to create reliable multicast, it is sometimes "emulated"
in that the sender tracks the group members and may use serial unicast
sends or augment a multicast send with unicast sends to the receivers
and track their acknowledgements of receipt.

-- Hal

> Regards,
> John T.
> 
> __
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Leonid Arsh
Hi Hal,

  Thank you for your reply.

Probably I wasn't clear.

I have a problem when OpenSM, being started, reads an out-if-date guid2lid file.
OpenSM changes LIDs in this case.
I don't want  the LIDs to be changed.
As I understand it, the '-r' option, on the contrary, causes the SM to
reassign all the LIDs.

I could just remove the file to handle the problem.
I'd like to know if there is a way to do it without touching the file.

Thanks,
Leonid

On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> Hi Leonid,
>
> On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
> > Hi list,
> >
> >  I have a question regarding the guid2lid cache file.
> >
> >   The file is read by OpenSM on the start up.
> >   OpenSM may reassign LIDs according to the LIDs saved in this file.
> >  It isn't always acceptable.
> >
> >  Is it a right policy? Am I missing anything here?
> >  Is there a way to disable the file reading on start up?
>
> There is the -r (--reassign_lids) option for this but it is not the
> default behavior of OpenSM.
>
> -- Hal
>
> >
> > Regards,
> >Leonid
> >
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> >
>
>
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] MPI Brodcast doubt

2006-09-05 Thread Dotan Barak
Hal Rosenstock wrote:
> John,
>
> On Mon, 2006-09-04 at 08:56, john t wrote:
>   
>> Hi,
>>  
>> I have 3 nodes connected via IB as shown below:
>>  
>> node1 ---> switch1 ---> node2
>> |--> node3
>>  
>> If node1 sends a brodcast message to node2 and node3, I want to know
>> if the message is delivered to the switch twice (first time for node2
>> and second time for node3) or just once (where switch will know by
>> looking at some headers or so that its a brodcast message and will
>> send it on all the outgoing ports) ?
>> 
>
> Assuming nodes 1, 2, and 3 are part of the same multicast group, the
> multicast send is sent once from node 1. When received at the switch, it
> is replicated to all ports which have members in the same group (in this
> case, nodes 2 and 3). The switch knows by the header (specifically the
> LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable
> to determine on which ports to forward it. However, IB multicast is
> unreliable so to create reliable multicast, it is sometimes "emulated"
> in that the sender tracks the group members and may use serial unicast
> sends or augment a multicast send with unicast sends to the receivers
> and track their acknowledgements of receipt.
>
> -- Hal
>   
All of the above is true for IB multicast (there isn't any broadcast in IB).

If the question was "what happens when one send a message using 
MPI_broadcast?"
then the answer will be: it depends on the MPI implementation.
I know that in MVAPICH the MPI handles the duplications by itself by default
(and the switch will get two messages and not one).
There is an option in that MPI to use IB multicast but it is disabled by 
default.

Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Hal Rosenstock
Hi Leonid,

On Tue, 2006-09-05 at 08:11, Leonid Arsh wrote:
> Hi Hal,
> 
>   Thank you for your reply.
> 
> Probably I wasn't clear.
> 
> I have a problem when OpenSM, being started, reads an out-if-date guid2lid 
> file.
> OpenSM changes LIDs in this case.

How do you know the file is "out of date" ?

> I don't want  the LIDs to be changed.

Oh, it's the other way you were asking about.

> As I understand it, the '-r' option, on the contrary, causes the SM to
> reassign all the LIDs.
> 
> I could just remove the file to handle the problem.

or move it aside.

> I'd like to know if there is a way to do it without touching the file.

Not currently. There is the -x (--honor_guid2lid) which will do this
(ignore the guid2lid file) when OpenSM is coming out of STANDBY though.

-- Hal

> Thanks,
> Leonid
> 
> On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > Hi Leonid,
> >
> > On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
> > > Hi list,
> > >
> > >  I have a question regarding the guid2lid cache file.
> > >
> > >   The file is read by OpenSM on the start up.
> > >   OpenSM may reassign LIDs according to the LIDs saved in this file.
> > >  It isn't always acceptable.
> > >
> > >  Is it a right policy? Am I missing anything here?
> > >  Is there a way to disable the file reading on start up?
> >
> > There is the -r (--reassign_lids) option for this but it is not the
> > default behavior of OpenSM.
> >
> > -- Hal
> >
> > >
> > > Regards,
> > >Leonid
> > >
> > > ___
> > > openib-general mailing list
> > > openib-general@openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit 
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> >
> >
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> >
> >


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question

2006-09-05 Thread Or Gerlitz
Michael S. Tsirkin wrote:
> Donnu, it looks really weird. Could you try firmware 3.5.0 please?

I just noted that you can not work with mstflint if the mthca driver is 
not loaded, i think it was not the case in the gen1 tools, am i correct.

Is this connected to this print

ACPI: PCI interrupt for device :02:00.0 disabled

i see once the mthca driver is unloaded?

Or.

> dill:/tmp # modprobe -r ib_mthca

> dill:/tmp # ./mstflint -d 00:02:00.0 q
> *** ERROR *** Read a corrupted device id (0x). Probably HW/PCI access 
> problem
> *** ERROR *** Device type 65535 not supported.
> *** ERROR *** Can not get flash type using device 00:02:00.0

> dill:/tmp # modprobe ib_mthca

> dill:/tmp # ./mstflint -d 00:02:00.0 q
> Image type:  Failsafe
> I.S. Version:1
> Chip Revision:   A1
> GUID Des:Node Port1Port2Sys image
> GUIDs:   0008f104039651dc 0008f104039651dd 0008f104039651de 
> 0008f104039651df
> Board ID: (VLT0010010001)
> VSD:
> PSID:VLT0010010001

> dill:/tmp # dmesg

> ACPI: PCI interrupt for device :02:00.0 disabled

> ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
> ib_mthca: Initializing :02:00.0
> PCI: Enabling device :02:00.0 (0110 -> 0112)
> ACPI: PCI Interrupt :02:00.0[A] -> GSI 29 (level, low) -> IRQ 193


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] problems to regiser memory as a reglar

2006-09-05 Thread Tziporet Koren
Dhabaleswar Panda wrote:
> Christian - Thanks for sending instructions for running mvapich2-0.9.5
> to Tziporet.
>
> Tziporet - Thanks for looking into this problem on SLES9 environment.
>
> Please note that a detailed user guide for running and tuning MVAPICH2
> 0.9.5 is available from the following URL:
>
> http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html
>
> DK
>   
Thanks to all,
We found the bug that was in memory registration flow of SLES9 only.
A fix will be available in OFED 1.1 RC4

Tziporet

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Leonid Arsh
Thanks,

On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > I have a problem when OpenSM, being started, reads an out-if-date guid2lid 
> > file.
> > OpenSM changes LIDs in this case.
>
> How do you know the file is "out of date" ?
>
Actually, the LIDs were assigned by another SM.
When I start my new OpenSM, the old SM is already dead.
Before starting the new OpenSM, the  ibnetdiscover utility shows LIDs different
from ones in the file.
When I start OpenSM, the LIDs are reassigned on the fabric.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 131] working with huge pages may crash the kernel on Suse10

2006-09-05 Thread bugzilla-daemon
http://openib.org/bugzilla/show_bug.cgi?id=131


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED




--- Comment #1 from [EMAIL PROTECTED]  2006-09-05 06:16 ---
was fixed in 1.1-rc3




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 145] IB Core unable to communicate IPoIB on Fedora Core 4

2006-09-05 Thread bugzilla-daemon
http://openib.org/bugzilla/show_bug.cgi?id=145


[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||WONTFIX




--- Comment #2 from [EMAIL PROTECTED]  2006-09-05 06:18 ---
this is not a bug in OFED




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Hal Rosenstock
Leonid,

On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote:
> Thanks,
> 
> On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > I have a problem when OpenSM, being started, reads an out-if-date 
> > > guid2lid file.
> > > OpenSM changes LIDs in this case.
> >
> > How do you know the file is "out of date" ?
> >
> Actually, the LIDs were assigned by another SM.

Different (vendor) SMs have different LID assignment and pathing
(routing) policies. It is inadvisable to failover across vendor SMs for
this and other reasons.

-- Hal

> When I start my new OpenSM, the old SM is already dead.
> Before starting the new OpenSM, the  ibnetdiscover utility shows LIDs 
> different
> from ones in the file.
> When I start OpenSM, the LIDs are reassigned on the fabric.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] opensm: osm_log_init_v2() - new osm_log initializer

2006-09-05 Thread Hal Rosenstock
On Mon, 2006-09-04 at 13:20, Sasha Khapyorsky wrote:
> There is new osm_log initializer osm_log_init_v2(), this is wrapped
> by osm_log_init() in order to preserve existing API.
> 
> Signed-off-by: Sasha Khapyorsky <[EMAIL PROTECTED]>

Thanks. Applied (to trunk and 1.1).

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] OpenSM - guid2lid cache file questions

2006-09-05 Thread Eitan Zahavi
Hi Leonid,

The best approach when switching from another vendor SM to 
OpenSM is to delete the /var/cache/osm/guid2lid file.

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Hal Rosenstock
> Sent: Tuesday, September 05, 2006 4:18 PM
> To: Leonid Arsh
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] OpenSM - guid2lid cache file questions
> 
> Leonid,
> 
> On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote:
> > Thanks,
> >
> > On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > > I have a problem when OpenSM, being started, reads an out-if-date
> guid2lid file.
> > > > OpenSM changes LIDs in this case.
> > >
> > > How do you know the file is "out of date" ?
> > >
> > Actually, the LIDs were assigned by another SM.
> 
> Different (vendor) SMs have different LID assignment and pathing
> (routing) policies. It is inadvisable to failover across vendor SMs for
this and
> other reasons.
> 
> -- Hal
> 
> > When I start my new OpenSM, the old SM is already dead.
> > Before starting the new OpenSM, the  ibnetdiscover utility shows LIDs
> > different from ones in the file.
> > When I start OpenSM, the LIDs are reassigned on the fabric.
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint question

2006-09-05 Thread Michael S. Tsirkin
Quoting r. Or Gerlitz <[EMAIL PROTECTED]>:
> Subject: Re: getting LOC_QP_OP_ERR with IPoIB - mstflint question
> 
> Michael S. Tsirkin wrote:
> > Donnu, it looks really weird. Could you try firmware 3.5.0 please?
> 
> I just noted that you can not work with mstflint if the mthca driver is 
> not loaded, i think it was not the case in the gen1 tools, am i correct.

Yes, recent kernels disable device access once driver is unloaded:

mstflint -d 08:00.0 q
*** ERROR *** Read a corrupted device id (0x). Probably HW/PCI access
problem
*** ERROR *** Device type 65535 not supported.
*** ERROR *** Can not get flash type using device 08:00.0

mstflint should work without driver using /proc:
mstflint -d /proc/bus/pci/08/00.0 q
Image type:  Failsafe
I.S. Version:1
Chip Revision:   A0


In gen1 flint had a separate driver which you had to load.
I am not sure whether this would work on 2.6.18

> Is this connected to this print
> 
>   ACPI: PCI interrupt for device :02:00.0 disabled
> 
> i see once the mthca driver is unloaded?
> 
> Or.

Probably not.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Bub Thomas
Title: libibcm can't connect/talk to libicm on other machine.






I’m still in the process of migrating my gen1 application to gen2.

Actually I CAN connect a gen2 application to a gen2 listener application on the same machine but NOT to a gen 2 listener on another machine.

Any hints where to look at?

Is there anything in the architecture that might prevent a libibcm connection to another machine?

I’m using an old Voltaire switch to connect the machines. Can this be the reason?

The switch didn’t cause problems using gen1 clients.

Thanks

Thomas Bub


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Dotan Barak
Hi bub.


Bub Thomas wrote:
>
> I’m still in the process of migrating my gen1 application to gen2.
>
> Actually I CAN connect a gen2 application to a gen2 listener 
> application on the same machine but NOT to a gen 2 listener on another 
> machine.
>
> Any hints where to look at?
>
> Is there anything in the architecture that might prevent a libibcm 
> connection to another machine?
>
> I’m using an old Voltaire switch to connect the machines. Can this be 
> the reason?
>
> The switch didn’t cause problems using gen1 clients.
>
What is the problem that you see?
there are some examples that comes with the libibcm that can show you 
how to use the library.

there can be several reasons for your problem:
1) side A send a req when side B is not ready and there is a timeout failure
2) only in side A the ib_ucm kernel module enabled
3) SM is not working (well)
4) host A cannot be reached to host B using IB
5) endianess issues?

i tried to use the libibcm and i don't have any problem (but i don't 
have any Voltaire switch, so i can't check your scenario).

Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Hal Rosenstock
Hi Bub,

On Tue, 2006-09-05 at 10:22, Bub Thomas wrote:
> I’m still in the process of migrating my gen1 application to gen2.
> 
> Actually I CAN connect a gen2 application to a gen2 listener
> application on the same machine but NOT to a gen 2 listener on another
> machine.
> 
> Any hints where to look at?

What are you using for SM ? OpenSM or vendor SM ?

> Is there anything in the architecture that might prevent a libibcm
> connection to another machine?

I don't think this is an architectural issue.

-- Hal

> I’m using an old Voltaire switch to connect the machines. Can this be
> the reason?
> 
> The switch didn’t cause problems using gen1 clients.
> 
> Thanks
> 
> Thomas Bub
> 
> 
> 
> __
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Bub Thomas
Dotan,
the ibv_rc_pingpong example works for me so I can exclude the
architecture.
I never got the libibcm example compiled.
Which is your example and which architecture x86 vs. x86_64 did you
compile it for?
Can you share your libibcm the example code? (if it is not the standard
that I can't get compiled)
Thomas

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dotan Barak
Sent: Tuesday, September 05, 2006 5:12 PM
To: Bub Thomas
Cc: openib-general@openib.org
Subject: Re: [openib-general] libibcm can't connect/talk to libicm on
other machine.

Hi bub.


Bub Thomas wrote:
>
> I'm still in the process of migrating my gen1 application to gen2.
>
> Actually I CAN connect a gen2 application to a gen2 listener 
> application on the same machine but NOT to a gen 2 listener on another

> machine.
>
> Any hints where to look at?
>
> Is there anything in the architecture that might prevent a libibcm 
> connection to another machine?
>
> I'm using an old Voltaire switch to connect the machines. Can this be 
> the reason?
>
> The switch didn't cause problems using gen1 clients.
>
What is the problem that you see?
there are some examples that comes with the libibcm that can show you 
how to use the library.

there can be several reasons for your problem:
1) side A send a req when side B is not ready and there is a timeout
failure
2) only in side A the ib_ucm kernel module enabled
3) SM is not working (well)
4) host A cannot be reached to host B using IB
5) endianess issues?

i tried to use the libibcm and i don't have any problem (but i don't 
have any Voltaire switch, so i can't check your scenario).

Dotan

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] New development tool for boot-time drivers (FCode, IEE-1275, IBM/Sun)

2006-09-05 Thread David L Paktor

If anyone is interested in developing boot-time device drivers for plug-in
devices, conformant to the IEEE-1275 (Open Firmware) specification, using
FCode (tokenized Forth source), which is compatible with both IBM and Sun
platforms (and is platform-independent, so that a driver written once is
compatible with all Open Firmware platforms ... but you already know all
this if you're using Open Firmware), then you will need a Tokenizer to
translate from your Forth source to FCode tokens, which are the "medium
of exchange" between the device and the platform.

I am writing to announce that a new FCode Tokenizer, capable of running
on IBM equipment (and that can be compiled on any other host that supports
the GnuCC compiler, and others as well) is freely available at the web-site
of the OpenBIOS project,  www.openbios.org   (and just follow the links
about the New FCODE suite)

If you have any questions, please direct them to the OpenBIOS Mailing List.

Thank you.

-

David L. Paktor  System Firmware Developer
System and Technology Group  Global Firmware Division
[EMAIL PROTECTED]  David L Paktor/Almaden/[EMAIL PROTECTED]

18880 Homestead Rd.  Building 9945
Cupertino CA 95014   Room 1026
408-342-6110 T/L 560-6110

"The Bug Stops Here"
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread Sean Hefty
Bub Thomas wrote:
> Dotan,
> the ibv_rc_pingpong example works for me so I can exclude the
> architecture.
> I never got the libibcm example compiled.
> Which is your example and which architecture x86 vs. x86_64 did you
> compile it for?
> Can you share your libibcm the example code? (if it is not the standard
> that I can't get compiled)
> Thomas

Did you try applying the following patch?

http://openib.org/pipermail/openib-general/2006-August/025005.html

I should also mention that I have a version of cmpost that works with the new 
libibsa, but I am waiting for the review of the kernel sa_query changes before 
committing.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] libibcm can't connect/talk to libicm on other machine.

2006-09-05 Thread JWM
Title: libibcm can't connect/talk to libicm on other machine.



    I know this sounds simple, but 
have you checked the routing tables?
    JW

  - Original Message - 
  From: 
  Bub 
  Thomas 
  To: openib-general@openib.org 
  Sent: Tuesday, September 05, 2006 9:22 
  AM
  Subject: [openib-general] libibcm can't 
  connect/talk to libicm on other machine.
  
  I’m 
  still in the process of migrating my gen1 application to 
  gen2.
  Actually I CAN connect a gen2 application to a gen2 listener application on the same machine but NOT to a gen 2 listener 
  on another machine.
  Any hints where to look 
  at?
  Is 
  there 
  anything in the architecture 
  that might prevent a libibcm connection to another machine?
  I’m using an old Voltaire switch to 
  connect the machines. Can this be the 
  reason?
  The switch 
  didn’t cause problems using 
  gen1 clients.
  Thanks
  Thomas 
  Bub
  
  

  ___openib-general 
  mailing 
  listopenib-general@openib.orghttp://openib.org/mailman/listinfo/openib-generalTo 
  unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis
Robert,

Here is a slightly modified patch for your attributes issue. Can you give it a 
try?

Signed-off by: Arlin Davis [EMAIL PROTECTED]

Index: dapl/openib/dapl_ib_util.c
===
--- dapl/openib/dapl_ib_util.c  (revision 9106)
+++ dapl/openib/dapl_ib_util.c  (working copy)
@@ -446,6 +446,7 @@
return(dapl_convert_errno(errno,"ib_query_hca"));
 
if (ia_attr != NULL) {
+   (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr->ia_address_ptr = 
@@ -470,7 +471,12 @@
/* ia_attr->hardware_version_minor   = dev_attr.fw_ver; */
ia_attr->max_eps  = dev_attr.max_qp;
ia_attr->max_dto_per_ep   = dev_attr.max_qp_wr;
-   ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
+   ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
ia_attr->max_evds = dev_attr.max_cq;
ia_attr->max_evd_qlen = dev_attr.max_cqe;
ia_attr->max_iov_segments_per_dto = dev_attr.max_sge;
@@ -501,6 +507,7 @@
}

if (ep_attr != NULL) {
+   (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr));
ep_attr->max_mtu_size = port_attr.max_msg_sz;
ep_attr->max_rdma_size= port_attr.max_msg_sz;
ep_attr->max_recv_dtos= dev_attr.max_qp_wr;
Index: dapl/openib_cma/dapl_ib_util.c
===
--- dapl/openib_cma/dapl_ib_util.c  (revision 9106)
+++ dapl/openib_cma/dapl_ib_util.c  (working copy)
@@ -424,6 +424,7 @@
return(dapl_convert_errno(errno,"ib_query_hca"));
 
if (ia_attr != NULL) {
+   (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr->ia_address_ptr = 
@@ -446,6 +447,8 @@
ia_attr->hardware_version_major = dev_attr.hw_ver;
ia_attr->max_eps  = dev_attr.max_qp;
ia_attr->max_dto_per_ep   = dev_attr.max_qp_wr;
+   ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom;
ia_attr->max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
ia_attr->max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
@@ -481,6 +484,7 @@
}

if (ep_attr != NULL) {
+   (void) dapl_os_memzero(ep_attr, sizeof(*ep_attr));
ep_attr->max_mtu_size = port_attr.max_msg_sz;
ep_attr->max_rdma_size= port_attr.max_msg_sz;
ep_attr->max_recv_dtos= dev_attr.max_qp_wr;
Index: dapl/openib_scm/dapl_ib_util.c
===
--- dapl/openib_scm/dapl_ib_util.c  (revision 9106)
+++ dapl/openib_scm/dapl_ib_util.c  (working copy)
@@ -373,6 +373,7 @@
return(dapl_convert_errno(errno,"ib_query_hca"));
 
if (ia_attr != NULL) {
+   (void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
ia_attr->ia_address_ptr = 
(DAT_IA_ADDRESS_PTR)&hca_ptr->hca_address;
@@ -390,7 +391,12 @@
/* ia_attr->hardware_version_minor   = dev_attr.fw_ver; */
ia_attr->max_eps  = dev_attr.max_qp;
ia_attr->max_dto_per_ep   = dev_attr.max_qp_wr;
-   ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+   ia_attr->max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
+   ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
ia_attr->max_evds = de

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Arlin Davis wrote:
> Robert,
> 
> Here is a slightly modified patch for your attributes issue. Can you give it 
> a try?
> 

I'll give it a spin this afternoon: it looks quite a bit more
comprehensive than the small patch I did.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP3sXfzvnpzTd9fxAQLwwAf+IOIsC+gqb9Juzt8rwJJlnSW1PjZFrRGi
NrCnRXvn52tsgclNNHGSzqOgkIntZ2TqxwEJJeTou3UhUQ5laJWEkQgwrvFTazcn
+IQH3BGDLFyZJJQO0WSi2685dEKOH5by6Zp9yVo9sy3Odu6jod2v/uCOjdGkR8ys
CvQW+y70qDmom1SJ9P2XQ4/dxxX/v2IFYOWMoVzMlDZsNnvnti/Uspwc1KpQeP6F
RRwWImlDyuuAW6+JX6atM5Lne797T5IO7MugW6d/+0oAMVU7H3oiDBdX+9tVwBci
IBJJ/PdQ8e7a7x4uOg+LKOSDH16IFVNaua4XhBfVmQEjf1y41KepDw==
=1zt8
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Question about interrupt generation

2006-09-05 Thread harish
Hi All,I tried the following simple experiment and am not able to understand the results:Calcualted the number of interrupts  generated by the infiniband [with little or no traffic to the NIC] over a period of 10seconds and saw around 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K interrupts/sec.  This screwed up my host machine. To reduce the impact of the interrupts, I add a call back that is scheduled to be periodically called every few microseconds that masks the irq line used by the NIC and a little later unmasks the same. Noticed that with no traffic, I see anywhere between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+ interrupts/sec.
Am a newbie to infiniband technology and so do not understand why so many interrupts are getting generated when I have my call back periodically called. Could it be that the Infiniband supports MSI? Or is what I am seeing IPIs? Or does Infiniband generate interrupts based on types of events and what I am doing by masking/unmasking the interrupt line is one such event?
Any information/suggestions would be useful.Thanks in advance,harish
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Woodruff, Robert J
Robert Walsh wrote,
>I'll give it a spin this afternoon: it looks quite a bit more
>comprehensive than the small patch I did.

I also just tried running the ib_rdma_bw test and it seems to
be flaky if you stress it. If you just run the defaults, it seems to
work, but if you crank up the iterations and the message size,
it sometimes fails with.

[EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
iters=1 | duplex=0 | cma=0 |
4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400
VAddr 0x2a95dd3480
4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500
VAddr 0x2a95c85480
4730:main: Completion with error at client:
4730:main: Failed status 9: wr_id 3
4730:main: scnt=7584, ccnt=6584
[EMAIL PROTECTED] bin]$  

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups

2006-09-05 Thread Roland Dreier
Thanks, I've rolled this up in the amso1100 patch I have queued up.

 > - #if 0 the following unused global function:
 >  - c2_mq.c: c2_mq_count()

Tom/Steve, any reason to keep c2_mq_count() at all?

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] for-2.6.19 cma: protect against adding device during destruction

2006-09-05 Thread Roland Dreier
Thanks, queued for 2.6.19.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis
Robert Walsh wrote:

>-BEGIN PGP SIGNED MESSAGE-
>Hash: SHA1
>
>Arlin Davis wrote:
>  
>
>>Robert,
>>
>>Here is a slightly modified patch for your attributes issue. Can you give it 
>>a try?
>>
>>
>>
>
>I'll give it a spin this afternoon: it looks quite a bit more
>comprehensive than the small patch I did.
>
>Regards,
> Robert.
>  
>

Just added all appropriate RDMA in/out fields and some code to zero out 
the structure to avoid uninitialized data fields.

-arlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> Just added all appropriate RDMA in/out fields and some code to zero out
> the structure to avoid uninitialized data fields.

Yup.  By "comprehensive", I meant "better" :-)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP32hfzvnpzTd9fxAQJnMwgAgcyxQpxdbk/eLEECXTnAOAYjyv3seTpE
Ir1s+K7JEYL2Rbyk9h9CzbK67YSYe4QeIE52pTopEVFw8mnSLaz+ZIOmvdRUiHSS
FiwEyfbXEPrFKZfyXu/REsigWx5vn7vCZid3hUIdx1vbt9eVAiVPGbAO1ALI8en9
/xc7iTGpYxwBwNOYbdhW0cOCjvobV98Fp6UJebvxd9xiRUS6c2JeZKLYdQyRO5rm
JV7L8HqJr1dS8nbAiPG7DSjCv7/3SFdQVr+Tgt5MQpVfD56z41eBBuXzEfeqsg5E
HHSxUOTdqizpscMyLudAWGAr5DZwOAQ4Z90zAL8gc2YYbjbOT3D6bA==
=JKRU
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups

2006-09-05 Thread Steve Wise

Its old debug code that isn't used anywhere.  It would be nice to keep
it around, but if you really don't want it, nuke it...




On Tue, 2006-09-05 at 14:57 -0700, Roland Dreier wrote:
> Thanks, I've rolled this up in the amso1100 patch I have queued up.
> 
>  > - #if 0 the following unused global function:
>  >  - c2_mq.c: c2_mq_count()
> 
> Tom/Steve, any reason to keep c2_mq_count() at all?
> 
>  - R.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Arlin Davis wrote:
> Robert,
> 
> Here is a slightly modified patch for your attributes issue. Can you give it 
> a try?

Oddly enough, I'm back to the same problem with your new patch as I saw
with the unpatched version:

  $ mpiexec -n 2 ./a.out
  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
rdma configuration
  will use rdma configuration
  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
  Hello world: rank 0 of 2 running on ib-idev-05
  rank 1 in job 1  ib-idev-05_51891   caused collective abort of all ranks
exit status of rank 1: killed by signal 9

Still tracking this one down.  I noticed in the patch you removed a
couple of lines, too:

  - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;

Any particular reason why you did this?

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP37QvzvnpzTd9fxAQI79wf6Anc3/Ve7tg3x31hE4i5qa9bB01qEYmEv
9xx4FQqXNbhMos9hHEQAWJ9S0sKccr+yCNekkIX6GzlaVDv+AKDzZF6uzA8Prrhr
CEcf28c1Pw7gflg8MMfVcnAHr2YG/hXyd+ve9m6cGv0rxgPqY6lWmHjghKDxKO7h
f/SaDOaVAuN6kEJMRgIrKIxDyFSVl4z1tGXAK3yHVhslvPqNqGwDqNfFMV6UQK+V
NNfKVVKVCttUWdzcVELzi3zkiat5xDdqIcwQr8xs2YaXHfAGeD4NurWowil887Sn
bRuh5soVdBaKW9mAtQWuAECt9VLDvyYReLWkEq6ikgilPGCeJluDEw==
=TNaE
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/: possible cleanups

2006-09-05 Thread Roland Dreier
Steve> Its old debug code that isn't used anywhere.  It would be
Steve> nice to keep it around, but if you really don't want it,
Steve> nuke it...

No, that's fine, I'll leave it inside the #if 0.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Arlin Davis


>Oddly enough, I'm back to the same problem with your new patch as I saw
>with the unpatched version:
 
Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your 
adapter and it worked.

Did you ever pick up the Intel MPI 3.0 beta?

>
>  $ mpiexec -n 2 ./a.out
>  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
>registry: OpenIB-cma
>  I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
>registry: OpenIB-cma
>  I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
>rdma configuration
>  will use rdma configuration
>  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
>could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
>  Hello world: rank 0 of 2 running on ib-idev-05
>  rank 1 in job 1  ib-idev-05_51891   caused collective abort of all ranks
>exit status of rank 1: killed by signal 9
>
>Still tracking this one down.  I noticed in the patch you removed a
>couple of lines, too:
>
>  - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;
>
>Any particular reason why you did this?

max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. 

Look at dat.h line #369

/* To support backwards compatibility for DAPL-1.0 */
#define max_rdma_read_per_epmax_rdma_read_per_ep_in
#define DAT_IA_FIELD_IA_MAX_DTO_PER_OP  DAT_IA_FIELD_IA_MAX_DTO_PER_EP_IN

/* To support backwards compatibility for DAPL-1.0 & DAPL-1.1 */
#define max_mtu_size max_message_size


-arlin

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

>> Oddly enough, I'm back to the same problem with your new patch as I saw
>> with the unpatched version:
>  
> Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your 
> adapter and it worked.

Weird - it's not working for me at all.  Maybe I'm messing up somewhere.
 I've got a meeting for the next hour or so - I'll check again when I
get back.

> Did you ever pick up the Intel MPI 3.0 beta?

Yup.

> max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. 

Ah - fair enough.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4DLPzvnpzTd9fxAQJ3nwgAiO+dLDRQv22RrBHYqHcodDwC2ZakxzFh
pXBn9j5kwzA2EmnXCvex14v7K168Alqr9lgUpfaGr6StZsCdBU0FY2TRjok41VFl
h+fYu78QFgDjleTMkp17Hl7RG9/r8AWzKzTG1LDn1YqwHrn9ngeZlqFfy1BP1tfB
pkkW+Nj7HQXbXUNiDc/V9HKW7eBOjwCvkfDI7Knbrfp2QVBI/9ABpWGO4bJf3P7X
n9ZzlEBN0SCOHKtGAa1gspQrmJGMHw0qyajUA6Yuyp1dWRygbl8L+ahF2BJFwZSx
KGyhoBRZexpP8m0AJASnKgAVjGf6JR31dL7O8WAOjD4QpFEofMSqqA==
=yDmH
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 218] New: Call usage verifier is detecting reinitialization of spinlocks already in use

2006-09-05 Thread bugzilla-daemon
http://openib.org/bugzilla/show_bug.cgi?id=218

   Summary: Call usage verifier is detecting reinitialization of
spinlocks already in use
   Product: OpenFabrics Windows
   Version: unspecified
  Platform: X86
OS/Version: Other
Status: NEW
  Severity: major
  Priority: P2
 Component: mthca driver
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]


I built a debug version of revision 467 and turned on call usage verifier (CUV)
for the mthca driver. It's detecting many cases of spinlocks being initialized
after they have already been used. This is usually bad. To build with CUV all
you have to do is add the following line to the sources file.

VERIFIER_DDK_EXTENSIONS=1

My experience is CUV tends to detect a different set of bugs from driver
verifier, and it might be useful to turn on CUV for all the drivers and see
what's reported.

CUV Driver Error: Calling KeInitializeSpinLock(...) at File
k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h, Line 57
  The Spin lock specified as parameter 1 [0x87840EDC]
  has been previously initialized and used as
  a In-Stack Queued Spin lock by this driver.
Break, Ignore, Zap, Remove, Disable all, H for help (bizrdh)? b
b
Breaking in... (press g to return to assert menu)
Break instruction exception - code 8003 (first chance)
nt!DbgBreakPoint:
8075cc00 cc   int 3
0: kd> k 50
ChildEBP RetAddr  
f7926438 baeab189 nt!DbgBreakPoint
f7926450 baeaa814 mthca!DDKExtPrompt+0x10a
[d:\dnsrv\sdktools\ddk\ddk_ext\verifier\messages.cpp @ 709]
f7926468 baea990e mthca!DDKExtVInitializeItem+0x98
[d:\dnsrv\sdktools\ddk\ddk_ext\verifier\validate.cpp @ 195]
f7926490 bae81635 mthca!DDK_KeInitializeSpinLock+0x35
[d:\dnsrv\sdktools\ddk\ddk_ext\verifier\locks.cpp @ 298]
f79264a4 baea42ee mthca!spin_lock_init+0x15
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h @ 58]
f79264b0 baea4057 mthca!mthca_wq_init+0xe
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 383]
f792653c bae7eaac mthca!mthca_modify_qp+0xe97
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 853]
f7926550 bae76eaa mthca!ibv_modify_qp+0x1c
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_verbs.c @ 467]
f7926628 ba99e0f3 mthca!mlnx_modify_qp+0x11a
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\hca_verbs.c @ 955]
f792673c ba99df12 ibbus!al_modify_qp+0x113
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1346]
f7926760 ba99d7b8 ibbus!modify_qp+0x502
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1313]
f7926778 ba99eef5 ibbus!ib_modify_qp+0x18
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1288]
f7926848 ba99ec9e ibbus!init_dgrm_svc+0x175
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1453]
f7926870 ba96d005 ibbus!ib_init_dgrm_svc+0x73e
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1395]
f7926c4c ba969fd8 ibbus!create_spl_qp_svc+0x18a5
[k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 718]
f7926c78 ba969a45 ibbus!spl_qp_agent_pnp+0x128
[k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 476]
f7926c8c ba98f071 ibbus!spl_qp0_agent_pnp_cb+0x95
[k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 429]
f7926cf4 ba98f2e8 ibbus!__pnp_notify_user+0x561
[k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 523]
f7926d38 ba990e7c ibbus!__pnp_port_notify+0x118
[k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 612]
f7926d70 ba94d8a4 ibbus!__pnp_process_add_ca+0x2dc
[k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 943]
f7926d8c ba953b94 ibbus!__cl_async_proc_worker+0x94
[k:\windows-openib\src\winib-467b\core\complib\cl_async_proc.c @ 153]
f7926da0 ba955c4c ibbus!__cl_thread_pool_routine+0x54
[k:\windows-openib\src\winib-467b\core\complib\cl_threadpool.c @ 67]
f7926dac 80a07678 ibbus!__thread_callback+0x2c
[k:\windows-openib\src\winib-467b\core\complib\kernel\cl_thread.c @ 49]
f7926ddc 80781346 nt!PspSystemThreadStartup+0x2e
  nt!KiThreadStartup+0x16
0: kd> g




--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about interrupt generation

2006-09-05 Thread harish
Hi,One more question. What kind of event mask helps mask the interrupts?thanksharishOn 9/5/06, harish <
[EMAIL PROTECTED]> wrote:Hi All,I tried the following simple experiment and am not able to understand the results:
Calcualted the number of interrupts  generated by the infiniband [with little or no traffic to the NIC] over a period of 10seconds and saw around 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K interrupts/sec.  This screwed up my host machine. To reduce the impact of the interrupts, I add a call back that is scheduled to be periodically called every few microseconds that masks the irq line used by the NIC and a little later unmasks the same. Noticed that with no traffic, I see anywhere between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+ interrupts/sec.
Am a newbie to infiniband technology and so do not understand why so many interrupts are getting generated when I have my call back periodically called. Could it be that the Infiniband supports MSI? Or is what I am seeing IPIs? Or does Infiniband generate interrupts based on types of events and what I am doing by masking/unmasking the interrupt line is one such event?
Any information/suggestions would be useful.Thanks in advance,harish


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Woodruff, Robert J wrote:
> Robert Walsh wrote,
>> I'll give it a spin this afternoon: it looks quite a bit more
>> comprehensive than the small patch I did.
> 
> I also just tried running the ib_rdma_bw test and it seems to
> be flaky if you stress it. If you just run the defaults, it seems to
> work, but if you crank up the iterations and the message size,
> it sometimes fails with.
> 
> [EMAIL PROTECTED] bin]$ ./ib_rdma_bw -n 1 -t 1000 -s 200 rkl-12
> 4730: | port=18515 | ib_port=1 | size=200 | tx_depth=1000 |
> iters=1 | duplex=0 | cma=0 |
> 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400
> VAddr 0x2a95dd3480
> 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500
> VAddr 0x2a95c85480
> 4730:main: Completion with error at client:
> 4730:main: Failed status 9: wr_id 3
> 4730:main: scnt=7584, ccnt=6584
> [EMAIL PROTECTED] bin]$  

This looks like a known bug, the fix to which didn't make it into OFED
1.1-RC3.  Hopefully we can still get this into 1.1-RC4.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4aOfzvnpzTd9fxAQKAEggAlZC5hYi9kdxLkj9Mfl/BwHJQxWUwsKcG
K2ck3jtrP6PVa04FdVI/TNL2XE7R3eu69vTfBaTS26pw2CVM6av0ztFiWEV2r5Fu
8FXGJBOuDOYxnwuA0o3yHSMVFtrRW6Jgn2G/JQPZ8IDAK7GrPj3VebvyclPwF5+d
KMPIFXJaTzjoJl2JEGFLiSlf+tFMOEs3vazrRwkZpQezKRcs3F1E6TQImtN7kuYK
0/IKxeS4ZOduXpczsJZgsPs6Y9kYi94XN0E4JeJJAh9Miq+bXkxhxbrafieNl7xW
n9m7i/phcFcngSzDwjBNXE2ZuQjujDpz94SRnkVedomYNbr5zKXBgQ==
=NurT
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

2006-09-05 Thread Robert Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> Here is a slightly modified patch for your attributes issue. Can you give it 
> a try?

I rebuilt OFED from scratch with the patch, and ran successfully on
Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
Intel MPI 3.0b.  If you could verify that the fix you mentioned that is
in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
If you have a later beta version you could send me, that would be great,
too.

Regards,
 Robert.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4ijvzvnpzTd9fxAQIqeggAkJ4OQ3GrkpqyJUbHImgqbob6npINOv5L
lBUANcHZZ8DMFIq5hP4H+OYX2s/yoS3AKDGf0x8kHoVsTDFTFNe69bsGzJMT3znP
YDmq3ETN4aSGOgKX2NFzWs+mYG0pEN9uDt/SmEYmccYiIuK3lTlb8jxON6mqqJFL
nfitAp7WaLn7OS8A3CfVrAbWwYJ4U6UWPD/rB5sJTg8nTxECc94JaOhPZ90smB6H
9xk8OihEoTxodFLzcpaz/ORS4EPAle69Uw2tP3myjr/4w/SzLGJT6DFVpGQ0BaWC
jVXFYVKyVW4JmFMcW1X29ogmVNH8gEDBUfbG1P5Wd8sLzMMB18tINA==
=X/q7
-END PGP SIGNATURE-

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [openfabrics-ewg] OFED 1.1-rc2 is ready (how do I enable madeye)?

2006-09-05 Thread Scott Weitzenkamp (sweitzen)
> 5. Added Madeye utility

How do I build madeye?  I don't see any reference to it to install.sh.
Is there any documentation for madeye?

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [Bug 218] Call usage verifier is detecting reinitialization of spinlocks already in use

2006-09-05 Thread bugzilla-daemon
http://openib.org/bugzilla/show_bug.cgi?id=218


[EMAIL PROTECTED] changed:

   What|Removed |Added

 AssignedTo|[EMAIL PROTECTED] |[EMAIL PROTECTED]






--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general