Re: [openib-general] question - mapping QPIDs back to ptrs

2006-11-17 Thread Roland Dreier
 > The Chelsio driver is hogging lots of memory right now for mapping
 > PDIDs, QPIDs, CQIDs, and STAG IDs back to their respective kernel
 > structures.  This is done via an array of pointers, indexed by the ID.
 > The critical performance mapping is finding a QP struct from the QPID in
 > the poll path.

mthca rolls its own two-level sparse arrays (the mthca_array_xxx)
stuff, but it would probably be smarter to use the kernel's radix tree
stuff.  I've been meaning to benchmark mthca after converting to radix
trees for those tables, to see if it makes a difference.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about the query QP mask

2006-11-16 Thread Roland Dreier
 > What should be the expected behavior?
 > Should this description should be changed or should the low level drivers
 > of mthca and ipath need to be changed?

The mask is used as a hint to the low-level driver about which
attributes the consumer cares about.  The driver may fill in more
fields, but it can use the mask to optimize some calls, if filling in
a particular field is expensive and that field is not requested by the
consumer.

I guess we should update the documentation to reflect this.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about multicast GIDs

2006-11-16 Thread Robert Walsh
Robert Walsh wrote:
> Roland Dreier wrote:
>>  > Is there are registration authority for multicast GIDs?  Or at 
>> least a  > safe way of assigning a range of GIDs to a vendor?
>>
>> I don't think so.  Perhaps RFC 3307 would be of some use...
> 
> Ah - looks exactly like what I was looking for.  Thanks.

Hmm - spoke too soon.  This seems to be related to IPv6 multicast GIDs, 
but not IB.  The idea is similar, but the allocation mechanism is 
entirely arbitrary (but consistent) and I don't think it would map from 
IPv6 to IB in any meaningful way.

I'll talk to the folks here who are on the various IB committees and see 
if they have any thoughts on this.

Regards,
  Robert.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about multicast GIDs

2006-11-15 Thread Robert Walsh
Roland Dreier wrote:
>  > Is there are registration authority for multicast GIDs?  Or at least a 
>  > safe way of assigning a range of GIDs to a vendor?
> 
> I don't think so.  Perhaps RFC 3307 would be of some use...

Ah - looks exactly like what I was looking for.  Thanks.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about multicast GIDs

2006-11-15 Thread Roland Dreier
 > Is there are registration authority for multicast GIDs?  Or at least a 
 > safe way of assigning a range of GIDs to a vendor?

I don't think so.  Perhaps RFC 3307 would be of some use...

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-06 Thread Hal Rosenstock
On Mon, 2006-11-06 at 13:13, Wang, Feiyi wrote:
> Hal -
> 
> Please see the output for active port 1 (although there are two ports on
> this HCA, the second one is disabled now).
> 
> #smpquery portinfo 8 1
> # Port info: Lid 8 port 1
> Mkey:0x
> GidPrefix:...0xfe80
> Lid:.0x0008
> SMLid:...0x0001
> CapMask:.0x2510a68
> IsTrapSupported
> IsAutomaticMigrationSupported
> IsSLMappingSupported
> IsLedInfoSupported
> IsSystemImageGUIDsupported
> IsCommunicatonManagementSupported
> IsVendorClassSupported
> IsCapabilityMaskNoticeSupported
> IsClientRegistrationSupported
> DiagCode:0x
> MkeyLeasePeriod:.0
> LocalPort:...1
> LinkWidthEnabled:1X or 4X 
> LinkWidthSupported:..1X or 4X
> LinkWidthActive:.4X
> LinkSpeedSupported:..2.5 or 5.0 Gbps
> LinkState:...Active
> PhysLinkState:...LinkUp
> LinkDownDefState:Polling
> ProtectBits:.0
> LMC:.0
> LinkSpeedActive:.2.5 Gbps
> LinkSpeedEnabled:2.5 or 5.0 Gbps
> NeighborMTU:.2048
> SMSL:0
> VLCap:...VL0-7
> InitType:0x00
> VLHighLimit:.255

OK; this is pretty conclusive.

> VLArbHighCap:8
> VLArbLowCap:.8
> InitReply:...0x00
> MtuCap:..2048
> VLStallCount:7
> HoqLife:.31
> OperVLs:.VL0-7
> PartEnforceInb:..0
> PartEnforceOutb:.0
> FilterRawInb:0
> FilterRawOutb:...0
> MkeyViolations:..0
> PkeyViolations:..0
> QkeyViolations:..0
> GuidCap:.32
> ClientReregister:0
> SubnetTimeout:...18
> RespTimeVal:.16
> LocalPhysErr:8
> OverrunErr:..8
> MaxCreditHint:...0
> RoundTrip:...0

Do you have an IB analyzer ?

-- Hal

> Feiyi
> 
> -Original Message-
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
> Sent: Friday, November 03, 2006 3:58 PM
> To: Wang, Feiyi
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] question on QoS support
> 
> On Fri, 2006-11-03 at 15:56, Wang, Feiyi wrote:
> > 255 
> > 
> > I think I tested with default 0 before, that is send at most one
> packet
> > before give low priority table the chance according to IBA. It doesn't
> > seem to make a difference though.
> 
> I was hoping you would say 0 as that means 1 packet before looking at
> low priority.
> 
> 255 means unbounded packets on high priority. Can you send me the
> results of smpquery portinfo on that port to ensure that it is being set
> properly ?
> 
> -- Hal 
> 
> > Feiyi
> > 
> > 
> > -Original Message-
> > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
> > Sent: Friday, November 03, 2006 3:51 PM
> > To: Wang, Feiyi
> > Cc: openib-general@openib.org
> > Subject: RE: [openib-general] question on QoS support
> > 
> > On Fri, 2006-11-03 at 15:43, Wang, Feiyi wrote:
> > > The test is done on two hosts, say A and B. A has 4x SDR (run
> > ib_rdam_bw
> > > as server), B has 4x DDR (run more than one thread of ib_rdma_bw as
> > > clients). The sl2vl table read as:
> > > 
> > > smpquery sl2vl 7
> > > # SL2VL table: Lid 7
> > > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8|
> > 9|10|11|12|13|14|15|
> > > ports: in  0, out  0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6|
> > 7|
> > > 
> > > smpquery vlarb  7
> > > # VLArbitration tables: Lid 7 port 0 LowCap 8 HighCap 8
> > > # Low priority VL Arbitration Table:
> > > VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> > > WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
> > > # High priority VL Arbitration Table:
&g

Re: [openib-general] question on QoS support

2006-11-06 Thread Wang, Feiyi
Hal -

Please see the output for active port 1 (although there are two ports on
this HCA, the second one is disabled now).

#smpquery portinfo 8 1
# Port info: Lid 8 port 1
Mkey:0x
GidPrefix:...0xfe80
Lid:.0x0008
SMLid:...0x0001
CapMask:.0x2510a68
IsTrapSupported
IsAutomaticMigrationSupported
IsSLMappingSupported
IsLedInfoSupported
IsSystemImageGUIDsupported
IsCommunicatonManagementSupported
IsVendorClassSupported
IsCapabilityMaskNoticeSupported
IsClientRegistrationSupported
DiagCode:0x
MkeyLeasePeriod:.0
LocalPort:...1
LinkWidthEnabled:1X or 4X 
LinkWidthSupported:..1X or 4X
LinkWidthActive:.4X
LinkSpeedSupported:..2.5 or 5.0 Gbps
LinkState:...Active
PhysLinkState:...LinkUp
LinkDownDefState:Polling
ProtectBits:.0
LMC:.0
LinkSpeedActive:.2.5 Gbps
LinkSpeedEnabled:2.5 or 5.0 Gbps
NeighborMTU:.2048
SMSL:0
VLCap:...VL0-7
InitType:0x00
VLHighLimit:.255
VLArbHighCap:8
VLArbLowCap:.8
InitReply:...0x00
MtuCap:..2048
VLStallCount:7
HoqLife:.31
OperVLs:.VL0-7
PartEnforceInb:..0
PartEnforceOutb:.0
FilterRawInb:0
FilterRawOutb:...0
MkeyViolations:..0
PkeyViolations:..0
QkeyViolations:..0
GuidCap:.32
ClientReregister:0
SubnetTimeout:...18
RespTimeVal:.16
LocalPhysErr:8
OverrunErr:..8
MaxCreditHint:...0
RoundTrip:...0

Feiyi

-Original Message-
From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 03, 2006 3:58 PM
To: Wang, Feiyi
Cc: openib-general@openib.org
Subject: RE: [openib-general] question on QoS support

On Fri, 2006-11-03 at 15:56, Wang, Feiyi wrote:
> 255 
> 
> I think I tested with default 0 before, that is send at most one
packet
> before give low priority table the chance according to IBA. It doesn't
> seem to make a difference though.

I was hoping you would say 0 as that means 1 packet before looking at
low priority.

255 means unbounded packets on high priority. Can you send me the
results of smpquery portinfo on that port to ensure that it is being set
properly ?

-- Hal 

> Feiyi
> 
> 
> -Original Message-
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
> Sent: Friday, November 03, 2006 3:51 PM
> To: Wang, Feiyi
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] question on QoS support
> 
> On Fri, 2006-11-03 at 15:43, Wang, Feiyi wrote:
> > The test is done on two hosts, say A and B. A has 4x SDR (run
> ib_rdam_bw
> > as server), B has 4x DDR (run more than one thread of ib_rdma_bw as
> > clients). The sl2vl table read as:
> > 
> > smpquery sl2vl 7
> > # SL2VL table: Lid 7
> > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8|
> 9|10|11|12|13|14|15|
> > ports: in  0, out  0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6|
> 7|
> > 
> > smpquery vlarb  7
> > # VLArbitration tables: Lid 7 port 0 LowCap 8 HighCap 8
> > # Low priority VL Arbitration Table:
> > VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> > WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
> > # High priority VL Arbitration Table:
> > VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> > WEIGHT: |0x1 |0x0 |0x8 |0x0 |0x0 |0x0 |0x0 |0x0 |
> > 
> > Low priority table entries are all zero to skip.
> > High priority table give VL 0 and VL 2 different weight.
> > 
> > The SL is specified on command line, one thread with SL 0, the other
> > thread with SL 2.
> > 
> > Thanks for looking into this, and let me know if more info is
needed.
> 
> What's the limit of high priority ?
> 
> -- Hal
> 
> > Feiyi
> > 
> > 
> > 
> > -Original Message-
> > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
> > Sent: Friday

Re: [openib-general] question on QoS support

2006-11-03 Thread Hal Rosenstock
On Fri, 2006-11-03 at 15:56, Wang, Feiyi wrote:
> 255 
> 
> I think I tested with default 0 before, that is send at most one packet
> before give low priority table the chance according to IBA. It doesn't
> seem to make a difference though.

I was hoping you would say 0 as that means 1 packet before looking at
low priority.

255 means unbounded packets on high priority. Can you send me the
results of smpquery portinfo on that port to ensure that it is being set
properly ?

-- Hal 

> Feiyi
> 
> 
> -Original Message-
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
> Sent: Friday, November 03, 2006 3:51 PM
> To: Wang, Feiyi
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] question on QoS support
> 
> On Fri, 2006-11-03 at 15:43, Wang, Feiyi wrote:
> > The test is done on two hosts, say A and B. A has 4x SDR (run
> ib_rdam_bw
> > as server), B has 4x DDR (run more than one thread of ib_rdma_bw as
> > clients). The sl2vl table read as:
> > 
> > smpquery sl2vl 7
> > # SL2VL table: Lid 7
> > # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8|
> 9|10|11|12|13|14|15|
> > ports: in  0, out  0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6|
> 7|
> > 
> > smpquery vlarb  7
> > # VLArbitration tables: Lid 7 port 0 LowCap 8 HighCap 8
> > # Low priority VL Arbitration Table:
> > VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> > WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
> > # High priority VL Arbitration Table:
> > VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> > WEIGHT: |0x1 |0x0 |0x8 |0x0 |0x0 |0x0 |0x0 |0x0 |
> > 
> > Low priority table entries are all zero to skip.
> > High priority table give VL 0 and VL 2 different weight.
> > 
> > The SL is specified on command line, one thread with SL 0, the other
> > thread with SL 2.
> > 
> > Thanks for looking into this, and let me know if more info is needed.
> 
> What's the limit of high priority ?
> 
> -- Hal
> 
> > Feiyi
> > 
> > 
> > 
> > -Original Message-
> > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
> > Sent: Friday, November 03, 2006 3:27 PM
> > To: Wang, Feiyi
> > Cc: openib-general@openib.org
> > Subject: Re: [openib-general] question on QoS support
> > 
> > On Fri, 2006-11-03 at 15:12, Feiyi Wang wrote:
> > > In our test at the ORNL - it appears you can "turn off" the traffic
> by
> > > giving every VL weight 0.
> > 
> > A weight of 0 indicates to skip that entry.
> > 
> > >  As soon as you assign non-zero VL weight,
> > > the traffic starts to flow, however, VL with more weight doesn't
> have
> > > expected preference treatment. In other words, traffic shaping
> didn't
> > > take place. smpquery vlarb verified the mapping table was there.
> > 
> > correctly ?
> > 
> > Is it high or low priority or both ?
> > 
> > What about SL2VLMapping table ? Is it setup correctly ?
> > 
> > What's your topology for this ?
> > 
> > Can you send your SL2VLMapping and VLarbitration configuration ?
> > 
> > > I believe the scenario described below 'should' be able to generate
> > > congestion point ... but it would be helpful if someone can
> elaborate
> > > a way to "look into" how/if scheduling/arbitration take place.
> > 
> > The only ways I know would be to look at either the packets on the
> wire
> > or what you are doing with multiple streams which seems valid to me.
> > 
> > Have you read section 7.6.9.2 (p. 189-190) in IBA 1.2 volume 1 to
> > understand how to configure this ?
> > 
> > -- Hal
> > 
> > > Best,
> > > 
> > > Feiyi
> > > 
> > > 
> > > On 02 Nov 2006 10:49:04 -0500, Hal Rosenstock <[EMAIL PROTECTED]>
> > wrote:
> > > > Hi Oliver,
> > > >
> > > > On Thu, 2006-11-02 at 10:20, Oliver wrote:
> > > > > Hi, Hal -
> > > > >
> > > > > > How is this being observed/measured ?
> > > > >
> > > > > Host A, B, with 4x DDR both connected to Flextronic switch.
> > > > > A single process of ibv_read_bw gives about 1415MB /s average
> > > > > bandwidth. Two concurrent process report 714.45 MB/s each, dead
> > even.
> > > > > Now if I bump up one process with a different SL, then I expect
> to
> > see
> > > > > shaping to take place. Please let me if the scenario makes
> sense.
> > > &

Re: [openib-general] question on QoS support

2006-11-03 Thread Wang, Feiyi

255 

I think I tested with default 0 before, that is send at most one packet
before give low priority table the chance according to IBA. It doesn't
seem to make a difference though.

Feiyi


-Original Message-
From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 03, 2006 3:51 PM
To: Wang, Feiyi
Cc: openib-general@openib.org
Subject: RE: [openib-general] question on QoS support

On Fri, 2006-11-03 at 15:43, Wang, Feiyi wrote:
> The test is done on two hosts, say A and B. A has 4x SDR (run
ib_rdam_bw
> as server), B has 4x DDR (run more than one thread of ib_rdma_bw as
> clients). The sl2vl table read as:
> 
> smpquery sl2vl 7
> # SL2VL table: Lid 7
> # SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8|
9|10|11|12|13|14|15|
> ports: in  0, out  0: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6|
7|
> 
> smpquery vlarb  7
> # VLArbitration tables: Lid 7 port 0 LowCap 8 HighCap 8
> # Low priority VL Arbitration Table:
> VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
> # High priority VL Arbitration Table:
> VL: |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> WEIGHT: |0x1 |0x0 |0x8 |0x0 |0x0 |0x0 |0x0 |0x0 |
> 
> Low priority table entries are all zero to skip.
> High priority table give VL 0 and VL 2 different weight.
> 
> The SL is specified on command line, one thread with SL 0, the other
> thread with SL 2.
> 
> Thanks for looking into this, and let me know if more info is needed.

What's the limit of high priority ?

-- Hal

> Feiyi
> 
> 
> 
> -Original Message-
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED] 
> Sent: Friday, November 03, 2006 3:27 PM
> To: Wang, Feiyi
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] question on QoS support
> 
> On Fri, 2006-11-03 at 15:12, Feiyi Wang wrote:
> > In our test at the ORNL - it appears you can "turn off" the traffic
by
> > giving every VL weight 0.
> 
> A weight of 0 indicates to skip that entry.
> 
> >  As soon as you assign non-zero VL weight,
> > the traffic starts to flow, however, VL with more weight doesn't
have
> > expected preference treatment. In other words, traffic shaping
didn't
> > take place. smpquery vlarb verified the mapping table was there.
> 
> correctly ?
> 
> Is it high or low priority or both ?
> 
> What about SL2VLMapping table ? Is it setup correctly ?
> 
> What's your topology for this ?
> 
> Can you send your SL2VLMapping and VLarbitration configuration ?
> 
> > I believe the scenario described below 'should' be able to generate
> > congestion point ... but it would be helpful if someone can
elaborate
> > a way to "look into" how/if scheduling/arbitration take place.
> 
> The only ways I know would be to look at either the packets on the
wire
> or what you are doing with multiple streams which seems valid to me.
> 
> Have you read section 7.6.9.2 (p. 189-190) in IBA 1.2 volume 1 to
> understand how to configure this ?
> 
> -- Hal
> 
> > Best,
> > 
> > Feiyi
> > 
> > 
> > On 02 Nov 2006 10:49:04 -0500, Hal Rosenstock <[EMAIL PROTECTED]>
> wrote:
> > > Hi Oliver,
> > >
> > > On Thu, 2006-11-02 at 10:20, Oliver wrote:
> > > > Hi, Hal -
> > > >
> > > > > How is this being observed/measured ?
> > > >
> > > > Host A, B, with 4x DDR both connected to Flextronic switch.
> > > > A single process of ibv_read_bw gives about 1415MB /s average
> > > > bandwidth. Two concurrent process report 714.45 MB/s each, dead
> even.
> > > > Now if I bump up one process with a different SL, then I expect
to
> see
> > > > shaping to take place. Please let me if the scenario makes
sense.
> > >
> > > It makes sense. However, if the higher priority traffic does not
> fill
> > > the scheduling, the low priority can take up the slack so I'm not
> sure
> > > if this is what you are seeing or something else.
> > >
> > > It might be interesting to try the same thing at SDR speeds.
> > >
> > > -- Hal
> > >
> > > > > Yes, 8 VLs should be supported in your subnet. You can verify
> this with
> > > > > smpquery portinfo on the HCA port and examine OperVLs assuming
> the port
> > > > > is ACTIVE.
> > > >
> > > > yes, I verified the data VL support, it is 8. I will poke for
more
> > > > info with suggested commands by Sasha.
> > > >
> > > > > > A related question is, if I modify qos setting in SM, 

Re: [openib-general] question on QoS support

2006-11-03 Thread Hal Rosenstock
On Fri, 2006-11-03 at 15:12, Feiyi Wang wrote:
> In our test at the ORNL - it appears you can "turn off" the traffic by
> giving every VL weight 0.

A weight of 0 indicates to skip that entry.

>  As soon as you assign non-zero VL weight,
> the traffic starts to flow, however, VL with more weight doesn't have
> expected preference treatment. In other words, traffic shaping didn't
> take place. smpquery vlarb verified the mapping table was there.

correctly ?

Is it high or low priority or both ?

What about SL2VLMapping table ? Is it setup correctly ?

What's your topology for this ?

Can you send your SL2VLMapping and VLarbitration configuration ?

> I believe the scenario described below 'should' be able to generate
> congestion point ... but it would be helpful if someone can elaborate
> a way to "look into" how/if scheduling/arbitration take place.

The only ways I know would be to look at either the packets on the wire
or what you are doing with multiple streams which seems valid to me.

Have you read section 7.6.9.2 (p. 189-190) in IBA 1.2 volume 1 to
understand how to configure this ?

-- Hal

> Best,
> 
> Feiyi
> 
> 
> On 02 Nov 2006 10:49:04 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > Hi Oliver,
> >
> > On Thu, 2006-11-02 at 10:20, Oliver wrote:
> > > Hi, Hal -
> > >
> > > > How is this being observed/measured ?
> > >
> > > Host A, B, with 4x DDR both connected to Flextronic switch.
> > > A single process of ibv_read_bw gives about 1415MB /s average
> > > bandwidth. Two concurrent process report 714.45 MB/s each, dead even.
> > > Now if I bump up one process with a different SL, then I expect to see
> > > shaping to take place. Please let me if the scenario makes sense.
> >
> > It makes sense. However, if the higher priority traffic does not fill
> > the scheduling, the low priority can take up the slack so I'm not sure
> > if this is what you are seeing or something else.
> >
> > It might be interesting to try the same thing at SDR speeds.
> >
> > -- Hal
> >
> > > > Yes, 8 VLs should be supported in your subnet. You can verify this with
> > > > smpquery portinfo on the HCA port and examine OperVLs assuming the port
> > > > is ACTIVE.
> > >
> > > yes, I verified the data VL support, it is 8. I will poke for more
> > > info with suggested commands by Sasha.
> > >
> > > > > A related question is, if I modify qos setting in SM, do I need to
> > > > > restart SA on each hosts for it to see the changes? (I am hoping not,
> > > > > as I tried in the test, it doesn't seem to make a difference)
> > > >
> > > > Not sure what you mean. SA is tightly coupled with the OpenSM. Do you
> > > > mean SA client ? The client hosts don't need restarting but did you
> > > > restart OpenSM with your QoS configuration ?
> > >
> > > I mean client SA. yes, I understand OpenSM needs to be restarted.
> > >
> > > > BTW, which OpenSM are you running ?
> > >
> > > OFED 1.1 based.
> > >
> > > thanks
> > >
> > > - Oliver
> >
> >


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-03 Thread Feiyi Wang
In our test at the ORNL - it appears you can "turn off" the traffic by
giving every VL weight 0. As soon as you assign non-zero VL weight,
the traffic starts to flow, however, VL with more weight doesn't have
expected preference treatment. In other words, traffic shaping didn't
take place. smpquery vlarb verified the mapping table was there.

I believe the scenario described below 'should' be able to generate
congestion point ... but it would be helpful if someone can elaborate
a way to "look into" how/if scheduling/arbitration take place.

Best,

Feiyi


On 02 Nov 2006 10:49:04 -0500, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> Hi Oliver,
>
> On Thu, 2006-11-02 at 10:20, Oliver wrote:
> > Hi, Hal -
> >
> > > How is this being observed/measured ?
> >
> > Host A, B, with 4x DDR both connected to Flextronic switch.
> > A single process of ibv_read_bw gives about 1415MB /s average
> > bandwidth. Two concurrent process report 714.45 MB/s each, dead even.
> > Now if I bump up one process with a different SL, then I expect to see
> > shaping to take place. Please let me if the scenario makes sense.
>
> It makes sense. However, if the higher priority traffic does not fill
> the scheduling, the low priority can take up the slack so I'm not sure
> if this is what you are seeing or something else.
>
> It might be interesting to try the same thing at SDR speeds.
>
> -- Hal
>
> > > Yes, 8 VLs should be supported in your subnet. You can verify this with
> > > smpquery portinfo on the HCA port and examine OperVLs assuming the port
> > > is ACTIVE.
> >
> > yes, I verified the data VL support, it is 8. I will poke for more
> > info with suggested commands by Sasha.
> >
> > > > A related question is, if I modify qos setting in SM, do I need to
> > > > restart SA on each hosts for it to see the changes? (I am hoping not,
> > > > as I tried in the test, it doesn't seem to make a difference)
> > >
> > > Not sure what you mean. SA is tightly coupled with the OpenSM. Do you
> > > mean SA client ? The client hosts don't need restarting but did you
> > > restart OpenSM with your QoS configuration ?
> >
> > I mean client SA. yes, I understand OpenSM needs to be restarted.
> >
> > > BTW, which OpenSM are you running ?
> >
> > OFED 1.1 based.
> >
> > thanks
> >
> > - Oliver
>
>

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question on ucma

2006-11-03 Thread Steve Wise
Sean posted 7 patches that include the ucma support.

You'll need those + the one librdmacm patch he posted.

Steve.


On Fri, 2006-11-03 at 13:59 +0530, Krishna Kumar2 wrote:
> Hi,
> 
> I installed the 2.6.19-rc3 bits, and when I try to run
> perftest/rdma_bw (with '-c' option), I get the error :
> "librdmacm: Couldnt open rdma_cm ABI version".
> 
> I found that this is due to ucma not being present in
> mainline kernel bits (which creates /sys/class/misc/rdma_cm).
> So how can I resolve this and run these tests ?
> 
> Thanks,
> 
> - KK
> 
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-02 Thread Hal Rosenstock
Hi Oliver,

On Thu, 2006-11-02 at 10:20, Oliver wrote:
> Hi, Hal -
> 
> > How is this being observed/measured ?
> 
> Host A, B, with 4x DDR both connected to Flextronic switch.
> A single process of ibv_read_bw gives about 1415MB /s average
> bandwidth. Two concurrent process report 714.45 MB/s each, dead even.
> Now if I bump up one process with a different SL, then I expect to see
> shaping to take place. Please let me if the scenario makes sense.

It makes sense. However, if the higher priority traffic does not fill
the scheduling, the low priority can take up the slack so I'm not sure
if this is what you are seeing or something else.

It might be interesting to try the same thing at SDR speeds.

-- Hal

> > Yes, 8 VLs should be supported in your subnet. You can verify this with
> > smpquery portinfo on the HCA port and examine OperVLs assuming the port
> > is ACTIVE.
> 
> yes, I verified the data VL support, it is 8. I will poke for more
> info with suggested commands by Sasha.
> 
> > > A related question is, if I modify qos setting in SM, do I need to
> > > restart SA on each hosts for it to see the changes? (I am hoping not,
> > > as I tried in the test, it doesn't seem to make a difference)
> >
> > Not sure what you mean. SA is tightly coupled with the OpenSM. Do you
> > mean SA client ? The client hosts don't need restarting but did you
> > restart OpenSM with your QoS configuration ?
> 
> I mean client SA. yes, I understand OpenSM needs to be restarted.
> 
> > BTW, which OpenSM are you running ?
> 
> OFED 1.1 based.
> 
> thanks
> 
> - Oliver


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-02 Thread Oliver
Hi, Hal -

> How is this being observed/measured ?

Host A, B, with 4x DDR both connected to Flextronic switch.
A single process of ibv_read_bw gives about 1415MB /s average
bandwidth. Two concurrent process report 714.45 MB/s each, dead even.
Now if I bump up one process with a different SL, then I expect to see
shaping to take place. Please let me if the scenario makes sense.



> Yes, 8 VLs should be supported in your subnet. You can verify this with
> smpquery portinfo on the HCA port and examine OperVLs assuming the port
> is ACTIVE.

yes, I verified the data VL support, it is 8. I will poke for more
info with suggested commands by Sasha.

> > A related question is, if I modify qos setting in SM, do I need to
> > restart SA on each hosts for it to see the changes? (I am hoping not,
> > as I tried in the test, it doesn't seem to make a difference)
>
> Not sure what you mean. SA is tightly coupled with the OpenSM. Do you
> mean SA client ? The client hosts don't need restarting but did you
> restart OpenSM with your QoS configuration ?

I mean client SA. yes, I understand OpenSM needs to be restarted.

> BTW, which OpenSM are you running ?

OFED 1.1 based.

thanks

- Oliver

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-02 Thread Hal Rosenstock
On Thu, 2006-11-02 at 09:15, Makia Minich wrote:
> Hal Rosenstock wrote:
> > Makia,
> > 
> > On Wed, 2006-11-01 at 17:42, Makia Minich wrote:
> >> It just so happens that we've started looking at this here at ORNL as
> >> well.  I had a question about the options.  The manpage makes it seem
> >> that you can set these qos options (e.g. qos_high_limit) from the
> >> command line, but I haven't been overly successful.
> > 
> > What are you referring to in the man page ?
> 
> OK, re-reading the man page section on qos, I now realize that I didn't
> understand the statement "cached options file" on my initial read
> through.  So, now I've got it.
> 
> > Which OpenSM are you using (trunk or 1.1 based) ?
> 
> 1.1 based
> 
> >> Is there an example of this being done?
> > 
> > Yes in both the man page under QOS CONFIGURATION or under
> > osm/doc/qos-config.txt in the repository.
> 
> I see that that file doesn't install in the doc directory with OFED,
> perhaps that should be added (so that I can find it in the ${OFED}/doc
> directory).

I used that doc and put it pretty much verbatim into the man page so IMO
this is somewhat redundant but it could be added to the next release if
you think this adds value (having the separate docs).

-- Hal

> >>   Or is changing the /var/cache/osm/opensm.opts file
> >> the preferred method of changing the options?
> > 
> > I think it's the only way but it is imperative QoS is enabled for this
> > to have any effect.
> > 
> > -- Hal
> 
> That part I've got set in the opensm.opts file:
> 
> no_qos FALSE
> 
> >> Sasha Khapyorsky wrote:
> >>> On 16:52 Wed 01 Nov , Oliver wrote:
>  Hi, folks -
> 
>  I am trying to verify and evaluate IB QoS support, running openSM as
>  subnet manager. The perftest program is extended to set SL as command
>  line options instead of default 0, and by modifying VL arbitration
>  tables, I am expecting to see the traffic shaping can actually take
>  place, but it did not.  More details on configuration:
> 
>  in opensm.opts:
>  # QoS default options
>  qos_high_limit 255 # disable low priority table
>  qos_vlarb_high: 0:4,1:4,2:8,3:0, 4:0  # this is to give VL 2
>  (corresponding to SL 2) a higher weight 8
>  qos_sl2vl 0,1,2,3,4, ... # no changes here
> 
>  I think (though not verified) the Voltaire HCA we are using can
>  support 8 data VLs. I don't have much more information to go on why
>  qos shaping is not taking place, any suggestions?
> >>> You can verify actual port's parameters with smpquery (from diags), you
> >>> will need to run to get QoS related parameters:
> >>>
> >>>   smpquery portinfo ...
> >>>   smpquery vlarb ...
> >>>   smpquery sl2vl ...
> >>>
> >>> Sasha
> >>>
>  A related question is, if I modify qos setting in SM, do I need to
>  restart SA on each hosts for it to see the changes? (I am hoping not,
>  as I tried in the test, it doesn't seem to make a difference)
> 
>  Thanks for help.
>  -- 
>  Oliver
> 
>  ___
>  openib-general mailing list
>  openib-general@openib.org
>  http://openib.org/mailman/listinfo/openib-general
> 
>  To unsubscribe, please visit 
>  http://openib.org/mailman/listinfo/openib-general
> 
> >>> ___
> >>> openib-general mailing list
> >>> openib-general@openib.org
> >>> http://openib.org/mailman/listinfo/openib-general
> >>>
> >>> To unsubscribe, please visit 
> >>> http://openib.org/mailman/listinfo/openib-general
> >>>
> >>>
> > 
> > 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-02 Thread Makia Minich
Hal Rosenstock wrote:
> Makia,
> 
> On Wed, 2006-11-01 at 17:42, Makia Minich wrote:
>> It just so happens that we've started looking at this here at ORNL as
>> well.  I had a question about the options.  The manpage makes it seem
>> that you can set these qos options (e.g. qos_high_limit) from the
>> command line, but I haven't been overly successful.
> 
> What are you referring to in the man page ?

OK, re-reading the man page section on qos, I now realize that I didn't
understand the statement "cached options file" on my initial read
through.  So, now I've got it.

> Which OpenSM are you using (trunk or 1.1 based) ?

1.1 based

>> Is there an example of this being done?
> 
> Yes in both the man page under QOS CONFIGURATION or under
> osm/doc/qos-config.txt in the repository.

I see that that file doesn't install in the doc directory with OFED,
perhaps that should be added (so that I can find it in the ${OFED}/doc
directory).

>>   Or is changing the /var/cache/osm/opensm.opts file
>> the preferred method of changing the options?
> 
> I think it's the only way but it is imperative QoS is enabled for this
> to have any effect.
> 
> -- Hal

That part I've got set in the opensm.opts file:

no_qos FALSE

>> Sasha Khapyorsky wrote:
>>> On 16:52 Wed 01 Nov , Oliver wrote:
 Hi, folks -

 I am trying to verify and evaluate IB QoS support, running openSM as
 subnet manager. The perftest program is extended to set SL as command
 line options instead of default 0, and by modifying VL arbitration
 tables, I am expecting to see the traffic shaping can actually take
 place, but it did not.  More details on configuration:

 in opensm.opts:
 # QoS default options
 qos_high_limit 255 # disable low priority table
 qos_vlarb_high: 0:4,1:4,2:8,3:0, 4:0  # this is to give VL 2
 (corresponding to SL 2) a higher weight 8
 qos_sl2vl 0,1,2,3,4, ... # no changes here

 I think (though not verified) the Voltaire HCA we are using can
 support 8 data VLs. I don't have much more information to go on why
 qos shaping is not taking place, any suggestions?
>>> You can verify actual port's parameters with smpquery (from diags), you
>>> will need to run to get QoS related parameters:
>>>
>>>   smpquery portinfo ...
>>>   smpquery vlarb ...
>>>   smpquery sl2vl ...
>>>
>>> Sasha
>>>
 A related question is, if I modify qos setting in SM, do I need to
 restart SA on each hosts for it to see the changes? (I am hoping not,
 as I tried in the test, it doesn't seem to make a difference)

 Thanks for help.
 -- 
 Oliver

 ___
 openib-general mailing list
 openib-general@openib.org
 http://openib.org/mailman/listinfo/openib-general

 To unsubscribe, please visit 
 http://openib.org/mailman/listinfo/openib-general

>>> ___
>>> openib-general mailing list
>>> openib-general@openib.org
>>> http://openib.org/mailman/listinfo/openib-general
>>>
>>> To unsubscribe, please visit 
>>> http://openib.org/mailman/listinfo/openib-general
>>>
>>>
> 
> 

-- 
Makia Minich <[EMAIL PROTECTED]>
National Center for Computation Science
Oak Ridge National Laboratory
Phone: 865.574.7460

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-02 Thread Hal Rosenstock
Makia,

On Wed, 2006-11-01 at 17:42, Makia Minich wrote:
> It just so happens that we've started looking at this here at ORNL as
> well.  I had a question about the options.  The manpage makes it seem
> that you can set these qos options (e.g. qos_high_limit) from the
> command line, but I haven't been overly successful.

What are you referring to in the man page ?

Which OpenSM are you using (trunk or 1.1 based) ?

> Is there an example of this being done?

Yes in both the man page under QOS CONFIGURATION or under
osm/doc/qos-config.txt in the repository.

>   Or is changing the /var/cache/osm/opensm.opts file
> the preferred method of changing the options?

I think it's the only way but it is imperative QoS is enabled for this
to have any effect.

-- Hal

> Sasha Khapyorsky wrote:
> > On 16:52 Wed 01 Nov , Oliver wrote:
> >> Hi, folks -
> >>
> >> I am trying to verify and evaluate IB QoS support, running openSM as
> >> subnet manager. The perftest program is extended to set SL as command
> >> line options instead of default 0, and by modifying VL arbitration
> >> tables, I am expecting to see the traffic shaping can actually take
> >> place, but it did not.  More details on configuration:
> >>
> >> in opensm.opts:
> >> # QoS default options
> >> qos_high_limit 255 # disable low priority table
> >> qos_vlarb_high: 0:4,1:4,2:8,3:0, 4:0  # this is to give VL 2
> >> (corresponding to SL 2) a higher weight 8
> >> qos_sl2vl 0,1,2,3,4, ... # no changes here
> >>
> >> I think (though not verified) the Voltaire HCA we are using can
> >> support 8 data VLs. I don't have much more information to go on why
> >> qos shaping is not taking place, any suggestions?
> > 
> > You can verify actual port's parameters with smpquery (from diags), you
> > will need to run to get QoS related parameters:
> > 
> >   smpquery portinfo ...
> >   smpquery vlarb ...
> >   smpquery sl2vl ...
> > 
> > Sasha
> > 
> >> A related question is, if I modify qos setting in SM, do I need to
> >> restart SA on each hosts for it to see the changes? (I am hoping not,
> >> as I tried in the test, it doesn't seem to make a difference)
> >>
> >> Thanks for help.
> >> -- 
> >> Oliver
> >>
> >> ___
> >> openib-general mailing list
> >> openib-general@openib.org
> >> http://openib.org/mailman/listinfo/openib-general
> >>
> >> To unsubscribe, please visit 
> >> http://openib.org/mailman/listinfo/openib-general
> >>
> > 
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > 


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-02 Thread Hal Rosenstock
Hi Oliver,

On Wed, 2006-11-01 at 16:52, Oliver wrote:
> Hi, folks -
> 
> I am trying to verify and evaluate IB QoS support, running openSM as
> subnet manager. The perftest program is extended to set SL as command
> line options instead of default 0, and by modifying VL arbitration
> tables, I am expecting to see the traffic shaping can actually take
> place,

How is this being observed/measured ?

>  but it did not.  More details on configuration:
> 
> in opensm.opts:
> # QoS default options
> qos_high_limit 255 # disable low priority table

This doesn't disable it but it won't be scheduled unless there are no
high priority packets to send.

> qos_vlarb_high: 0:4,1:4,2:8,3:0, 4:0  # this is to give VL 2
> (corresponding to SL 2) a higher weight 8
> qos_sl2vl 0,1,2,3,4, ... # no changes here
> 
> I think (though not verified) the Voltaire HCA we are using can
> support 8 data VLs.

Yes, 8 VLs should be supported in your subnet. You can verify this with
smpquery portinfo on the HCA port and examine OperVLs assuming the port
is ACTIVE.

>  I don't have much more information to go on why
> qos shaping is not taking place, any suggestions?

Sasha's email is a good start. We can go from there.

> A related question is, if I modify qos setting in SM, do I need to
> restart SA on each hosts for it to see the changes? (I am hoping not,
> as I tried in the test, it doesn't seem to make a difference)

Not sure what you mean. SA is tightly coupled with the OpenSM. Do you
mean SA client ? The client hosts don't need restarting but did you
restart OpenSM with your QoS configuration ?

BTW, which OpenSM are you running ?

-- Hal

> Thanks for help.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-01 Thread Sasha Khapyorsky
On 17:42 Wed 01 Nov , Makia Minich wrote:
> It just so happens that we've started looking at this here at ORNL as
> well.  I had a question about the options.  The manpage makes it seem
> that you can set these qos options (e.g. qos_high_limit) from the
> command line,

AFAIK there is option -Q which enables/disables QoS configuration, it
does nothing with particular qos_high_limit parameter. Configuration
parameters (qos_max_vls, qos_high_limit, qos_vlarb_high, qos_vlarb_low
and qos_sl2vl templates) should be specified in opensm.opts file (or
other OpenSM configuration file which does not exist yet).

> but I haven't been overly successful.  Is there an example
> of this being done?  Or is changing the /var/cache/osm/opensm.opts file
> the preferred method of changing the options?

Yes, you need to specify QoS parameters in opensm.opts file.

There is some readme file osm/doc/qos-config.txt which describes
details (I think man page have similar section too).


Ah, important note with OFED QoS is disabled by default in OpenSM, so -Q
option should be used, which for OFED means --qos. OpenSM from trunk
supports QoS configuration by default and -Q option disables this (and
means --no-qos), this can be confused, I know.

Sasha


> 
> Sasha Khapyorsky wrote:
> > On 16:52 Wed 01 Nov , Oliver wrote:
> >> Hi, folks -
> >>
> >> I am trying to verify and evaluate IB QoS support, running openSM as
> >> subnet manager. The perftest program is extended to set SL as command
> >> line options instead of default 0, and by modifying VL arbitration
> >> tables, I am expecting to see the traffic shaping can actually take
> >> place, but it did not.  More details on configuration:
> >>
> >> in opensm.opts:
> >> # QoS default options
> >> qos_high_limit 255 # disable low priority table
> >> qos_vlarb_high: 0:4,1:4,2:8,3:0, 4:0  # this is to give VL 2
> >> (corresponding to SL 2) a higher weight 8
> >> qos_sl2vl 0,1,2,3,4, ... # no changes here
> >>
> >> I think (though not verified) the Voltaire HCA we are using can
> >> support 8 data VLs. I don't have much more information to go on why
> >> qos shaping is not taking place, any suggestions?
> > 
> > You can verify actual port's parameters with smpquery (from diags), you
> > will need to run to get QoS related parameters:
> > 
> >   smpquery portinfo ...
> >   smpquery vlarb ...
> >   smpquery sl2vl ...
> > 
> > Sasha
> > 
> >> A related question is, if I modify qos setting in SM, do I need to
> >> restart SA on each hosts for it to see the changes? (I am hoping not,
> >> as I tried in the test, it doesn't seem to make a difference)
> >>
> >> Thanks for help.
> >> -- 
> >> Oliver
> >>
> >> ___
> >> openib-general mailing list
> >> openib-general@openib.org
> >> http://openib.org/mailman/listinfo/openib-general
> >>
> >> To unsubscribe, please visit 
> >> http://openib.org/mailman/listinfo/openib-general
> >>
> > 
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > 
> 
> -- 
> Makia Minich <[EMAIL PROTECTED]>
> National Center for Computation Science
> Oak Ridge National Laboratory
> Phone: 865.574.7460

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-01 Thread Makia Minich
It just so happens that we've started looking at this here at ORNL as
well.  I had a question about the options.  The manpage makes it seem
that you can set these qos options (e.g. qos_high_limit) from the
command line, but I haven't been overly successful.  Is there an example
of this being done?  Or is changing the /var/cache/osm/opensm.opts file
the preferred method of changing the options?

Sasha Khapyorsky wrote:
> On 16:52 Wed 01 Nov , Oliver wrote:
>> Hi, folks -
>>
>> I am trying to verify and evaluate IB QoS support, running openSM as
>> subnet manager. The perftest program is extended to set SL as command
>> line options instead of default 0, and by modifying VL arbitration
>> tables, I am expecting to see the traffic shaping can actually take
>> place, but it did not.  More details on configuration:
>>
>> in opensm.opts:
>> # QoS default options
>> qos_high_limit 255 # disable low priority table
>> qos_vlarb_high: 0:4,1:4,2:8,3:0, 4:0  # this is to give VL 2
>> (corresponding to SL 2) a higher weight 8
>> qos_sl2vl 0,1,2,3,4, ... # no changes here
>>
>> I think (though not verified) the Voltaire HCA we are using can
>> support 8 data VLs. I don't have much more information to go on why
>> qos shaping is not taking place, any suggestions?
> 
> You can verify actual port's parameters with smpquery (from diags), you
> will need to run to get QoS related parameters:
> 
>   smpquery portinfo ...
>   smpquery vlarb ...
>   smpquery sl2vl ...
> 
> Sasha
> 
>> A related question is, if I modify qos setting in SM, do I need to
>> restart SA on each hosts for it to see the changes? (I am hoping not,
>> as I tried in the test, it doesn't seem to make a difference)
>>
>> Thanks for help.
>> -- 
>> Oliver
>>
>> ___
>> openib-general mailing list
>> openib-general@openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit 
>> http://openib.org/mailman/listinfo/openib-general
>>
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 
> 

-- 
Makia Minich <[EMAIL PROTECTED]>
National Center for Computation Science
Oak Ridge National Laboratory
Phone: 865.574.7460

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question on QoS support

2006-11-01 Thread Sasha Khapyorsky
On 16:52 Wed 01 Nov , Oliver wrote:
> Hi, folks -
> 
> I am trying to verify and evaluate IB QoS support, running openSM as
> subnet manager. The perftest program is extended to set SL as command
> line options instead of default 0, and by modifying VL arbitration
> tables, I am expecting to see the traffic shaping can actually take
> place, but it did not.  More details on configuration:
> 
> in opensm.opts:
> # QoS default options
> qos_high_limit 255 # disable low priority table
> qos_vlarb_high: 0:4,1:4,2:8,3:0, 4:0  # this is to give VL 2
> (corresponding to SL 2) a higher weight 8
> qos_sl2vl 0,1,2,3,4, ... # no changes here
> 
> I think (though not verified) the Voltaire HCA we are using can
> support 8 data VLs. I don't have much more information to go on why
> qos shaping is not taking place, any suggestions?

You can verify actual port's parameters with smpquery (from diags), you
will need to run to get QoS related parameters:

  smpquery portinfo ...
  smpquery vlarb ...
  smpquery sl2vl ...

Sasha

> A related question is, if I modify qos setting in SM, do I need to
> restart SA on each hosts for it to see the changes? (I am hoping not,
> as I tried in the test, it doesn't seem to make a difference)
> 
> Thanks for help.
> -- 
> Oliver
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about ehca CQ handling

2006-10-02 Thread Christoph Raisch
> While looking over the ehca driver from the perspective of adding a
> "peek CQ" operation, I noticed some code that looked funny.
>
> In hipz_set_cqx_n0() and hipz_set_cqx_n1(), what is the point of the
> calls to hipz_galpa_load_cq()?  The return value is discarded.  I see
> that hipz_galpa_load_cq() dereferences a volatile pointer internally,
> so I'm guessing this is some sort of ordering constraint.  But would
> it be just as good to do "barrier()" there?
>
>  - R.

No, barrier won't help,
the I/O bus connection is theoretically allowed to reorder and aggregate
writes in a defined pattern.
The recommended way to ensure that the ehca chip actually has seen the
write is doing a read on the same address.

Gruss / Regards . . . Christoph R



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about interrupt generation

2006-09-05 Thread harish
Hi,One more question. What kind of event mask helps mask the interrupts?thanksharishOn 9/5/06, harish <
[EMAIL PROTECTED]> wrote:Hi All,I tried the following simple experiment and am not able to understand the results:
Calcualted the number of interrupts  generated by the infiniband [with little or no traffic to the NIC] over a period of 10seconds and saw around 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K interrupts/sec.  This screwed up my host machine. To reduce the impact of the interrupts, I add a call back that is scheduled to be periodically called every few microseconds that masks the irq line used by the NIC and a little later unmasks the same. Noticed that with no traffic, I see anywhere between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+ interrupts/sec.
Am a newbie to infiniband technology and so do not understand why so many interrupts are getting generated when I have my call back periodically called. Could it be that the Infiniband supports MSI? Or is what I am seeing IPIs? Or does Infiniband generate interrupts based on types of events and what I am doing by masking/unmasking the interrupt line is one such event?
Any information/suggestions would be useful.Thanks in advance,harish


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] question: ib_umem page_size

2006-08-23 Thread Roland Dreier
 > > It gives the page size for the user memory described by the struct.
 > > The idea was that if/when someone tries to optimize for huge pages,
 > > then the low-level driver can know that a region is using huge pages
 > > without having to walk through the page list and search for the
 > > minimum physically contiguous size.
 > 
 > Hmm, mthca_reg_user_mr seems to do:
 > 
 > len = sg_dma_len(&chunk->page_list[j]) >> shift
 > 
 > which means that dma_len must be a multiple of page size.
 > 
 > Is this intentional?

Yes, it's intentional I think.  I'm probably missing something, but
the upper layer has just told mthca_reg_user_mr() that the page size
for this region is (1

Re: [openib-general] question: ib_umem page_size

2006-08-23 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: question: ib_umem page_size
> 
> Michael> Roland, could you please clarify what does the page_size
> Michael> field in struct ib_mem do?
> 
> It gives the page size for the user memory described by the struct.
> The idea was that if/when someone tries to optimize for huge pages,
> then the low-level driver can know that a region is using huge pages
> without having to walk through the page list and search for the
> minimum physically contiguous size.

Hmm, mthca_reg_user_mr seems to do:

len = sg_dma_len(&chunk->page_list[j]) >> shift

which means that dma_len must be a multiple of page size.

Is this intentional?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-22 Thread Sean Hefty
>Cool, I would go for XOR-ing a random value with the **local id** .
>
>Sean, my understanding it can be narrowed for doing so in:
>
>1) cm_alloc_id() after calling idr_get_new_above()
>2) cm_free_id() before calling idr_remove()
>3) cm_get_id() before calling idr_find()
>
>and initializing the random value we XOR in ib_cm_init()
>
>What do you think?

I like this approach as well.  I need to see what else I have in my queue first,
but will work on a patch, since it seems straightforward.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-22 Thread Or Gerlitz
Sean Hefty wrote:
  > When a new REQ is received, we enter its timewait structure into two 
trees: one
> sorted by remote ID, one sorted by remote QPN.  If the REQ is new, both would
> succeed, and timewait_info would be NULL.  Since timewait_info is not NULL, we
> are dealing with a REQ that re-uses the same remote ID or same remote QPN.  If
> the new REQ has the same remote ID (get_cm_id() returns non-NULL), we treat it
> as a duplicate, otherwise it's marked as stale.

OK, thanks for clarifying this.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-22 Thread Or Gerlitz
Roland Dreier wrote:
> Sean> If we record a base offset, we can start at any random
> Sean> number.  We just need to always add/subtract the base when
> Sean> getting a value from the IDR.
> 
> Good point -- or better still, we could XOR in a random bit pattern.
> That way we don't have to keep straight when to add and when to subtract.

Cool, I would go for XOR-ing a random value with the **local id** .

Sean, my understanding it can be narrowed for doing so in:

1) cm_alloc_id() after calling idr_get_new_above()
2) cm_free_id() before calling idr_remove()
3) cm_get_id() before calling idr_find()

and initializing the random value we XOR in ib_cm_init()

What do you think?

Or.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Roland Dreier
Sean> If we record a base offset, we can start at any random
Sean> number.  We just need to always add/subtract the base when
Sean> getting a value from the IDR.

Good point -- or better still, we could XOR in a random bit pattern.
That way we don't have to keep straight when to add and when to subtract.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Sean Hefty
>> If we get here, this means that the REQ was a new REQ and not a
>> duplicate, but the remote_id or remote_qpn is already in use.  We need
>> to reject the new REQ as containing stale data.
>
>I don't follow, if we get to the else case its as of cm_get_id()
>returning NULL. This holds when idr_find() returns NULL or when the
>entry returned is associated with a different remote_id, so what makes
>you to conclude that "the remote_id or remote_qpn is already in use"???

When a new REQ is received, we enter its timewait structure into two trees: one
sorted by remote ID, one sorted by remote QPN.  If the REQ is new, both would
succeed, and timewait_info would be NULL.  Since timewait_info is not NULL, we
are dealing with a REQ that re-uses the same remote ID or same remote QPN.  If
the new REQ has the same remote ID (get_cm_id() returns non-NULL), we treat it
as a duplicate, otherwise it's marked as stale.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Sean Hefty
>> Just to emphasize what Sean has pointed out, you are asking how can a CM
>> consumer know that a **local** QPN is not in the timewait state
>> according to the **remote** CM. Since the issue is with the remote CM,
>> it seems to me that pushing down timewait into verbs is not the correct
>> direction to look at.

We should still ensure that we don't give a user a local QPN that we know is in
timewait.  For example, a user 1 connects over a QP, transfers some data, then
destroys the QP.  User 2 allocates a new QP.  Can user 2 get the same QP as the
user 1?  If so, user 2 is likely to see a stale connection.  An option at this
point is for user 2 to destroy the QP and allocate a new one.  If they do this,
will they get the same QP again?

Now imagine if user 1 had created 1000 connections.  I believe that we should
make things as easy on user 2 as possible, including reducing the chance of
giving them a QP that the remote side is likely to have in timewait.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Sean Hefty
>How about (for the meantime, till this rework is designed && done) going
>to projecting the initial random local id into the range of (say)
>[0-1022] (i think 1023 is prime, if not choose a prime near it) this way
>with very good probability and with very little overhead on memory
>consumption a client connect/reboot/"reconnect" would work.

If we record a base offset, we can start at any random number.  We just need to
always add/subtract the base when getting a value from the IDR.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Roland Dreier
Or> How about (for the meantime, till this rework is designed &&
Or> done) going to projecting the initial random local id into the
Or> range of (say) [0-1022] (i think 1023 is prime, if not choose
Or> a prime near it) this way with very good probability and with
Or> very little overhead on memory consumption a client
Or> connect/reboot/"reconnect" would work.

Of course 1023 is not prime -- since (a^2 - b^2) = (a - b) * (a + b),
it follows 2 ^ 10 - 1 = (2^5 - 1) * (2^5 + 1) = 31 * 33.

I don't see why you care about the range being prime, but the closest
primes to 1024 are 1021 and 1031.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Or Gerlitz
This email appear in the archive, but seems not to be distributed to the 
subscribers so i am reposting it.


Or Gerlitz wrote:
> Sean Hefty wrote:
>> Even if we pushed timewait handling under verbs, a user could always 
>> get a QP that the remote side thinks is connected.  The original 
>> connection could fail to disconnect because of lost DREQs.  So, 
>> locally, the QP could have exited timewait, while the remote side 
>> still thinks that it's connected.
> 
> Sean,
> 
> If you don't mind (also related to the patch you have sent Eric of 
> randomizing the initial local cm id) to get into this deeper, can we do 
> here a quick code review of the REQ matching logic? I wrote what i 
> understand below.
> 
>> static struct cm_id_private * cm_match_req(struct cm_work *work,
>> +  struct cm_id_private 
>> *cm_id_priv)
>> +{
>> +   struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
>> +   struct cm_timewait_info *timewait_info;
>> +   struct cm_req_msg *req_msg;
>> +   unsigned long flags;
>> +
>> +   req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
>> +
>> +   /* Check for duplicate REQ and stale connections. */
>> +   spin_lock_irqsave(&cm.lock, flags);
>> +   timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
>> +   if (!timewait_info)
>> +   timewait_info = 
>> cm_insert_remote_qpn(cm_id_priv->timewait_info);
> 
> This if() holds when  entry is present in 
> remote_id_table OR  entry is present in 
> remote_qpn_table
> 
>> +   if (timewait_info) {
>> +   cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
>> +  
>> timewait_info->work.remote_id);
>  > +   spin_unlock_irqrestore(&cm.lock, flags);
>> +   if (cur_cm_id_priv) {
>> +   cm_dup_req_handler(work, cur_cm_id_priv);
>> +   cm_deref_id(cur_cm_id_priv);
> 
>  entry exists in local_id_table, looking on 
> dup_req_handler() i see it sends REP when the id is in "MRA sent" and 
> sends a STALE_CONN REJ when the id is in timewait state, else it does 
> nothing.
> 
>> +   } else
>> +   cm_issue_rej(work->port, work->mad_recv_wc,
>> +IB_CM_REJ_STALE_CONN, 
>> CM_MSG_RESPONSE_REQ,
>> +NULL, 0);
> 
> what is this case? there is no  entry but there is 
> remote  or  entries???
> 
>> +   goto error;
>> +   }
> 
> Or.
> 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Or Gerlitz
This email appear in the archive, but seems not to be distributed to the 
subscribers so i am reposting it.

Or Gerlitz wrote:
> Arlin Davis wrote:
>> We are running into connection reject issues (IB_CM_REJ_STALE_CONN) 
>> with our application under heavy load and lots of connections.
>>
>> We occassionally get a reject based on the QP being in timewait state 
>> leftover from a prior connection. It appears that the CM keeps track 
>> of the QP's in timewait state on both sides of the connection, 
> 
> How did you verify that? the CM generated REJ with IB_CM_REJ_STALE_CONN 
> in two flows for the passive side (ie rejecting a REQ) and one flow for 
> the active side (ie rejecting a REP).
> 
>> How can a consumer know for sure that the new QP will not be in a 
>> timewait state according to the CM? Does it make sense to push the 
>> timewait functionality down into verbs? If not, is there a way for the 
>> CM to hold a reference to the QP until the timewait expires?
> 
> Just to emphasize what Sean has pointed out, you are asking how can a CM 
> consumer know that a **local** QPN is not in the timewait state 
> according to the **remote** CM. Since the issue is with the remote CM, 
> it seems to me that pushing down timewait into verbs is not the correct 
> direction to look at.
> 
> Or.
> 



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Or Gerlitz
>>> +   } else
>>> +   cm_issue_rej(work->port, work->mad_recv_wc,
>>> +IB_CM_REJ_STALE_CONN, 
>>> CM_MSG_RESPONSE_REQ,
>>> +NULL, 0);
>>
>>
>> what is this case? there is no  entry but there is 
>> remote  or  entries???

> If we get here, this means that the REQ was a new REQ and not a 
> duplicate, but the remote_id or remote_qpn is already in use.  We need 
> to reject the new REQ as containing stale data.

I don't follow, if we get to the else case its as of cm_get_id() 
returning NULL. This holds when idr_find() returns NULL or when the 
entry returned is associated with a different remote_id, so what makes 
you to conclude that "the remote_id or remote_qpn is already in use"???

> +static struct cm_id_private * cm_get_id(__be32 local_id, __be32 remote_id)
> +{
> +   struct cm_id_private *cm_id_priv;
> +
> +   cm_id_priv = idr_find(&cm.local_id_table, (__force int) local_id);
> +   if (cm_id_priv) {
> +   if (cm_id_priv->id.remote_id == remote_id)
> +   atomic_inc(&cm_id_priv->refcount);
> +   else
> +   cm_id_priv = NULL;
> +   }
> +
> +   return cm_id_priv;
> +}

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-20 Thread Or Gerlitz
Sean Hefty wrote:
> Or Gerlitz wrote:

>> If you don't mind (also related to the patch you have sent Eric of 
>> randomizing the initial local cm id) to get into this deeper, can we do 

> There's an issue trying to randomize the initial local CM ID.  The way 
> the IDR works, if you start at a high value, then the IDR size grows up 
> to the size of the first value, which can result in memory allocation 
> failures.  In my tests, using a random value would frequently result in 
> connection failures because of low memory.

> My conclusion is that the local ID assignment in the IB CM needs to be 
> reworked, or we will run into a condition that after X number of 
> connections have been established, we will be unable to create any new 
> connections, even if the previous connections have all been destroyed.

How about (for the meantime, till this rework is designed && done) going 
to projecting the initial random local id into the range of (say) 
[0-1022] (i think 1023 is prime, if not choose a prime near it) this way 
with very good probability and with very little overhead on memory 
consumption a client connect/reboot/"reconnect" would work.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-17 Thread Sean Hefty
Or Gerlitz wrote:
> If you don't mind (also related to the patch you have sent Eric of 
> randomizing the initial local cm id) to get into this deeper, can we do 

There's an issue trying to randomize the initial local CM ID.  The way the IDR 
works, if you start at a high value, then the IDR size grows up to the size of 
the first value, which can result in memory allocation failures.  In my tests, 
using a random value would frequently result in connection failures because of 
low memory.

My conclusion is that the local ID assignment in the IB CM needs to be 
reworked, 
or we will run into a condition that after X number of connections have been 
established, we will be unable to create any new connections, even if the 
previous connections have all been destroyed.

>> static struct cm_id_private * cm_match_req(struct cm_work *work,
>> +  struct cm_id_private 
>> *cm_id_priv)
>> +{
>> +   struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
>> +   struct cm_timewait_info *timewait_info;
>> +   struct cm_req_msg *req_msg;
>> +   unsigned long flags;
>> +
>> +   req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
>> +
>> +   /* Check for duplicate REQ and stale connections. */
>> +   spin_lock_irqsave(&cm.lock, flags);
>> +   timewait_info = cm_insert_remote_id(cm_id_priv->timewait_info);
>> +   if (!timewait_info)
>> +   timewait_info = 
>> cm_insert_remote_qpn(cm_id_priv->timewait_info);
> 
> 
> This if() holds when  entry is present in 
> remote_id_table OR  entry is present in 
> remote_qpn_table

correct

> 
>> +   if (timewait_info) {
>> +   cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
>> +  
>> timewait_info->work.remote_id);
> 
>  > +   spin_unlock_irqrestore(&cm.lock, flags);
> 
>> +   if (cur_cm_id_priv) {
>> +   cm_dup_req_handler(work, cur_cm_id_priv);
>> +   cm_deref_id(cur_cm_id_priv);
> 
> 
>  entry exists in local_id_table, looking on 
> dup_req_handler() i see it sends REP when the id is in "MRA sent" and 
> sends a STALE_CONN REJ when the id is in timewait state, else it does 
> nothing.

It sends an MRA if in the MRA sent state, or a reject as indicated.

>> +   } else
>> +   cm_issue_rej(work->port, work->mad_recv_wc,
>> +IB_CM_REJ_STALE_CONN, 
>> CM_MSG_RESPONSE_REQ,
>> +NULL, 0);
> 
> 
> what is this case? there is no  entry but there is 
> remote  or  entries???

If we get here, this means that the REQ was a new REQ and not a duplicate, but 
the remote_id or remote_qpn is already in use.  We need to reject the new REQ 
as 
containing stale data.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about QP's in timewait state and CM stale conn rejects

2006-08-16 Thread Sean Hefty
Arlin Davis wrote:
> How can a consumer know for sure that the new QP will not be in a 
> timewait state according to the CM?

Given that the QP may have been in use by another process, I don't think that 
there's any way for the new owner to know.

> Does it make sense to push the timewait functionality down into verbs?

This may be a clean way of handling the issue, but... see below.

> If not, is there a way for the 
> CM to hold a reference to the QP until the timewait expires?

For userspace QPs, the CM doesn't have access to the QP, so some sort of 
special 
call into verbs would be needed.

Even if we pushed timewait handling under verbs, a user could always get a QP 
that the remote side thinks is connected.  The original connection could fail 
to 
disconnect because of lost DREQs.  So, locally, the QP could have exited 
timewait, while the remote side still thinks that it's connected.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question: ib_umem page_size

2006-08-15 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: question: ib_umem page_size
> 
> Michael> Roland, could you please clarify what does the page_size
> Michael> field in struct ib_mem do?
> 
> It gives the page size for the user memory described by the struct.
> The idea was that if/when someone tries to optimize for huge pages,
> then the low-level driver can know that a region is using huge pages
> without having to walk through the page list and search for the
> minimum physically contiguous size.

Thoguth though. Cool, that's exactly what I'm trying to do.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question: ib_umem page_size

2006-08-15 Thread Roland Dreier
Michael> Roland, could you please clarify what does the page_size
Michael> field in struct ib_mem do?

It gives the page size for the user memory described by the struct.
The idea was that if/when someone tries to optimize for huge pages,
then the low-level driver can know that a region is using huge pages
without having to walk through the page list and search for the
minimum physically contiguous size.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Parks Fields

At 12:36 PM 6/5/2006, Talpey, Thomas wrote:

Thanks Parks, this is a very interesting perspective.
I will avoid going into my rant about edge devices for
now, however. :-)


Cool, you can send it direct if you want.



I am not sure what you mean about using SDP "end to end".
I assume you would perhaps use SDP to these edge nodes,
but this would require terminating the SDP connection and
re-issuing the stream over TCP to the Panasas box, wouldn't it?


yes It would probably have to work that way. Another problem would be 
SDP is not routeable.





Would this bridging be done in-kernel, like your IPoIB/Ethernet
solution today, or would you implement a daemon? It will be
a difficult challenge, I predict.


We are just starting to think about things like this, and trying to 
keep an open mind to all possibilities.  We have no solutions to do 
this yet. There might be better ways.
So you are correct and haven't thought it all the way through and 
have no alterative plan other than IPoIB at the moment.


My next step will be testing 4x-ddr IPoIB before doing anything else.
parks



   * Correspondence *

This email contains no programmatic content that requires independent 
ADC review  




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Talpey, Thomas
Thanks Parks, this is a very interesting perspective.
I will avoid going into my rant about edge devices for
now, however. :-)

I am not sure what you mean about using SDP "end to end".
I assume you would perhaps use SDP to these edge nodes,
but this would require terminating the SDP connection and
re-issuing the stream over TCP to the Panasas box, wouldn't it?

Would this bridging be done in-kernel, like your IPoIB/Ethernet
solution today, or would you implement a daemon? It will be
a difficult challenge, I predict.

Tom.

At 02:16 PM 6/5/2006, Parks Fields wrote:
>
>>
>>I consider IPoIB to be Ethernet emulation.
>>
>>As for apples and oranges, my point exactly.
>
>
>It is not really about comparisons. Here at LANL we have an 
>environment where all our new Clusters have to mount our global 
>parallel file system Panasas. It is ethernet and will be for a while.
>
>Cluster interconnect is IB and the compute nodes do NOT have 
>ethernet, so we created i-o nodes to "bridge " IB to ethernet.
>
>Compute nodeIB---i/o node---10gig---ethernet switch   panasas
>
>We like to match / balance the network to bandwidth to storage 
>bandwidth plus try to achieve 1GB/sec per TF of the machine.  EX: 
>50TF machine  = 50 GB/sec of storage bandwidth needed.
>
>So if IPoIB would give us ~700 MB/sec and came out the other side 
>with 10gigE at ~800 that would be nice.
>Hope this helps.   We are now trying to find out is SDP will work end-to-end.
>
>thanks
>parks
>
>
>
>* Correspondence *
>
>This email contains no programmatic content that requires independent 
>ADC review  
>
>
>


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Parks Fields




I consider IPoIB to be Ethernet emulation.

As for apples and oranges, my point exactly.



It is not really about comparisons. Here at LANL we have an 
environment where all our new Clusters have to mount our global 
parallel file system Panasas. It is ethernet and will be for a while.


Cluster interconnect is IB and the compute nodes do NOT have 
ethernet, so we created i-o nodes to "bridge " IB to ethernet.


Compute nodeIB---i/o node---10gig---ethernet switch   panasas

We like to match / balance the network to bandwidth to storage 
bandwidth plus try to achieve 1GB/sec per TF of the machine.  EX: 
50TF machine  = 50 GB/sec of storage bandwidth needed.


So if IPoIB would give us ~700 MB/sec and came out the other side 
with 10gigE at ~800 that would be nice.

Hope this helps.   We are now trying to find out is SDP will work end-to-end.

thanks
parks



   * Correspondence *

This email contains no programmatic content that requires independent 
ADC review  




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Talpey, Thomas
[PATCHv2 1/2] resend: mthca support for
>max_map_per_fmr
>  device attribute (Roland Dreier)
>   7. Re: Question about the IPoIB bandwidth performance ?
>  (Talpey, Thomas)
>   8. Re: Question about the IPoIB bandwidth performance ? (hbchen)
>
>----- Message from "hbchen" <[EMAIL PROTECTED]> on Mon, 05 Jun 2006 09:38:24
>-0600 -
>   
>  To: "Hal Rosenstock" <[EMAIL PROTECTED]> 
>   
>  cc: "OPENIB"  
>   
> Subject: Re: [openib-general] Question about the IPoIB bandwidth  
>  performance ?
>   
>
>Hal Rosenstock wrote:
>  On Mon, 2006-06-05 at 11:12, hbchen wrote:
>
>Hi,
>I have a question about the IPoIB bandwidth performance.
>I did netperf testing using Single GiGE, Myrinet D card,
>Myrinet 10G
>ethernet card,
>and Voltaire Infiniband 4X HCA400Ex (PCI-Express interface).
>
>
>NIC (Jumbo enabled) Line bandwidth(LB) IPoverNIC bandwidth
>utilization
>(IPoNIC/LB)
>-  --
>--
>Single Gigabit NIC : 1Gb/sec=125MB/sec 120MB/sec 96% (PIC-X
>interface)
>Myrinet D card : 250MB/sec 240~-245MB/sec 96% ~ 98% (PCI-X
>interface)
>Myrinet 10G Ethernet: 10Gb/sec=1280MB/sec 980MB/sec 76.6% (My
>testing
>using Linux 2.6.14.6)
>(PCI-Express) 1225MB/sec 95.7% (Data from Myrinet website)
>IB HCA4X(PCI-Express): 10Gb/sec=1280MB/sec 420MB/sec 32.8% (My
>testing
>using Linux 2.6.14.6)
>474MB/sec 37% (the best from OpenIB mailing list)
>(2.6.12-rc5 patch 1)
>
>Why the bandwidth utilization of IPoIB is so low compared to
>the others
>NICs?
>
>
>  One thing to note is that the max utilization of 10G IB (4x) is 8G
>  due
>  to the signalling being included in this rate (unlike ethernet whose
>  rate represents the data rate and does not include the signalling
>  overhead).
>
>Hal,
>Even with this IB-4X = 8Gb/sec = 1024 MB/sec the IPoIB bandwidth
>utilization is still very low.
>>> IPoIB=420MB/sec
>>> bandwidth utilization= 420/1024 = 41.01%
>
>
>HB
>
>
>
>
>  -- Hal
>
>
>There must be a lot of room to improve the IPoIB software to
>reach 75%+
>bandwidth utilization.
>
>
>HB Chen
>Los Alamos National Lab
>[EMAIL PROTECTED]
>
>___
>openib-general mailing list
>openib-general@openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit
>http://openib.org/mailman/listinfo/openib-general
>
>
>
>
>
>- Message from "Hal Rosenstock" <[EMAIL PROTECTED]> on 05 Jun 2006
>11:34:50 -0400 -
>   
>   To: "Eitan Zahavi" <[EMAIL PROTECTED]>   
>   
>   cc: "OPENIB" 
>   
>  Subject: [openib-general] Re: [PATCH] osm: trivial missing header files  
>   fix 
>   
>
>On Mon, 2006-06-05 at 08:51, Eitan Zahavi wrote:
>> Hi Hal
>>
>> Cleaning up compilation warnings I found there missing includes in
>> various sources.
>>
>> Eitan
>>
>> Signed-off-by:  Eitan Zahavi <[EMAIL PROTECTED]>
>
>Thanks. Applied to trunk only.
>
>-- Hal
>
>
>
>- Message from "Hal Rosenstock" <[EMAIL PROTECTED]> on 05 Jun 2006
>11:45:28 -0400 -
>   
> To: "Eitan Zahavi" <[EMAIL P

Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread hycsw
Tom,

We are in the process of measuring the CPU utilization on our NFS/RDMA
experiments in contrast with regular the NFS, we also intend to include 
netperf numbers and will keep you posted with our results as soon as 
possible.

Helen

- original Message -
>From [EMAIL PROTECTED] Mon Jun  5 09:03:56 2006


Helen, have you measured the CPU utilizations during these runs?
Perhaps you are out of CPU.

Outrageous opinion follows.

Frankly, an IB HCA running Ethernet emulation is approximately the
world's worst 10GbE adapter (not to put too fine of a point on it :-) )
There is no hardware checksumming, nor large-send offloading, both
of which force overhead onto software. And, as you just discovered
it isn't even 10Gb!

In general, network emulation layers are always going to perform more
poorly than native implementations. But this is only a generality learned
from years of experience with them.

Tom.  

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Bernard King-Smith
> Thomas Talpey said:
> At 11:38 AM 6/5/2006, hbchen wrote:
> >Even with this IB-4X = 8Gb/sec = 1024 MB/sec the IPoIB bandwidth
utilization is still very > low.
> >>> IPoIB=420MB/sec
> >>> bandwidth utilization= 420/1024 = 41.01%
>
>
> Helen, have you measured the CPU utilizations during these runs?
> Perhaps you are out of CPU.
>
> Outrageous opinion follows.
>
> Frankly, an IB HCA running Ethernet emulation is approximately the
> world's worst 10GbE adapter (not to put too fine of a point on it :-) )
> There is no hardware checksumming, nor large-send offloading, both
> of which force overhead onto software. And, as you just discovered
> it isn't even 10Gb!
>
> In general, network emulation layers are always going to perform more
> poorly than native implementations. But this is only a generality learned
> from years of experience with them
>
> Tom.

Hold on here

Who said anything about Ethernnet emulation. Hal said he is running
straight Netperf over IB not ethernet emulation. I don't think that any IB
HCAs today support offloaded checksum and large send. You are comparing
apples and oranges. The only appropriate comparison is to use the IBM HCA
compared to the mthca adapters. I think Hal's point is actually comparing
"any" IB adapter against GigE and Myrinet. Both the mthca and IBM HCA's
should get similar IPoIB performance using identical OpenIB stacks.


Bernie King-Smith
IBM Corporation
Server Group
Cluster System Performance
[EMAIL PROTECTED](845)433-8483
Tie. 293-8483 or wombat2 on NOTES

"We are not responsible for the world we are born into, only for the world
we leave when we die.
So we have to accept what has gone before us and work to change the only
thing we can,
-- The Future." William Shatner


   
 openib-general-re 
 [EMAIL PROTECTED]  
 Sent by:   To 
 openib-general-bo openib-general@openib.org   
 [EMAIL PROTECTED]   cc 
   
   Subject 
 06/05/2006 12:11  openib-general Digest, Vol 24,  
 PMIssue 22
   
   
 Please respond to 
 [EMAIL PROTECTED] 
 enib.org  
   
   




Send openib-general mailing list submissions to
 openib-general@openib.org

To subscribe or unsubscribe via the World Wide Web, visit
 http://openib.org/mailman/listinfo/openib-general
or, via email, send a message with subject or body 'help' to
 [EMAIL PROTECTED]

You can reach the person managing the list at
 [EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of openib-general digest..."
Today's Topics:

   1. Re: Question about the IPoIB bandwidth   performance ?
(hbchen)
   2. Re: [PATCH] osm: trivial missing header files fix (Hal Rosenstock)
   3. Re: [PATCH] osm: trivial missing cast in osmt_service call
  for memcmp (Hal Rosenstock)
   4. Re: Question about the IPoIB bandwidth performance ?
  (Bernard King-Smith)
   5. Re: Re: [PATCH]Repost: IPoIB skb panic (Shirley Ma)
   6. Re: [PATCHv2 1/2] resend: mthca support for
max_map_per_fmr
  device attribute (Roland Dreier)
   7. Re: Question about the IPoIB bandwidth performance ?
  (Talpey, Thomas)
   8. Re: Question about the IPoIB bandwidth performance ? (hbchen)

- Message from "hbchen" <[EMAIL PROTECTED]> on Mon, 05 Jun 2006 09:38:24
-0600 -
   
  To: "Hal Rosenstock" <[EMAIL PROTECTED]> 
               
  cc: "OPENIB"  
   
 Subject: Re: [openib-general] Question about the IPoIB bandwidth  
  performance ? 

RE: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Felix Marti








 

 











From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of hbchen
Sent: Monday, June 05, 2006 9:12
AM
To: Talpey, Thomas
Cc: openib-general@openib.org
Subject: Re: [openib-general]
Question about the IPoIB bandwidth performance ?



 

Talpey, Thomas wrote:



At 11:38 AM 6/5/2006, hbchen wrote:  

Even with this IB-4X = 8Gb/sec = 1024 MB/sec the IPoIB bandwidth utilization is still very low.    



IPoIB=420MB/sec  bandwidth utilization= 420/1024 = 41.01%    





  Helen, have you measured the CPU utilizations during these runs?Perhaps you are out of CPU.   

Tom,
I am HB Chen from LANL not the Helen Chen from SNL.
I didn't run out of CPU.  It is about 70-80 % of CPU utilization.
  



Outrageous opinion follows. Frankly, an IB HCA running Ethernet emulation is approximately theworld's worst 10GbE adapter (not to put too fine of a point on it :-) )  

The IP over Myrinet ( Ethernet emulation) can reach
upto 96%-98%  bandwidth utilization why not the IPoIB ?



[Felix:] As pointed out earlier: it is the message rate. If
you change the mtu to 1500B (instead of the non-standard 9000B Jumbo frames)
performance will drop into the same range as what you see with IPoIB (limited
by the receiver).


HB Chen 
[EMAIL PROTECTED]



There is no hardware checksumming, nor large-send offloading, bothof which force overhead onto software. And, as you just discoveredit isn't even 10Gb! In general, network emulation layers are always going to perform morepoorly than native implementations. But this is only a generality learnedfrom years of experience with them. Tom.     

 








___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Talpey, Thomas
At 12:11 PM 6/5/2006, hbchen wrote:
>>Perhaps you are out of CPU.
>>
>>  
>Tom,
>I am HB Chen from LANL not the Helen Chen from SNL.

Oops, sorry! I have too many email messages going by. :-)
HB, then.


>I didn't run out of CPU.  It is about 70-80 % of CPU utilization.

But, is one CPU at 100%? Interrupt processing, for example.

>  
>>
>>Outrageous opinion follows.
>>
>>Frankly, an IB HCA running Ethernet emulation is approximately the
>>world's worst 10GbE adapter (not to put too fine of a point on it :-) )
>>  
>The IP over Myrinet ( Ethernet emulation) can reach upto 96%-98%  bandwidth 
>utilization why not the IPoIB ?

I am not familiar with the implementation Myrinet uses. In any
case, I am not saying that an emulation can't reach certain goals,
just that they will pretty much always be inferior to native approaches.
Sometimes far inferior.

Tom. 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread hbchen




Talpey, Thomas wrote:

  At 11:38 AM 6/5/2006, hbchen wrote:
  
  
Even with this IB-4X = 8Gb/sec = 1024 MB/sec the IPoIB bandwidth utilization is still very low.


  
IPoIB=420MB/sec  
bandwidth utilization= 420/1024 = 41.01%

  

  
  

Helen, have you measured the CPU utilizations during these runs?
Perhaps you are out of CPU.

  

Tom,
I am HB Chen from LANL not the Helen Chen from SNL.
I didn't run out of CPU.  It is about 70-80 % of CPU utilization.
 


  Outrageous opinion follows.

Frankly, an IB HCA running Ethernet emulation is approximately the
world's worst 10GbE adapter (not to put too fine of a point on it :-) )
  

The IP over Myrinet ( Ethernet emulation) can reach upto 96%-98% 
bandwidth utilization why not the IPoIB ?

HB Chen 
[EMAIL PROTECTED]

  There is no hardware checksumming, nor large-send offloading, both
of which force overhead onto software. And, as you just discovered
it isn't even 10Gb!

In general, network emulation layers are always going to perform more
poorly than native implementations. But this is only a generality learned
from years of experience with them.

Tom.  

  




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Talpey, Thomas
At 11:38 AM 6/5/2006, hbchen wrote:
>Even with this IB-4X = 8Gb/sec = 1024 MB/sec the IPoIB bandwidth utilization 
>is still very low.
>>> IPoIB=420MB/sec  
>>> bandwidth utilization= 420/1024 = 41.01%


Helen, have you measured the CPU utilizations during these runs?
Perhaps you are out of CPU.

Outrageous opinion follows.

Frankly, an IB HCA running Ethernet emulation is approximately the
world's worst 10GbE adapter (not to put too fine of a point on it :-) )
There is no hardware checksumming, nor large-send offloading, both
of which force overhead onto software. And, as you just discovered
it isn't even 10Gb!

In general, network emulation layers are always going to perform more
poorly than native implementations. But this is only a generality learned
from years of experience with them.

Tom.  

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Bernard King-Smith
Hal Rosenstock wrote:

> On Mon, 2006-06-05 at 11:12, hbchen wrote:
> > Hi,
> > I have a question about the IPoIB bandwidth performance.
> > I did netperf testing using Single GiGE, Myrinet D card, Myrinet 10G
> > ethernet card,
> > and Voltaire Infiniband 4X HCA400Ex (PCI-Express interface).
> >
> >
> > NIC (Jumbo enabled) Line bandwidth(LB) IPoverNIC bandwidth utilization
> > (IPoNIC/LB)
> > -  --
> > --
> > Single Gigabit NIC : 1Gb/sec=125MB/sec 120MB/sec 96% (PIC-X interface)
> > Myrinet D card : 250MB/sec 240~-245MB/sec 96% ~ 98% (PCI-X interface)
> > Myrinet 10G Ethernet: 10Gb/sec=1280MB/sec 980MB/sec 76.6% (My testing
> > > using Linux 2.6.14.6)
> > (PCI-Express) 1225MB/sec 95.7% (Data from Myrinet website)
> > IB HCA4X(PCI-Express): 10Gb/sec=1280MB/sec 420MB/sec 32.8% (My testing
> > using Linux 2.6.14.6)
> > 474MB/sec 37% (the best from OpenIB mailing list)
> > (2.6.12-rc5 patch 1)
> >
> > Why the bandwidth utilization of IPoIB is so low compared to the others
> > NICs?
>
> One thing to note is that the max utilization of 10G IB (4x) is 8G due
> to the signalling being included in this rate (unlike ethernet whose
> rate represents the data rate and does not include the signalling
> overhead).
>
> -- Hal
>

You also have larger IP packets when you use GigE ( especially in large
send/offload ) and Myrinet. I think Myrinet uses a 60K MTU and for GigE,
without large send you get a 9000 MTU. With large send you get a 64K buffer
to the adapter so fragmentation to 1500/9000 IP packets is offloaded in the
adapter.

Currently with IPoIB using UD mode, you have to generate lots of 2K
packets. With serialized IBoIP drivers you end up bottlenecking on a single
CPU. There is a IPoIB-CM IEFT spec out which should significantly improve
IPoIB performance if implemented.

> > There must be a lot of room to improve the IPoIB software to reach 75%+
> > bandwidth utilization.
> >
> >
> > HB Chen
> > Los Alamos National Lab
> > [EMAIL PROTECTED]
> >
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
> >


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general


Bernie King-Smith
IBM Corporation
Server Group
Cluster System Performance
[EMAIL PROTECTED](845)433-8483
Tie. 293-8483 or wombat2 on NOTES

"We are not responsible for the world we are born into, only for the world
we leave when we die.
So we have to accept what has gone before us and work to change the only
thing we can,
-- The Future." William Shatner


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread hbchen




Hal Rosenstock wrote:

  On Mon, 2006-06-05 at 11:12, hbchen wrote:
  
  
Hi,
I have a question about the IPoIB bandwidth performance.
I did netperf testing using Single GiGE, Myrinet D card, Myrinet 10G
ethernet card,
and Voltaire Infiniband 4X HCA400Ex (PCI-Express interface).


NIC (Jumbo enabled) Line bandwidth(LB) IPoverNIC bandwidth utilization
(IPoNIC/LB)
-  --
--
Single Gigabit NIC : 1Gb/sec=125MB/sec 120MB/sec 96% (PIC-X interface)
Myrinet D card : 250MB/sec 240~-245MB/sec 96% ~ 98% (PCI-X interface)
Myrinet 10G Ethernet: 10Gb/sec=1280MB/sec 980MB/sec 76.6% (My testing
using Linux 2.6.14.6)
(PCI-Express) 1225MB/sec 95.7% (Data from Myrinet website)
IB HCA4X(PCI-Express): 10Gb/sec=1280MB/sec 420MB/sec 32.8% (My testing
using Linux 2.6.14.6)
474MB/sec 37% (the best from OpenIB mailing list)
(2.6.12-rc5 patch 1)

Why the bandwidth utilization of IPoIB is so low compared to the others
NICs?

  
  
One thing to note is that the max utilization of 10G IB (4x) is 8G due
to the signalling being included in this rate (unlike ethernet whose
rate represents the data rate and does not include the signalling
overhead).
  

Hal,
Even with this IB-4X = 8Gb/sec = 1024 MB/sec the IPoIB bandwidth
utilization is still very low.
>> IPoIB=420MB/sec  
>> bandwidth utilization= 420/1024 = 41.01%


HB 




  
-- Hal

  
  
There must be a lot of room to improve the IPoIB software to reach 75%+
bandwidth utilization.


HB Chen
Los Alamos National Lab
[EMAIL PROTECTED]

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


  
  
  




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Question about the IPoIB bandwidth performance ?

2006-06-05 Thread Hal Rosenstock
On Mon, 2006-06-05 at 11:12, hbchen wrote:
> Hi,
> I have a question about the IPoIB bandwidth performance.
> I did netperf testing using Single GiGE, Myrinet D card, Myrinet 10G
> ethernet card,
> and Voltaire Infiniband 4X HCA400Ex (PCI-Express interface).
> 
> 
> NIC (Jumbo enabled) Line bandwidth(LB) IPoverNIC bandwidth utilization
> (IPoNIC/LB)
> -  --
> --
> Single Gigabit NIC : 1Gb/sec=125MB/sec 120MB/sec 96% (PIC-X interface)
> Myrinet D card : 250MB/sec 240~-245MB/sec 96% ~ 98% (PCI-X interface)
> Myrinet 10G Ethernet: 10Gb/sec=1280MB/sec 980MB/sec 76.6% (My testing
> using Linux 2.6.14.6)
> (PCI-Express) 1225MB/sec 95.7% (Data from Myrinet website)
> IB HCA4X(PCI-Express): 10Gb/sec=1280MB/sec 420MB/sec 32.8% (My testing
> using Linux 2.6.14.6)
> 474MB/sec 37% (the best from OpenIB mailing list)
> (2.6.12-rc5 patch 1)
> 
> Why the bandwidth utilization of IPoIB is so low compared to the others
> NICs?

One thing to note is that the max utilization of 10G IB (4x) is 8G due
to the signalling being included in this rate (unlike ethernet whose
rate represents the data rate and does not include the signalling
overhead).

-- Hal

> There must be a lot of room to improve the IPoIB software to reach 75%+
> bandwidth utilization.
> 
> 
> HB Chen
> Los Alamos National Lab
> [EMAIL PROTECTED]
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-14 Thread Hal Rosenstock
On Sun, 2006-05-14 at 15:30, Jason Gunthorpe wrote:
> On Sun, May 14, 2006 at 07:40:25AM -0400, Hal Rosenstock wrote:
> > > > Not always true in terms of local subnet (multicast and management MAD
> > > > response exceptions).
> > > 
> > > Yes, but these are well specified. Multicast must always have a GRH.
> > > MAD requests are covered under my scenario above and MAD responses
> > > to MAD requests with GRH's are specified to use the GRH and set the
> > > HopLimit = 0xFF.
> > 
> > Where does the spec say HopLmt needs to be 0xFF for multicast ?
> 
> I ment that the spec says a MAD response with a GRH should have 0xFF
> for HopLmt. (13.5.4.4)

Right; from the MAD response rules.

> I'd expect the Multicast HopLmt to come from the SA, just like in the
> unicast case.

OK; that's what I thought.

> > Off subnet is either determined by the prefix comparison or HopLimit >=2
> > in the response from the SA. The latter is implied by C8-16 on p. 229.
> 
> The only possible downside of using HopLimit, that I can see, is
> compatability with existing SA's. Do all existing SA's set HopLmt to 0
> or 1 in path record responses? (Since no SA's support routers,
> that would be correct..)

I would argue that the implementations would not be conformant if that
were not the case currently.

> Scope should not be a problem because the SA can follow whatever
> scope based rules might exist and then set HopLimit properly.

Sure, the SA would certainly use the scope to know whether it needs to
go beyond the local subnet for path resolution (both unicast and
multicast).

> FWIW, my vote would be to use HopLimit, since that lets the SA
> tell the client if it should use a GRH. With prefix comparison GRH
> usage is not under the control of the SA - so it is less flexable.

Makes sense to me (now)...

-- Hal

> Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-14 Thread Jason Gunthorpe
On Sun, May 14, 2006 at 07:40:25AM -0400, Hal Rosenstock wrote:
> > > Not always true in terms of local subnet (multicast and management MAD
> > > response exceptions).
> > 
> > Yes, but these are well specified. Multicast must always have a GRH.
> > MAD requests are covered under my scenario above and MAD responses
> > to MAD requests with GRH's are specified to use the GRH and set the
> > HopLimit = 0xFF.
> 
> Where does the spec say HopLmt needs to be 0xFF for multicast ?

I ment that the spec says a MAD response with a GRH should have 0xFF
for HopLmt. (13.5.4.4)

I'd expect the Multicast HopLmt to come from the SA, just like in the
unicast case.

> Off subnet is either determined by the prefix comparison or HopLimit >=2
> in the response from the SA. The latter is implied by C8-16 on p. 229.

The only possible downside of using HopLimit, that I can see, is
compatability with existing SA's. Do all existing SA's set HopLmt to 0
or 1 in path record responses? (Since no SA's support routers,
that would be correct..)

Scope should not be a problem because the SA can follow whatever
scope based rules might exist and then set HopLimit properly.

FWIW, my vote would be to use HopLimit, since that lets the SA
tell the client if it should use a GRH. With prefix comparison GRH
usage is not under the control of the SA - so it is less flexable.

Jason
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-14 Thread Hal Rosenstock
On Fri, 2006-05-12 at 13:55, Sean Hefty wrote:
> Jason Gunthorpe wrote:
> > How about this, how do you see this scenario:
> > 
> > 1) Client gets a DGID from 'someplace'
> > 2) Client sends a SA query to resolve the DGID to a Path Record
> > 3) Client configures a QP based on the Path Record
> > 
> > Now, the question I'm interested in is this:
> >   During step #3 what test should the client apply to determine if a 
> >   GRH should be used with the QP.
> 
> This is the scenario that I need to resolve.
> 
> What would happen if the GRH flag were always set?

That would work but there would be additional overhead (especially for
small packets this would be more noticeable) in the local subnet case.

> Set only if the GID prefixes of the SGID/DGID were different?

That's one way although it is more complex than what Jason has been
proposing for this (SA response with HopLimit>=2). I'm not yet sure that
the latter is sufficient as I think there may be other factors as to
whether a packet is forwarded off subnet. One is the prefix scope (but I
would think link local scopes should be limited in HopLimit except for
multicasts (Jason cited that multicasts were required to have HopLimit
0xFF) but they require GRHs anyhow) so maybe I'm wrong about this and
HopLimit>=2 is sufficient.

-- Hal

> - Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-14 Thread Hal Rosenstock
On Fri, 2006-05-12 at 13:10, Jason Gunthorpe wrote:
> On Fri, May 12, 2006 at 08:11:17AM -0400, Hal Rosenstock wrote:
> 
> > > To allow what Roland is talking about you need an unambiguous
> > > mechanism where the SA can signal to the client that the path
> > > needs a GRH.
> > 
> > Ah, you are referring to the SA path record response not the request.
> 
> Yes.. Though I think we are still talking about different things in a
> few places ;>
> 
> How about this, how do you see this scenario:
> 
> 1) Client gets a DGID from 'someplace'
> 2) Client sends a SA query to resolve the DGID to a Path Record
> 3) Client configures a QP based on the Path Record
> 
> Now, the question I'm interested in is this:
>   During step #3 what test should the client apply to determine if a 
>   GRH should be used with the QP.
> 
> Other issues around the GRH like management MAD responses use and
> multicast I feel are well specified and don't need more consideration.

Thanks for clarifying.
 
> > > Think of it the other way, HopLimit < 2 means it _can't_ be forwarded
> > > off subnet, so that result from the SA should _always_ cause the
> > > requesting client to not use a GRH for that path.
> > 
> > Not always true in terms of local subnet (multicast and management MAD
> > response exceptions).
> 
> Yes, but these are well specified. Multicast must always have a GRH.
> MAD requests are covered under my scenario above and MAD responses
> to MAD requests with GRH's are specified to use the GRH and set the
> HopLimit = 0xFF.

Where does the spec say HopLmt needs to be 0xFF for multicast ?

> Also, I would assume when building a router that multicast packets
> with a hop limit of 0 are non-forwardable based on the rules in IBA.

0 or 1 hop limit for both unicast and multicast

> > Are you saying HopLimit is supplied to the SA in the request ? It could
> > be but it's optional in general. In the router case, an off subnet DGID
> > should be sufficient. I would think the HopLimit (as well as the other
> > GRH fields) would need to be returned by the SA to the client.
> 
> Talking about a request for a Path to the SA from a client now:
> I would suggest that if the client wishes to restrict itself to paths
> that are only on-link then it could send a SA request with the
> path record HopLimit=0.

Yes (or HopLimit=1).

>  A SA request with HopLimit=* (masked out
> of component mask) should let the SA return routed paths.

Yes.

> I also think that the SA response should have a HopLimit of 0 for
> local paths

1 would also be valid here too.

>  and a HopLimit >= 2 for routed paths.

Yes.

> However, I can't find any wording in IBA that would require this
> behavior.

In terms of the SA responses to Path/MultiPathRecord requests, the
HopLimit is required to be filled in in the response. Is that what you
are asking ? It's up to the SA to determine this and for the client to
use the values returned subsequently just as it does for DLIDs, SLs,
etc.

> > Not sure exactly what you mean by full control over the routing header
> > (GRH). The SA supplies the info for the headers to the client and the
> > client is responsible for putting the correct info in the headers. Do
> > you mean supplies sufficient info for the client to do this correctly ?
> > If so, I agree.
> 
> As far as I can see IBA includes all header information for the GRH
> and LRH in the PathRecord response. It does not define a how to
> determine if the path described by a PathRecord response requires
> a GRH or not.

I think the rules are there:
Multicasts always have GRH.
Unicasts off subnet have GRH and on subnet they are optional.

Off subnet is either determined by the prefix comparison or HopLimit >=2
in the response from the SA. The latter is implied by C8-16 on p. 229.

-- Hal

> Thanks,
> Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-12 Thread Sean Hefty

Jason Gunthorpe wrote:

How about this, how do you see this scenario:

1) Client gets a DGID from 'someplace'
2) Client sends a SA query to resolve the DGID to a Path Record
3) Client configures a QP based on the Path Record

Now, the question I'm interested in is this:
  During step #3 what test should the client apply to determine if a 
  GRH should be used with the QP.


This is the scenario that I need to resolve.

What would happen if the GRH flag were always set?  Set only if the GID prefixes 
of the SGID/DGID were different?


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-12 Thread Jason Gunthorpe
On Fri, May 12, 2006 at 08:11:17AM -0400, Hal Rosenstock wrote:

> > To allow what Roland is talking about you need an unambiguous
> > mechanism where the SA can signal to the client that the path
> > needs a GRH.
> 
> Ah, you are referring to the SA path record response not the request.

Yes.. Though I think we are still talking about different things in a
few places ;>

How about this, how do you see this scenario:

1) Client gets a DGID from 'someplace'
2) Client sends a SA query to resolve the DGID to a Path Record
3) Client configures a QP based on the Path Record

Now, the question I'm interested in is this:
  During step #3 what test should the client apply to determine if a 
  GRH should be used with the QP.

Other issues around the GRH like management MAD responses use and
multicast I feel are well specified and don't need more consideration.
 
> > Think of it the other way, HopLimit < 2 means it _can't_ be forwarded
> > off subnet, so that result from the SA should _always_ cause the
> > requesting client to not use a GRH for that path.
> 
> Not always true in terms of local subnet (multicast and management MAD
> response exceptions).

Yes, but these are well specified. Multicast must always have a GRH.
MAD requests are covered under my scenario above and MAD responses
to MAD requests with GRH's are specified to use the GRH and set the
HopLimit = 0xFF.

Also, I would assume when building a router that multicast packets
with a hop limit of 0 are non-forwardable based on the rules in IBA.

> Are you saying HopLimit is supplied to the SA in the request ? It could
> be but it's optional in general. In the router case, an off subnet DGID
> should be sufficient. I would think the HopLimit (as well as the other
> GRH fields) would need to be returned by the SA to the client.

Talking about a request for a Path to the SA from a client now:
I would suggest that if the client wishes to restrict itself to paths
that are only on-link then it could send a SA request with the
path record HopLimit=0. A SA request with HopLimit=* (masked out
of component mask) should let the SA return routed paths.

I also think that the SA response should have a HopLimit of 0 for
local paths and a HopLimit >= 2 for routed paths.

However, I can't find any wording in IBA that would require this
behavior.

> Not sure exactly what you mean by full control over the routing header
> (GRH). The SA supplies the info for the headers to the client and the
> client is responsible for putting the correct info in the headers. Do
> you mean supplies sufficient info for the client to do this correctly ?
> If so, I agree.

As far as I can see IBA includes all header information for the GRH
and LRH in the PathRecord response. It does not define a how to
determine if the path described by a PathRecord response requires
a GRH or not.

Thanks,
Jason
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-12 Thread Hal Rosenstock
On Thu, 2006-05-11 at 13:12, Jason Gunthorpe wrote:
> On Thu, May 11, 2006 at 07:20:19AM -0400, Hal Rosenstock wrote:
> 
> > That would be a simpler check but HopLimit is not a required component
> > of PathRecord but I think this may not be sufficient as just because a
> > HopLimit >= 2 doesn't mean that a packet would be forwarded off subnet.
> 
> I was thinking of the other direction: How does the requestor/client
> know if a Path requires a GRH.

The requester/client needs to request a path for a DGID which is off
(the local) subnet.

> To allow what Roland is talking about you need an unambiguous
> mechanism where the SA can signal to the client that the path
> needs a GRH.

Ah, you are referring to the SA path record response not the request.

> The only field I can see that could be used for that is HopLimit..

That's one. The ugly prefix comparison would be another.

> Think of it the other way, HopLimit < 2 means it _can't_ be forwarded
> off subnet, so that result from the SA should _always_ cause the
> requesting client to not use a GRH for that path.

Not always true in terms of local subnet (multicast and management MAD
response exceptions).

> Any test beyond HopLimit could be done in the SA prior to returning
> the path records to the client.

Are you saying HopLimit is supplied to the SA in the request ? It could
be but it's optional in general. In the router case, an off subnet DGID
should be sufficient. I would think the HopLimit (as well as the other
GRH fields) would need to be returned by the SA to the client.

> If further tests are put in the client
> they only limit the routing configurations that are possible.

Not sure what further tests you are referring to here. I agree with the
goal not to add any unnecessary constraints on routing configurations.

> Note:
> Although 8.3.6 specifies that 0 and 1 don't let the packet off
> the subnet table 60 says that CA's should set the HopLimit
> to 0 and the 'first' router should fill it in. Hmm..

Interesting. The description is table 60 also says "Alternately set
according to application."

> > Why is a request with just a non link local prefix (with HopLimit
> > wildcarded) not sufficient ?
> 
> I think it wouuld be best of the SA had full control over what headers
> the CA's put on their packets on a path by path basis. That allows for
> the most flexability down the road.

Not sure exactly what you mean by full control over the routing header
(GRH). The SA supplies the info for the headers to the client and the
client is responsible for putting the correct info in the headers. Do
you mean supplies sufficient info for the client to do this correctly ?
If so, I agree.

-- Hal

> Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-11 Thread Jason Gunthorpe
On Thu, May 11, 2006 at 10:21:08AM -0700, Sean Hefty wrote:
> Hal Rosenstock wrote:
> >Anytime the send is off the local subnet (as well as multicast), a GRH
> >is required. Also, there is a management response rule for responding
> >when the request contained a GRH that require a GRH (13.5.4.4 p. 769).
 
> Reading through the responses, I think my problems are worse.  Now I'm not 
> even sure how I determine which remote node I'm trying to talk to short of 
> hard-coding the DGID...

> We currently use ARP to resolve an IP address to a DGID, which I don't 
> believe will work across a router.  Does an app even know enough to be able 
> to get a path record?

The only wrinkles I could see you having is how to choose between
multiple DGID's when generating the ARP response. I don't think that
is a serious issue though since any GID to any GID should be routable
on the subnet.

I haven't looked at the ARP code, but based on the RFCs the IPv4 ARP
process would be more or less:

1) Send ARP datagram to the broadcast multicast group LID w/ GRH. The
   ARP packet includes the IPv4 address of the sender and the
   GID/QPN (hardware address) of the sender, asking for the hardware
   address of the target IPv4.

   A router must support multicast routing so that the ARP request is
   forwarded to the remote subnet. It has a GRH of course so this is
   OK. The SM and router work together to make this happen.
2) ARP responder matches the target IP address, gets the IP of the
   requestor, and the GID/QPN from the ARP packet's sender fields

   We are still OK since the GID in the ARP packet's sender fields is
   global.
3) ARP responder produces a unicast packet to the IPv4 requestor
   address:
   - The sender's GID/QPN is converted into a path either from a local cache
 or via a SA query. The sender's GID combined with any of the target's
 GID's should be sufficient to ask the SA for a path.
 [Note: that you must use the _hardware_ address here and you
  cannot just lookup the IPv4 sender address in the neighbor
  cache. This is needed to support ARP tricks like zeroconf that
  use null source IPs]
   - This query results in a path record for communication with the
 sender. [Some implementations will learn based on ARP requests
 and will update the neighbor cache here]
   - The path record is used to generate the unicast headers, GRH and
 all - if necessary.
   - The same SGID that was used in the path record query above is returned
 in the ARP response as the target's address.
  
   Since the SA specifies the path to get back to the requestor based
   only on the GID in the ARP request it can produce a path that
   crosses the router.

4) The ARP requestor now gets the respondor's GID/QPN from the unicast
   ARP response and does the same path lookup that the ARP requestor
   did to get the 'reverse' path.

   Again, since the SA is now involved the resulting path can cross
   the router.

IPv6 is similar, but the packet format is different and the 'ARP'
(NS packet) request is sent to a multicast address chosen by 'hashing'
the IPv6 address.

Hope this helps,
Jason
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-11 Thread Hal Rosenstock
On Thu, 2006-05-11 at 13:29, Roland Dreier wrote:
> Sean> We currently use ARP to resolve an IP address to a DGID,
> Sean> which I don't believe will work across a router.  Does an
> Sean> app even know enough to be able to get a path record?
> 
> I think you're fine.  The IB router just has to handle forwarding
> multicasts

Specifically IPoIB broadcast

>  between two IB subnets for ARP to work.

Yes, because an IPoIB subnet can span multiple IB subnets.

> If there's also an IP router in between the two hosts

when the hosts are on different IP(oIB) subnets.

> then there's a problem, but I don't think it's that reasonable 
> to expect to make a direct RDMA connection in that case.

That's a different case; you don't ARP off your IPoIB subnet; you get
the next hop router towards that IPoIB subnet.

-- Hal

>  - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-11 Thread Roland Dreier
Sean> We currently use ARP to resolve an IP address to a DGID,
Sean> which I don't believe will work across a router.  Does an
Sean> app even know enough to be able to get a path record?

I think you're fine.  The IB router just has to handle forwarding
multicasts between two IB subnets for ARP to work.

If there's also an IP router in between the two hosts then there's a
problem, but I don't think it's that reasonable to expect to make a
direct RDMA connection in that case.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-11 Thread Sean Hefty

Hal Rosenstock wrote:

Anytime the send is off the local subnet (as well as multicast), a GRH
is required. Also, there is a management response rule for responding
when the request contained a GRH that require a GRH (13.5.4.4 p. 769).


Reading through the responses, I think my problems are worse.  Now I'm not even 
sure how I determine which remote node I'm trying to talk to short of 
hard-coding the DGID...


We currently use ARP to resolve an IP address to a DGID, which I don't believe 
will work across a router.  Does an app even know enough to be able to get a 
path record?


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-11 Thread Jason Gunthorpe
On Thu, May 11, 2006 at 07:20:19AM -0400, Hal Rosenstock wrote:

> That would be a simpler check but HopLimit is not a required component
> of PathRecord but I think this may not be sufficient as just because a
> HopLimit >= 2 doesn't mean that a packet would be forwarded off subnet.

I was thinking of the other direction: How does the requestor/client
know if a Path requires a GRH.

To allow what Roland is talking about you need an unambiguous
mechanism where the SA can signal to the client that the path
needs a GRH.

The only field I can see that could be used for that is HopLimit..

Think of it the other way, HopLimit < 2 means it _can't_ be forwarded
off subnet, so that result from the SA should _always_ cause the
requesting client to not use a GRH for that path.

Any test beyond HopLimit could be done in the SA prior to returning
the path records to the client. If further tests are put in the client
they only limit the routing configurations that are possible.

Note:
Although 8.3.6 specifies that 0 and 1 don't let the packet off
the subnet table 60 says that CA's should set the HopLimit
to 0 and the 'first' router should fill it in. Hmm..
 
> Why is a request with just a non link local prefix (with HopLimit
> wildcarded) not sufficient ?

I think it wouuld be best of the SA had full control over what headers
the CA's put on their packets on a path by path basis. That allows for
the most flexability down the road.

Jason
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-11 Thread Eitan Zahavi
I agree with Hal. If you look for Path Record to ANOTHER subnet you
should provide the GRH in the sent packet address ...

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Hal Rosenstock
> Sent: Thursday, May 11, 2006 2:20 PM
> To: Jason Gunthorpe
> Cc: Roland Dreier; openib-general@openib.org
> Subject: Re: [openib-general] question regarding GRH flag in
ib_ah_attr
> 
> On Thu, 2006-05-11 at 01:48, Jason Gunthorpe wrote:
> > On Wed, May 10, 2006 at 09:56:58PM -0700, Roland Dreier wrote:
> > > Hal> Huh ? In this case, aren't the subnet prefixes are
required
> > > Hal> to be different ?
> > >
> > > It's kind of a crazy thing to do but I don't see anything in the
IB
> > > spec that forbids two subnets with the same subnet prefix, or any
> > > reason why a router couldn't route between them.  The SMs would
just
> > > have to be smart enough to return the LID of the router for paths
to
> > > ports on the other subnet, and the routers would have to have
explicit
> > > routes rather than forwarding based on just GID prefix.
> >
> > Hmm, this is an interesting point, you can do this in IP land using
> > host routes.
> >
> > How about this - the Path record (and related) SA responses include
> > the Hop Limit fields and the spec says:
> >
> > 8.3.6 Hop Limit: [..] Setting this value to 0 or 1 will ensure that
> > the packet will not be forwarded beyond the local subnet.
> >
> > So, it is within the spec to use HopLmt >= 2 as the GRH required
flag.
> 
> That would be a simpler check but HopLimit is not a required component
> of PathRecord but I think this may not be sufficient as just because a
> HopLimit >= 2 doesn't mean that a packet would be forwarded off
subnet.
> 
> > I'd propose that the combination of a non-link-local prefix and a >=
2
> > Hop Limit should force a GRH. SM's that do not support routers
should
> > always fill in 0 for HopLmt.
> 
> Why is a request with just a non link local prefix (with HopLimit
> wildcarded) not sufficient ?
> 
> -- Hal
> 
> > Jason
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-11 Thread Hal Rosenstock
On Thu, 2006-05-11 at 01:48, Jason Gunthorpe wrote:
> On Wed, May 10, 2006 at 09:56:58PM -0700, Roland Dreier wrote:
> > Hal> Huh ? In this case, aren't the subnet prefixes are required
> > Hal> to be different ?
> > 
> > It's kind of a crazy thing to do but I don't see anything in the IB
> > spec that forbids two subnets with the same subnet prefix, or any
> > reason why a router couldn't route between them.  The SMs would just
> > have to be smart enough to return the LID of the router for paths to
> > ports on the other subnet, and the routers would have to have explicit
> > routes rather than forwarding based on just GID prefix.
> 
> Hmm, this is an interesting point, you can do this in IP land using
> host routes.
> 
> How about this - the Path record (and related) SA responses include
> the Hop Limit fields and the spec says:
> 
> 8.3.6 Hop Limit: [..] Setting this value to 0 or 1 will ensure that
> the packet will not be forwarded beyond the local subnet.
> 
> So, it is within the spec to use HopLmt >= 2 as the GRH required flag.

That would be a simpler check but HopLimit is not a required component
of PathRecord but I think this may not be sufficient as just because a
HopLimit >= 2 doesn't mean that a packet would be forwarded off subnet.

> I'd propose that the combination of a non-link-local prefix and a >= 2
> Hop Limit should force a GRH. SM's that do not support routers should
> always fill in 0 for HopLmt.

Why is a request with just a non link local prefix (with HopLimit
wildcarded) not sufficient ?

-- Hal

> Jason

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-11 Thread Hal Rosenstock
On Thu, 2006-05-11 at 00:56, Roland Dreier wrote:
> Hal> Huh ? In this case, aren't the subnet prefixes are required
> Hal> to be different ?
> 
> It's kind of a crazy thing to do but I don't see anything in the IB
> spec that forbids two subnets with the same subnet prefix,

There's errata against the current confusion in the IBA spec in terms of
GID v. subnet prefix.

The bottom line on this is:

Each subnet is uniquely identified with a subnet ID known as the Subnet
Prefix.

>  or any reason why a router couldn't route between them.  The SMs would just
> have to be smart enough to return the LID of the router for paths to
> ports on the other subnet, and the routers would have to have explicit
> routes rather than forwarding based on just GID prefix.

Assuming the above is ignored (and the subnet prefixes are not unique),
the routers along any particular path would just have explicit routes
for one of these duplicate subnets, right ?

-- Hal

>  - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-10 Thread Jason Gunthorpe
On Wed, May 10, 2006 at 09:56:58PM -0700, Roland Dreier wrote:
> Hal> Huh ? In this case, aren't the subnet prefixes are required
> Hal> to be different ?
> 
> It's kind of a crazy thing to do but I don't see anything in the IB
> spec that forbids two subnets with the same subnet prefix, or any
> reason why a router couldn't route between them.  The SMs would just
> have to be smart enough to return the LID of the router for paths to
> ports on the other subnet, and the routers would have to have explicit
> routes rather than forwarding based on just GID prefix.

Hmm, this is an interesting point, you can do this in IP land using
host routes.

How about this - the Path record (and related) SA responses include
the Hop Limit fields and the spec says:

8.3.6 Hop Limit: [..] Setting this value to 0 or 1 will ensure that
the packet will not be forwarded beyond the local subnet.

So, it is within the spec to use HopLmt >= 2 as the GRH required flag.

I'd propose that the combination of a non-link-local prefix and a >= 2
Hop Limit should force a GRH. SM's that do not support routers should
always fill in 0 for HopLmt.

Jason
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-10 Thread Roland Dreier
Hal> What you are describing is similar to a NAT function for IB
Hal> which would need to be supported in the IB edge router to
Hal> that private network.

Why does there have to be any NAT?  The router would just have to
replace the DLID the same as it usually does.  I don't see why the GID
prefix makes any difference really.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-10 Thread Roland Dreier
Hal> Huh ? In this case, aren't the subnet prefixes are required
Hal> to be different ?

It's kind of a crazy thing to do but I don't see anything in the IB
spec that forbids two subnets with the same subnet prefix, or any
reason why a router couldn't route between them.  The SMs would just
have to be smart enough to return the LID of the router for paths to
ports on the other subnet, and the routers would have to have explicit
routes rather than forwarding based on just GID prefix.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-10 Thread Hal Rosenstock
On Wed, 2006-05-10 at 21:26, Hal Rosenstock wrote:
> On Wed, 2006-05-10 at 19:44, Roland Dreier wrote:
> > Sean> Does anyone know how the user determines if the grh flag
> > Sean> should be set in the ib_ah_attr when allocating an ib_ah?
> > Sean> Do they do this by examining the GIDs in a path record?
> > 
> > Good question.  It's always needed for multicast, of course.  For
> > unicast, I guess one could look at whether the subnet prefixes of the
> > SGID and DGID are the same, but I'm not sure that's sufficient -- a
> > router could conceivably sit between two subnets with the same subnet
> > prefix.
> 
> Huh ? In this case, aren't the subnet prefixes are required to be
> different ?

Not just different but globally unique, right ? 

What you are describing is similar to a NAT function for IB which would
need to be supported in the IB edge router to that private network.

-- Hal

> 
> -- Hal
> 
> > Perhaps some of the Obsidian guys could comment?
> > 
> >  - R.
> > ___
> > openib-general mailing list
> > openib-general@openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit 
> > http://openib.org/mailman/listinfo/openib-general
> 
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-10 Thread Hal Rosenstock
On Wed, 2006-05-10 at 19:35, Sean Hefty wrote:
> For context, I'm trying to work backwards from send a message on a UD QP to
> determine what information is needed and how it is obtained.
> 
> Does anyone know how the user determines if the grh flag should be set in the
> ib_ah_attr when allocating an ib_ah?  Do they do this by examining the GIDs 
> in a
> path record?

Anytime the send is off the local subnet (as well as multicast), a GRH
is required. Also, there is a management response rule for responding
when the request contained a GRH that require a GRH (13.5.4.4 p. 769).

-- Hal

> 
> - Sean
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-10 Thread Hal Rosenstock
On Wed, 2006-05-10 at 19:44, Roland Dreier wrote:
> Sean> Does anyone know how the user determines if the grh flag
> Sean> should be set in the ib_ah_attr when allocating an ib_ah?
> Sean> Do they do this by examining the GIDs in a path record?
> 
> Good question.  It's always needed for multicast, of course.  For
> unicast, I guess one could look at whether the subnet prefixes of the
> SGID and DGID are the same, but I'm not sure that's sufficient -- a
> router could conceivably sit between two subnets with the same subnet
> prefix.

Huh ? In this case, aren't the subnet prefixes are required to be
different ?

-- Hal

> Perhaps some of the Obsidian guys could comment?
> 
>  - R.
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-10 Thread Jason Gunthorpe
On Wed, May 10, 2006 at 04:44:42PM -0700, Roland Dreier wrote:
> Sean> Does anyone know how the user determines if the grh flag
> Sean> should be set in the ib_ah_attr when allocating an ib_ah?
> Sean> Do they do this by examining the GIDs in a path record?
> 
> Good question.  It's always needed for multicast, of course.  For
> unicast, I guess one could look at whether the subnet prefixes of the
> SGID and DGID are the same, but I'm not sure that's sufficient -- a
> router could conceivably sit between two subnets with the same subnet
> prefix.
 
> Perhaps some of the Obsidian guys could comment?

Our intention in the absence of standardization is to leverage common
practice in IPv6 for numbering - which means that global prefixes need
to be globally unique (or at least site unqiue). A generic N port
router cannot connect subnets with the same prefix because it
is ambiguous where to send the packets.

Logically I think the GRH usage should be selected after the output
port is determined based on matching the port's PortInfo.GIDPrefix and
the IBA default prefix (the link local prefix FE80:: which is always
on-link) against the DGID. If there is a match it is on link,
otherwise it is off link, through a router, and a GRH is necessary.

Right now IBA only allows two prefixes, FE80:: and PortInfo.GIDPrefix
so the check described above can be reduced to comparing the SGID and
DGID prefixes, if they are different and the DGID prefix is not FE80::
then it is off link and needs a GRH.

Regards,
Jason
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question regarding GRH flag in ib_ah_attr

2006-05-10 Thread Roland Dreier
Sean> Does anyone know how the user determines if the grh flag
Sean> should be set in the ib_ah_attr when allocating an ib_ah?
Sean> Do they do this by examining the GIDs in a path record?

Good question.  It's always needed for multicast, of course.  For
unicast, I guess one could look at whether the subnet prefixes of the
SGID and DGID are the same, but I'm not sure that's sufficient -- a
router could conceivably sit between two subnets with the same subnet
prefix.

Perhaps some of the Obsidian guys could comment?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question on : ib_reg_phys_mr()

2006-04-10 Thread James Lentini


On Sat, 8 Apr 2006, Devesh Sharma wrote:

> In your nfs-rdma context what this function is supposed to do?

It should create a memory region for the specified address range. For 
the exact semantics, see the IBTA spec's description of the REGISTER 
PHYSICAL MEMORY REGION verb (section 11.2.8.3 of the 1.2 spec).

> I know that this function returns memory region, but what is the 
> difference from other mr returning functions? why get_dma_mr can't 
> be used?

get_dma_mr() will return a memory region which covers all of physical 
memory. For security reasons, it is not always desirable to expose all 
of physical memory. ib_reg_phys_mr() allows for more fine grained 
access control.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question on : ib_reg_phys_mr()

2006-04-07 Thread Devesh Sharma
Thanks James for quick reply,
 
In your nfs-rdma context what this function is supposed to do?
I know that this function returns memory region, but what is the difference from other mr returning functions?why get_dma_mr can't be used?
 
Devesh 
On 4/7/06, James Lentini <[EMAIL PROTECTED]> wrote:
On Fri, 7 Apr 2006, Devesh Sharma wrote:> Hello list,> In Ib kernel verbs there is a function ib_reg_phys_mr().
> I am not able to trace the call of this verb by any ulp or uverb.> Who calls this function?NFS-RDMA uses this function:http://sourceforge.net/projects/nfs-rdma
> Is this function mendatory to be supported by the HCA driver provider?As a ULP implementer, I expect it to be supported. It is a standardIBTA verb.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Question on : ib_reg_phys_mr()

2006-04-07 Thread James Lentini


On Fri, 7 Apr 2006, Devesh Sharma wrote:

> Hello list,
> In Ib kernel verbs there is a function ib_reg_phys_mr().
> I am not able to trace the call of this verb by any ulp or uverb.
> Who calls this function?

NFS-RDMA uses this function:

http://sourceforge.net/projects/nfs-rdma

> Is this function mendatory to be supported by the HCA driver provider?

As a ULP implementer, I expect it to be supported. It is a standard 
IBTA verb.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question on get_dma_mr()

2006-04-04 Thread Devesh Sharma
Hi list and Roland,
Is this verb (ib_get_dma_mr) is equivalent to the verb explained in the section 11.2.8.1 Allocate L_key?On 3/30/06, Steve Wise
 <[EMAIL PROTECTED]> wrote:
On Wed, 2006-03-29 at 20:35 -0800, Roland Dreier wrote:> Devesh> Here I am saying that assigning Key is sufficient Or there> Devesh> are some other specific setps to be taken?>> It would depend on the device.  You can look at the mthca, ipath and ehca
> drivers' implementation of get_dma_mr() for examples.>As well as the iwarp devices in the iwarp branch.  amso1100 and cxgb3.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Question on get_dma_mr()

2006-03-30 Thread Steve Wise
On Wed, 2006-03-29 at 20:35 -0800, Roland Dreier wrote:
> Devesh> Here I am saying that assigning Key is sufficient Or there
> Devesh> are some other specific setps to be taken?
> 
> It would depend on the device.  You can look at the mthca, ipath and ehca
> drivers' implementation of get_dma_mr() for examples.
> 

As well as the iwarp devices in the iwarp branch.  amso1100 and cxgb3.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question on get_dma_mr()

2006-03-29 Thread Devesh Sharma
yha Ok Thanks for replying Once again.

DeveshOn 3/30/06, Roland Dreier <[EMAIL PROTECTED]> wrote:
Devesh> Here I am saying that assigning Key is sufficient Or thereDevesh> are some other specific setps to be taken?It would depend on the device.  You can look at the mthca, ipath and ehcadrivers' implementation of get_dma_mr() for examples.
 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Question on get_dma_mr()

2006-03-29 Thread Roland Dreier
Devesh> Here I am saying that assigning Key is sufficient Or there
Devesh> are some other specific setps to be taken?

It would depend on the device.  You can look at the mthca, ipath and ehca
drivers' implementation of get_dma_mr() for examples.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question on get_dma_mr()

2006-03-29 Thread Devesh Sharma
On 3/29/06, Roland Dreier <[EMAIL PROTECTED]> wrote:
Devesh> S/G entry ?scatter gather entryDevesh> What is the size of this region ? is there any limitationDevesh> in providing this size?It must be large enough to cover all DMA (bus) addresses for the device.
Devesh> Finally you mean to say in the implementation of thisDevesh> function providing a unique L_Key and R_Key isDevesh> sufficient. Is it?I can't really understand this question.  Of course keys must be
unique -- if two regions had the same key, then there would be no wayfor the HCA to know which one to use.
Here I am saying that assigning Key is sufficient Or there are some other specific setps to be taken?
 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] Question on get_dma_mr()

2006-03-29 Thread Roland Dreier
Devesh> S/G entry ?

scatter gather entry

Devesh> What is the size of this region ? is there any limitation
Devesh> in providing this size?

It must be large enough to cover all DMA (bus) addresses for the device.

Devesh> Finally you mean to say in the implementation of this
Devesh> function providing a unique L_Key and R_Key is
Devesh> sufficient. Is it?

I can't really understand this question.  Of course keys must be
unique -- if two regions had the same key, then there would be no way
for the HCA to know which one to use.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question on get_dma_mr()

2006-03-28 Thread Devesh Sharma
Thanks to all  of you
On 3/27/06, Roland Dreier <[EMAIL PROTECTED]> wrote:
Devesh> Hello all, Please any body explain me about theDevesh> functionality of verbs ib_get_dma_mr()?Actually, the responses you've gotten are not quite right.ib_get_dma_mr() returns a memory region that can be used for any _bus_
addresses.  In other words, if an S/G entry is passed to the driver
S/G entry ? 
that uses the L_Key from ib_get_dma_mr() and an address of, say,0xdeadbeef, then the RDMA device should use a bus address of
0xdeadbeef to access that memory.
What is the size of this region ? is there any limitation in providing this size?
The difference between bus addresses and physical addresses issignificant when IOMMUs are present.
This is somewhat similar to the verbs extensions notion of "reservedL_Key," except that it also provides an R_Key and the ability tospecify the access permissions of the region.

Finally  you mean to say  in the implementation  of this
function  providing a unique L_Key and R_Key is sufficient. Is it?
 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] question related to rdma_bind_addr

2006-03-27 Thread James Lentini


On Sun, 26 Mar 2006, Or Gerlitz wrote:

> > I would find calling it rdma_bind_device() confusing. 
> 
> why? I find it very much unconfusing

I associate the word bind with bind(2). For that reason, 
rdma_bind_addr() is a good name because it is the CMA's analog for 
bind(2). Since it isn't related to bind(2), I find the name 
rdma_bind_device(dst_addr) confusing.

> > In any event, I don't find the functionality very interesting.
> 
> Hey, as i mentioned earlier in this thread, the interest came from a 
> ***possible*** enhancement to the open iscsi initiator design, now 
> being discussed, with which a transport (TCP/iSER/iSCSI offload 
> HW/etc) is asked to create its connection resources synchronously, , 
> not sure what is your interest in that.

I was speaking from my experience with NFS/RDMA. If this functionality 
is necessary for implementing iSER, I would definitely support adding 
it.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question on get_dma_mr()

2006-03-26 Thread Roland Dreier
Devesh> Hello all, Please any body explain me about the
Devesh> functionality of verbs ib_get_dma_mr()?

Actually, the responses you've gotten are not quite right.
ib_get_dma_mr() returns a memory region that can be used for any _bus_
addresses.  In other words, if an S/G entry is passed to the driver
that uses the L_Key from ib_get_dma_mr() and an address of, say,
0xdeadbeef, then the RDMA device should use a bus address of
0xdeadbeef to access that memory.

The difference between bus addresses and physical addresses is
significant when IOMMUs are present.

This is somewhat similar to the verbs extensions notion of "reserved
L_Key," except that it also provides an R_Key and the ability to
specify the access permissions of the region.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question related to rdma_bind_addr

2006-03-26 Thread Or Gerlitz

James Lentini wrote:

On Thu, 23 Mar 2006, Sean Hefty wrote:
I think that Or is just exploring the idea of synchronously binding 
to a local *device* based on a remote address.  
This would allow an application to bind, then allocate PDs, CQs, 
QPs, etc. up front, rather than deferring resource allocation until 
address resolution completes. 


exactly.

Yes - this is what rdma_bind_addr(src_addr) does.  But I can 
envision adding a new call, rdma_bind_device(dst_addr), provided 
some use for it can be found.


Indeed, but hold your horses, i told you i was just seeking to resolve
if possible impl is possible, no real need yet...

I would find calling it rdma_bind_device() confusing. 


why? I find it very much unconfusing

In any event, I don't find the functionality very interesting. 


Hey, as i mentioned earlier in this thread, the interest came from a 
***possible*** enhancement to the open iscsi initiator design, now being 
discussed, with which a transport (TCP/iSER/iSCSI offload HW/etc) is 
asked to create its connection resources synchronously, , not sure what 
is your interest in that.


Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question on get_dma_mr()

2006-03-24 Thread Sean Hefty

Devesh Sharma wrote:
Please any body explain me about the functionality of verbs 
ib_get_dma_mr()?

What is the need of this function?
what a driver implementer is supposed to implement in this function?


This function returns a memory region for all of system memory.  See 
mthca_provider.c for an implementation.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] question related to rdma_bind_addr

2006-03-24 Thread James Lentini


On Thu, 23 Mar 2006, Sean Hefty wrote:

> >What does it mean to bind to a remote address? What functionality
> >would that enable? Spoofing?
> 
> I think that Or is just exploring the idea of synchronously binding 
> to a local *device* based on a remote address.  
>
> This would allow an application to bind, then allocate PDs, CQs, 
> QPs, etc. up front, rather than deferring resource allocation until 
> address resolution completes.  A ULP may be able to take advantage 
> of this, but I can't personally say that I know what benefit it 
> would provide.  (Maybe avoid the need to keep track of everything 
> that must be allocated once address resolution completes?)
>
> >When I think of bind(2), I only think of binding to local 
> >addresses.
> 
> Yes - this is what rdma_bind_addr(src_addr) does.  But I can 
> envision adding a new call, rdma_bind_device(dst_addr), provided 
> some use for it can be found.

I would find calling it rdma_bind_device() confusing. 

Could you modify the behavior of rdma_resolve_addr() to set the cma 
id's device field before returning? If so, that would be better than 
adding a new function.

In any event, I don't find the functionality very interesting. If the 
address resolved properly, it would speed up setup time (of course 
setup is not generally a bottleneck). In the error case (which is when 
address resolution would take a long time) things aren't any faster. 
Also consumers would still need to handle an asynchronous event.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question on get_dma_mr()

2006-03-24 Thread Steve Wise
On Fri, 2006-03-24 at 12:51 +0530, Devesh Sharma wrote:
> Hello all,
> 
> Please any body explain me about the functionality of verbs
> ib_get_dma_mr()? 
> What is the need of this function?
> what a driver implementer is supposed to implement in this function?

It returns a MR that maps all of physical memory.

Steve.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] question related to rdma_bind_addr

2006-03-23 Thread Sean Hefty
>What does it mean to bind to a remote address? What functionality
>would that enable? Spoofing?

I think that Or is just exploring the idea of synchronously binding to a local
*device* based on a remote address.  This would allow an application to bind,
then allocate PDs, CQs, QPs, etc. up front, rather than deferring resource
allocation until address resolution completes.  A ULP may be able to take
advantage of this, but I can't personally say that I know what benefit it would
provide.  (Maybe avoid the need to keep track of everything that must be
allocated once address resolution completes?)

>When I think of bind(2), I only think of binding to local addresses.

Yes - this is what rdma_bind_addr(src_addr) does.  But I can envision adding a
new call, rdma_bind_device(dst_addr), provided some use for it can be found.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] question related to rdma_bind_addr

2006-03-23 Thread James Lentini

On Thu, 23 Mar 2006, Sean Hefty wrote:

> >I could not approve my assumptions from looking on the cma/addr 
> >code, but if i am correct this opens the door for future 
> >enhancement of rdma_bind_addr() to work on non local addresses.
> 
> I believe that could be the case.

What does it mean to bind to a remote address? What functionality 
would that enable? Spoofing?

When I think of bind(2), I only think of binding to local addresses. 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question related to rdma_bind_addr

2006-03-23 Thread Or Gerlitz

Sean Hefty wrote:

If my understanding is correct, the current code of rdma_bind_addr
assumes you supply it one of three (all are **src** address)

+1 ANY (0.0.0.0) addr
+2 local loopback addr
+3 other local addr



Correct. - Note that currently a valid port number needs to be provided, but
this is a temporary restriction.


I am not sure to understand your comment on the port number, you mean to 
the ((struct sockaddr_in *)addr)->sin_port field of addr ?



So it is not possible to syncrously create and bind the cma id to
ib device based on the destination address (which is the typical info
the active side has).



It is not possible to synchronously bind based on the destination address.
Rdma_bind_addr() binds synchronously to a local device based on a local address
only.  To bind based on a destination address, you use rdma_resolve_addr().
However, the lookup may involve issuing an ARP request in order to determine the
remote hardware address, which is needed in resolving the route.


rdma_resolve_addr resolves two things based on the dest address

+1 the local IB device to use (plus its port number, pkey etc)
+2 the remote (dest) IB GID (or iWARP MAC)

Now, i was thinking that the first step of getting the local device 
based on the dest address is done by ip_route_output_key() and friends, 
so you synchronously get a network device (on which you later issues the 
ARP) whose private/rdma pointer is ipoib_device who has ib device.


I could not approve my assumptions from looking on the cma/addr code,
but if i am correct this opens the door for future enhancement of 
rdma_bind_addr() to work on non local addresses.




I'm not sure that binding to a local device synchronously based on a remote
address is exactly impossible.  But it doesn't remove the need to resolve the
remote address to a hardware address, which is asynchronous.


sure, i see that you kind of approve my assumptions that its possible.

thanks,

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] question related to rdma_bind_addr

2006-03-22 Thread Or Gerlitz

Or Gerlitz wrote:


At this point i see an actual need, it just related to some change we
discuss in the open scsi model for iser integration, and i wanted to make
sure that currently creating the IB resources in synchronous manner is
impossible.


Sorry, my fingers are broken today...

I meant to say "i still ***dont*** see an actual need

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question On mad.c

2006-01-20 Thread Sean Hefty

Devesh Sharma wrote:

In mad.c while calling ib_post_receive() operation

  spin_lock_irqsave(&recv_queue->lock, flags);

   post = (++recv_queue->count < recv_queue->max_active);

   list_add_tail(&mad_priv->header.mad_list.list, &recv_queue->list);
   spin_unlock_irqrestore(&recv_queue->lock, flags);
   ret = ib_post_recv(qp_info->qp, &recv_wr, &bad_recv_wr);

This is in while loop till "post" variable remains true, value of
max_active is 512 So loop will go 512 times.

If the qp on which this posting is going on dose not supports 512
recevie  descriptors posting then what will happen?
Although during qp creation max_recv supported will be returned but
loop is independent of this.


The QP is created with a size of IB_MAD_QP_RECV_SIZE (512).  If the hardware 
cannot support this size of a QP, then the create QP call will fail.  I.e. the 
hardware can provide a QP that is larger, but not smaller.  The code cannot 
adjust to using a larger size without resizing the corresponding CQ, which is 
not yet supported.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question about QP access flags (struct ib_qp_attr.qp_access_flags)

2005-11-08 Thread Sean Hefty

Ralph Campbell wrote:

Ralph> When ib_modify_qp() is called with the IB_QP_ACCESS_FLAGS
Ralph> set in the mask, what values should be used in struct
Ralph> ib_qp_attr.qp_access_flags?  The IB spec. seems to indicate
Ralph> that RDMA and atomic operations are all enabled or disabled
Ralph> as a group but all I see in ib_verbs.h is the enum
Ralph> ib_access_flags which is used for memory region access.
Ralph> These are more fine grained than the IB spec. implies for
Ralph> QPs.  So I can see qp_access_flags being either a boolean
Ralph> or perhaps a new enum defined for the values for
Ralph> qp_access_flags.

Roland> I think the IB spec is at best ambiguous as to whether RDMA
Roland> and atomics are enabled as a group or not.
Roland> The values are IB_ACCESS_REMOTE_ATOMIC, IB_ACCESS_REMOTE_WRITE,
Roland> and IB_ACCESS_REMOTE_READ or-ed together I think.


Roland's response is correct.  Atomics and RDMA reads are enabled separately. 
(See page 573 of release 1.2 of the spec.  I interpreted the separate bullets to 
mean that they are set separately.)  I think it makes sense to keep this 
distinction, since atomics are also an optional feature of an HCA.


If you look in cm.c for init_qp_attr, you can see how the IB CM sets the mask 
and QP attributes.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Question about locked pages

2005-10-31 Thread Roland Dreier
Jeff> Ditto (I thought those were shmem values / didn't think they
Jeff> had any effect on Open IB).  The information that I got was
Jeff> third-hand, which is why I posted here to ask about it.  :-)

Jeff> I'll remove them from the FAQ entry -- any other comments?

Well, a normal user can't use "ulimit -l" to increase their limit on
locked memory.  However I've never really looked into what the
cleanest way to increase the limit is.  /etc/security/limits.conf is
part of the answer, but ssh+privilege separation can cause that to
break as well.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


  1   2   >