On Mon, Apr 13, 2009 at 3:01 PM, Hal Rosenstock <[email protected]> wrote: > On Mon, Apr 13, 2009 at 4:09 PM, Chris Worley <[email protected]> wrote: >> On Mon, Apr 13, 2009 at 12:52 PM, Hal Rosenstock >> <[email protected]> wrote: >>> On Mon, Apr 13, 2009 at 2:26 PM, Chris Worley <[email protected]> wrote: >>>> On Mon, Apr 13, 2009 at 11:53 AM, Hal Rosenstock >>>> <[email protected]> wrote: >>>>> On Mon, Apr 13, 2009 at 12:02 PM, Chris Worley <[email protected]> wrote: >>>>>> On Mon, Apr 13, 2009 at 7:43 AM, Hal Rosenstock >>>>>> <[email protected]> wrote: >>>>>>> On Mon, Apr 13, 2009 at 9:37 AM, Chris Worley <[email protected]> wrote: >>>>>>>> On Mon, Apr 13, 2009 at 5:39 AM, Hal Rosenstock >>>>>>>> <[email protected]> wrote: >>>>>>>>> On Sun, Apr 12, 2009 at 11:01 PM, Chris Worley <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> So I need to tell the SM to route specific ports on the server/target >>>>>>>>>> to specific clients/initiators. >>>>>>>>>> >>>>>>>>>> Is there any way to do this? >>>>>>>>> >>>>>>>>> Do you mean restrict access between certain clients/servers ? >>>>>>>> >>>>>>>> One server w/ 4QDR boards, 16 clients with one QDR board. I want each >>>>>>>> port on the server routed/zoned to two clients. >>>>>>>> >>>>>>>>> If so, >>>>>>>>> you can do this with partitioning >>>>>>>> >>>>>>>> What is partitioning? >>>>>>> >>>>>>> A partition is a collection of ports which are allowed to communicate >>>>>>> together. There are two forms of members: full members which can talk >>>>>>> to any other member (useful for servers) and limited members which can >>>>>>> only talk to full members (useful for clients). See the opensm man >>>>>>> page or partition-config.txt on setting this up for OpenSM. >>>>>>> >>>>>> >>>>>> Let me see if I understand this with a simple example... my port GUIDs >>>>>> (as reported by ibstat) are for one server (4 QDR ports) and four >>>>>> clients (one QDR port each): >>>>>> >>>>>> >>>>>> Server A: Port GUID: 0x0024717124000029 >>>>>> Server B: Port GUID: 0x002471712400002a >>>>>> Server C: Port GUID: 0x0024717127000035 >>>>>> Server D: Port GUID: 0x0024717127000036 >>>>>> >>>>>> Client 1: Port GUID: 0x0002c90300028c01 >>>>>> Client 2: Port GUID: 0x0002c90300026047 >>>>>> Client 3: Port GUID: 0x0002c90300026053 >>>>>> Client 4: Port GUID: 0x0002c9030002603b > > Is there a switch in between or just back to back HCA ports ?
Yes, there's a switch; it's not directly connected from port to port. In the end, there will be 2 or 4 clients per server port (this simple configuration is just to get me going), so a switch is needed. > >>>>>> >>>>>> Assuming I want a 1:1 (one server port to one client) partitioning, I >>>>>> would put the following in /etc/ofed/partitions.conf: >>>>>> >>>>>> part1=0x1, ipoib, defmember=full : 0x0024717124000029, >>>>>> 0x0002c90300028c01; >>>>>> part2=0x2, ipoib, defmember=full : 0x002471712400002a, >>>>>> 0x0002c90300026047; >>>>>> part3=0x3, ipoib, defmember=full : 0x0024717127000035, >>>>>> 0x0002c90300026053; >>>>>> part4=0x4, ipoib, defmember=full : 0x0024717127000036, >>>>>> 0x0002c9030002603b; >>>>> >>>>> So you want IPoIB. >>>> >>>> I'm doing SRP, so I need IPoIB working. >>> >>> SRP needs to query PathRecord with the correct PKey and use the >>> correct Pkey index for that partition. I'm not sure how that is done >>> in SRP but first IPoIB needs to be made to work (again). >>> >> >> Okay... I'll setup the IPoIB as the ipoib.txt suggests, i.e.: >> >> echo 0x1 > /sys/class/net/ib0/create_child >> >> ... but for now, I'm still not seeing the state go to "up"... I think >> that's the first problem. > > Yes, port state needs to be linkup/active first. I see LinkUp/Armed from > below. > >>>>> >>>>>> ... and run w/: >>>>>> >>>>>> opensm -r -B -P/etc/ofed/partitions.conf >>> >>> Also, do you need to use -r ? It's better not to (reassign LIDs). >> >> I'm using it to assure that it just doesn't hang on to the old state, >> especially since I'm not getting the SM working... > > OK. > >> I don't want it to >> assume anything is right about the previous state. >> >> I have tried w/ and w/o and don't see a difference. >> >> The plan is, once I get it working, to remove the "-r". > > That's fine. > >> Or, are you suggesting I not use it? >> >>> >>>>>> Does that sound correct? It doesn't work >>>>> >>>>> What application(s) aren't working ? >>>> >>>> ping over IPoIB, for example. >>>> >>>> I am seeing the test node in an "initializing" state right now... I >>>> thought it was "up" before. >>> >>> Yes, this has gone "backwards" (not as far along yet...) >>> >> >> I think getting to an "up" state is the first step. > > Were the ports getting to LinkUp/Active before partitions were configured ? Yes, before I started trying to partition, all the nodes could communicate... except they'd all use just one port on the server and I couldn't get the throughput I needed. > >>>>> Any SM error messages ? >>>> >>>> The server has one klogd error coming out continuously: >>>> >>>> ib0: multicast join failed for >>>> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22 >>> >>> IPoIB broadcast group (on the default partition) can't be joined (I'm >>> presuming due to the current partition setup (e.g. it worked prior to >>> this, right ?)). >>> >>> You need to do some IPoIB configuration relative to partitions as well. >>> See kernel Documentation/infiniband/ipoib.txt for help with this. >>> >> >> Will do. As you say, the trick will be getting SRP to use the right >> P_Key's... but I need to get the IB in an "up" state first. >> >> <snip sm output> >>>>> Which server ? >>>> >>>> There's only one server... it has many ports for which I'm trying to >>>> partition do different clients. So, in the above, when I say "Server >>>> A", I mean server port "A". >>> >>> I meant which server port is running OpenSM (which GUID is being >>> used). I see above it is 0x24717124000029 >> >> That was it. I've switched to a client as the SM now, as you suggest >> a stand-alone SM. > > So it's no longer a client in the ULP sense, right ? It is just being used for SM now. > >>> >>>>> You still need the default partition with the SM node being full and >>>>> the others being limited there (so it's also best to run SM on >>>>> separate node if possible otherwise you have the potential of any >>>>> client connecting to it on default partition). >>>> >>>> Are you saying to change the partitions.conf file to: >>>> >>>> part1=0x1, ipoib: 0x0024717124000029=full, 0x0002c90300028c01; >>>> part2=0x2, ipoib: 0x002471712400002a=full, 0x0002c90300026047; >>>> part3=0x3, ipoib: 0x0024717127000035=full, 0x0002c90300026053; >>>> part4=0x4, ipoib: 0x0024717127000036=full, 0x0002c9030002603b; >>> >>> That's part of it. >>> >>>> ... (which still doesn't work) in which case I set all the server's >>>> ports to "full", or should just one be "full" (which didn't work >>>> either)? >>> >>> You also need: >>> Default=0x7fff: ALL, SELF=FULL; >>> I would put that first. >> >> So, now my /etc/ofed/partitions.conf file looks like: >> >> Default=0x7fff: ALL, SELF=FULL; >> part1=0x1, ipoib: 0x0002c903000292af=full, 0x0002c90300028c01; >> part2=0x2, ipoib: 0x0002c903000292b0=full, 0x0002c90300026047; >> part4=0x4, ipoib: 0x0024717124000029=full, 0x0002c9030002603b; > >> ... I pulled out the node on partition 3 to use as an SM exclusive >> node, I also changed the server ports to some of the other IB ports on >> that machine (port GUIDs as shown by ibstat). I set the server port >> GUID's to "full", as I want the client GUIDs to talk to it, but not >> necessarily each other (as there is only one client GUID on each >> partition now, it's a moot point). >> >> Note that I made-up the partition P_Key's of 1, 2, and 4. > > This all looks/sounds fine to me. :( > >> Note that it still doesn't work. On the stand-alone SM, ibstat looks like: >> >> # ibstat >> CA 'mlx4_0' >> CA type: MT26428 >> Number of ports: 2 >> Firmware version: 2.6.0 >> Hardware version: a0 >> Node GUID: 0x0002c90300026052 >> System image GUID: 0x0002c90300026055 >> Port 1: >> State: Armed >> Physical state: LinkUp >> Rate: 10 >> Base lid: 1 >> LMC: 0 >> SM lid: 1 >> Capability mask: 0x0251086a >> Port GUID: 0x0002c90300026053 >> Port 2: >> State: Down >> Physical state: Polling >> Rate: 10 >> Base lid: 0 >> LMC: 0 >> SM lid: 0 >> Capability mask: 0x02510868 >> Port GUID: 0x0002c90300026054 > > What's at the other end of port 1 ? Would you do smpquery portinfo for > this HCA port and it's peer port ? > >> ... On the server, the devices mentioned in the partitions file look like: >> >> CA 'mlx4_0' >> CA type: MT25418 >> Number of ports: 2 >> Firmware version: 2.6.0 >> Hardware version: a0 >> Node GUID: 0x0024717124000028 >> System image GUID: 0x002471712400002b >> Port 1: >> State: Initializing >> Physical state: LinkUp >> Rate: 10 >> Base lid: 0 >> LMC: 0 >> SM lid: 0 >> Capability mask: 0x02510868 >> Port GUID: 0x0024717124000029 >> Port 2: >> State: Initializing >> Physical state: LinkUp >> Rate: 10 >> Base lid: 0 >> LMC: 0 >> SM lid: 0 >> Capability mask: 0x02510868 >> Port GUID: 0x002471712400002a >> CA 'mlx4_1' >> CA type: MT26428 >> Number of ports: 2 >> Firmware version: 2.6.0 >> Hardware version: a0 >> Node GUID: 0x0002c903000292ae >> System image GUID: 0x0002c903000292b1 >> Port 1: >> State: Initializing >> Physical state: LinkUp >> Rate: 10 >> Base lid: 0 >> LMC: 0 >> SM lid: 0 >> Capability mask: 0x02510868 >> Port GUID: 0x0002c903000292af >> Port 2: >> State: Initializing >> Physical state: LinkUp >> Rate: 10 >> Base lid: 0 >> LMC: 0 >> SM lid: 0 >> Capability mask: 0x02510868 >> Port GUID: 0x0002c903000292b0 > > So no SM initialization is occurring there since they are still just in Init. Correct. But, the SM is running. > >> On one of the clients: >> >> # ibstat >> CA 'mlx4_0' >> CA type: MT26428 >> Number of ports: 2 >> Firmware version: 2.6.0 >> Hardware version: a0 >> Node GUID: 0x0002c90300026046 >> System image GUID: 0x0002c90300026049 >> Port 1: >> State: Initializing >> Physical state: LinkUp >> Rate: 10 >> Base lid: 7 >> LMC: 0 >> SM lid: 1 >> Capability mask: 0x02510868 >> Port GUID: 0x0002c90300026047 >> Port 2: >> State: Down >> Physical state: Polling >> Rate: 10 >> Base lid: 0 >> LMC: 0 >> SM lid: 0 >> Capability mask: 0x02510868 >> Port GUID: 0x0002c90300026048 > > Ditto. Down means it's likely a port that is not connected. > >> Partition "part2" with P_Key=2 should connect this client's port 0 to >> the sever on port 1 of mlx4_1 > > Do you really mean port 0 ? Nope... in this case I have 0x0002c903000292b0 in part2 in my partitions file, which is port 1, the second port of the adapter. I'm hoping to use both ports of all adapters on the server. > >>> >>>> I did have a difficult time understanding the difference between >>>> "full" and "limited" in the man page. >>> >>> On a given partition, full can talk with all other members whereas a >>> limited member can only talk with full members (not other limited >>> members). >>> >> >> I think I've got that correctly specified in the above partitions file. >> >>>> I've got a captive network, so I don't want any paths I've not >>>> specified to be allowed. If that makes any sense. So, I didn't want >>>> to put a statement in like: >>>> >>>> Default=0x7fff,ipoib:ALL=full; >>>> >>>> ... that would let a rogue node slip through the cracks. >>> >>> The only one they can talk with is the SM (the way I'm proposing) so >>> it's best if the SM node could be separate. >> >> It's separate now. The log looks like (in its entirety at statup): >> >> Apr 13 13:41:56 182699 [1D71CA30] 0x03 -> OpenSM 3.2.5_20081207 >> Apr 13 13:41:56 182764 [1D71CA30] 0x80 -> OpenSM 3.2.5_20081207 >> Apr 13 13:41:56 183020 [1D71CA30] 0x02 -> osm_vendor_init: 1000 >> pending umads specified >> Apr 13 13:41:56 183104 [1D71CA30] 0x80 -> Entering DISCOVERING state >> Apr 13 13:41:56 193181 [1D71CA30] 0x02 -> osm_vendor_bind: Binding to >> port 0x2c90300026053 >> Apr 13 13:41:56 217349 [1D71CA30] 0x02 -> osm_vendor_bind: Binding to >> port 0x2c90300026053 >> Apr 13 13:41:57 018570 [47FCE940] 0x01 -> umad_receiver: ERR 5409: >> send completed with error (method=0x1 attr=0x11 trans_id=0x110000123b) >> -- dropping >> Apr 13 13:41:57 018586 [47FCE940] 0x01 -> umad_receiver: ERR 5411: DR >> SMP Hop Ptr: 0x0 >> Apr 13 13:41:57 018603 [47FCE940] 0x01 -> Received SMP on a 1 hop path: >> Initial path = 0,0 >> Return path = 0,0 >> Apr 13 13:41:57 018608 [47FCE940] 0x01 -> >> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error >> (IB_TIMEOUT) >> Apr 13 13:41:57 018626 [47FCE940] 0x01 -> SMP dump: >> base_ver................0x1 >> mgmt_class..............0x81 >> class_ver...............0x1 >> method..................0x1 (SubnGet) >> D bit...................0x0 >> status..................0x0 >> hop_ptr.................0x0 >> hop_count...............0x1 >> trans_id................0x123b >> attr_id.................0x11 (NodeInfo) >> resv....................0x0 >> attr_mod................0x0 >> m_key...................0x0000000000000000 >> dr_slid.................65535 >> dr_dlid.................65535 >> >> Initial path: 0,1 >> Return path: 0,0 >> Reserved: [0][0][0][0][0][0][0] >> >> 00 00 00 00 00 00 00 00 00 00 00 00 >> 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 >> 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 >> 00 00 00 00 >> >> 00 00 00 00 00 00 00 00 00 00 00 00 >> 00 00 00 00 > > This is the first level problem. Some SMA is not responding to a > NodeInfo query from the SM. Whatever is the next hop from the SM port > appears not to be responding. You may need to reboot that device or > otherwise reset it to see if this clears this issue. After power-cycling the switch, the ports went "active"! Note that I didn't restart the SM... I just left it running. So, on one client... the one corresponding to "part2" in the partitions file, I put the P_Key into the "create child": echo 0x2 > /sys/class/net/ib0/create_child ... and did likewise on the host, for ib3 (the second port on the second adapter): echo 0x2 > /sys/class/net/ib3/create_child Still, no ping (the interfaces are setup correctly). Thanks, Chris <snip> _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
