Hi Randy -

I'm not familiar with any of the pushes towards HA so I'm not sure
where you stand at this point, or how we plan on implementing that in
the future but I wouldnt mind contributing once there is a gameplan in
place.

But, looking at Vlads request, to be able to pass a hint or option to
pvfs to define which IB port to use, I"m not really sure where to add
this...
Where's the best place to put the config param?  in the config file?
command line?    And then how do we push that all the way down to the
bmi_ib code?


Vlad -

I threw this together today, it will pick port number 2 instead of
port number 1 if it exists, but it isnt configurable yet, still
pending comments above.

It compiles but I don't have the hardware anymore to test it.  This
approach completely disregards the fact that an HCA may have more than
2 ports, but once we figure out how we want to be able to pass the
parameter down to bmi we can fix that with 1 line.
Can you test it out?

Thanks
~Kyle


test@test:~/pvfs/orangefs$ cat port_number.patch
Index: src/io/bmi/bmi_ib/openib.c
===================================================================
--- src/io/bmi/bmi_ib/openib.c  (revision 9182)
+++ src/io/bmi/bmi_ib/openib.c  (working copy)
@@ -899,8 +899,22 @@
     ib_device->func.check_async_events = openib_check_async_events;

     od->ctx = ctx;
-    od->nic_port = IBV_PORT;  /* maybe let this be configurable */

+   /* Query the device for the max_ requests and such */
+    ret = ibv_query_device(od->ctx, &hca_cap);
+    if (ret)
+       error_xerrno(ret, "%s: ibv_query_device", __func__);
+    VALGRIND_MAKE_MEM_DEFINED(&hca_cap, sizeof(hca_cap));
+
+    /* Try to see if we can bring up more than one port instead of hc it */
+    if((int)hca_cap.phys_port_cnt > 1)
+    {
+       /* parse in the port number request here */
+       od->nic_port=2;
+    }
+    else
+       od->nic_port = IBV_PORT;  /* default to port 1 */
+
     /* get the lid and verify port state */
     ret = ibv_query_port(od->ctx, od->nic_port, &hca_port);
     if (ret)
@@ -913,12 +927,8 @@
        error("%s: port state is %s but should be ACTIVE; check subnet manager",
              __func__, openib_port_state_string(hca_port.state));

-    /* Query the device for the max_ requests and such */
-    ret = ibv_query_device(od->ctx, &hca_cap);
-    if (ret)
-       error_xerrno(ret, "%s: ibv_query_device", __func__);
-    VALGRIND_MAKE_MEM_DEFINED(&hca_cap, sizeof(hca_cap));
-
+    /* used to query device here, but we queried these up above to define
+       port number  */
     debug(1, "%s: max %d completion queue entries", __func__, hca_cap.max_cq);
     cqe_num = IBV_NUM_CQ_ENTRIES;
     od->nic_max_sge = hca_cap.max_sge;



Kyle Schochenmaier



On Tue, Jan 31, 2012 at 1:33 PM, Randall Martin <w...@clemson.edu> wrote:
> Kyle,
>
> Yes I think we need some form of fail-over capability with multi-port NICs
> in orangefs for HA.  As the number of I/O servers grow, the odds of some
> kind of hardware failure increases.  Network errors in this brave new
> world should be expected and tolerated as much as possible.  This might be
> a good step in that direction.
>
> -Randy
>
> On 1/31/12 10:27 AM, "Kyle Schochenmaier" <kscho...@gmail.com> wrote:
>
>>Hi Vlad, All -
>>
>>A couple comments..
>>You can probably just hardcode to port 2 to force things onto port 2,
>>feel free to test it out, just be sure to rebuild and push out all of
>>the client and server binaries so they all play nicely with eachother.
>>
>>Also, I don't think we can implement port bonding at this level, it
>>would require quite a bit of work and synchronization which could put
>>in enough overhead to make it not perform significantly faster.. I
>>guess the only way to tell would be to try it but Im going to suspect
>>that it might not be beneficial.
>>
>>Now,
>>The other thing you mentioned had to do with port fail-over I
>>believe..this is why I'm bringing in the dev list here.  Currently I
>>believe the standard practice across all interconnects using bmi is to
>>have a hard fail whenever a particular port configuration fails to
>>come up initially.
>>
>>But I know we're going to be making a push into HA with orangefs soon
>>so I am wondering what peoples thoughts are here?
>>Is this something that would need to be implemented anyways, does it
>>fit the HA scheme that is being examined for orangefs?
>>Thoughts?
>>
>>
>>Kyle Schochenmaier
>>
>>
>>
>>On Tue, Jan 31, 2012 at 2:25 AM, vlad <v...@cosy.sbg.ac.at> wrote:
>>> Dear Kyle,
>>>
>>>
>>>> I dont think we ever got around to testing this with multiple ports
>>>> active on each HCA when we wrote it, so I believe I hard coded it to
>>>> just default to the first port... iirc we tried to bring up the 2nd
>>>> port at one point and found that there were some memory exhaustion
>>>> issues when using more than one port AND the default HCA
>>>> buffers/MTUs/etc on the cards that this was primarily tested on so it
>>>> went back to 1.
>>>>
>>>> I wouldn't recommend changing this via a hard code for obvious
>>>> reasons, but at the same time it probably wouldn't take more than
>>>> 20-30 lines of code to fix this up to take more than one port.  I'll
>>>> try to take a look at it.
>>>
>>>
>>> Thanks, I never thought about bonding the 2 infiniband ports.
>>>
>>> It is absolutely sufficient for me to swap port 1 for port 2. I never
>>>had
>>> the intention of using both ports simultaneusly.
>>> Could this be achieved by changing   IBV_PORT to "2" instead of "1" ?
>>>
>>> For the new code wishlist (if I may ask for it ..):
>>>
>>> It would be nice to be able to define the infiniband port in the config
>>> file and to have a defined fallback to the other infiniband port, if the
>>> 1st one does not work.
>>>
>>> (example connection should go over ib0,ib1://host:port/service)
>>>
>>> But that is not very urgent to have .
>>>
>>> Thanks for all the help,
>>>
>>> Greetings,
>>>
>>> Vlad
>>
>>_______________________________________________
>>Pvfs2-developers mailing list
>>Pvfs2-developers@beowulf-underground.org
>>http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>

_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to