Re: [OMPI devel] Device failover on ob1

Brian W. Barrett Mon, 3 Aug 2009 11:23:29 -0400

On Sun, 2 Aug 2009, Ralph Castain wrote:

Perhaps a bigger question needs to be addressed - namely, does the ob1 codeneed to be refactored?
Having been involved a little in the early discussion with bull when wedebated over where to put this, I know the primary concern was that the codenot suffer the same fate as the dr module. We have since run into a similarissue with the checksum module, so I know where they are coming from.
The problem is that the code base is adjusted to support changes in ob1,which is still being debugged. On the order of 95% of the code in ob1 isrequired to be common across all the pml modules, so the rest of us have to(a) watch carefully all the commits to see if someone touches ob1, and then(b) manually mirror the change in our modules.
This is not a supportable model over the long-term, which is why dr has died,and checksum is considering integrating into ob1 using configure #if's toavoid impacting non-checksum users. Likewise, device failover has beentreated similarly here - i.e., configure out the added code unless someonewants it.
This -does- lead to messier source code with these #if's in it. If we canrefactor the ob1 code so the common functionality resides in the base, thenperhaps we can avoid this problem.
Is it possible?

I think Ralph raises a good point - we need to think about how to allowbetter use of OB1's code base between consumers like checksum andfailover. The current situation is problematic to me, for the reasonsRalph cited. However, since the ob1 structures and code have little usefor PMLs such as CM, I'd rather not push the code into the base - in theend, it's very specific to a particular PML implementation and the codepushed into the base already made things much more interesting inimplementing CM than I would have liked. DR is different in thisconversation, as it was almost entirely a seperate implementation from ob1by the end, due to the removal of many features and the addition of manyothers.

However, I think there's middle ground here which could greatly improvethe current situation. With the proper refactoring, there's no technicalreason why we couldn't move the checksum functionality into ob1 and addthe failover to ob1, with no impact on performance when the functionalityisn't used and little impact on code readability.

So, in summary, refactor OB1 to support checksum / failover good, pushingob1 code into base bad.


Brian

Re: [OMPI devel] Device failover on ob1

Reply via email to