Lily Li writes:
> Joep Vesseur wrote:
> > On 03/11/09 00:26, Scott Rotondo wrote:
> >   
> >> Nicolas Williams wrote:
> >>     
> >>> But the i-team here might not have the resources to fix shmux to do the
> >>> right thing.  If we agree that not allowing normal users to use fping
> >>> was a mistake, then shmux could be integrated as is with a bug opened in
> >>> the upstream community to have the -p option deprecated/replaced
> >>> with a design that does async connect() + timeouts.  That would be the
> >>> pragmatic solution.
> >>>       
> It's right!

I agree that simply redefining the "-p" and "-P" options to do the
Right (and obvious) Thing with async connect() is likely the best
answer.

The stumbling block for me is proceeding with integration even though
we _know_ that some of the documented features are broken-as-designed
on OpenSolaris, and that by shipping this we're just letting customers
walk into the problem at their convenience.

Doing this provides OpenSolaris with a competitive disadvantage when
compared to Linux.  Users will discover that we ship shmux, but it's
not as good as on Linux, because some of the features don't work on
OpenSolaris but do work on Linux.

> IMO, it should be acceptable to integrate the shmux as is for now, as it 
> is useful to us after all(my daily work is mainly for nfs test, I always 
> need to check/run something on many hosts, so this tool will be great 
> help for me :) ), and we can update the package once the new version is 
> available.

Partially functional is great for open source projects.  I'm not
convinced that it's right for OpenSolaris.

> I'm not sure whether he would like to do. But anyway, I will send a mail 
> to him to confirm. BTW, do you want the maintainer to submit the 
> modified shmux based on our requirement instead of the current one?

If the upstream maintainer won't do it (or, in the worst case, doesn't
agree with the change), and if we really want this utility to be part
of OpenSolaris, then we'll need to do it.

As we've come to a stumbling block here, and I haven't been able to
convince myself that shipping something with known architectural
defects that break the project's documented features but something
that works for some usage cases is "good enough," I think I can offer
a solution of sorts.

That solution would be to derail this fast-track, offer the following
TCR, and then vote to approve (I hope we can address this in ARC
business today):

        TCR 1.  The utility must be modified to remove its dependence
                on fping and should (to preserve the user interface)
                re-use the existing "-p" and "-P" options to control
                the behavior of connect(), using non-blocking
                sockets, poll/select for write, and short timers.

That way, we finish the back-and-forth discussion, we get a clear
decision from the other ARC members (both on the imposition of the TCR
and on the approval of the project) so that it's not just me arguing
this point, and the project team can then decide how to go forward; it
can either seek to have the ARC overridden or find some means to
comply.

Since I think the discussion has gone on long enough, I'll derail.  I
offer the TCR above, and now's the point for other ARC members to chip
in: do you agree with the TCR?  (Should it be a TCA instead, or
perhaps reworded?)  If you do agree with the TCR, would you be ready
to vote on the project?

-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Reply via email to