Re: [networking-discuss] simple code review request

Michael Hunter Thu, 28 Jun 2007 09:29:28 -0700

On Thu, 28 Jun 2007 12:06:25 -0400
James Carlson <[EMAIL PROTECTED]> wrote:


> Michael Hunter writes:
> > What you suggest here is that we move the solution into the
> > application.  That seems heavy handed to me.  The first part (non
> > blocking name resolution) might be something that a small number of
> > name resolution users want to use but to push that type of asycronous
> > API onto everybody doesn't seem reasonable to me.
> 
> If you're going to be agile in the face of name service change, I
> don't see a good alternative.
> 
> However, I wasn't necessarily suggesting pushing it into "the
> application."  In this case, we're talking about inetd services, so I
> was talking about pushing it into inetd itself -- a single place where
> the problem can be dealt with, rather than the dozens or hundreds of
> clients that inetd may have.

If it was just inetd that we needed to make more resiliant then I'm
with you.  But of your 3 points the last one was to have name service
clients react to change without having to blow the world away and
restart it.  That one seems to me like it would push burden onto many
of the clients.

> 
> > Instead you could have a local name server that understand the policies
> > used to choose type of name service to use, etc.  All the local
> > applications see this local name server as the only thing that exists
> > and continue to survive as name services are changed, etc.  I think
> > something that the user could sit poll() blocked on would be my choice
> > as it gets rid of the need for a special non-blocking API.  But
> > anything we could wrap an API around would work.
> 
> I'm not sure I understand the specifics of what you're suggesting.
> 
> If you're talking about running a local caching NIS or DNS server
> (perhaps alongside or instead of nscd) in order to help guarantee
> availability, then I think I'm with you.  We should have done
> something like that long ago.

Kindof.  A cacheing-translating server that hid the underlieing
mechanism it used.  But a local caching server for each name service
also works.

> 
> Otherwise, I don't understand what you're suggesting or how it fixes
> the underlying problem: trying to translate a "service name" into a
> port number may take a very long time and may not succeed at all, and
> yet we've got a single service that's dependent on doing this for
> dozens or hundreds of separate entries.  That doesn't sound like the
> right design.

For inetd I agree that the inability to translate one service number
shouldn't hang up other services that might already be up or that might
be specified numerically.  But this bug is about inetd-update.

> 
> > > I think we need a refactoring.
> > 
> > This is wider then just name service although that is a big part of
> > it.  Unix code tends to swill away all kinds of information about the
> > node its on, interfaces, etc. that is _usually_ stable.  The socket API
> > wasn't exactly designed to discourage this.  But then when users try to
> > build highly reliable system on top of this they are surprised when
> > things break?
> > 
> > But I don't think that type of refactoring is demanded by this bug.
> > This is inetd-upgrade.  It should run less often then the number of
> > software installations that contain it.  Its a conversion step from the
> > old world into the new.  Solving the bigger name service problem seems
> > like a big hammer to demand to make this work a little better.
> 
> It breaks frequently enough to make Solaris look bad, and there are a
> lot of vaguish bugs in the database that likely have this same root
> cause.  I think it needs to be a lot more reliable than it is today.

Agreed.

> 
> Without regard to the asynchronous nature of name service changes, I
> think the idea of expressing a set of individual dependencies with a
> single rolled-up state is wrong.  It means that administrators are
> *UNABLE* to configure the system reasonably for high-availability of
> particular services.

Yes.

> 
> In other words, it's common practice to identify particular services
> as being "important," and then to place static entries in
> /etc/services (and other files) and set the nsswitch.conf entry to
> point to "files nis."  This means that the service is guaranteed to
> resolve correctly, no matter what else is going on, and the system
> isn't brought to its knees by a temporary network outage.

Yes.

> 
> The way SMF was used here breaks this.  It breaks it by making all
> services fall apart when _any_ of them cannot be resolved.  That seems
> like a mistake.
> 
> Whether you get there with this fix or not is another matter.  "Out of
> scope for now" doesn't seem like a wrong answer to me.

Thats what I think is the right answer for inetd-upgrade (inetconv).  I
think its a P2 bug against inetd if it ever hangs on name resolution.
Its probably a P2ish RFE to clean up the whole "name service can
traffic jam a bunch of valid services" issue.

                        mph

> 
> -- 
> James Carlson, Solaris Networking              <[EMAIL PROTECTED]>
> Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442 2084
> MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] simple code review request

Reply via email to