On Fri, Feb 03, 2017 at 03:40:32PM +1030, Ron wrote:
> On Thu, Feb 02, 2017 at 02:22:47PM +0100, Uwe Kleine-König wrote:
> > On Thu, Feb 02, 2017 at 04:08:49PM +1030, Ron via Syslinux wrote:
> >
> > > The use of AI_PASSIVE here is a placebo.  That flag has no effect unless
> > > address was NULL, and if that was true, neither of the hunks here would
> > > actually be executed in the first place.
> > 
> > Right. Today it only has an effect if the first argument to getaddrinfo
> > is NULL. The intension (IIUC) is that it should be used when you plan to
> > feed the result to bind (opposed to connect).
> 
> What do you mean by "Today"?  Both SuSv4 and the Linux man page are
> unequivocal about the _only_ use of that flag being to special case
> a NULL address (meaning 'this machine') to return either the wildcard
> address or the LOOPBACK address.

I had in mind that at some point in the future (say with ipv8 or
802.11t-2042) the flag might mean more. I'd say the intension is to use
AI_PASSIVE if you plan to listen on this address, so it seemed right to
use it.  But I'm willing to restrict the discussion to the removing of
AI_ADDRCONFIG.

> > > Using AI_CANONNAME here should be harmless at worst.  So the only actual
> > > change is to drop AI_ADDRCONFIG - the flag which limits getaddrinfo to
> > > returning only the address families that are actually supported by the
> > > configured interfaces on the system.  And ordinarily that would seem to
> > > be a fairly uncontroversially Good Thing to do, for both connecting and
> > > listening sockets.
> > 
> > The downside of using AI_ADDRCONFIG is that it makes binding to 0.0.0.0
> > (or ::) fail when no interface is up yet.
> 
> That seems to be where we disagree.  I don't see it as a 'downside' that
> if you explicitly say "I want to bind to IPv4 addresses" (or IPv6), and
> you don't actually have any, that this should fail early and loudly to
> warn you about either misconfiguration, or some other more serious
> failure, occurring.

If I want to bind to 0.0.0.0 and no interface (but lo which might be
good enough for me) is up this might be intensionally. This way I might
be able to speed up system boot because I don't have to wait until all
interfaces are up before I start the daemon. Of course this doesn't help
if I configure tftp to listen on an explicit address, but that's a
different problem that's out of scope of my patch.

> If you passed a name instead of a numeric address, you'd only get the
> address families the machine actually supported.  If you pass a numeric
> address in a particular family, you get a sanity check that it's valid.
> 
> If you (personally) don't care about that, just don't pass an explicit
> address.
> 
> The downsides of not using AI_ADDRCONFIG can't be remedied so easily.

IMHO AI_ADDRCONFIG is at best an optimisation for programs that use a
socket to connect(2). It's not that sensible for sockets to listen.

Consider I do:

        tftpd -a tftpd.mycompany.com:tftpd

and tftpd.mycompany.com resolves to both an ipv4 and an ipv6 address. If
the server has an ipv4 problem it just starts to listen on the ipv6
address with AI_ADDRCONFIG. Does this sound wrong only for me?

> > If we can agree in principle I can rework the patch to make one change
> > per patch:
> 
> I'm far less concerned about the format of the patch, than the details
> of what it's actually hoping to achieve.
> 
> If all of the reasons for doing this are just different ways of saying
> "if we do less sanity checking, then misconfiguration and broken tools
> won't annoy me as often" - that's not very compelling.

I tried hard to show good reasons that this is not the motivation for
this change. I don't say "drop all sanity checking", I'm only saying
"don't refuse to work as good as you can".

> Doubly not so when there already is a way you can configure this for
> your own use which does already bypass them.  Simply disabling that
> for everyone and every configuration isn't a good answer.
> 
> > > So unless upstream sees this differently, I still think we'd need to see
> > > some stronger rationale for why that isn't a Good Thing in this particular
> > > case than just "Dropping that flag hides a real bug in NetworkManager".
> > 
> > This is not the (only) reason for me. This is mostly only how it showed
> > up for some people, but still there are more IMHO good reasons to fix
> > it:
> > 
> >  - inconsistent behaviour when no interface is up: -a 0.0.0.0 fails,
> >    -a :: fails, not passing -a doesn't fail and makes tftpd bind to all
> >    interfaces.
> 
> I don't see how that is 'inconsistent'?  If you ask for an explicit
> address (family) you get a sanity check that (support for it) is
> available.  If you say you don't care, then tftpd doesn't either.

If I don't pass -a that's not "I don't care" but "bind to 0.0.0.0 and
::". To make it more explicit:

 - If the admin requests 0.0.0.0 this is denied because there is no ipv4
   address on any interface.
 - If the admin requests :: this is denied because there is no ipv6
   address on any interface.
 - If the admin requests :: and 0.0.0.0 he gets what he wants even if
   all interfaces have neither an ipv4 nor an ipv6 address.

> >  - The "no interface is up" also happens with ifupdown with no auto
> >    interface is used (only hotplug)
> 
> Not defaulting to auto (and systemd not respecting it for a while) were
> both bugs that broke lots of services on lots of people's machine.

Still there are valid use cases of using hotplug. I don't see why tftpd
shouldn't try to cooperate in these cases without restricting users that
have different setups.

> And I already explained to you in #d-d that the established 'solution'
> for that, for services which don't monitor netlink events, is to add a
> hook which restarts them when interfaces appear or disappear, if you
> really want them to bind to interfaces which might genuinely be expected
> to be hotplugged.

This is a more complicated solution for a more complicated problem.
I want to discuss: "tftpd fails to bind to 0.0.0.0 in some situations
even though it could do as requested."

Also listening on 0.0.0.0 includes listening on lo. For listening
sockets it's ridiculous to special case lo. IMHO it's even wrong that in
the case where there is no address on any interface but lo

        >>> import socket
        >>> socket.getaddrinfo("127.0.0.1", "tftp", socket.AF_INET, 
flags=socket.AI_ADDRCONFIG)

fails. So IMHO AI_ADDRCONFIG is just band aid that might be used for
clients(!) that fail to use all return values from getaddrinfo.

> Because if you're not actually using wildcard addresses, then this will
> _still_ fail, even with your patch, if that interface isn't already up.

Right, this is entirely ok and everything else would be wrong. (I didn't
test, but I think also in this case the error message is improved from
"Cannot resolve 8.8.8.8" to "Cannot bind to 8.8.8.8" which IMHO makes
more sense.)

> >  - The "no interface is up" also happens if your laptop has no network
> >    connection during boot
> 
> This doesn't need a link up state to work, it just needs a local
> interface to be brought up with the needed address (family) assigned
> to it.

This is not usual for the common laptop. If it's booted in a train, then
suspend and resumed in the office tftpd isn't running.

> If your only address is assigned by DHCP or something similar, and you
> aren't blocking waiting on that - then as above lots of services are
> going to fail to start if your boot sequence blindly tries to start
> them anyway.

On my machine tftp is the only service having this problem. Pointing to
others that don't behave cooperative isn't an excuse to not cooperate.

> That isn't the fault of, or a bug in, those services.  Your system is
> just misconfigured for what you actually want it to do.

For me it's the other way round. If I request an application to bind to
0.0.0.0 it should try to do this and not be smart with me. Even if the
application thinks I did something wrong, the application should only
complain if I request something impossible.

> >  - It's more robust to try what the admin requested. It is possible even
> >    if no interface is up to bind to 0.0.0.0. So I suggest to do that and
> >    not try to know it better than the admin.
> 
> See above.  Your patch is *taking away* the ability of the admin to
> have the choice to specify exactly what behaviour they want ...

I fail to follow. Can you please remember me, what the failure is I
introduce?

> The behaviour that you want is already supported.  You just need to
> explicitly "request" that, rather than redefine the historical
> default to now mean that as well, taking away any other option that
> some other admin might want to request.

I'm not talking about the default configuration. I just want to make
tftpd able to bind to 0.0.0.0 when requested to do so.

> >  - The error message
> >     cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not 
> > known
> >    is misleading.
> 
> Well, you can file that bug against glibc :)  We're just reporting
> what getaddrinfo and gai_strerror returned ...

getaddrinfo and gai_strerror are fine. ftpd uses them wrongly and so the
error doesn't fit to what tftpd should actually do.

> > > Because it could hide or introduce real problems in other cases too, and
> > > if the bug in NM is fixed, then the only reason I'm so far aware of for
> > > you proposing this patch (based on the discussion on #d-d) also goes away
> > > too ...
> > 
> > See above. Which problems do you see introduced by my patch?
> 
> You're assuming that the only configuration anyone would want is to bind
> to any address, without any checking of what address (families) are
> available, even if the user explicitly specifies them - and that silently
> ignoring any error with being able to do that is a Good Thing.

No, you're talking about that other problem ("What should be the default
binding address of tftpd?"). 
I'm talking about "tftpd fails to bind to 0.0.0.0 in some situations even
though it could do as requested."

> If you're running a toy service on your laptop, that might be ok.

Great, so we can agree on a use case where my patch makes sense. Great.
For me this is good enough to apply the patch given there are no
disadvantages to other use cases.

> If you're really using your laptop as a 'portable' server for some
> special use case, you probably don't want it binding to random wifi
> hotspots wherever you may go with it.

Right. In this case I have to make something different. I can do so even
if tftpd behaves fine when requested to bind to 0.0.0.0. Binding to
0.0.0.0 might not be part of the solution for the portable server
scenario, but being able to do so doesn't restrict me here. Great!

> If you're running a Real Server, then silently ignoring real problems
> is pretty much the opposite of what you'd want happening in most cases.

What is the problem here? I have a Real Server and tftpd is supposed to
be started on 0.0.0.0. Currently it fails to do so if the networking
setup is broken. So after repairing the network configuration I have to
restart tftpd. This is cheap and ok. With my suggested patch I have to
repair the network only and after that tftpd starts serving requests as
its configured to do. That's even better, isn't it? Even if not, it
doesn't make the situation worse here.

So lets agree that there are some situations where the patch is good,
and in all other situations it doesn't hurt. That's a good enough
justification to apply the patch if you ask me.

> The lessons of:
> https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00
> aren't entirely inapplicable here.

I admit I didn't read that completely. The abstract says that the
statement "Be liberal in what you accept, and conservative in what you
send" might have negative consequences to long term maintenance. This
might be true if you expand your application to handle all sort of
broken requests. I don't see this fit my patch, as it is not about
better guessing what the admin requested if he articulates something
incomprehensial. It's just about doing what the admin clearly requested.

> > IMHO "we don't do it right because it might paper over other problems"
> > is a poor reason for not patching. ("I don't need seatbelt or a helmet
> > because if my head gets hurt there is a different problem.")
> 
> I don't think defining "do it right" as "I'd rather you disable checking
> in the software for everyone than configure it locally to do what I want"
> is very helpful here.  Especially not if the only reason you want that
> is because NM has a bug, or you don't want to configure it to properly
> support using this service with hotplugged interfaces.

I think you're talking about the default configuration thing again. If
not I cannot follow.

> By your analogy, what you're saying with this patch is "I want you to
> remove the seatbelt completely, even though I already have the option
> to choose not to wear it myself".

That's wrong. You currently don't have the option to not wear the
seatbelt because -a 0.0.0.0 fails.

> > > Assuming that at some point the NM bug will be fixed, why would we still
> > > want to make this change in this code?
> > 
> > See above. FTR: The NM bug is already reported upstream and a first
> > (though not fixing) patch was suggested by them.
> 
> Cool, though it would be nice if there was a (possibly RC?) debian bug
> also tracking that.
> 
> I think if you want to change the behaviour of this program, you're also
> going to have to get your proposed patch past upstream before I apply it.
> When I first adopted this package, the first thing I did was get all of
> the patches we were carrying upstreamed, and I'm not keen to diverge
> from them again over something like this.

Agreed. That's why I posted the patch to the upstream mailing list.

Assuming we're still not in agreement about this patch, it would be
great to get a third opinion.

Best regards
Uwe

Attachment: signature.asc
Description: PGP signature

Reply via email to