Hm, I'm still struggling to understand why this is a problem as an
option for operators, sorry if I'm being dense!

> The server state is a perfect example of huge breakage that would be
> caused by such a change. In short, you start with the names from the
> configuration file, which are looked up in the state file, then the
> names are changed, and once dumped, they are different. So upon next
> reload, nothing matches at all.

As long as the configuration file is updated before updating the name
this should be fine, shouldn't it? I was under the impression that
best practice for using dynamic HAProxy server features was always to
write out the modified configuration file when sending socket
commands. That way reloads are consistent, and if someone really needs
to know what exactly is happening right now in the running process
that's what show stat or show servers state is for? I'm also
struggling to see why the state file is an issue, my understanding is
that you save the server state file before a reload, so why would you
send a name update after you save the server sate?

Fwiw I've tested the patch and "show server state" correctly displays
the updated name [1], but I don't think that's what you are concerned
about...

> For me it's a more general problem of inconsistency between the configuration
> and what is running. For example if you have this :\

I very much agree, I just thought that we were already going down this
path with dynamically updating addrs and using server templates (which
introduce config-runtime drift as well). As long as whatever is doing
the socket manipulation (in our case synapse), does file configuration
changes first shouldn't this be reasonably ok?

>
>     backend foo
>          use-server srv1 if rule1
>          use-server srv2 if rule2
>          server srv1 1.1.1.1:80
>          server srv2 2.2.2.2:80
>
> Do you know what happens once you'll switch the names over the CLI ? (hint:
> I know).

My understanding is that we'd continue sending to the original
servers. I do see this would be bad, but isn't the requirement on the
user to not do this, specifically if you have use-server rules then
you have to reload the config to pick up changes in backends?
Furthermore, rule1 or rule2 could refer to hard coded IP addresses
which would also be wrong if we updated the addr of the server with
"set server addr" as it is. Also thinking about this more, if
use-server rules dynamically searched for the server instead of
storing that might be a nifty way to do like primary replica failover,
something like:

backend foo
    use-server leader if write_operation
    server leader 1.1.1.1:1234
    server follower1 2.2.2.2:1234
    server follower2 3.3.3.3:1234

Then to do a leader failover you would just set the leader to a
follower (during which time you 503 write requests), and promote a
follower to a leader. I think named backends with a use_backend ACL is
probably the _better_ way to do this, but it's an interesting thought.

> And it will even be inconsistent with what use_backend does because
> use_backend *can* resolve at runtime instead of config time. The logs will
> return the current name depending on the last change, possibly conflicting
> with what is in the configuration. Another example :
>
>     backend foo1
>         server srv1 1.1.1.1:80 track foo2/srv1
>         server srv2 2.2.2.2:80 track foo2/srv2
>
>     backend foo2
>         server srv1 1.1.1.1:80 check
>         server srv2 2.2.2.2:80 check
>
> Here after a renaming of the tracked servers, it becomes a mess, you'll
> constantly have to know what's taken at startup time and what's taken at
> runtime.

I agree, but if an administrator is using features like track or
use-server I think they should pick up a server change with a reload
rather than with the dynamic socket. Similar to if they are using ip
or port based ACLs they should probably reload instead of setting the
address dynamically?

> On the opposite, there's zero backwards compatibility concerns to have
> with adding a description field because we don't have it for servers for
> now, only for backends and frontends.

I see, ok I will see if this can work.

>> and I'm still concerned about how does
>> automation go from address to name going forward.
>
> Just a question, wouldn't your requirement in fact be to lookup a server
> by its address ? I mean, wouldn't it be easier for your use case ? If so
> I'm pretty sure that's something we could easily apply. Either we apply
> this to the first server that matches the address, or we apply it to all
> those which match, depending on the difficulty.
>
> For example wouldn't it be easier for you to be able to do :
>
>    set server addr bk/@1.1.1.1:80 2.2.2.2:80
>
> or :
>
>    disable server bk/@1.1.1.1:80
>
> ?

Interesting syntax, I think if we added that to all the set server
commands it could work (we'd have to port from using enable/disable
server to set server state, but that could work) for all but the
strangest edge case in synapse. For context Synapse generates HAProxy
server names from host:port + a user provided "name" (which is
typically the fqdn, but some users use it to double register the same
physical server, which this syntax would no longer allow). It does
seem to me though like we're doing a lot more work then we need (but
that's probably because I don't understand why updating the name is
bad yet heh).

> I'm pretty sure you have a very valid use case that we probably need to
> discuss more to find the largest intersection with what can reasonably be
> done here without breaking existing setups. We know that the management
> socket needs to be significantly improved for some maintenance operations,
> so it's even more important to know what your ideal usage scenario would
> be to see how it could be mapped here.

The usage scenario is microservices, namely we have a bunch of service
instances (containers) which are constantly churning around physical
machines as applications autoscale, deploy, fail, get preempted, get
bin-packed onto other machines etc ... This is pretty common in any
Mesos or Kubernetes like service deployment. Right now we manage the
churn with synapse, which gets dynamic push updates from a service
registry (zookeeper, dns, etc), and automatically reconfigures (within
seconds) and reloads (within minutes) HAProxy across a fleet of a few
thousand machines. To give you an idea of the kind of churn, synapse
handles about 10-20 thousand socket updates (mostly servers going in
and out of maintenance) and about 200 reloads per day per machine. We
mostly reload to pick up new servers, and although synapse tries to
group server changes into 1 minute groups (so we aggregate all the
changes within 1 minute) we still get about 200 restarts per day and
for something simple like scaling up a backend by one server we have
to wait 1 minute for HAProxies everywhere to pick that up. Ideally we
can add servers just as fast as we can down/up them, which I thought
might be possible with the new dynamic "set server addr" directive,
but if we can't change the name then synapse can't map it's notion of
a server to HAProxy's notion of a server and I don't know how to make
it work. Synapse doesn't support fancy per server features like
set-server and server tracking since they're really not needed in the
microservice world. You just need a backend pool that dynamically
updates as fast as possible as containers migrate around machines.
>
> It's unrelated, I meant the "char *old_name" in the middle of the code.
Ah, right c. I've attached a modified patch which is a little safer
per your suggestions, but it sounds like you're against updating the
name so will probably throw that out. Also you can find an example of
what show servers state looks like before and after dynamically
updating here [1].

I'm still not quite understanding why allowing changing the name is
going to cause breakage for users that are choosing to change the name
dynamically, but I do really appreciate all the context you're giving
me :-)

-Joey

[1] https://gist.github.com/jolynch/43fea470b9b5ab86bfac4969ee035fa6

Attachment: 0001-MINOR-cli-ability-to-change-a-server-s-name.patch
Description: Binary data

Reply via email to