Steve Litt writes:
I'd like to discuss this. Now, after a year of thought, I still see no
benefit to "starting servers in parallel" except for boot time.

Because you're thinking of the happy path.

Suppose you have a few dozen servers on three continents, providing a user-facing service, using something like zk or etcd to coordinate the servers.

Suppose further that something on the servers does five DNS lookups at startup. On the happy path that takes 5*0.008=0.04 seconds and who cares, but the worst case is in minutes. Say five 90-second timeouts. If things start up serially, zk or etcd will begin to initialise about eight minutes after the server started booting. The cluster can be without a quorum for eight minutes, and if you're lucky that's just a horrible backlog of failed or blocking transactions. If you're unlucky the node has been declared unhealthy and the cluster has started copying terabytes of data in order to restore redundancy.

For want of an X, Y. In real life ;)

BTW, systemd's approach to parallelism isn't particularly good for this sort of service. Parallelism is good, but not just any kind. Systemd thinks it can start services according to a DAG, but in reality that DAG is not knowable on any single host. For example: Service X on nodes 1-A8 needs service Y, which runs on nodes 3-5 and 12-15 today. The only sensible approach is to start everything and require that all services behave robustly when a dependency isn't ready.

Arnt

_______________________________________________
Dng mailing list
Dng@lists.dyne.org
https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng

Reply via email to