On 29/8/2023 8:35 am, Steve Thompson wrote:
What happens if a WINTEL server running MQ buys the farm? Those
inflight transactions going through that server may time out and have
to be re-driven. Is this considered an outage? Not if you have a
second one handling the load and it takes over. But that one or 10(?)
users may see an error message. Does that count as an outage if the
user only loses a few seconds in getting an answer? Or a Pharmacy
getting info? Or an OR getting info on drug interactions?
Distributed systems are very different beasts. Mitigating network
partitions has lead to the CAP theorem. Apache Kafka is a popular
message broker on distributed systems and it's highly available if you
run it on a least 3 nodes, which can tolerate the loss of 1 broker. 5
nodes for 2 brokers. Orchestration platforms such as Kubernetes and Open
Shift make it quite easy to deploy clusters, even using availability
zones for replicate to a remote data center. All brokers replicate with
each other and are coordinated on the control plane using a consensus
algorithm like Raft. As a mainframe guy I was blown away how anybody
would find eventual consistency acceptable, but they do.
https://en.wikipedia.org/wiki/CAP_theorem
https://en.wikipedia.org/wiki/Raft_(algorithm)
z is the king of CA systems as it's not susceptible to network
partitions. That's why I find it odd why people would want to run
systems like Kafka on z/OS when it's architecture is designed to run on
unreliable commodity hardware.
Need some perspective.
Steve Thompson
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN