On Thu, Sep 18, 2014 at 8:54 AM, Devananda van der Veen <devananda....@gmail.com> wrote: > On Thu, Sep 18, 2014 at 7:45 AM, Flavio Percoco <fla...@redhat.com> wrote: >> On 09/18/2014 04:09 PM, Gordon Sim wrote: >>> On 09/18/2014 12:31 PM, Flavio Percoco wrote: >>>> Zaqar guarantees FIFO. To be more precise, it does that relying on the >>>> storage backend ability to do so as well. Depending on the storage used, >>>> guaranteeing FIFO may have some performance penalties. >>> >>> Would it be accurate to say that at present Zaqar does not use >>> distributed queues, but holds all queue data in a storage mechanism of >>> some form which may internally distribute that data among servers but >>> provides Zaqar with a consistent data model of some form? >> >> I think this is accurate. The queue's distribution depends on the >> storage ability to do so and deployers will be able to choose what >> storage works best for them based on this as well. I'm not sure how >> useful this separation is from a user perspective but I do see the >> relevance when it comes to implementation details and deployments. > > Guaranteeing FIFO and not using a distributed queue architecture > *above* the storage backend are both scale-limiting design choices. > That Zaqar's scalability depends on the storage back end is not a > desirable thing in a cloud-scale messaging system in my opinion, > because this will prevent use at scales which can not be accommodated > by a single storage back end. >
It may be worth qualifying this a bit more. While no single instance of any storage back-end is infinitely scalable, some of them are really darn fast. That may be enough for the majority of use cases. It's not outside the realm of possibility that the inflection point [0] where these design choices result in performance limitations is at the very high end of scale-out, eg. public cloud providers who have the resources to invest further in improving zaqar. As an example of what I mean, let me refer to the 99th percentile response time graphs in Kurt's benchmarks [1]... increasing the number of clients with write-heavy workloads was enough to drive latency from <10ms to >200 ms with a single service. That latency significantly improved as storage and application instances were added, which is good, and what I would expect. These benchmarks do not (and were not intended to) show the maximal performance of a public-cloud-scale deployment -- but they do show that performance under different workloads improves as additional services are started. While I have no basis for comparing the configuration of the deployment he used in those tests to what a public cloud operator might choose to deploy, and presumably such an operator would put significant work into tuning storage and running more instances of each service and thus shift that inflection point "to the right", my point is that, by depending on a single storage instance, Zaqar has pushed the *ability* to scale out down into the storage implementation. Given my experience scaling SQL and NoSQL data stores (in my past life, before working on OpenStack) I have a knee-jerk reaction to believing that this approach will result in a public-cloud-scale messaging system. -Devananda [0] http://en.wikipedia.org/wiki/Inflection_point -- in this context, I mean the point on the graph of throughput vs latency where the derivative goes from near-zero (linear growth) to non-zero (exponential growth) [1] https://wiki.openstack.org/wiki/Zaqar/Performance/PubSub/Redis _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev