Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
In response to Gil Yehuda's comments on MongoDB and the AGPL (here http://lists.openstack.org/pipermail/openstack-dev/2014-March/030510.html), I understand the concern about the AGPL. But in this case it's completely, absolutely unfounded. As mentioned earlier, MongoDB Inc. wants people to use MongoDB, the project. That's why we wrapped the server code (AGPL) in an Apache license (drivers). Basically, for 99.999% of the world's population, you can use MongoDB under the cover of the Apache license. If you'd like more assurance, we're happy to provide it. We want people using the world's most popular NoSQL database with the world's most popular open source cloud (OpenStack). I think our track record on this is 100% in the affirmative.___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Excerpts from Ozgur Akan's message of 2014-03-20 14:18:27 -0700: > Hi, > > Marconi manages its own sharding (doesn't rely on mongoDB's own sharding) > in order to have more control on where data is stored. Sharding is done > based on project_id + queue_id and stored in a catalog. Since Marconi > manages it's own shards, it can use the same logic with any storage. If it > was redis, scaling wouldn't be any different than having mongoDB as > backend. > Cool. Said catalog is duplicated globally then? > Marconi (with some work) can also offer different backends at the same time > to provide different performance / durability options to it's users. And > users here are not operators but actual customers/users that are using the > queuing service. > Right, sort of like with AMQP how you can ask for reliable delivery or not? > MongoDB seems to be a good choice as a storage backend as it doesn't need > VRRP during failover which makes it much easier to deploy on top of > OpenStack compute at times when moving a VIP can' t be done via VRRP. MySQL > for example would require a VIP in order to endure a failed master. monoDB > is relatively easier to manage (scale) when you have to migrate whole data > from one cluster to another. > Using Galera, MySQL doesn't require a VIP approach either. > I don't think RDBM is a bad idea but might not be practical. Mysql without > sql interface can be fast; > https://blogs.oracle.com/mysqlinnodb/entry/mysql_5_7_3_deep > The SQL isn't the only problem, and speed isn't the same as scalability (Fast: Ferrari, Scalable: Bullet Train). You also have MVCC. In InnoDB, just inserting, updating, and deleting millions of tiny rows in a concurrent fashion will tie up threads and mutexes, and bog down InnoDB with millions of tiny transactions. The linked blog is dealing entirely with scaling excessive tiny reads, which is important, but not really the problem Marconi faces. There's no point in discussing how to try and make MySQL or any other MVCC database work well as a queue backend. IMO, Look at how QPID and RabbitMQ do durable messaging... that is the model to copy. But that is why the original email was "why isn't Marconi just provisioning brokers?" because those brokers have already implemented this and it seems wasteful to try and do it again. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Hi, Marconi manages its own sharding (doesn't rely on mongoDB's own sharding) in order to have more control on where data is stored. Sharding is done based on project_id + queue_id and stored in a catalog. Since Marconi manages it's own shards, it can use the same logic with any storage. If it was redis, scaling wouldn't be any different than having mongoDB as backend. Marconi (with some work) can also offer different backends at the same time to provide different performance / durability options to it's users. And users here are not operators but actual customers/users that are using the queuing service. MongoDB seems to be a good choice as a storage backend as it doesn't need VRRP during failover which makes it much easier to deploy on top of OpenStack compute at times when moving a VIP can' t be done via VRRP. MySQL for example would require a VIP in order to endure a failed master. monoDB is relatively easier to manage (scale) when you have to migrate whole data from one cluster to another. I don't think RDBM is a bad idea but might not be practical. Mysql without sql interface can be fast; https://blogs.oracle.com/mysqlinnodb/entry/mysql_5_7_3_deep best wishes, Oz On Thu, Mar 20, 2014 at 2:56 PM, Clint Byrum wrote: > Excerpts from Flavio Percoco's message of 2014-03-19 03:01:19 -0700: > > FWIW, I think there's a value on having an sqlalchemy driver. It's > > helpful for newcomers, it integrates perfectly with the gate and I > > don't want to impose other folks what they should or shouldn't use in > > production. Marconi may be providing a data API but it's still > > non-opinionated and it wants to support other drivers - or at least > provide > > a nice way to implement them. Working on sqlalchemy instead of amqp (or > > redis) was decided in the incubation meeting. > > > > But again, It's an optional driver that we're talking about here. As > > of now, our recommended driver is mongodb's and as I already mentioned > > in this email, we'll start working on an amqp's one, which will likely > > become the recommended one. There's also support for redis. > > > > As already mentioned, we have plans to complete the redis driver and > > write an amqp based one and let them both live in the code base. > > Having support for different storage dirvers makes marconi's sharding > > feature more valuable. > > > > > > Just to steer this back to technical development discussions a bit: > > I suggest the sqla driver be removed. It will never be useful as a queue > backend. It will confuse newcomers because they'll see the schema and > think that it will work and then use it, and then they find out that SQL > is just not suitable for queueing about the time that they're taking a > fire extinguisher to their rack. > > "Just use Redis" is pretty interesting as a counter to the concerns > MongoDB's license situation. Redis, AFAIK, does not have many of the > features that make MongoDB attractive for backing a queue. The primary > one that I would cite is sharding. While MongoDB will manage sharding > for you, Redis works more like Memcached when you want to partition[1]. > This is particularly problematic for an operational _storage_ product > as that means if you want to offline a node, you are going to have to > consider what kind of partitioning Marconi has used, and how it will > affect the availability and durability of the data. > > All of this to say, if Marconi is going to be high scale, I agree that > SQL can't be used, and even that MongoDB, on technical abilities alone, > makes some sense. But I think what might be simpler is if Marconi just > shifted focus to make the API more like AMQP, and used AMQP on its > backend. This allows cloud operators to deploy what they're used to for > OpenStack, and would still give users something they're comfortable with > (an HTTP API) to consume it. > > [1] http://redis.io/topics/partitioning > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Excerpts from Flavio Percoco's message of 2014-03-19 03:01:19 -0700: > FWIW, I think there's a value on having an sqlalchemy driver. It's > helpful for newcomers, it integrates perfectly with the gate and I > don't want to impose other folks what they should or shouldn't use in > production. Marconi may be providing a data API but it's still > non-opinionated and it wants to support other drivers - or at least provide > a nice way to implement them. Working on sqlalchemy instead of amqp (or > redis) was decided in the incubation meeting. > > But again, It's an optional driver that we're talking about here. As > of now, our recommended driver is mongodb's and as I already mentioned > in this email, we'll start working on an amqp's one, which will likely > become the recommended one. There's also support for redis. > > As already mentioned, we have plans to complete the redis driver and > write an amqp based one and let them both live in the code base. > Having support for different storage dirvers makes marconi's sharding > feature more valuable. > > Just to steer this back to technical development discussions a bit: I suggest the sqla driver be removed. It will never be useful as a queue backend. It will confuse newcomers because they'll see the schema and think that it will work and then use it, and then they find out that SQL is just not suitable for queueing about the time that they're taking a fire extinguisher to their rack. "Just use Redis" is pretty interesting as a counter to the concerns MongoDB's license situation. Redis, AFAIK, does not have many of the features that make MongoDB attractive for backing a queue. The primary one that I would cite is sharding. While MongoDB will manage sharding for you, Redis works more like Memcached when you want to partition[1]. This is particularly problematic for an operational _storage_ product as that means if you want to offline a node, you are going to have to consider what kind of partitioning Marconi has used, and how it will affect the availability and durability of the data. All of this to say, if Marconi is going to be high scale, I agree that SQL can't be used, and even that MongoDB, on technical abilities alone, makes some sense. But I think what might be simpler is if Marconi just shifted focus to make the API more like AMQP, and used AMQP on its backend. This allows cloud operators to deploy what they're used to for OpenStack, and would still give users something they're comfortable with (an HTTP API) to consume it. [1] http://redis.io/topics/partitioning ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Let me start by saying that I want there to be a constructive discussion around all this. I've done my best to keep my tone as non-snarky as I could while still clearly stating my concerns. I've also spent a few hours reviewing the current code and docs. Hopefully this contribution will be beneficial in helping the discussion along. For what it's worth, I don't have a clear understanding of why the Marconi developer community chose to create a new queue rather than an abstraction layer on top of existing queues. While my lack of understanding there isn't a technical objection to the project, I hope they can address this in the aforementioned FAQ. The reference storage implementation is MongoDB. AFAIK, no integrated projects require an AGPL package to be installed, and from the discussions I've been part of, that would be a show-stopper if Marconi required MongoDB. As I understand it, this is why sqlalchemy support was required when Marconi was incubated. Saying "Marconi also supports SQLA" is disingenuous because it is a second-class citizen, with incomplete API support, is clearly not the recommended storage driver, and is going to be unusuable at scale (I'll come back to this point in a bit). Let me ask this. Which back-end is tested in Marconi's CI? That is the back-end that matters right now. If that's Mongo, I think there's a problem. If it's SQLA, then I think Marconi should declare any features which SQLA doesn't support to be optional extensions, make SQLA the default, and clearly document how to deploy Marconi at scale with a SQLA back-end. "[drivers] storage = mongodb [drivers:storage:mongodb] uri = mongodb://localhost:27017/marconi http://logs.openstack.org/94/81094/2/check/check-tempest-dsvm-marconi/c006285/logs/etc/marconi/marconi.conf.txt.gz On an related note I see that marconi has no gating integration tests. https://review.openstack.org/#/c/81094/2 But then again that is documented in https://wiki.openstack.org/wiki/Marconi/Incubation/Graduation#Legal_requirements We have a devstack-gate job running and will be making it voting this week. Of the non-gating integration test job, I only see one marconi test being run: tempest.api.queuing.test_queues.TestQueues.test_create_queue http://logs.openstack.org/94/81094/2/check/check-tempest-dsvm-marconi/c006285/logs/testr_results.html.gz " I have a separate thread started on the graduation gating requirements w.r.t Tempest. The single test we have on Tempest was a result of the one-liner requirement ' 'Project must have a basic devstack-gate job set up'. The subsequent discussion in openstack qa meeting lead me to believe that the 'basic' job we have is good enough. Please refer to the email 'Graduation Requirements + Scope of Tempest' for more details regarding this. But that does not mean that 'the single tempest test' is all we have to verify the Marconi functionality. We have had a robust test suite (unit & functional tests – with lots of positive & negative test scenarios)for a very long time in Marconi. See http://logs.openstack.org/33/81033/2/check/gate-marconi-python27/35822df/testr_results.html.gz These tests are run against a sqlite backend. The gating tests have been using sqlalchemy driver ever since we have had it. Hope that clarifies ! - Malini Then there's the db-as-a-queue antipattern, and the problems that I have seen result from this in the past... I'm not the only one in the OpenStack community with some experience scaling MySQL databases. Surely others have their own experiences and opinions on whether a database (whether MySQL or Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall over from resource contention. I would hope that those members of the community would chime into this discussion at some point. Perhaps they'll even disagree with me! A quick look at the code around claim (which, it seems, will be the most commonly requested action) shows why this is an antipattern. The MongoDB storage driver for claims requires _four_ queries just to get a message, with a serious race condition (but at least it's documented in the code) if multiple clients are claiming messages in the same queue at the same time. For reference: https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119 The SQLAlchemy storage driver is no better. It's issuing _five_ queries just to claim a message (including a query to purge all expired claims every time a new claim is created). The performance of this transaction under high load is probably going to be bad... https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83 Lastly, it looks like the Marconi storage drivers assume the storage back-end to be infinitely scalable. AFAICT, the mongo storage driver supports mongo's native sharding -- which I'm happy to see -- but the SQLA driver does not appear to support anything equiv
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
On 20/03/14 09:09 +, Mark McLoughlin wrote: On Wed, 2014-03-19 at 12:37 -0700, Devananda van der Veen wrote: Let me start by saying that I want there to be a constructive discussion around all this. I've done my best to keep my tone as non-snarky as I could while still clearly stating my concerns. I've also spent a few hours reviewing the current code and docs. Hopefully this contribution will be beneficial in helping the discussion along. Thanks, I think it does. Very helpful, Thanks! For what it's worth, I don't have a clear understanding of why the Marconi developer community chose to create a new queue rather than an abstraction layer on top of existing queues. While my lack of understanding there isn't a technical objection to the project, I hope they can address this in the aforementioned FAQ. The reference storage implementation is MongoDB. AFAIK, no integrated projects require an AGPL package to be installed, and from the discussions I've been part of, that would be a show-stopper if Marconi required MongoDB. As I understand it, this is why sqlalchemy support was required when Marconi was incubated. Saying "Marconi also supports SQLA" is disingenuous because it is a second-class citizen, with incomplete API support, is clearly not the recommended storage driver, and is going to be unusuable at scale (I'll come back to this point in a bit). Let me ask this. Which back-end is tested in Marconi's CI? That is the back-end that matters right now. If that's Mongo, I think there's a problem. If it's SQLA, then I think Marconi should declare any features which SQLA doesn't support to be optional extensions, make SQLA the default, and clearly document how to deploy Marconi at scale with a SQLA back-end. Then there's the db-as-a-queue antipattern, and the problems that I have seen result from this in the past... I'm not the only one in the OpenStack community with some experience scaling MySQL databases. Surely others have their own experiences and opinions on whether a database (whether MySQL or Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall over from resource contention. I would hope that those members of the community would chime into this discussion at some point. Perhaps they'll even disagree with me! A quick look at the code around claim (which, it seems, will be the most commonly requested action) shows why this is an antipattern. The MongoDB storage driver for claims requires _four_ queries just to get a message, with a serious race condition (but at least it's documented in the code) if multiple clients are claiming messages in the same queue at the same time. For reference: https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119 The SQLAlchemy storage driver is no better. It's issuing _five_ queries just to claim a message (including a query to purge all expired claims every time a new claim is created). The performance of this transaction under high load is probably going to be bad... https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83 Lastly, it looks like the Marconi storage drivers assume the storage back-end to be infinitely scalable. AFAICT, the mongo storage driver supports mongo's native sharding -- which I'm happy to see -- but the SQLA driver does not appear to support anything equivalent for other back-ends, eg. MySQL. This relegates any deployment using the SQLA backend to the scale of "only what one database instance can handle". It's unsuitable for any large-scale deployment. Folks who don't want to use Mongo are likely to use MySQL and will be promptly bitten by Marconi's lack of scalability with this back end. While there is a lot of room to improve the messaging around what/how/why, and I think a FAQ will be very helpful, I don't think that Marconi should graduate this cycle because: (1) support for a non-AGPL-backend is a legal requirement [*] for Marconi's graduation; (2) deploying Marconi with sqla+mysql will result in an incomplete and unscalable service. It's possible that I'm wrong about the scalability of Marconi with sqla + mysql. If anyone feels that this is going to perform blazingly fast on a single mysql db backend, please publish a benchmark and I'll be very happy to be proved wrong. To be meaningful, it must have a high concurrency of clients creating and claiming messages with (num queues) << (num clients) << (num messages), and all clients polling on a reasonably short interval, based on what ever the recommended client-rate-limit is. I'd like the test to be repeated with both Mongo and SQLA back-ends on the same hardware for comparison. My guess (and it's just a guess) is that the Marconi developers almost wish their SQLA driver didn't exist after reading your email because of the confusion it's causing. My understanding is that the SQLA driver is not intended for production usage. Yeah, pretty much the feeling now! :D In
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
On Wed, 2014-03-19 at 12:37 -0700, Devananda van der Veen wrote: > Let me start by saying that I want there to be a constructive discussion > around all this. I've done my best to keep my tone as non-snarky as I could > while still clearly stating my concerns. I've also spent a few hours > reviewing the current code and docs. Hopefully this contribution will be > beneficial in helping the discussion along. Thanks, I think it does. > For what it's worth, I don't have a clear understanding of why the Marconi > developer community chose to create a new queue rather than an abstraction > layer on top of existing queues. While my lack of understanding there isn't > a technical objection to the project, I hope they can address this in the > aforementioned FAQ. > > The reference storage implementation is MongoDB. AFAIK, no integrated > projects require an AGPL package to be installed, and from the discussions > I've been part of, that would be a show-stopper if Marconi required > MongoDB. As I understand it, this is why sqlalchemy support was required > when Marconi was incubated. Saying "Marconi also supports SQLA" is > disingenuous because it is a second-class citizen, with incomplete API > support, is clearly not the recommended storage driver, and is going to be > unusuable at scale (I'll come back to this point in a bit). > > Let me ask this. Which back-end is tested in Marconi's CI? That is the > back-end that matters right now. If that's Mongo, I think there's a > problem. If it's SQLA, then I think Marconi should declare any features > which SQLA doesn't support to be optional extensions, make SQLA the > default, and clearly document how to deploy Marconi at scale with a SQLA > back-end. > > > Then there's the db-as-a-queue antipattern, and the problems that I have > seen result from this in the past... I'm not the only one in the OpenStack > community with some experience scaling MySQL databases. Surely others have > their own experiences and opinions on whether a database (whether MySQL or > Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall > over from resource contention. I would hope that those members of the > community would chime into this discussion at some point. Perhaps they'll > even disagree with me! > > A quick look at the code around claim (which, it seems, will be the most > commonly requested action) shows why this is an antipattern. > > The MongoDB storage driver for claims requires _four_ queries just to get a > message, with a serious race condition (but at least it's documented in the > code) if multiple clients are claiming messages in the same queue at the > same time. For reference: > > https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119 > > The SQLAlchemy storage driver is no better. It's issuing _five_ queries > just to claim a message (including a query to purge all expired claims > every time a new claim is created). The performance of this transaction > under high load is probably going to be bad... > > https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83 > > Lastly, it looks like the Marconi storage drivers assume the storage > back-end to be infinitely scalable. AFAICT, the mongo storage driver > supports mongo's native sharding -- which I'm happy to see -- but the SQLA > driver does not appear to support anything equivalent for other back-ends, > eg. MySQL. This relegates any deployment using the SQLA backend to the > scale of "only what one database instance can handle". It's unsuitable for > any large-scale deployment. Folks who don't want to use Mongo are likely to > use MySQL and will be promptly bitten by Marconi's lack of scalability with > this back end. > > While there is a lot of room to improve the messaging around what/how/why, > and I think a FAQ will be very helpful, I don't think that Marconi should > graduate this cycle because: > (1) support for a non-AGPL-backend is a legal requirement [*] for Marconi's > graduation; > (2) deploying Marconi with sqla+mysql will result in an incomplete and > unscalable service. > > It's possible that I'm wrong about the scalability of Marconi with sqla + > mysql. If anyone feels that this is going to perform blazingly fast on a > single mysql db backend, please publish a benchmark and I'll be very happy > to be proved wrong. To be meaningful, it must have a high concurrency of > clients creating and claiming messages with (num queues) << (num clients) > << (num messages), and all clients polling on a reasonably short interval, > based on what ever the recommended client-rate-limit is. I'd like the test > to be repeated with both Mongo and SQLA back-ends on the same hardware for > comparison. My guess (and it's just a guess) is that the Marconi developers almost wish their SQLA driver didn't exist after reading your email because of the confusion it's causing. My understanding is that the SQLA driver
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
On Thu, 2014-03-20 at 01:28 +, Joshua Harlow wrote: > Proxying from yahoo's open source director (since he wasn't initially > subscribed to this list, afaik he now is) on his behalf. > > From Gil Yehuda (Yahoo’s Open Source director). > > I would urge you to avoid creating a dependency between Openstack code > and any AGPL project, including MongoDB. MongoDB is licensed in a very > strange manner that is prone to creating unintended licensing mistakes > (a lawyer’s dream). Indeed, MongoDB itself presents Apache licensed > drivers – and thus technically, users of those drivers are not > impacted by the AGPL terms. MongoDB Inc. is in the unique position to > license their drivers this way (although they appear to violate the > AGPL license) since MongoDB is not going to sue themselves for their > own violation. However, others in the community create MongoDB drivers > are licensing those drivers under the Apache and MIT licenses – which > does pose a problem. > > Why? The AGPL considers 'Corresponding Source' to be defined as “the > source code for shared libraries and dynamically linked subprograms > that the work is specifically designed to require, such as by intimate > data communication or control flow between those subprograms and other > parts of the work." Database drivers *are* work that is designed to > require by intimate data communication or control flow between those > subprograms and other parts of the work. So anyone using MongoDB with > any other driver now invites an unknown -- that one court case, one > judge, can read the license under its plain meaning and decide that > AGPL terms apply as stated. We have no way to know how far they apply > since this license has not been tested in court yet. > Despite all the FAQs MongoDB puts on their site indicating they don't > really mean to assert the license terms, normally when you provide a > license, you mean those terms. If they did not mean those terms, they > would not use this license. I hope they intended to do something good > (to get contributions back without impacting applications using their > database) but, even good intentions have unintended consequences. > Companies with deep enough pockets to be lawsuit targets, and > companies who want to be good open source citizens face the problem > that using MongoDB anywhere invites the future risk of legal > catastrophe. A simple development change in an open source project can > change the economics drastically. This is simply unsafe and unwise. > > OpenStack's ecosystem is fueled by the interests of many commercial > ventures who wish to cooperate in the open source manner, but then > leverage commercial opportunities they hope to create. I suggest that > using MongoDB anywhere in this project will result in a loss of > opportunity -- real or perceived, that would outweigh the benefits > MongoDB itself provides. > > tl;dr version: If you want to use MongoDB in your company, that's your > call. Please don't turn anyone who uses OpenStack components into a > unsuspecting MongoDB users. Instead, decouple the database from the > project. It's not worth the legal risk, nor the impact on the > "Apache-ness" of this project. Thanks for that, Josh and Gil. Rather than cross-posting, I think this MongoDB/AGPLv3 discussion should continue on the legal-discuss mailing list: http://lists.openstack.org/pipermail/legal-discuss/2014-March/thread.html#174 Bear in mind that we (OpenStack, as a project and community) need to judge whether this is a credible concern or not. If some users said they were only willing to deploy Apache licensed code in their organization, we would dismiss that notion pretty quickly. Is this AGPLv3 concern sufficiently credible that OpenStack needs to take it into account when making important decisions? That's what I'm hoping to get to in the legal-discuss thread. Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Proxying from yahoo's open source director (since he wasn't initially subscribed to this list, afaik he now is) on his behalf. >From Gil Yehuda (Yahoo’s Open Source director). I would urge you to avoid creating a dependency between Openstack code and any AGPL project, including MongoDB. MongoDB is licensed in a very strange manner that is prone to creating unintended licensing mistakes (a lawyer’s dream). Indeed, MongoDB itself presents Apache licensed drivers – and thus technically, users of those drivers are not impacted by the AGPL terms. MongoDB Inc. is in the unique position to license their drivers this way (although they appear to violate the AGPL license) since MongoDB is not going to sue themselves for their own violation. However, others in the community create MongoDB drivers are licensing those drivers under the Apache and MIT licenses – which does pose a problem. Why? The AGPL considers 'Corresponding Source' to be defined as “the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work." Database drivers *are* work that is designed to require by intimate data communication or control flow between those subprograms and other parts of the work. So anyone using MongoDB with any other driver now invites an unknown -- that one court case, one judge, can read the license under its plain meaning and decide that AGPL terms apply as stated. We have no way to know how far they apply since this license has not been tested in court yet. Despite all the FAQs MongoDB puts on their site indicating they don't really mean to assert the license terms, normally when you provide a license, you mean those terms. If they did not mean those terms, they would not use this license. I hope they intended to do something good (to get contributions back without impacting applications using their database) but, even good intentions have unintended consequences. Companies with deep enough pockets to be lawsuit targets, and companies who want to be good open source citizens face the problem that using MongoDB anywhere invites the future risk of legal catastrophe. A simple development change in an open source project can change the economics drastically. This is simply unsafe and unwise. OpenStack's ecosystem is fueled by the interests of many commercial ventures who wish to cooperate in the open source manner, but then leverage commercial opportunities they hope to create. I suggest that using MongoDB anywhere in this project will result in a loss of opportunity -- real or perceived, that would outweigh the benefits MongoDB itself provides. tl;dr version: If you want to use MongoDB in your company, that's your call. Please don't turn anyone who uses OpenStack components into a unsuspecting MongoDB users. Instead, decouple the database from the project. It's not worth the legal risk, nor the impact on the "Apache-ness" of this project. Gil Yehuda Sr. Director Of Open Source, Open Standards, Yahoo! Inc. gyeh...@yahoo-inc.com From: , Kevin M mailto:kevin@pnnl.gov>> Reply-To: "OpenStack Development Mailing List (not for usage questions)" mailto:openstack-dev@lists.openstack.org>> Date: Wednesday, March 19, 2014 at 2:38 PM To: "OpenStack Development Mailing List (not for usage questions)" mailto:openstack-dev@lists.openstack.org>> Cc: "legal-disc...@lists.openstack.org<mailto:legal-disc...@lists.openstack.org>" mailto:legal-disc...@lists.openstack.org>> Subject: Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API? Its my understanding that the only case the A in the AGPL would kick in is if the cloud provider made a change to MongoDB and exposed the MongoDB instance to users. Then the users would have to be able to download the changed code. Since Marconi's in front, the user is Marconi, and wouldn't ever want to download the source. As far as I can tell, in this use case, the AGPL'ed MongoDB is not really any different then the GPL'ed MySQL in footprint here. MySQL is acceptable, so why isn't MongoDB? It would be good to get legal's official take on this. It would be a shame to make major architectural decisions based on license assumptions that turn out not to be true. I'm cc-ing them. Thanks, Kevin From: Chris Friesen [chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>] Sent: Wednesday, March 19, 2014 2:24 PM To: openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org> Subject: Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API? On 03/19/2014 02:24 PM, Fox, Kevin M wrote: Can s
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
On Wed, Mar 19, 2014 at 12:37 PM, Devananda van der Veen < devananda@gmail.com> wrote: > Let me start by saying that I want there to be a constructive discussion > around all this. I've done my best to keep my tone as non-snarky as I could > while still clearly stating my concerns. I've also spent a few hours > reviewing the current code and docs. Hopefully this contribution will be > beneficial in helping the discussion along. > > For what it's worth, I don't have a clear understanding of why the Marconi > developer community chose to create a new queue rather than an abstraction > layer on top of existing queues. While my lack of understanding there isn't > a technical objection to the project, I hope they can address this in the > aforementioned FAQ. > > The reference storage implementation is MongoDB. AFAIK, no integrated > projects require an AGPL package to be installed, and from the discussions > I've been part of, that would be a show-stopper if Marconi required > MongoDB. As I understand it, this is why sqlalchemy support was required > when Marconi was incubated. Saying "Marconi also supports SQLA" is > disingenuous because it is a second-class citizen, with incomplete API > support, is clearly not the recommended storage driver, and is going to be > unusuable at scale (I'll come back to this point in a bit). > > Let me ask this. Which back-end is tested in Marconi's CI? That is the > back-end that matters right now. If that's Mongo, I think there's a > problem. If it's SQLA, then I think Marconi should declare any features > which SQLA doesn't support to be optional extensions, make SQLA the > default, and clearly document how to deploy Marconi at scale with a SQLA > back-end. > [drivers] storage = mongodb [drivers:storage:mongodb] uri = mongodb://localhost:27017/marconi http://logs.openstack.org/94/81094/2/check/check-tempest-dsvm-marconi/c006285/logs/etc/marconi/marconi.conf.txt.gz On an related note I see that marconi has no gating integration tests. https://review.openstack.org/#/c/81094/2 But then again that is documented in https://wiki.openstack.org/wiki/Marconi/Incubation/Graduation#Legal_requirements We have a devstack-gate job running and will be making it voting this week. Of the non-gating integration test job, I only see one marconi test being run: tempest.api.queuing.test_queues.TestQueues.test_create_queue http://logs.openstack.org/94/81094/2/check/check-tempest-dsvm-marconi/c006285/logs/testr_results.html.gz > > Then there's the db-as-a-queue antipattern, and the problems that I have > seen result from this in the past... I'm not the only one in the OpenStack > community with some experience scaling MySQL databases. Surely others have > their own experiences and opinions on whether a database (whether MySQL or > Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall > over from resource contention. I would hope that those members of the > community would chime into this discussion at some point. Perhaps they'll > even disagree with me! > > A quick look at the code around claim (which, it seems, will be the most > commonly requested action) shows why this is an antipattern. > > The MongoDB storage driver for claims requires _four_ queries just to get > a message, with a serious race condition (but at least it's documented in > the code) if multiple clients are claiming messages in the same queue at > the same time. For reference: > > https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119 > > The SQLAlchemy storage driver is no better. It's issuing _five_ queries > just to claim a message (including a query to purge all expired claims > every time a new claim is created). The performance of this transaction > under high load is probably going to be bad... > > https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83 > > Lastly, it looks like the Marconi storage drivers assume the storage > back-end to be infinitely scalable. AFAICT, the mongo storage driver > supports mongo's native sharding -- which I'm happy to see -- but the SQLA > driver does not appear to support anything equivalent for other back-ends, > eg. MySQL. This relegates any deployment using the SQLA backend to the > scale of "only what one database instance can handle". It's unsuitable for > any large-scale deployment. Folks who don't want to use Mongo are likely to > use MySQL and will be promptly bitten by Marconi's lack of scalability with > this back end. > > While there is a lot of room to improve the messaging around what/how/why, > and I think a FAQ will be very helpful, I don't think that Marconi should > graduate this cycle because: > (1) support for a non-AGPL-backend is a legal requirement [*] for > Marconi's graduation; > (2) deploying Marconi with sqla+mysql will result in an incomplete and > unscalable service. > ++ > > It's possible that I'm wrong about the scalability of Marconi
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
2014-03-19 22:38 GMT+01:00 Fox, Kevin M : > Its my understanding that the only case the A in the AGPL would kick in is > if the cloud provider made a change to MongoDB and exposed the MongoDB > instance to users. Then the users would have to be able to download the > changed code. Since Marconi's in front, the user is Marconi, and wouldn't > ever want to download the source. As far as I can tell, in this use case, > the AGPL'ed MongoDB is not really any different then the GPL'ed MySQL in > footprint here. MySQL is acceptable, so why isn't MongoDB? > > MongoDB is AGPL but MongoDB drivers are Apache licenced [1] GPL contamination should not happen if we consider integrating only drivers in the code. [1] http://www.mongodb.org/about/licensing/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
On 03/19/2014 02:24 PM, Fox, Kevin M wrote: Can someone please give more detail into why MongoDB being AGPL is a problem? The drivers that Marconi uses are Apache2 licensed, MongoDB is separated by the network stack and MongoDB is not exposed to the Marconi users so I don't think the 'A' part of the GPL really kicks in at all since the MongoDB "user" is the cloud provider, not the cloud end user? Even if MongoDB was exposed to end-users, would that be a problem? Obviously the source to MongoDB would need to be made available (presumably it already is) but does the AGPL licence "contaminate" the Marconi stuff? I would have thought that would fall under "mere aggregation". Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Its my understanding that the only case the A in the AGPL would kick in is if the cloud provider made a change to MongoDB and exposed the MongoDB instance to users. Then the users would have to be able to download the changed code. Since Marconi's in front, the user is Marconi, and wouldn't ever want to download the source. As far as I can tell, in this use case, the AGPL'ed MongoDB is not really any different then the GPL'ed MySQL in footprint here. MySQL is acceptable, so why isn't MongoDB? It would be good to get legal's official take on this. It would be a shame to make major architectural decisions based on license assumptions that turn out not to be true. I'm cc-ing them. Thanks, Kevin From: Chris Friesen [chris.frie...@windriver.com] Sent: Wednesday, March 19, 2014 2:24 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API? On 03/19/2014 02:24 PM, Fox, Kevin M wrote: > Can someone please give more detail into why MongoDB being AGPL is a > problem? The drivers that Marconi uses are Apache2 licensed, MongoDB is > separated by the network stack and MongoDB is not exposed to the Marconi > users so I don't think the 'A' part of the GPL really kicks in at all > since the MongoDB "user" is the cloud provider, not the cloud end user? Even if MongoDB was exposed to end-users, would that be a problem? Obviously the source to MongoDB would need to be made available (presumably it already is) but does the AGPL licence "contaminate" the Marconi stuff? I would have thought that would fall under "mere aggregation". Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Can someone please give more detail into why MongoDB being AGPL is a problem? The drivers that Marconi uses are Apache2 licensed, MongoDB is separated by the network stack and MongoDB is not exposed to the Marconi users so I don't think the 'A' part of the GPL really kicks in at all since the MongoDB "user" is the cloud provider, not the cloud end user? Thanks, Kevin From: Devananda van der Veen [devananda@gmail.com] Sent: Wednesday, March 19, 2014 12:37 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API? Let me start by saying that I want there to be a constructive discussion around all this. I've done my best to keep my tone as non-snarky as I could while still clearly stating my concerns. I've also spent a few hours reviewing the current code and docs. Hopefully this contribution will be beneficial in helping the discussion along. For what it's worth, I don't have a clear understanding of why the Marconi developer community chose to create a new queue rather than an abstraction layer on top of existing queues. While my lack of understanding there isn't a technical objection to the project, I hope they can address this in the aforementioned FAQ. The reference storage implementation is MongoDB. AFAIK, no integrated projects require an AGPL package to be installed, and from the discussions I've been part of, that would be a show-stopper if Marconi required MongoDB. As I understand it, this is why sqlalchemy support was required when Marconi was incubated. Saying "Marconi also supports SQLA" is disingenuous because it is a second-class citizen, with incomplete API support, is clearly not the recommended storage driver, and is going to be unusuable at scale (I'll come back to this point in a bit). Let me ask this. Which back-end is tested in Marconi's CI? That is the back-end that matters right now. If that's Mongo, I think there's a problem. If it's SQLA, then I think Marconi should declare any features which SQLA doesn't support to be optional extensions, make SQLA the default, and clearly document how to deploy Marconi at scale with a SQLA back-end. Then there's the db-as-a-queue antipattern, and the problems that I have seen result from this in the past... I'm not the only one in the OpenStack community with some experience scaling MySQL databases. Surely others have their own experiences and opinions on whether a database (whether MySQL or Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall over from resource contention. I would hope that those members of the community would chime into this discussion at some point. Perhaps they'll even disagree with me! A quick look at the code around claim (which, it seems, will be the most commonly requested action) shows why this is an antipattern. The MongoDB storage driver for claims requires _four_ queries just to get a message, with a serious race condition (but at least it's documented in the code) if multiple clients are claiming messages in the same queue at the same time. For reference: https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119 The SQLAlchemy storage driver is no better. It's issuing _five_ queries just to claim a message (including a query to purge all expired claims every time a new claim is created). The performance of this transaction under high load is probably going to be bad... https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83 Lastly, it looks like the Marconi storage drivers assume the storage back-end to be infinitely scalable. AFAICT, the mongo storage driver supports mongo's native sharding -- which I'm happy to see -- but the SQLA driver does not appear to support anything equivalent for other back-ends, eg. MySQL. This relegates any deployment using the SQLA backend to the scale of "only what one database instance can handle". It's unsuitable for any large-scale deployment. Folks who don't want to use Mongo are likely to use MySQL and will be promptly bitten by Marconi's lack of scalability with this back end. While there is a lot of room to improve the messaging around what/how/why, and I think a FAQ will be very helpful, I don't think that Marconi should graduate this cycle because: (1) support for a non-AGPL-backend is a legal requirement [*] for Marconi's graduation; (2) deploying Marconi with sqla+mysql will result in an incomplete and unscalable service. It's possible that I'm wrong about the scalability of Marconi with sqla + mysql. If anyone feels that this is going to perform blazingly fast on a single mysql db backend
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Let me start by saying that I want there to be a constructive discussion around all this. I've done my best to keep my tone as non-snarky as I could while still clearly stating my concerns. I've also spent a few hours reviewing the current code and docs. Hopefully this contribution will be beneficial in helping the discussion along. For what it's worth, I don't have a clear understanding of why the Marconi developer community chose to create a new queue rather than an abstraction layer on top of existing queues. While my lack of understanding there isn't a technical objection to the project, I hope they can address this in the aforementioned FAQ. The reference storage implementation is MongoDB. AFAIK, no integrated projects require an AGPL package to be installed, and from the discussions I've been part of, that would be a show-stopper if Marconi required MongoDB. As I understand it, this is why sqlalchemy support was required when Marconi was incubated. Saying "Marconi also supports SQLA" is disingenuous because it is a second-class citizen, with incomplete API support, is clearly not the recommended storage driver, and is going to be unusuable at scale (I'll come back to this point in a bit). Let me ask this. Which back-end is tested in Marconi's CI? That is the back-end that matters right now. If that's Mongo, I think there's a problem. If it's SQLA, then I think Marconi should declare any features which SQLA doesn't support to be optional extensions, make SQLA the default, and clearly document how to deploy Marconi at scale with a SQLA back-end. Then there's the db-as-a-queue antipattern, and the problems that I have seen result from this in the past... I'm not the only one in the OpenStack community with some experience scaling MySQL databases. Surely others have their own experiences and opinions on whether a database (whether MySQL or Mongo or Postgres or ...) can be used in such a way _at_scale_ and not fall over from resource contention. I would hope that those members of the community would chime into this discussion at some point. Perhaps they'll even disagree with me! A quick look at the code around claim (which, it seems, will be the most commonly requested action) shows why this is an antipattern. The MongoDB storage driver for claims requires _four_ queries just to get a message, with a serious race condition (but at least it's documented in the code) if multiple clients are claiming messages in the same queue at the same time. For reference: https://github.com/openstack/marconi/blob/master/marconi/queues/storage/mongodb/claims.py#L119 The SQLAlchemy storage driver is no better. It's issuing _five_ queries just to claim a message (including a query to purge all expired claims every time a new claim is created). The performance of this transaction under high load is probably going to be bad... https://github.com/openstack/marconi/blob/master/marconi/queues/storage/sqlalchemy/claims.py#L83 Lastly, it looks like the Marconi storage drivers assume the storage back-end to be infinitely scalable. AFAICT, the mongo storage driver supports mongo's native sharding -- which I'm happy to see -- but the SQLA driver does not appear to support anything equivalent for other back-ends, eg. MySQL. This relegates any deployment using the SQLA backend to the scale of "only what one database instance can handle". It's unsuitable for any large-scale deployment. Folks who don't want to use Mongo are likely to use MySQL and will be promptly bitten by Marconi's lack of scalability with this back end. While there is a lot of room to improve the messaging around what/how/why, and I think a FAQ will be very helpful, I don't think that Marconi should graduate this cycle because: (1) support for a non-AGPL-backend is a legal requirement [*] for Marconi's graduation; (2) deploying Marconi with sqla+mysql will result in an incomplete and unscalable service. It's possible that I'm wrong about the scalability of Marconi with sqla + mysql. If anyone feels that this is going to perform blazingly fast on a single mysql db backend, please publish a benchmark and I'll be very happy to be proved wrong. To be meaningful, it must have a high concurrency of clients creating and claiming messages with (num queues) << (num clients) << (num messages), and all clients polling on a reasonably short interval, based on what ever the recommended client-rate-limit is. I'd like the test to be repeated with both Mongo and SQLA back-ends on the same hardware for comparison. Regards, Devananda [*] https://wiki.openstack.org/wiki/Marconi/Incubation/Graduation#Legal_requirements ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
On 20 March 2014 01:06, Mark McLoughlin wrote: > I think we need a slight reset on this discussion. The way this email > was phrased gives a strong sense of "Marconi is a dumb idea, it's going > to take a lot to persuade me otherwise". Thanks Mark, thats a great point to make. I don't think Marconi is dumb, but I sure don't understand why . Thank you! -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
On 03/19/2014 07:49 AM, Thierry Carrez wrote: > Flavio Percoco wrote: >> On 19/03/14 10:17 +1300, Robert Collins wrote: >>> My desires around Marconi are: - to make sure the queue we have >>> is suitable for use by OpenStack itself: we have a very strong >>> culture around consolidating technology choices, and it would >>> be extremely odd to have Marconi be something that isn't >>> suitable to replace rabbitmq etc as the queue abstraction in >>> the fullness of time. >> >> Although this could be done in the future, I've heard from many >> folks in the community that replacing OpenStack's rabbitmq / qpid >> / etc layer with Marconi is a no-go. I don't recall the exact >> reasons now but I think I can grab them from logs or something >> (Unless those folks are reading this email and want to chime in). >> FWIW, I'd be more than happy to *experiment* with this in the >> future. Marconi is definitely not ready as-is. > > That's the root of this thread. Marconi is not really designed to > cover Robert's use case, which would be to be consumed internally > by OpenStack as a message queue. > > I classify Marconi as an "application building block" (IaaS+), a > convenient, SQS-like way for cloud application builders to pass > data around without having to spin up their own message queue in a > VM. I think that's a relevant use case, as long as performance is > not an order of magnitude worse than the "spin up your own in a VM" > alternative. Personally I don't consider "serving the internal > needs of OpenStack" as a feature blocker. It would be nice if it > could, but the IaaS+ use case is IMHO compelling enough. This is my view, as well. I never considered replacing OpenStack's current use of messaging within the scope of Marconi. It's possible we could have yet another project that is a queue provisioning project in the style of Trove. I'm not sure that actually makes sense (an application template you can deploy may suffice here). In any case, I view OpenStack's use case and anyone wanting to use qpid/rabbit/whatever directly separate and out of scope of Marconi. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
On Wed, 2014-03-19 at 10:17 +1300, Robert Collins wrote: > So this came up briefly at the tripleo sprint, and since I can't seem > to find a /why/ document > (https://wiki.openstack.org/wiki/Marconi/Incubation#Raised_Questions_.2B_Answers > and https://wiki.openstack.org/wiki/Marconi#Design don't supply this) I think we need a slight reset on this discussion. The way this email was phrased gives a strong sense of "Marconi is a dumb idea, it's going to take a lot to persuade me otherwise". That's not a great way to start a conversation, but it's easy to understand - a TC member sees a project on the cusp of graduating and, when they finally get a chance to look closely at it, a number of things don't make much sense. "Wait! Stop! WTF!" is a natural reaction if you think a bad decision is about to be made. We've all got to understand how pressurized a situation these graduation and incubation discussions are. Projects put an immense amount of work into proving themselves worthy of being an integrated project, they get fairly short bursts of interaction with the TC, TC members aren't necessarily able to do a huge amount of due diligence in advance and yet TC members are really, really keen to avoid either undermining a healthy project around some cool new technology or undermining OpenStack by including an unhealthy project or sub-par technology. And then there's the time pressure where a decision has to be made by a certain date and if that decision is "not this time", the six months delay until the next chance for a positive decision can be really draining on motivation and momentum when everybody had been so focused on getting a positive decision this time around. We really need cool heads here and, above all, to try our best to assume good faith, intentions and ability on both sides. Some of the questions Robert asked are common questions and I know they were discussed during the incubation review. However, the questions persist and it's really important that TC members (and the community at large) feel they can stand behind the answers to those questions. If I'm chatting to someone and they ask me "why does OpenStack need to implement its own messaging broker?", I need to have a good answer. How about we do our best to put the implications for the graduation decision aside for a bit and focus on collaboratively pulling together a FAQ that everyone can buy into? The "raised questions and answers" section of the incubation review linked above is a good start, but I think we can take this email as feedback that those questions and answers need much improvement. This could be a good pattern for all new projects - if the TC and the new project can't work together to draft a solid FAQ like this, then it's not a good sign for the project. See below for my attempt to summarize the questions and how we might go about answering them. Is this a reasonable start? Mark. Why isn't Marconi simply an API for provisioning and managing AMQP, Kestrel, ZeroMQ, etc. brokers and queues? Why is a new broker implementation needed? => I'm not sure I can summarize the answer here - the need for a HTTP data plane API, the need for multi-tenancy, etc.? Maybe a table listing the required features and whether they're provided by these existing solutions. Maybe there's also an element of "we think we can do a better job". If so, the point probably worth addressing is "OpenStack shouldn't attempt to write a new database, or a new hypervisor, or a new SDN controller, or a new block storage implementation ... so why should we write a implement a new message broker? If this is just a bad analogy, explain why? Implementing a message queue using an SQL DB seems like a bad idea, why is Marconi doing that? => Perhaps explain why MongoDB is a good storage technology for this use case and the SQLalchemy driver is just a toy. Marconi's default driver depends on MongoDB which is licensed under the AGPL. This license is currently a no-go for some organizations, so what plans does Marconi have to implement another production-ready storage driver that supports all API features? => Discuss the Redis driver plans? Is Marconi designed to be suitable for use by OpenStack itself? => Discuss that it's not currently in scope and why not. In what way does the OpenStack use case differ from the applications Marconi's current API focused on? How should a client subscribe to a queue? => Discuss that it's not by GET /messages but instead POST /claims?limit=N ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Flavio Percoco wrote: > On 19/03/14 10:17 +1300, Robert Collins wrote: >> My desires around Marconi are: >> - to make sure the queue we have is suitable for use by OpenStack >> itself: we have a very strong culture around consolidating technology >> choices, and it would be extremely odd to have Marconi be something >> that isn't suitable to replace rabbitmq etc as the queue abstraction >> in the fullness of time. > > Although this could be done in the future, I've heard from many folks > in the community that replacing OpenStack's rabbitmq / qpid / etc layer > with Marconi is a no-go. I don't recall the exact reasons now but I > think I can grab them from logs or something (Unless those folks are > reading this email and want to chime in). FWIW, I'd be more than happy > to *experiment* with this in the future. Marconi is definitely not ready > as-is. That's the root of this thread. Marconi is not really designed to cover Robert's use case, which would be to be consumed internally by OpenStack as a message queue. I classify Marconi as an "application building block" (IaaS+), a convenient, SQS-like way for cloud application builders to pass data around without having to spin up their own message queue in a VM. I think that's a relevant use case, as long as performance is not an order of magnitude worse than the "spin up your own in a VM" alternative. Personally I don't consider "serving the internal needs of OpenStack" as a feature blocker. It would be nice if it could, but the IaaS+ use case is IMHO compelling enough. -- Thierry Carrez (ttx) signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Kurt already gave a quite detailed explanation of why Marconi, what can you do with it and where it's standing. I'll reply in-line: On 19/03/14 10:17 +1300, Robert Collins wrote: So this came up briefly at the tripleo sprint, and since I can't seem to find a /why/ document (https://wiki.openstack.org/wiki/Marconi/Incubation#Raised_Questions_.2B_Answers and https://wiki.openstack.org/wiki/Marconi#Design don't supply this) we decided at the TC meeting that I should raise it here. Firstly, let me check my facts :) - Marconi is backed by a modular 'storage' layer which places some conceptual design constraints on the storage backends that are possible (e.g. I rather expect a 0mq implementation to be very tricky, at best (vs the RPC style front end https://wiki.openstack.org/wiki/Marconi/specs/zmq/api/v1 )), and has a hybrid control/data plane API implementation where one can call into it to make queues etc, and to consume them. Those docs refers to a transport driver not a storage driver. In Marconi, it's possible to have different protocols on top of the API. The current one is based on HTTP but there'll likely be others in the future. We've changed some things in the API to support amqp based storage drivers. We had a session during the HKG summit about this and since then, we've always kept amqp drivers in mind when doing changes on the API. I'm not saying it's perfect, though. The API for the queues is very odd from a queueing perspective - https://wiki.openstack.org/wiki/Marconi/specs/api/v1#Get_a_Specific_Message - you don't subscribe to the queue, you enumerate and ask for a single message. The current way to subscribe to queues is by using polling. Subscribing is not just tight to the "API" but also the transport itself. As mentioned above, we currently just have support for HTTP. Also, enumerating is not necessary. For instance, claiming with limit 1 will consume one message. (Side note: At the incubation meeting, it was recommended to not put efforts on writing new transport but to stabilize the API and work an a storage backend with a license != AGPL) And the implementations in tree are mongodb (which is at best contentious, due to the AGPL and many folks reasonable concerns about it), and mysq. Just to avoid misleading folks that are not familiar with marconi, I just want to point out that the driver is based on sqlalchemy. My desires around Marconi are: - to make sure the queue we have is suitable for use by OpenStack itself: we have a very strong culture around consolidating technology choices, and it would be extremely odd to have Marconi be something that isn't suitable to replace rabbitmq etc as the queue abstraction in the fullness of time. Although this could be done in the future, I've heard from many folks in the community that replacing OpenStack's rabbitmq / qpid / etc layer with Marconi is a no-go. I don't recall the exact reasons now but I think I can grab them from logs or something (Unless those folks are reading this email and want to chime in). FWIW, I'd be more than happy to *experiment* with this in the future. Marconi is definitely not ready as-is. - to make sure that deployers with scale / performance needs can have that met by Marconi - to make my life easy as a deployer ;) This has been part of our daily reviews, work and designs. I'm sure there's room for improvement, though. So my questions are: - why isn't the API a queue friendly API (e.g. like Define *queue friendly* https://github.com/twitter/kestrel - kestrel which uses the memcache API, puts put into the queue, gets get from the queue). The current I don't know kestrel but, how is this different from what Marconi does? API looks like pretty much the worst case scenario there - CRUD rather than submit/retrieve with blocking requests (e.g. longpoll vs poll). I agree there are some limitations from using HTTP for this job, hence the support for different transports. Just saying *the API is CRUD* is again misleading and it doesn't highlight the value of having an HTTP based transport. It's just wrong to think about marconi as *just another queuing system* instead of considering the use-cases it's trying to solve. There's a rough support for websocket in an external project but: 1. It's not offical... yet. 2. It was written as a proof of concept for the transport layer. 3. It likely needs to be updated. https://github.com/FlaPer87/marconi-websocket - wouldn't it be better to expose other existing implementations of HTTP message queues like nova does with hypervisors, rather than creating our own one? E.g. HTTPSQS, RestMQ, Kestrel, queues.io. We've discussed to have support for API extensions in order to allow some deployments to expose features from a queuing technology that we don't necessary consider part of the core API. - or even do what Trove does and expose the actual implementation directly? - whats the plan to fix the API? Fix the API? For starters, moving away fr
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
Kurt Griffiths, Thanks for detailed explanation. Is there a comparison between Marconi and existing message brokers anywhere that you can point me out? I can see how your examples can be implemented using other brokers like RabbitMQ. So why there is a need another broker? And what is wrong with currently deployed RabbitMQ that most of OpenStack services are using (typically via oslo.messaging RPC)? On Wed, Mar 19, 2014 at 4:00 AM, Kurt Griffiths < kurt.griffi...@rackspace.com> wrote: > I think we can agree that a data-plane API only makes sense if it is > useful to a large number of web and mobile developers deploying their apps > on OpenStack. Also, it only makes sense if it is cost-effective and > scalable for operators who wish to deploy such a service. > > Marconi was born of practical experience and direct interaction with > prospective users. When Marconi was kicked off a few summits ago, the > community was looking for a multi-tenant messaging service to round out > the OpenStack portfolio. Users were asking operators for something easier > to work with and more web-friendly than established options such as AMQP. > > To that end, we started drafting an HTTP-based API specification that > would afford several different messaging patterns, in order to support the > use cases that users were bringing to the table. We did this completely in > the open, and received lots of input from prospective users familiar with > a variety of message broker solutions, including more "cloudy" ones like > SQS and Iron.io. > > The resulting design was a hybrid that supported what you might call > "claim-based" semantics ala SQS and feed-based semantics ala RSS. > Application developers liked the idea of being able to use one or the > other, or combine them to come up with new patterns according to their > needs. For example: > > 1. A video app can use Marconi to feed a worker pool of transcoders. When > a video is uploaded, it is stored in Swift and a job message is posted to > Marconi. Then, a worker claims the job and begins work on it. If the > worker crashes, the claim expires and the message becomes available to be > claimed by a different worker. Once the worker is finished with the job, > it deletes the message so that another worker will not process it, and > claims another message. Note that workers never "list" messages in this > use case; those endpoints in the API are simply ignored. > > 2. A backup service can use Marconi to communicate with hundreds of > thousands of backup agents running on customers' machines. Since Marconi > queues are extremely light-weight, the service can create a different > queue for each agent, and additional queues to broadcast messages to all > the agents associated with a single customer. In this last scenario, the > service would post a message to a single queue and the agents would simply > list the messages on that queue, and everyone would get the same message. > This messaging pattern is emergent, and requires no special routing setup > in advance from one queue to another. > > 3. A metering service for an Internet application can use Marconi to > aggregate usage data from a number of web heads. Each web head collects > several minutes of data, then posts it to Marconi. A worker periodically > claims the messages off the queue, performs the final aggregation and > processing, and stores the results in a DB. So far, this messaging pattern > is very much like example #1, above. However, since Marconi's API also > affords the observer pattern via listing semantics, the metering service > could run an auditor that logs the messages as they go through the queue > in order to provide extremely valuable data for diagnosing problems in the > aggregated data. > > Users are excited about what Marconi offers today, and we are continuing > to evolve the API based on their feedback. > > Of course, app developers aren't the only audience Marconi needs to serve. > Operators want something that is cost-effective, scales, and is > customizable for the unique needs of their target market. > > While Marconi has plenty of room to improve (who doesn't?), here is where > the project currently stands in these areas: > > 1. Customizable. Marconi transport and storage drivers can be swapped out, > and messages can be manipulated in-flight with custom filter drivers. > Currently we have MongoDB and SQLAlchemy drivers, and are exploring Redis > and AMQP brokers. Now, the v1.0 API does impose some constraints on the > backend in order to support the use cases mentioned earlier. For example, > an AMQP backend would only be able to support a subset of the current API. > Operators occasionally ask about AMQP broker support, in particular, and > we are exploring ways to evolve the API in order to support that. > > 2. Scalable. Operators can use Marconi's HTTP transport to leverage their > existing infrastructure and expertise in scaling out web heads. When it > comes to the backend, for small deplo
Re: [openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
I think we can agree that a data-plane API only makes sense if it is useful to a large number of web and mobile developers deploying their apps on OpenStack. Also, it only makes sense if it is cost-effective and scalable for operators who wish to deploy such a service. Marconi was born of practical experience and direct interaction with prospective users. When Marconi was kicked off a few summits ago, the community was looking for a multi-tenant messaging service to round out the OpenStack portfolio. Users were asking operators for something easier to work with and more web-friendly than established options such as AMQP. To that end, we started drafting an HTTP-based API specification that would afford several different messaging patterns, in order to support the use cases that users were bringing to the table. We did this completely in the open, and received lots of input from prospective users familiar with a variety of message broker solutions, including more “cloudy” ones like SQS and Iron.io. The resulting design was a hybrid that supported what you might call “claim-based” semantics ala SQS and feed-based semantics ala RSS. Application developers liked the idea of being able to use one or the other, or combine them to come up with new patterns according to their needs. For example: 1. A video app can use Marconi to feed a worker pool of transcoders. When a video is uploaded, it is stored in Swift and a job message is posted to Marconi. Then, a worker claims the job and begins work on it. If the worker crashes, the claim expires and the message becomes available to be claimed by a different worker. Once the worker is finished with the job, it deletes the message so that another worker will not process it, and claims another message. Note that workers never “list” messages in this use case; those endpoints in the API are simply ignored. 2. A backup service can use Marconi to communicate with hundreds of thousands of backup agents running on customers' machines. Since Marconi queues are extremely light-weight, the service can create a different queue for each agent, and additional queues to broadcast messages to all the agents associated with a single customer. In this last scenario, the service would post a message to a single queue and the agents would simply list the messages on that queue, and everyone would get the same message. This messaging pattern is emergent, and requires no special routing setup in advance from one queue to another. 3. A metering service for an Internet application can use Marconi to aggregate usage data from a number of web heads. Each web head collects several minutes of data, then posts it to Marconi. A worker periodically claims the messages off the queue, performs the final aggregation and processing, and stores the results in a DB. So far, this messaging pattern is very much like example #1, above. However, since Marconi’s API also affords the observer pattern via listing semantics, the metering service could run an auditor that logs the messages as they go through the queue in order to provide extremely valuable data for diagnosing problems in the aggregated data. Users are excited about what Marconi offers today, and we are continuing to evolve the API based on their feedback. Of course, app developers aren’t the only audience Marconi needs to serve. Operators want something that is cost-effective, scales, and is customizable for the unique needs of their target market. While Marconi has plenty of room to improve (who doesn’t?), here is where the project currently stands in these areas: 1. Customizable. Marconi transport and storage drivers can be swapped out, and messages can be manipulated in-flight with custom filter drivers. Currently we have MongoDB and SQLAlchemy drivers, and are exploring Redis and AMQP brokers. Now, the v1.0 API does impose some constraints on the backend in order to support the use cases mentioned earlier. For example, an AMQP backend would only be able to support a subset of the current API. Operators occasionally ask about AMQP broker support, in particular, and we are exploring ways to evolve the API in order to support that. 2. Scalable. Operators can use Marconi’s HTTP transport to leverage their existing infrastructure and expertise in scaling out web heads. When it comes to the backend, for small deployments with minimal throughput needs, we are providing a SQLAlchemy driver as a non-AGPL alternative to MongoDB. For large-scale production deployments, we currently provide the MongoDB driver and will likely add Redis as another option (there is already a POC driver). And, of course, operators can provide drivers for NewSQL databases, such as VelocityDB, that are very fast and scale extremely well. In Marconi, every queue can be associated with a different backend cluster. This allows operators to scale both up and out, according to what is most cost-effective for them. Marconi's app-level sharding is currently done using a lo
[openstack-dev] [Marconi] Why is marconi a queue implementation vs a provisioning API?
So this came up briefly at the tripleo sprint, and since I can't seem to find a /why/ document (https://wiki.openstack.org/wiki/Marconi/Incubation#Raised_Questions_.2B_Answers and https://wiki.openstack.org/wiki/Marconi#Design don't supply this) we decided at the TC meeting that I should raise it here. Firstly, let me check my facts :) - Marconi is backed by a modular 'storage' layer which places some conceptual design constraints on the storage backends that are possible (e.g. I rather expect a 0mq implementation to be very tricky, at best (vs the RPC style front end https://wiki.openstack.org/wiki/Marconi/specs/zmq/api/v1 )), and has a hybrid control/data plane API implementation where one can call into it to make queues etc, and to consume them. The API for the queues is very odd from a queueing perspective - https://wiki.openstack.org/wiki/Marconi/specs/api/v1#Get_a_Specific_Message - you don't subscribe to the queue, you enumerate and ask for a single message. And the implementations in tree are mongodb (which is at best contentious, due to the AGPL and many folks reasonable concerns about it), and mysq. My desires around Marconi are: - to make sure the queue we have is suitable for use by OpenStack itself: we have a very strong culture around consolidating technology choices, and it would be extremely odd to have Marconi be something that isn't suitable to replace rabbitmq etc as the queue abstraction in the fullness of time. - to make sure that deployers with scale / performance needs can have that met by Marconi - to make my life easy as a deployer ;) So my questions are: - why isn't the API a queue friendly API (e.g. like https://github.com/twitter/kestrel - kestrel which uses the memcache API, puts put into the queue, gets get from the queue). The current API looks like pretty much the worst case scenario there - CRUD rather than submit/retrieve with blocking requests (e.g. longpoll vs poll). - wouldn't it be better to expose other existing implementations of HTTP message queues like nova does with hypervisors, rather than creating our own one? E.g. HTTPSQS, RestMQ, Kestrel, queues.io. - or even do what Trove does and expose the actual implementation directly? - whats the plan to fix the API? - is there a plan / desire to back onto actual queue services (e.g. AMQP, $anyof the http ones above, etc) - what is the current performance - how many usecs does it take to put a message, and get one back, in real world use? How many concurrent clients can a single Marconi API server with one backing server deliver today? As background, 'implement a message queue in a SQL DB' is such a horrid antipattern its been a standing joke in many organisations I've been in - and yet we're preparing to graduate *exactly that* which is frankly perplexing. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev