Excerpts from Mike Bayer's message of 2015-03-09 17:26:36 -0700: > > Clint Byrum <cl...@fewbar.com> wrote: > > > > > So I think I didn't speak clearly enough here. The benchmarks are of > > course needed, but there's a tipping point when write activity gets to > > a certain level where it's cheaper to let it get a little skewed and > > correct asynchronously. This is not unique to SQL, this is all large > > scale distributed systems. There's probably a super cool formula for it > > too, but roughly it is > > > > (num_trans_per_s * cost_of_fk_check_per_trans) > > > > versus > > > > (error_cost * error_rate)+(cost_find_all_errors/seconds_to_find_all_errors) > > Well the error cost here would be a database that would be “corrupted”, > meaning it has rows which no longer refer to things that exist and the > database is now in a case where it may very well be unusable by the > application, without being rolled back to some known state. >
That's not a cost, that's a situation. What's the actual cost to the user? "may very well be unusable" implies uncertainty, which is certainly a risk, but the cost is unknown. Typically one must estimate the cost with each error found. > If Keystone truly doesn’t care about ACID it might want to consider MyISAM > tables, which are faster for read-heavy workloads, though these aren’t > compatible with Galera. > Please try to refrain from using false equivalence. ACID stands for Atomicity, Consistency, Isolation, Durability. Nowhere in there does it stand for "referential integrity". If Keystone uses transactions properly, ACID is preserved. Also I don't think it is productive to bring up MyISAM in any serious conversation about databases. > > So it's not really something I think one can blindly accept as "better", > > but rather something that one needs to calculate for themselves. You say > > cost_of_fk_check_per_trans is negligible, but that has been measured as > > not true in the past: > > > > http://www.percona.com/blog/2010/09/20/instrumentation-and-the-cost-of-foreign-keys/ > > That’s not a surprising case because the “parent” row being modified is > being referred to by the “child” row that’s still in transaction. This is an > implementation detail of the ACID guarantees which one gets when they use a > relational database. If Keystone’s relational backend in fact has a > performance bottleneck due to an operation like this, it should be visited > individually. But I think it’s extremely unlikely this is actually the case. > Lock contention is a real thing that will inevitably slow down transaction speed if not carefully avoided. One less query (which is what FK checks end up being) means one less read lock taken and one less place to have to think through. In practical matters, the fact that identity and assignment are not allowed to FK does practically shutdown most of the real possibilities of this type of contention. > > That article demonstrates that the FK adds lock contention in > > InnoDB. There's more. With NDB (MySQL cluster) it's an 18% performance > > hit on raw throughput: > > > > http://johanandersson.blogspot.com/2013/06/benchmarking-performance-impact-of.html > > For NDB cluster, foreign key support was only added to that system two years > ago, in version 5.6.10 in 2013. This is clearly not a system designed to > support foreign keys in the first place, the feature is entirely bleeding > edge for that specific system, and performance like that is entirely > atypical outside for database systems outside of NDB cluster. Specifically > with Openstack, the clustering solution usually used is Galera which has no > such performance issue. > > So sure, if you’re using NDB cluster, FOREIGN KEY support is > bleeding edge and you may very well want to disable constraints as you’re > using a system that wasn’t designed with this use case in mind. But because > using a relational database is somewhat pointless if you don’t need ACID, > I’d probably use Galera instead. > NDB is probably overkill for Keystone until we get up into the millions of users scale. One day maybe :). The point is that this is a high performance DB with high performance demands and it is 18% slower for some types of operations when FK's are added. > > > > Now, where that point is with Keystone I don't know. The point is, if you > > write the code relying on the existence, Keystone becomes a vertically > > scaling app that cannot ever scale out beyond whatever that limit is. > > There seems to be some misunderstanding that using foreign keys to enforce > referential integrity seems to imply that the application is now dependent > on these constraints being in place. I notice that the conversation was > originally talking a bit about allowing rows to be deleted using CASCADE, > and my original question referred to the notion of foreign key use > *overall*, not specifically as a means to offer automatic deletion of > related rows with CASCADE. The use of foreign key constraints > in openstack applications does not imply an unbreakable reliance > upon them at all, for two reasons. > > For the first reason, foreign keys first and foremost offer nothing more > than an integrity guarantee that prevents a particular row from being > deleted or having its primary key modified such that other rows which refer > to that primary key would now be left with an invalid reference. At this > level, you can have an application that is working perfectly, you can then > shut off the foreign key constraints entirely, and the application will > continue to work perfectly with no change; the difference is only that if > the application at some point failed, due to bugs either present or newly > introduced, or if an end user tried to tinker with the database directly, > operations which would leave referencing rows hanging will not be blocked. > So the use of foreign keys at this level has nothing to do with the > application relying upon their existence; they are nothing more than > integrity guarantees which can be silently removed. > I'm not entirely sure what you've said above actually prevents coders from relying on the constraints. Being careful about deleting all of the child rows before a parent is good practice. I have seen code like this in the past though: try: parent.delete() except ForeignKeyFailure: parent.children.delete() parent.delete() This means if you don't have the FK's, you may never delete the children. Is this a bug? YES. Is it super obvious that it is the wrong thing to do? No. > For the second, within the realm of ON DELETE CASCADE specifically, Keystone > like all other Openstack applications uses SQLAlchemy for relational > persistence. SQLAlchemy supports foreign key constraints that support or > don’t support ON DELETE CASCADE transparently; meaning, whether or not a > foreign key can be relied upon to delete dependent rows is nothing more than > a configuration option in the mappings; SQLAlchemy will continue to emit a > DELETE statement for rows that are locally mirrored in memory but if it > knows that the foreign key has ON DELETE CASCADE set up, it will let the > rows that are unloaded be deleted automatically. With the flag off, it will > attempt to locate those rows as well. So in this situation, the use of ON > DELETE CASCADE *improves* performance, by reducing database round trips. > Right, which is why as usual "it depends". If Keystone did a lot of child-row deletes, then it is a net gain in latency. But if it does a lot of inserts it's a huge loss, because every time you insert a row that has an FK constraint, you need to do an index lookup which is effectively an extra query which your severely limited central database has to handle all by itself without help of slaves. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev