Hello all, To those who don't me I'm a freshman computer science student at Rensselaer Polytechnic Institute in Troy, New York. I'm on the mailing lists quite a bit so you may have seen me around.
A Multiple Database API For Django ================================== Django current has the low level hooks necessary for multiple database support, but it doesn't have the high level API for using, nor any support infrastructure, documentation, or tests. The purpose of this project would be to implement the high level API necessary for the use of multiple databases in Django, along with requisit documentation and tests. There have been several previous proposals and implementation of multiple-database support in Django, non of which has been complete, or gained sufficient traction in the community in order to be included in Django itself. As such this proposal will specifically address some of the reasons for past failures, and their remedies. The API ------- First there is the API for defining multiple connections. A new setting will be created ``DATABASES`` (or something similar), which is a dictionary mapping database alias(internal name) to a dictionary containing the current ``DATABASE_*`` settings: .. sourcecode:: python DATABASES = { 'default': { 'DATABASE_ENGINE': 'postgresql_psycopg2', 'DATABASE_NAME': 'my_data_base', 'DATABASE_USER': 'django', 'DATABASE_PASSWORD': 'super_secret', } 'user': { 'DATABASE_ENGINE': 'sqlite3', 'DATABASE_NAME': '/home/django_projects/universal/users.db', } } A database with the alias ``default`` will be the default connection(it will be used if no other one is specified for a query) and will be the direct replacement for the ``DATABASE_*`` settings. In compliance with Django's deprecation policy the ``DATABASE_*`` will automatically be handled as if they were defined in the ``DATABASES`` dict for at least 2 releases. Next a ``connections`` object will be implemented in ``django.db``, analgous to the ``django.db.connection`` object, the ``connections`` one will be a dictionary like object, that is subscripted by database alias, and lazily returns a connection to the database. ``django.db.connection`` will remain(at least for the present, it's ultimate state will be by community consensus) and merely proxy to ``django.db.connections['default']``. Using the previously defined database setting this might be used as: .. sourcecode:: python from django.db import connections conn = connections['user'] c = conn.cursor() results = c.execute("""SELECT 1""") results.fetchall() Now that there is the necessary infastructure to accompany the very low level plumbing we need our actual API. The high level API will have 2 components. First here will be a ``using()`` method on ``QuerySet`` and ``Manager`` objects. This method simply takes an alias to a connection(and possibly a connection object itself to allow for dynamic database usage) and makes that the connection that will be used for that query. Secondly, a new options will be created in the inner Meta class of models. This option will be named ``using`` and specify the default connection to use for all queries against this model, overiding the default specified in the settings: .. sourcecode:: python class MyUser(models.Model): ... class Meta: using = 'user' # this queries the 'user' database MyUser.objects.all() # this queries the 'default' database MyUser.objects.using('default') Lastly, various plumbing will need to be updated to reflect the new multidb API, such as transactions, breakpoints, management commands, etc. More Advanced Usage ------------------- While the above two methods are strictly speaking sufficient they require the user to write lots of boilerplate code in order to implement advanced multi database strategies such as replication and sharding. Therefore we also introduce the concept of ``DatabaseManagers``, not to be confused with Django's current managers. DatabaseManagers are classes that define how what connection should be used for a given query. There are 2 levels at which to specify what ``DatabaseManager`` to use, as a setting, and at the class level. For example in one's settings.py one might have: .. sourcecode:: python DEFAULT_DB_MANAGER = 'django.db.multidb.round_robin.Random' This tells Django that for each query it should use the ``DatabaseManager`` specified at that location, unless it is overidden by the ``using`` Meta option, or the ``using()`` method. The more granular way to use ``DatabaseManagers`` is to provide them, in place of a string, as the ``using`` Meta option. Here we pass an instance of the class we want to use: .. sourcecode:: python class MyModel(models.Model): class Meta: using = Random(['my_db1', 'my_db2', 'my_db2']) At this level it can still be overidden by the explicit usage of the ``using()`` method. But how exactly do ``DatabaseManagers`` work? Let's start with an example: .. sourcecode:: python class Random(DatabaseManager): def __init__(self, dbs=None): self.dbs = dbs if dbs is not None else settings.DATABASES.keys() def select(self, cls, **params): return random.choose(self.dbs) def create(self, cls, **params): raise TypeError("Random database manager is intended only for reads") def update(self, cls, **params): raise TypeError("Random database manager is intended only for reads") Basically we have 3 methods on a ``DatabaseManager``, plus the ``__init__`` method. ``__init__`` should be able to be called with no parameters if you want to make the class the default for your project. ``select()``, ``create()``, and ``update()`` each take the class of the model that the query is for, plus ``**params``, it has yet to be determined what params should be passed, ideas include: * The ``Query`` object for the ``QuerySet`` in question. * The ``WhereNode`` for the ``Query`` object. * others... Plan of Action -------------- 1) Implement the ``connections`` object. -- 1 day 2) Alter the relevant management commands and anything else to use all connections or ``django.db.connections['default']`` depending on which is approporiate. -- 1 week 3) Implement the method tracking(command pattern). -- 1 week 4) Implement the ``using()`` method and the ``using`` inner ``Meta`` options. -- 1 week 5) Write initial tests and docs, the rest will be written as features are implemented, however a large initial set needs to be written. -- 3 weeks 6) Fix up transaction support, the close database signal, anything else in transactions.py. -- 2 weeks 7) Add support for the ``DatabaseManager`` for more complex support. -- 2 weeks 8) Time permitting implement a few common replication patterns. All of these times are fairly aggressive, and there are about 2 weeks to spare, so those can be used as necessary, or for part #8. Hurdles ------- The following are a list of possible technical issues: * In ``django.db.models.sql.query.Query`` are any tests done on what the connection is before the actual SQL construction phase. If so these need to be changed not to do this, since the connect might change at some point after that test. If this can't be done than ``using()`` needs to be the first method called on a ``QuerySet``, or at a minimum called before any methods that do such testing. Further, if these tests can't be put off then the only option is a callback that's called right when the first ``Query`` object is constructed, this means Django won't know what type of query it would be, rendering the ``DatabaseManager`` impossible. * Will models need to know which database they came from so that they can be saved back correctly? * Does ``Model.save()`` need to take a ``using`` parameter so new objecs can be created on a specific database or saved to a new database. * Backends that use custom query classes, will we need a ``from_query`` classmethod to transform them. This would require all backends to store and use information that is basically less than or equal to what the ``Query`` object stores. Also, there needs to be the reverse, a way to go from a custom ``Query`` object back to either the Django default or some other custom ``Query`` object. * Foreign keys will basically be handled en passant because of how they are implemented, but many to many fields will require more thought, especially since that SQL isn't in the ``Query`` class. * Solutions --------- The greatest hurdle is changing the connection after we already have our ``Query`` partly created. The issues here are that: we might have done tests against ``connection.features`` already, we might need to switch either to or from a custom ``Query`` object, amongst other issues. One possible solution that is very powerful(though quite inellegant) is to have the ``QuerySet`` keep track of all public API method calls against it and what parameters they took, then when the ``connection`` is changed it will recreate the ``Query`` object by creating a "blank" one with the new connection and reapplying all the methods it has stored. This is basically a simple implementation of the command pattern. I'm here soliciting feedback on both the API, and any potential hurdles I may have missed. Thanks, Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire "The people's good is the highest law."--Cicero --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---