Re: [GSOC] Multiple Database API proposal
Alex Gaynor wrote: > 8) Time permitting implement a few common replication patterns. I'm kind of not very excited with this point. To me replication is a major use-case. I suspect most people who move beyond single server setup and beyond 10'000 - 20'000 visitors realize that replication should just be in place ensuring performance and redundancy. In my experience other multi-DB patterns (those that covered with `using()` and Meta-attributes on models) are just *less* common in practice. So I consider leaving replication to "time permitting" a mistake. On the other hand may be all this work won't break mysql_replicated and I'll just have to update it to the new db backend interface. There may be non-trivial things to work out though such as having separate master-slave pairs for each data shard. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: [GSOC] Multiple Database API proposal
On Sat, Mar 21, 2009 at 1:25 AM, Malcolm Tredinnick < malc...@pointy-stick.com> wrote: > > On Sat, 2009-03-21 at 00:41 -0400, Alex Gaynor wrote: > > > > > > > One suggestion Eric Florenzano had was that we go above and > > beyond > > > just storing the methods and parameters, we don't even > > excecute them > > > at all until absolutely necessary. > > > > > > Excuse me for a moment whilst I add Eric to a special list > > I've been > > keeping. He's trying to make trouble. > > > > Ok, back now... There are at least two problems with this. > > > > (a) Backwards incompatible in that some querysets would return > > noticeably different results before and after that change. It > > would be > > subtle, quiet and very difficult to detect without auditing > > every line > > of code that contributes to a queryset. The worst kind of > > change for us > > to make from the perspective of the users. > > > > What scenario does it return different results, the one place I can > > think of is: > > > > query = queryset.order_by('I AM NOT A REAL FIELD, HAHA') > > render_to_response('template.html', {'q': query}) > > > > which would raise an exception in the template instead of in the view. > > It's related to eager/deferred argument evaluation (which is done for > the same reasons): any "smart" object like Q objects would require > changing to handle deferring things correctly. They can currently be > designed to evaluate only once and will work correctly. > I don't see this as an issue, simply because whatever happens in the instantiation of these objects would be the same for whatever connection was in use. > > > > > > > (b) Intentionally not done right now and not because I'm > > whimsical and > > arbitrary (although I am). The problem is it requires storing > > all sorts > > of arbitrarily complex Python objects. Which breaks pickling, > > which > > breaks caching. People tend to complain, a lot, about that > > last bit. > > > > That's why the Where.add() converts things to more basic types > > when they > > are added (via a filter() command). If somebody really needs > > lazily > > evaluated parameters, it's easy enough via a custom Q-like > > object, but > > so far nobody has asked for that if they've gotten stuck doing > > it. It's > > even something we could consider adding to Django, although > > it's not a > > no-brainer given the potential to break caching. > > > > I vaguely recall there being a ticket about this that you wontfixed, > > although that may have been about defering calling callables :). In > > any event the caching issue was one I hadn't considered, although one > > solution would be not to pickle it with the ability to switch to a > > different query type, it's a bit of a strange restriction, but I don't > > think it's one that would practically affect people, and it's less > > restricitive. > > You wrote a really long sentence there that didn't make a lot of sense > (too many prepositions and commas, not enough nouns and full stops). > Unclear which restriction you're arguing against, but the picklability > of querysets is pretty much a requirement. It's something people really > use. > > However, before we go too far down this path: this is a very minor > thing. It's unlikely to be required. Adding it "because we can" is an > argument Eric can propose at some much later date if it's not absolutely > *required* for multi-db stuff. I think we won't need to worry about this > at all. > Just to clear that up what I was say was: When you pickly a QuerySet we build up the entire Query as we would right before SQL excecution and then just pickle that. Then the restriction is that you can't change the database type to be used on an unpickled query. > > > > > > > [...] > > > > > > Thanks for all the review Malcolm. > > > > > > No problems. > > > > > One question that I didn't really ask in the initial post is > > what > > > parameters should a "DatabaseManager" receieve on it's > > methods, one > > > suggestion is the Query object, since that gives the use the > > maximal > > > amount of information,, however my concerns there are that > > it's not a > > > public API, and having a private API as a part of the public > > API feels > > > klunky. > > > > > > At first glance, I believe the word you're looking for is > > "wrong". :-) > > > > Yes, that's the one. > > > > > > Definitely a valid concern. > > > > > OTOH there isn't really another data structure that > > carries around > > > the information someone writing their sharding logic(or > >
Re: [GSOC] Multiple Database API proposal
On Sat, 2009-03-21 at 00:41 -0400, Alex Gaynor wrote: > > > > One suggestion Eric Florenzano had was that we go above and > beyond > > just storing the methods and parameters, we don't even > excecute them > > at all until absolutely necessary. > > > Excuse me for a moment whilst I add Eric to a special list > I've been > keeping. He's trying to make trouble. > > Ok, back now... There are at least two problems with this. > > (a) Backwards incompatible in that some querysets would return > noticeably different results before and after that change. It > would be > subtle, quiet and very difficult to detect without auditing > every line > of code that contributes to a queryset. The worst kind of > change for us > to make from the perspective of the users. > > What scenario does it return different results, the one place I can > think of is: > > query = queryset.order_by('I AM NOT A REAL FIELD, HAHA') > render_to_response('template.html', {'q': query}) > > which would raise an exception in the template instead of in the view. It's related to eager/deferred argument evaluation (which is done for the same reasons): any "smart" object like Q objects would require changing to handle deferring things correctly. They can currently be designed to evaluate only once and will work correctly. > > > (b) Intentionally not done right now and not because I'm > whimsical and > arbitrary (although I am). The problem is it requires storing > all sorts > of arbitrarily complex Python objects. Which breaks pickling, > which > breaks caching. People tend to complain, a lot, about that > last bit. > > That's why the Where.add() converts things to more basic types > when they > are added (via a filter() command). If somebody really needs > lazily > evaluated parameters, it's easy enough via a custom Q-like > object, but > so far nobody has asked for that if they've gotten stuck doing > it. It's > even something we could consider adding to Django, although > it's not a > no-brainer given the potential to break caching. > > I vaguely recall there being a ticket about this that you wontfixed, > although that may have been about defering calling callables :). In > any event the caching issue was one I hadn't considered, although one > solution would be not to pickle it with the ability to switch to a > different query type, it's a bit of a strange restriction, but I don't > think it's one that would practically affect people, and it's less > restricitive. You wrote a really long sentence there that didn't make a lot of sense (too many prepositions and commas, not enough nouns and full stops). Unclear which restriction you're arguing against, but the picklability of querysets is pretty much a requirement. It's something people really use. However, before we go too far down this path: this is a very minor thing. It's unlikely to be required. Adding it "because we can" is an argument Eric can propose at some much later date if it's not absolutely *required* for multi-db stuff. I think we won't need to worry about this at all. > > > [...] > > > > Thanks for all the review Malcolm. > > > No problems. > > > One question that I didn't really ask in the initial post is > what > > parameters should a "DatabaseManager" receieve on it's > methods, one > > suggestion is the Query object, since that gives the use the > maximal > > amount of information,, however my concerns there are that > it's not a > > public API, and having a private API as a part of the public > API feels > > klunky. > > > At first glance, I believe the word you're looking for is > "wrong". :-) > > Yes, that's the one. > > > Definitely a valid concern. > > > OTOH there isn't really another data structure that > carries around > > the information someone writing their sharding logic(or > whatever other > > scheme they want to implement) who inevitably want to have. > > > Two solutions spring to mind, although I haven't thought this > through a > lot: it's not particularly germane to the proposal since it's > something > we can work out a bit later on. I've got limited time > today(something > about a beta release coming up), so I wanted to just get out > responses > to the two people who posted items for discussion. I suspect > there's a > lot of thinking n
Re: [GSOC] Multiple Database API proposal
> > > One suggestion Eric Florenzano had was that we go above and beyond > > just storing the methods and parameters, we don't even excecute them > > at all until absolutely necessary. > > Excuse me for a moment whilst I add Eric to a special list I've been > keeping. He's trying to make trouble. > > Ok, back now... There are at least two problems with this. > > (a) Backwards incompatible in that some querysets would return > noticeably different results before and after that change. It would be > subtle, quiet and very difficult to detect without auditing every line > of code that contributes to a queryset. The worst kind of change for us > to make from the perspective of the users. > What scenario does it return different results, the one place I can think of is: query = queryset.order_by('I AM NOT A REAL FIELD, HAHA') render_to_response('template.html', {'q': query}) which would raise an exception in the template instead of in the view. > > (b) Intentionally not done right now and not because I'm whimsical and > arbitrary (although I am). The problem is it requires storing all sorts > of arbitrarily complex Python objects. Which breaks pickling, which > breaks caching. People tend to complain, a lot, about that last bit. > > That's why the Where.add() converts things to more basic types when they > are added (via a filter() command). If somebody really needs lazily > evaluated parameters, it's easy enough via a custom Q-like object, but > so far nobody has asked for that if they've gotten stuck doing it. It's > even something we could consider adding to Django, although it's not a > no-brainer given the potential to break caching. > I vaguely recall there being a ticket about this that you wontfixed, although that may have been about defering calling callables :). In any event the caching issue was one I hadn't considered, although one solution would be not to pickle it with the ability to switch to a different query type, it's a bit of a strange restriction, but I don't think it's one that would practically affect people, and it's less restricitive. > > [...] > > > > Thanks for all the review Malcolm. > > No problems. > > > One question that I didn't really ask in the initial post is what > > parameters should a "DatabaseManager" receieve on it's methods, one > > suggestion is the Query object, since that gives the use the maximal > > amount of information,, however my concerns there are that it's not a > > public API, and having a private API as a part of the public API feels > > klunky. > > At first glance, I believe the word you're looking for is "wrong". :-) > Yes, that's the one. > > Definitely a valid concern. > > > OTOH there isn't really another data structure that carries around > > the information someone writing their sharding logic(or whatever other > > scheme they want to implement) who inevitably want to have. > > Two solutions spring to mind, although I haven't thought this through a > lot: it's not particularly germane to the proposal since it's something > we can work out a bit later on. I've got limited time today(something > about a beta release coming up), so I wanted to just get out responses > to the two people who posted items for discussion. I suspect there's a > lot of thinking needed here about the concept as a whole and I want to > do that. Anyway... > > One option is to use the piece of public API that is available which > will always be carrying around a Query object: the QuerySet. Query > objects don't exist in isolation. However, this sounds problematic > because the implementation is going to be working at a very low-level -- > database managers are only really interesting to Query.as_sql() and it's > dependencies. But that leads to the next idea, ... > > The other is to work out a better place for this database manager in the > hierarchy. It might be something that lives as an attribute on a > QuerySet. Something like the user provides a function that picks the > database based "some information" that is available to it and the base > method selects the right database to use. Since it lives in the QuerySet > namespace, it can happily access the "query" attribute there without any > encapsulation violations. The database manager then becomes two pieces, > an algorithm on QuerySet (that might just dispatch to the real algorithm > on Query), plus some user-supplied code to make the right selections. > That latter thing could be a callable object if you need the full class > structure. But the stuff QuerySet/Query needs to know about is probably > a much smaller interface than *requiring* a full class. (Did any of that > make sense?) > > I think this -- the database manager concept -- is the part of your > proposal that is most up in the air with respect to what the API looks > like. Which is fine. The fact that it's something to consider is good > enough to know. Certainly put some thought into the problem, but don't > sweat the details too much just yet (in
Re: [GSOC] Multiple Database API proposal
Trimming unused portions of the response to make it readable (which I should have done the first time around, too)... On Fri, 2009-03-20 at 23:41 -0400, Alex Gaynor wrote: > > > On Fri, Mar 20, 2009 at 11:21 PM, Malcolm Tredinnick > wrote: > > > On Fri, 2009-03-20 at 09:45 -0400, Alex Gaynor wrote: > > Hello all, [...] > > The greatest hurdle is changing the connection after we > already have > > our > > ``Query`` partly created. The issues here are that: we > might have > > done tests > > against ``connection.features`` already, we might need to > switch > > either to or > > from a custom ``Query`` object, amongst other issues. [...] > > One possible solution > > that is very powerful(though quite inellegant) is to have > the > > ``QuerySet`` keep > > track of all public API method calls against it and what > parameters > > they took, > > then when the ``connection`` is changed it will recreate the > ``Query`` > > object > > by creating a "blank" one with the new connection and > reapplying all > > the > > methods it has stored. This is basically a simple > implementation of > > the > > command pattern. > > > > > It's pretty yukky. There's a lot of Python level junk that we > intentionally avoid storing in querysets so that they behave > properly as > persistent data structures (clones are independent copies) and > can be > pickled without trouble, etc. It would be really bad for > performance to > reintroduce those (I did a lot of profiling when developing > that stuff > and tried to throw out as much as possible). I think this > fortunately > isn't going to be a real issue. I was pretty careful > originally to keep > the leakage from django.db.connection into the Query class to > as few > places as possible and mostly when we're creating the SQL. > > Some cases that might eb unavoidable could be replaced with > delayed > evaluation objects (essentially encapsulating the command > pattern just > for that fragment), which is a bit cleaner. > > > One suggestion Eric Florenzano had was that we go above and beyond > just storing the methods and parameters, we don't even excecute them > at all until absolutely necessary. Excuse me for a moment whilst I add Eric to a special list I've been keeping. He's trying to make trouble. Ok, back now... There are at least two problems with this. (a) Backwards incompatible in that some querysets would return noticeably different results before and after that change. It would be subtle, quiet and very difficult to detect without auditing every line of code that contributes to a queryset. The worst kind of change for us to make from the perspective of the users. (b) Intentionally not done right now and not because I'm whimsical and arbitrary (although I am). The problem is it requires storing all sorts of arbitrarily complex Python objects. Which breaks pickling, which breaks caching. People tend to complain, a lot, about that last bit. That's why the Where.add() converts things to more basic types when they are added (via a filter() command). If somebody really needs lazily evaluated parameters, it's easy enough via a custom Q-like object, but so far nobody has asked for that if they've gotten stuck doing it. It's even something we could consider adding to Django, although it's not a no-brainer given the potential to break caching. [...] > > Thanks for all the review Malcolm. No problems. > One question that I didn't really ask in the initial post is what > parameters should a "DatabaseManager" receieve on it's methods, one > suggestion is the Query object, since that gives the use the maximal > amount of information,, however my concerns there are that it's not a > public API, and having a private API as a part of the public API feels > klunky. At first glance, I believe the word you're looking for is "wrong". :-) Definitely a valid concern. > OTOH there isn't really another data structure that carries around > the information someone writing their sharding logic(or whatever other > scheme they want to implement) who inevitably want to have. Two solutions spring to mind, although I haven't thought this through a lot: it's not particularly germane to the proposal since it's something we can work out a bit later on. I've got limited time today(something about a beta release coming up), so I wanted to just get out responses to the two people who posted items for discussion. I suspect there's a lot of thinking needed here about the concept as a whole and I want to do that. Anyway...
Re: [GSOC] Multiple Database API proposal
On Fri, Mar 20, 2009 at 11:21 PM, Malcolm Tredinnick < malc...@pointy-stick.com> wrote: > > On Fri, 2009-03-20 at 09:45 -0400, Alex Gaynor wrote: > > Hello all, > > > > To those who don't me I'm a freshman computer science student at > > Rensselaer > > Polytechnic Institute in Troy, New York. I'm on the mailing lists > > quite a bit > > so you may have seen me around. > > > > A Multiple Database API For Django > > == > > > > Django current has the low level hooks necessary for multiple database > > support, > > but it doesn't have the high level API for using, nor any support > > infrastructure, documentation, or tests. The purpose of this project > > would be > > to implement the high level API necessary for the use of multiple > > databases in > > Django, along with requisit documentation and tests. > > > > There have been several previous proposals and implementation of > > multiple-database support in Django, non of which has been complete, > > or gained > > sufficient traction in the community in order to be included in Django > > itself. > > As such this proposal will specifically address some of the reasons > > for past > > failures, and their remedies. > > > > The API > > --- > > > > First there is the API for defining multiple connections. A new > > setting will > > be created ``DATABASES`` (or something similar), which is a dictionary > > mapping > > database alias(internal name) to a dictionary containing the current > > ``DATABASE_*`` settings: > > > > .. sourcecode:: python > > > > DATABASES = { > > 'default': { > > 'DATABASE_ENGINE': 'postgresql_psycopg2', > > 'DATABASE_NAME': 'my_data_base', > > 'DATABASE_USER': 'django', > > 'DATABASE_PASSWORD': 'super_secret', > > } > > 'user': { > > 'DATABASE_ENGINE': 'sqlite3', > > 'DATABASE_NAME': > > '/home/django_projects/universal/users.db', > > } > > } > > > > A database with the alias ``default`` will be the default > > connection(it will be > > used if no other one is specified for a query) and will be the direct > > replacement for the ``DATABASE_*`` settings. In compliance with > > Django's > > deprecation policy the ``DATABASE_*`` will automatically be handled as > > if they > > were defined in the ``DATABASES`` dict for at least 2 releases. > > > > Next a ``connections`` object will be implemented in ``django.db``, > > analgous > > to the ``django.db.connection`` object, the ``connections`` one will > > be a > > dictionary like object, that is subscripted by database alias, and > > lazily > > returns a connection to the database. ``django.db.connection`` will > > remain(at > > least for the present, it's ultimate state will be by community > > consensus) and > > merely proxy to ``django.db.connections['default']``. Using the > > previously > > defined database setting this might be used as: > > > > .. sourcecode:: python > > > > from django.db import connections > > > > conn = connections['user'] > > c = conn.cursor() > > results = c.execute("""SELECT 1""") > > results.fetchall() > > > > Now that there is the necessary infastructure to accompany the very > > low level > > plumbing we need our actual API. The high level API will have 2 > > components. > > First here will be a ``using()`` method on ``QuerySet`` and > > ``Manager`` > > objects. This method simply takes an alias to a connection(and > > possibly a > > connection object itself to allow for dynamic database usage) and > > makes that > > the connection that will be used for that query. Secondly, a new > > options will > > be created in the inner Meta class of models. This option will be > > named > > ``using`` and specify the default connection to use for all queries > > against > > this model, overiding the default specified in the settings: > > > > .. sourcecode:: python > > > > class MyUser(models.Model): > > ... > > class Meta: > > using = 'user' > > > > # this queries the 'user' database > > MyUser.objects.all() > > # this queries the 'default' database > > MyUser.objects.using('default') > > > > Lastly, various plumbing will need to be updated to reflect the new > > multidb > > API, such as transactions, breakpoints, management commands, etc. > > > > More Advanced Usage > > --- > > > > While the above two methods are strictly speaking sufficient they > > require the > > user to write lots of boilerplate code in order to implement advanced > > multi > > database strategies such as replication and sharding. Therefore we > > also > > introduce the concept of ``DatabaseManagers``, not to be confused with > > Django's > > current managers. DatabaseManagers are classes that define how what > > connection > > should be used for a given query. There are 2 levels at which to > > specify what > > ``DatabaseManager`` to use, as a setting, and at th
Re: [GSOC] Multiple Database API proposal
On Fri, 2009-03-20 at 09:45 -0400, Alex Gaynor wrote: > Hello all, > > To those who don't me I'm a freshman computer science student at > Rensselaer > Polytechnic Institute in Troy, New York. I'm on the mailing lists > quite a bit > so you may have seen me around. > > A Multiple Database API For Django > == > > Django current has the low level hooks necessary for multiple database > support, > but it doesn't have the high level API for using, nor any support > infrastructure, documentation, or tests. The purpose of this project > would be > to implement the high level API necessary for the use of multiple > databases in > Django, along with requisit documentation and tests. > > There have been several previous proposals and implementation of > multiple-database support in Django, non of which has been complete, > or gained > sufficient traction in the community in order to be included in Django > itself. > As such this proposal will specifically address some of the reasons > for past > failures, and their remedies. > > The API > --- > > First there is the API for defining multiple connections. A new > setting will > be created ``DATABASES`` (or something similar), which is a dictionary > mapping > database alias(internal name) to a dictionary containing the current > ``DATABASE_*`` settings: > > .. sourcecode:: python > > DATABASES = { > 'default': { > 'DATABASE_ENGINE': 'postgresql_psycopg2', > 'DATABASE_NAME': 'my_data_base', > 'DATABASE_USER': 'django', > 'DATABASE_PASSWORD': 'super_secret', > } > 'user': { > 'DATABASE_ENGINE': 'sqlite3', > 'DATABASE_NAME': > '/home/django_projects/universal/users.db', > } > } > > A database with the alias ``default`` will be the default > connection(it will be > used if no other one is specified for a query) and will be the direct > replacement for the ``DATABASE_*`` settings. In compliance with > Django's > deprecation policy the ``DATABASE_*`` will automatically be handled as > if they > were defined in the ``DATABASES`` dict for at least 2 releases. > > Next a ``connections`` object will be implemented in ``django.db``, > analgous > to the ``django.db.connection`` object, the ``connections`` one will > be a > dictionary like object, that is subscripted by database alias, and > lazily > returns a connection to the database. ``django.db.connection`` will > remain(at > least for the present, it's ultimate state will be by community > consensus) and > merely proxy to ``django.db.connections['default']``. Using the > previously > defined database setting this might be used as: > > .. sourcecode:: python > > from django.db import connections > > conn = connections['user'] > c = conn.cursor() > results = c.execute("""SELECT 1""") > results.fetchall() > > Now that there is the necessary infastructure to accompany the very > low level > plumbing we need our actual API. The high level API will have 2 > components. > First here will be a ``using()`` method on ``QuerySet`` and > ``Manager`` > objects. This method simply takes an alias to a connection(and > possibly a > connection object itself to allow for dynamic database usage) and > makes that > the connection that will be used for that query. Secondly, a new > options will > be created in the inner Meta class of models. This option will be > named > ``using`` and specify the default connection to use for all queries > against > this model, overiding the default specified in the settings: > > .. sourcecode:: python > > class MyUser(models.Model): > ... > class Meta: > using = 'user' > > # this queries the 'user' database > MyUser.objects.all() > # this queries the 'default' database > MyUser.objects.using('default') > > Lastly, various plumbing will need to be updated to reflect the new > multidb > API, such as transactions, breakpoints, management commands, etc. > > More Advanced Usage > --- > > While the above two methods are strictly speaking sufficient they > require the > user to write lots of boilerplate code in order to implement advanced > multi > database strategies such as replication and sharding. Therefore we > also > introduce the concept of ``DatabaseManagers``, not to be confused with > Django's > current managers. DatabaseManagers are classes that define how what > connection > should be used for a given query. There are 2 levels at which to > specify what > ``DatabaseManager`` to use, as a setting, and at the class level. For > example > in one's settings.py one might have: > > .. sourcecode:: python > > DEFAULT_DB_MANAGER = 'django.db.multidb.round_robin.Random' > > This tells Django that for each query it should use the > ``DatabaseManager`` > specified at that location, unless it is
Re: [GSOC] Multiple Database API proposal
> I'm here soliciting feedback on both the API, and any potential hurdles I > may have missed. While my vote may mean little, Alex has certainly been active and had quality code on the mailing list. MultiDB has also been a frequent issue on the mailing-list, so Alex gets my +1 I'd hope to see "multiple databases" defined a little more clearly as discussed in this thread[1]. Whether the SoC project address *all* of the facets (wow, lots of work!) or just selects certain issues, I'd like to see them addressed in the proposal ("addressing federation and load-balancing, but not sharding") to show that they're being considered during the implementation. From what I gather in the description, Alex is only proposing load-balancing. Depending on which definitions of multidb you plan to address, it also impacts areas such as aggregation (performing count/summation over shards requires extra consideration) and cross-database joining. In the above thread, Malcolm also raises the issue of read/write consistency when doing load-balancing. -tim [1] http://groups.google.com/group/django-users/browse_thread/thread/663046559fd0f9c1/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
[GSOC] Multiple Database API proposal
Hello all, To those who don't me I'm a freshman computer science student at Rensselaer Polytechnic Institute in Troy, New York. I'm on the mailing lists quite a bit so you may have seen me around. A Multiple Database API For Django == Django current has the low level hooks necessary for multiple database support, but it doesn't have the high level API for using, nor any support infrastructure, documentation, or tests. The purpose of this project would be to implement the high level API necessary for the use of multiple databases in Django, along with requisit documentation and tests. There have been several previous proposals and implementation of multiple-database support in Django, non of which has been complete, or gained sufficient traction in the community in order to be included in Django itself. As such this proposal will specifically address some of the reasons for past failures, and their remedies. The API --- First there is the API for defining multiple connections. A new setting will be created ``DATABASES`` (or something similar), which is a dictionary mapping database alias(internal name) to a dictionary containing the current ``DATABASE_*`` settings: .. sourcecode:: python DATABASES = { 'default': { 'DATABASE_ENGINE': 'postgresql_psycopg2', 'DATABASE_NAME': 'my_data_base', 'DATABASE_USER': 'django', 'DATABASE_PASSWORD': 'super_secret', } 'user': { 'DATABASE_ENGINE': 'sqlite3', 'DATABASE_NAME': '/home/django_projects/universal/users.db', } } A database with the alias ``default`` will be the default connection(it will be used if no other one is specified for a query) and will be the direct replacement for the ``DATABASE_*`` settings. In compliance with Django's deprecation policy the ``DATABASE_*`` will automatically be handled as if they were defined in the ``DATABASES`` dict for at least 2 releases. Next a ``connections`` object will be implemented in ``django.db``, analgous to the ``django.db.connection`` object, the ``connections`` one will be a dictionary like object, that is subscripted by database alias, and lazily returns a connection to the database. ``django.db.connection`` will remain(at least for the present, it's ultimate state will be by community consensus) and merely proxy to ``django.db.connections['default']``. Using the previously defined database setting this might be used as: .. sourcecode:: python from django.db import connections conn = connections['user'] c = conn.cursor() results = c.execute("""SELECT 1""") results.fetchall() Now that there is the necessary infastructure to accompany the very low level plumbing we need our actual API. The high level API will have 2 components. First here will be a ``using()`` method on ``QuerySet`` and ``Manager`` objects. This method simply takes an alias to a connection(and possibly a connection object itself to allow for dynamic database usage) and makes that the connection that will be used for that query. Secondly, a new options will be created in the inner Meta class of models. This option will be named ``using`` and specify the default connection to use for all queries against this model, overiding the default specified in the settings: .. sourcecode:: python class MyUser(models.Model): ... class Meta: using = 'user' # this queries the 'user' database MyUser.objects.all() # this queries the 'default' database MyUser.objects.using('default') Lastly, various plumbing will need to be updated to reflect the new multidb API, such as transactions, breakpoints, management commands, etc. More Advanced Usage --- While the above two methods are strictly speaking sufficient they require the user to write lots of boilerplate code in order to implement advanced multi database strategies such as replication and sharding. Therefore we also introduce the concept of ``DatabaseManagers``, not to be confused with Django's current managers. DatabaseManagers are classes that define how what connection should be used for a given query. There are 2 levels at which to specify what ``DatabaseManager`` to use, as a setting, and at the class level. For example in one's settings.py one might have: .. sourcecode:: python DEFAULT_DB_MANAGER = 'django.db.multidb.round_robin.Random' This tells Django that for each query it should use the ``DatabaseManager`` specified at that location, unless it is overidden by the ``using`` Meta option, or the ``using()`` method. The more granular way to use ``DatabaseManagers`` is to provide them, in place of a string, as the ``using`` Meta option. Here we pass an instance of the class we want to use: .. sourcecode:: python class MyModel(models.Model): class Meta: using = Random(['my_db1', 'my_db2', 'my_db2']) At this level it