As hinted at earlier on the ml, have started doing some work on refactoring the actual db backend; ticket 5106 holds the current version (http://code.djangoproject.com/ticket/5106).
Best to start with perceived cons of current design of the backends- 1) redundancy of code; each backend implementation has to implement the exact same functions repeatedly- _(commit|rollback|close) are simple examples, better examples are the debug cursor, dictfetch* assignments in each base module, repeat get_* func definitions in the base module. Looks of it, each backend was roughly developed via copying an existing one over, and modifying it for the target backend- this obviously isn't grand (why fix a bug once when you can fix it in 7 duplicate spots? ;) 2) due to the lack of any real base class/interface, devs are basically stuck grepping each backend to identify what functionality is available; track the usage of get_autoinc_sql in core/management for example, some spots protect themselves for the function missing, some spots assume it always exists (always exists best I can figure). Lack of real OOP for the backend code also means that django is slightly screwed in terms of trying to do changes to the backend- instead of adding a compatibility hack in one spot, have to add it to each/every backend. Not fun. 3) reliance on globals; this one requires some explanation, and a minor backstory; mod_python spawns a new interpretter per vhost; if you have lots of vhosts in a worker/prefork setup, this means you bleed memory like a siev- not fun. The solution (at least my approach) is to mangle the misc globals django relies on so that they are able to swap their settings on the fly per request (literally swapping the use $DJANGO_SETTINGS_MODULE/django.conf.settings._target), and to force mod_python to reuse the same interpretter. Upshot of it, for our usage of it at curse-gaming, this means growing >400 mb/process limited to 100 requests becomes ~40mb/process, unlimited requests (we have a veritable buttload of vhosts). Assume min ~20 idle workers, and you get an idea of why globals are more then a wee bit anti-scaling for a setup with a large # of vhosts. Getting back to db refactoring, reliance on globals through django code means that tricks like that are far harder, and adds more work for multidb code/attempts; that codebase require a reduction of global reliance (quote_name is a simple example- the quoting rules for mysql aren't the same as oracle/pgsql/sqlite, thus you need to get the quoter for the specific backend). The old mantra about globals sucking basically is true; the access to misc backend functionality really needs to be grabbed via the actual backend object itself if there ever is an intention of supporting N backends w/in a single interprettor. 4) minor, but annoying; forced module layout means writing generated/new backends is tricky, further, you have to shove the backend into django.db.backends (the hardcoded location is addressable w/out the refactoring, although the layout issue would remain). What I'm implementing/proposing; 1) shift access to introspection/creation/client module functionality to; connection.introspection # literal attr based namespace connection.creation # literall attr based namespace; realistically # could shift DATA_TYPES to connection.DATA_TYPES and drop creation. connection.runshell # func to execute the shell 2) shift access of base.* misc. bits into 5 attrs; connection.DatabaseError # should realistically be there anyways, and # potentially accessible on the cursor object connection.IntegrityError # same connection.orm_map # base.OPERATOR_MAPPING connection.ops # basically the misc get_*, quote_name, *_transaction, # dict* methods floating in base; connection.capabilities # the misc bools django relies on to discern # what sql to generate for the backend; allows_group_by_ordinals, # allows_unique_and_pk, autoindexes_primary_key, etc 3) convert code over to accessing connection, instead of backend. Kind of a given this breaks the hell out of current consumers doing sql generation (moving quote_name for example), but the api breakage can be limited via adding a temporary __getattr__ to the base connection class that searches the new compartmentalized locations returning from there. Not a good long term solution, but should be an effective intermediate band aid. Basically, the pros of the approach are: 1) fix, or enable the next step in fixing of listed cons above in mainline django (instead of folks just forking off django with their needed changes). 2) actual base interface present for backend bits, making things less of a crapshoot when trying to write backend agnostic code. 3) connection reuse/pooling being able to be inlined in (or wrapped, implementation will be similar enough) for backends that lack it; fun example that came to mind was writing a simple wrapper to collect the backtraces for where queries are getting forced to evalute- it's not a complex example, but ought to give y'all an idea of some of the stuff that's doable with cleanup. Right now, you would have to inline it every time- tweaking the settings is far saner, and promotes reuse (plus it would be a useful tool :) So... thoughts? The second round of refactoring posted on 5106 currently lacks some of the new features mentioned above (need to port them over mainly), but v3l address that, and add in at least a SteadyDB persistant connection (pooling would be based off that). The current backend implementations in v2 are also still a fair bit ugly; currently just mapping old layout into the new api (no point converting till folks agree with the new api mainly). Comments would be appreciated; personally, I'm not after a mass gutting of what's there, just after refactoring it so that it's in a semi-sane state for extension/heavier refactoring. ~harring
pgpyjVZkNSsLj.pgp
Description: PGP signature
