[sqlalchemy] Using Alchemy across tens of thousands of tables, hundreds of databases and dozens of dataservers.

phrrn...@googlemail.com Tue, 10 Mar 2009 13:16:49 -0700

Thanks to all the help from the group, I am now facing an issue much
earlier than anticipated: how to manage Alchemy in -- apologies for
using a dreadfully overused phrase -- an 'Enterprise Setting'. This
really boils down to matters of scale: tens of thousands of tables in
hundreds of databases across several dozen dataservers.  I am
interested in how to structure and organize the Python code-artifacts
containing the metadata collections, POPO class declarations and
mapper() invocations which associate the classes to the tables. I am
also interested in configuration and credential management i.e. how to
control which dataservers are queried and the credentials used to
connect to them. Various use-cases include: use a replica reporting
dataserver for queries; use a development system for the
foo.bar.bletch class/class-hierarchy but use 'production' for
everything else; use SQLite for high-performance querying of stable
(i.e. does not change much if ever over time) reference/lookup data
but use production systems for live, trading-related data.


Now how does one manage all of this at the kinds of scale described
above? Hoes does one stitch together the various mapped classes to the
appropriate database engines at runtime? What kind of namespaces -- if
any -- would you use to manage large numbers of metadata collections?

If your eyes have glazed over at this point, there are more details
below!

I am looking forward to hearing if anyone has used Alchemy 'in the
large' and what their experiences have been, either positive or
negative.

thanks,

pjjH


Starting at the lowest levels, we have a centralized time-series of
physical meta-data for a number of our dataservers (production,
development and QA) with a reasonly unified representation independent
of the underlying dataserver technology (thanks to the ODBC catalog
calls). It is reasonably easy to add in new dataserver platforms and
the system is linearly scalable. In conjunction with the Cheetah
templating system and the wonderful cog code-generation tool, we can
code-generate Alchemy meta-data collections for arbitrary subsets of
tables: this fragment will generate the Python code to populate a
MetaData collection with all tables from the 'pear' database on the
'BANANA' dataserver

metadata = MetaData()

#[[[cog
#    m = Mongo(dataserver='BANANA', database='pear)
#    cog.outl("# auto-generated SQLAlchemy stuff here")
#    tables = m.lookup()
#    for table in tables:
#       cog.outl("%s" % m.apply_template('sa.tmpl', table))
#]]]

Similarly, later on in the same file or in a completely different
file, we can have a cog fragment like this one that generates stub
POPO class declarations and mapper invocations that map the POPO class
to the given table.

#[[[cog
#    def camelize(s):
#       return ''.join(word[0].upper() + word[1:] for word in s.split
('_'))
#
#    for table in tables:
#       cog.outl("%s" % m.apply_template('sa_mappers.tmpl', table,
{'camelize': camelize}))
#]]]


We also have hand-written mapper code that adds what I call 'cooked
accessors' to the POPO classes: this one adds a property called 'Type'
which returns a single OrgType object (OrgType being the wrapper class
around the foreign key reference/key/lookup table 'org_type'

class_mapper(Organization).add_properties({
   'Type' : relation(OrgType, uselist=False),
   }

This basic mechanism can be used to build up a collection of 'boring'
classes (which I sometimes hear referred to as Data Transfer Objects
or DTOs). cog allows us to mix hand-written and auto-generated code in
the same file so we can have a reasonably loose, 'build-time' coupling
between Python and the database schema and we get change management
and auditability (because now the interesting bits of the database
schema are serialized as Python code and get checked in, tagged etc
just like any other file). We also get documentation, thanks to Mike's
suggestion to use attributes.instrumentation_finders.

It seems obvious that related groups of objects on the same dataserver
should be grouped together in the same metadata collection as part of
the 'build'/code-generation process. It also seems obvious that each
metadata collection should have some form of default association with
a dataserver URI. However, we also want to be able to configure
metadata subsets to talk to development dataservers while everything
else talks to 'production' or, perhaps less controversially, have
reads go against a read-only replica while writes go to a master.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Using Alchemy across tens of thousands of tables, hundreds of databases and dozens of dataservers.

Reply via email to