Thanks to all the help from the group, I am now facing an issue much earlier than anticipated: how to manage Alchemy in -- apologies for using a dreadfully overused phrase -- an 'Enterprise Setting'. This really boils down to matters of scale: tens of thousands of tables in hundreds of databases across several dozen dataservers. I am interested in how to structure and organize the Python code-artifacts containing the metadata collections, POPO class declarations and mapper() invocations which associate the classes to the tables. I am also interested in configuration and credential management i.e. how to control which dataservers are queried and the credentials used to connect to them. Various use-cases include: use a replica reporting dataserver for queries; use a development system for the foo.bar.bletch class/class-hierarchy but use 'production' for everything else; use SQLite for high-performance querying of stable (i.e. does not change much if ever over time) reference/lookup data but use production systems for live, trading-related data.
Now how does one manage all of this at the kinds of scale described above? Hoes does one stitch together the various mapped classes to the appropriate database engines at runtime? What kind of namespaces -- if any -- would you use to manage large numbers of metadata collections? If your eyes have glazed over at this point, there are more details below! I am looking forward to hearing if anyone has used Alchemy 'in the large' and what their experiences have been, either positive or negative. thanks, pjjH Starting at the lowest levels, we have a centralized time-series of physical meta-data for a number of our dataservers (production, development and QA) with a reasonly unified representation independent of the underlying dataserver technology (thanks to the ODBC catalog calls). It is reasonably easy to add in new dataserver platforms and the system is linearly scalable. In conjunction with the Cheetah templating system and the wonderful cog code-generation tool, we can code-generate Alchemy meta-data collections for arbitrary subsets of tables: this fragment will generate the Python code to populate a MetaData collection with all tables from the 'pear' database on the 'BANANA' dataserver metadata = MetaData() #[[[cog # m = Mongo(dataserver='BANANA', database='pear) # cog.outl("# auto-generated SQLAlchemy stuff here") # tables = m.lookup() # for table in tables: # cog.outl("%s" % m.apply_template('sa.tmpl', table)) #]]] Similarly, later on in the same file or in a completely different file, we can have a cog fragment like this one that generates stub POPO class declarations and mapper invocations that map the POPO class to the given table. #[[[cog # def camelize(s): # return ''.join(word[0].upper() + word[1:] for word in s.split ('_')) # # for table in tables: # cog.outl("%s" % m.apply_template('sa_mappers.tmpl', table, {'camelize': camelize})) #]]] We also have hand-written mapper code that adds what I call 'cooked accessors' to the POPO classes: this one adds a property called 'Type' which returns a single OrgType object (OrgType being the wrapper class around the foreign key reference/key/lookup table 'org_type' class_mapper(Organization).add_properties({ 'Type' : relation(OrgType, uselist=False), } This basic mechanism can be used to build up a collection of 'boring' classes (which I sometimes hear referred to as Data Transfer Objects or DTOs). cog allows us to mix hand-written and auto-generated code in the same file so we can have a reasonably loose, 'build-time' coupling between Python and the database schema and we get change management and auditability (because now the interesting bits of the database schema are serialized as Python code and get checked in, tagged etc just like any other file). We also get documentation, thanks to Mike's suggestion to use attributes.instrumentation_finders. It seems obvious that related groups of objects on the same dataserver should be grouped together in the same metadata collection as part of the 'build'/code-generation process. It also seems obvious that each metadata collection should have some form of default association with a dataserver URI. However, we also want to be able to configure metadata subsets to talk to development dataservers while everything else talks to 'production' or, perhaps less controversially, have reads go against a read-only replica while writes go to a master. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---