Re: [sqlalchemy] Multiple databases or multiple schemas?

2010-06-14 Thread Henry Precheur
On Mon, Jun 14, 2010 at 01:06:04PM -0400, Michael Bayer wrote:
 you can do the multi-schema thing using either a PoolListener, or a
 Table subclass that implements a @property for the schema attribute
 - in both cases these would be linked to a thread local variable that
 you would set per request.   If the load per site is low, you can also
 forego the usage of pooling to eliminate the issue of dormant
 connections (possibly look into something like pgpool, perhaps they
 have some feature that optimizes this use case).

Both approach would work. The PoolListener looks a bit better, the Table
subclass looks a bit hackish. BTW I am not planning to use threads, I'll
just use a dozen python processes to make sure the site is responsive
under heavy load. This should simplify pooling quite a bit, I'll just
use a StaticPool.

 As far as approach, consider the load each per-user site will have,
 and how you might scale out to include multiple database servers.
 I'd grep around the PG docs and wiki carefully to get a full grasp of
 the implications of each approach.

Yes, scaling was one of the motivation for this approach, I definitely
need to learn more about how PG would handle this.

Simplifying the architecture was the main motivation. I don't want to
have a column `site_id` in every table. Also I think that putting
everything in its own schema would improve performance dramatically:
instead of big indexes for all sites, there'll be small indexes for each
site.


Thank-you so much, your reply helped a great deal.

-- 
  Henry PrĂȘcheur

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Multiple databases or multiple schemas?

2010-06-14 Thread Henry Precheur
On Sun, Jun 13, 2010 at 09:23:22PM +, Faheem Mitha wrote:
 Sqlalchemy's table can take the qschema as argument, eg.
 
 pheno_table = Table(
 'pheno', metadata,
 Column('patientid', String(60), primary_key=True),
 Column('famid', String(60), nullable=True),
 Column('sex_id',  None, ForeignKey(schemaname+'.sex.val', 
 onupdate='CASCADE', ondelete='CASCADE'), index=True),
 Column('race_id',  None, ForeignKey(schemaname+'.race.val', 
 onupdate='CASCADE', ondelete='CASCADE'), index=True),
 Column('phenotype', SmallInteger),
 schema = schemaname,
 )
 
 So I don't think you do have to do that.

The thing is that each table will be present in every schema. So I can't
use the schema parameter (or I can subclass Table like Mike suggested).


Cheers,

-- 
  Henry PrĂȘcheur

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Multiple databases or multiple schemas?

2010-06-13 Thread Henry Precheur
I'm starting a new project which is roughly hosting a bunch of sites. I
want the sites to be isolated, they'll all have the same schema (data
definition), but will store their data on different 'name-spaces'.

The front end will be a python web app. And I'd like to have this
application talk to all the 'name-spaces' at the same time:

  With a request on example.com/site1, the web app will use the
  'name-space' site1, with example.com/site2 it will use site2.

I am using Postgres. There are 2 options for the 'name-space': Database
or Schema [1]:
  1. One database per site
  2. One database for all the sites and 1 schema per site


Solution #1 would require to maintain 1 connection per site  per python
process. That means: lots of connections,  lots of memory needed. One
the other hand, this solution is supported by SQLAlchemy out-of-the-box.
I'll have a dictionary like that:

{'site1': Engine('postgres://.../site1',
 'site2': Engine('postgres://.../site2', ...}

And whenever a request comes in I get the right engine via this
dictionary.


Solution #2 is not supported natively by SQLAlchemy. Each time a request
comes-in I'll have to issue an additional query SET search_path TO
MY_SITE where MY_SITE is the schema associated with the site.


Solution #2 seems much more lightweight to me. The only problem is the
small overhead that might be created by the additional query.

What do you guys think? Will I get into trouble with solution #2?

If you have alternative suggestions I'd like to hear them :)


Regards,

[1]: http://www.postgresql.org/docs/8.4/interactive/ddl-schemas.html

-- 
  Henry PrĂȘcheur

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.