Tim,
Thanks for the helpful answers.
As for specific details about my app, right now I'm still in the
design phase. It will start small, but hopefully it will get popular
quickly. So I don't know how big the Db will be or how many users will
there be.
What I'm trying to do is to make sure there is a graceful scale out
path should such a need arise.
Some answers to your questions:
> are you bringing back huge datasets or just small sub-slices of
your data?
Small
> are you updating large swaths of data at a time, or are you
just updating single records most of the time?
Mostly single records
> are just a few select users doing the updating, and all the
rest of your users are doing piles of reads?
Most users will do both reads and writes.
The usage pattern is not unlike facebook status update, where most
users will both update status and read status frequently.
> can you partition by things that totally do not relate, such as
by customer, so each customer can have their own instance that
then gets put wherever your admins define letting DNS balance the
load? (a'la BaseCamp's customername.basecamp.com)
App is for general users, anyone can sign up for free. So I doubt this
would work.
> can you tolerate replication delays? what time-frame?
(sub-second? async taking up to 30 minutes? a whole day?)
Probably not. If a user updates her own status, she should be able to
see that updates immediately. Otherwise she'd think the site is not
working.
Is there any plan to support multiple connections? Sharding is a very
common techniques used in many places. I know in theory with a big
server and huge SAN disk array a single database could support many
users, but it's a lot cheaper to use multiple commodity servers
instead. And even with a huge budget sooner or later the scale up
approach will stop working.
On Apr 13, 6:51 am, Tim Chase wrote:
> > Recently I found out Django doesn't support multiple databases. That's
> > quite surprising.
>
> > Given that limitation, how do you scale out a Django app?
>
> Depends on where your bottleneck(s) is/are. It also depends
> heavily on your read/write usage pattern. If you're truly
> experiencing a stenosis of the database connection, you have
> several options, but most of them reside in domain specific tuning.
>
> > Without multi-DB support, most of the usual techniques for scaling out
> > such as:
> > - DB sharding
> > - functional partitioning - eg. separate DB servers for user
> > profiles, orders, and products
> > would be infeasible with django.
>
> Sharding and functional partitioning don't yet exist in stock
> Django. There's a GSoC project that may make some headway on
> "multiple database support", but I've not heard anything further
> on the Django Developers regarding that.
>
> > I know replication is still available. But that still means all data
> > must fit in 1 server.
>
> Well, with bountiful storage using things like AoE, SAS, SAN, FC,
> etc, having "all the data fit in one server" isn't a horrible
> issue. And with 1TB drives on the market, fitting multiple TB in
> a single machine isn't a disastrous idea. If you have more data
> than will fit in a single machine, you have a lot of other issues
> and will likely have to get very specific (and likely expensive
> ;-) help.
>
> > Also replication isn't going to help update performance.
>
> This goes back to my "read/write usage pattern" quip...if you
> have a high volume of reads, and a low volume of writes,
> replication is one of the first tools you reach for. However,
> with a high volume of writes, you've entered the realm of "hard
> problems". Usually if you app reaches this volume of DB traffic,
> you need a solution specialized to your domain, so stock Django
> may not be much help. Given that you've not detailed the problem
> you're actually having (this is where profiling comes in), it's
> hard to point much beyond the generic here. So answers to some
> questions might help:
>
> - are you bringing back huge datasets or just small sub-slices of
> your data?
>
> - are you updating large swaths of data at a time, or are you
> just updating single records most of the time?
>
> - are just a few select users doing the updating, and all the
> rest of your users are doing piles of reads?
>
> - how big is this hypothetical DB of yours?
>
> - can you partition by things that totally do not relate, such as
> by customer, so each customer can have their own instance that
> then gets put wherever your admins define letting DNS balance the
> load? (a'la BaseCamp's customername.basecamp.com)
>
> - can you tolerate replication delays? what time-frame?
> (sub-second? async taking up to 30 minutes? a whole day?)
>
> - how readily can you cache things to prevent touching the
> database to begin with? Can you cache with an HTTP proxy
> font-end for repeated pages? Can you cache datasets or other
> fragments with memcached? If your web-app follows