Re: How to scale out django apps?

2009-04-13 Thread Andy

Tim,

Thanks for the helpful answers.

As for specific details about my app, right now I'm still in the
design phase. It will start small, but hopefully it will get popular
quickly. So I don't know how big the Db will be or how many users will
there be.

What I'm trying to do is to make sure there is a graceful scale out
path should such a need arise.

Some answers to your questions:

> are you bringing back huge datasets or just small sub-slices of
your data?

Small

> are you updating large swaths of data at a time, or are you
just updating single records most of the time?

Mostly single records

> are just a few select users doing the updating, and all the
rest of your users are doing piles of reads?

Most users will do both reads and writes.
The usage pattern is not unlike facebook status update, where most
users will both update status and read status frequently.

> can you partition by things that totally do not relate, such as
by customer, so each customer can have their own instance that
then gets put wherever your admins define letting DNS balance the
load? (a'la BaseCamp's customername.basecamp.com)

App is for general users, anyone can sign up for free. So I doubt this
would work.

> can you tolerate replication delays?  what time-frame?
(sub-second?  async taking up to 30 minutes?  a whole day?)

Probably not. If a user updates her own status, she should be able to
see that updates immediately. Otherwise she'd think the site is not
working.


Is there any plan to support multiple connections? Sharding is a very
common techniques used in many places. I know in theory with a big
server and huge SAN disk array a single database could support many
users, but it's a lot cheaper to use multiple commodity servers
instead. And even with a huge budget sooner or later the scale up
approach will stop working.


On Apr 13, 6:51 am, Tim Chase  wrote:
> > Recently I found out Django doesn't support multiple databases. That's
> > quite surprising.
>
> > Given that limitation, how do you scale out a Django app?
>
> Depends on where your bottleneck(s) is/are.  It also depends
> heavily on your read/write usage pattern.  If you're truly
> experiencing a stenosis of the database connection, you have
> several options, but most of them reside in domain specific tuning.
>
> > Without multi-DB support, most of the usual techniques for scaling out
> > such as:
> >    - DB sharding
> >    - functional partitioning - eg. separate DB servers for user
> > profiles, orders, and products
> > would be infeasible with django.
>
> Sharding and functional partitioning don't yet exist in stock
> Django.  There's a GSoC project that may make some headway on
> "multiple database support", but I've not heard anything further
> on the Django Developers regarding that.
>
> > I know replication is still available. But that still means all data
> > must fit in 1 server.
>
> Well, with bountiful storage using things like AoE, SAS, SAN, FC,
> etc, having "all the data fit in one server" isn't a horrible
> issue.  And with 1TB drives on the market, fitting multiple TB in
> a single machine isn't a disastrous idea.  If you have more data
> than will fit in a single machine, you have a lot of other issues
> and will likely have to get very specific (and likely expensive
> ;-) help.
>
> > Also replication isn't going to help update performance.
>
> This goes back to my "read/write usage pattern" quip...if you
> have a high volume of reads, and a low volume of writes,
> replication is one of the first tools you reach for.  However,
> with a high volume of writes, you've entered the realm of "hard
> problems".  Usually if you app reaches this volume of DB traffic,
> you need a solution specialized to your domain, so stock Django
> may not be much help.  Given that you've not detailed the problem
> you're actually having (this is where profiling comes in), it's
> hard to point much beyond the generic here.  So answers to some
> questions might help:
>
> - are you bringing back huge datasets or just small sub-slices of
> your data?
>
> - are you updating large swaths of data at a time, or are you
> just updating single records most of the time?
>
> - are just a few select users doing the updating, and all the
> rest of your users are doing piles of reads?
>
> - how big is this hypothetical DB of yours?
>
> - can you partition by things that totally do not relate, such as
> by customer, so each customer can have their own instance that
> then gets put wherever your admins define letting DNS balance the
> load? (a'la BaseCamp's customername.basecamp.com)
>
> - can you tolerate replication delays?  what time-frame?
> (sub-second?  async taking up to 30 minutes?  a whole day?)
>
> - how readily can you cache things to prevent touching the
> database to begin with?  Can you cache with an HTTP proxy
> font-end for repeated pages?  Can you cache datasets or other
> fragments with memcached?  If your web-app follows 

Re: How to scale out django apps?

2009-04-13 Thread Tim Chase

> Recently I found out Django doesn't support multiple databases. That's
> quite surprising.
> 
> Given that limitation, how do you scale out a Django app?

Depends on where your bottleneck(s) is/are.  It also depends 
heavily on your read/write usage pattern.  If you're truly 
experiencing a stenosis of the database connection, you have 
several options, but most of them reside in domain specific tuning.

> Without multi-DB support, most of the usual techniques for scaling out
> such as:
>- DB sharding
>- functional partitioning - eg. separate DB servers for user
> profiles, orders, and products
> would be infeasible with django.

Sharding and functional partitioning don't yet exist in stock 
Django.  There's a GSoC project that may make some headway on 
"multiple database support", but I've not heard anything further 
on the Django Developers regarding that.

> I know replication is still available. But that still means all data
> must fit in 1 server.

Well, with bountiful storage using things like AoE, SAS, SAN, FC, 
etc, having "all the data fit in one server" isn't a horrible 
issue.  And with 1TB drives on the market, fitting multiple TB in 
a single machine isn't a disastrous idea.  If you have more data 
than will fit in a single machine, you have a lot of other issues 
and will likely have to get very specific (and likely expensive 
;-) help.

> Also replication isn't going to help update performance.

This goes back to my "read/write usage pattern" quip...if you 
have a high volume of reads, and a low volume of writes, 
replication is one of the first tools you reach for.  However, 
with a high volume of writes, you've entered the realm of "hard 
problems".  Usually if you app reaches this volume of DB traffic, 
you need a solution specialized to your domain, so stock Django 
may not be much help.  Given that you've not detailed the problem 
you're actually having (this is where profiling comes in), it's 
hard to point much beyond the generic here.  So answers to some 
questions might help:

- are you bringing back huge datasets or just small sub-slices of 
your data?

- are you updating large swaths of data at a time, or are you 
just updating single records most of the time?

- are just a few select users doing the updating, and all the 
rest of your users are doing piles of reads?

- how big is this hypothetical DB of yours?

- can you partition by things that totally do not relate, such as 
by customer, so each customer can have their own instance that 
then gets put wherever your admins define letting DNS balance the 
load? (a'la BaseCamp's customername.basecamp.com)

- can you tolerate replication delays?  what time-frame? 
(sub-second?  async taking up to 30 minutes?  a whole day?)

- how readily can you cache things to prevent touching the 
database to begin with?  Can you cache with an HTTP proxy 
font-end for repeated pages?  Can you cache datasets or other 
fragments with memcached?  If your web-app follows good design, 
any GET can be cached based on a subset of its headers.

Lastly, read over David Cramer's blog[1] as he's done some nice 
work scaling Django to big deployments and has some helpful tips.

-tim

[1]
http://www.davidcramer.net/category/code/django






--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



How to scale out django apps?

2009-04-13 Thread Continuation

Recently I found out Django doesn't support multiple databases. That's
quite surprising.

Given that limitation, how do you scale out a Django app?

Without multi-DB support, most of the usual techniques for scaling out
such as:
   - DB sharding
   - functional partitioning - eg. separate DB servers for user
profiles, orders, and products
would be infeasible with django.

I know replication is still available. But that still means all data
must fit in 1 server. Also replication isn't going to help update
performance.

Is scalability of django really limited to a single DB? Or are there
workarounds?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---