Django scaling and Database replication

2007-04-08 Thread Merric Mercer

The django book's chapter on deployment mentions  the use of Database 
replication as a means to scale using MySQL.

My understanding with Database replication is it uses  a MASTER DB and a 
number of SLAVES.
The master updates the slaves, asynchronously.  This means that the 
slaves are used for reading data only and only the master is used for 
writing data.
As the number of reads typically exceed the number of writes replication 
is supposed to work well.

However, I can't figure out how Django handles it.  I can't see anything 
in the documentation or the settings that would allow writes to be 
handled by a different host to the reads.

Is there a way to do this? 

MerMer


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread James Bennett

On 4/8/07, Merric Mercer <[EMAIL PROTECTED]> wrote:
> However, I can't figure out how Django handles it.  I can't see anything
> in the documentation or the settings that would allow writes to be
> handled by a different host to the reads.

The idea with both load balancing and DB replication is that Django
doesn't need to know about them -- if Django had to know how many load
balancers were in front of it or how many databases were in the
cluster behind it, configuration would become much more complex for no
good reason.

So, for example, with databases you generally have a pooling
connection manager between Django and the DB cluster, and Django talks
to that instead of having to figure out how many databases to talk to
and when to talk to each one. This is how we've done it a couple
times, albeit with PostgreSQL; we've used pgpool to maintain the pool
of database connections, and had Django talk to it.


-- 
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread Julio Nobrega

  Yes, there's a way, but it's not in Django that you do this, but on
the database. You don't need to configured Django (or any application
accessing the database) to "talk" to slave hosts. It's the job of the
database server software to abstract this step for you.

  You're going to setup slaves to a master host, and in most cases,
the former will be used for SELECT and the latter to I/U/D.

  What happens is something like this: Master M and slaves S1 and S2
are configured. Application asks M for row id 15. M sends query to S1.
S1 answers to M who answers to application. Application asks for row
id 20. M sends to the next slave, S2, and so on. As new queries keep
coming, so M keeps rotating between its slaves.

On 4/8/07, Merric Mercer <[EMAIL PROTECTED]> wrote:
>
> The django book's chapter on deployment mentions  the use of Database
> replication as a means to scale using MySQL.
>
> My understanding with Database replication is it uses  a MASTER DB and a
> number of SLAVES.
> The master updates the slaves, asynchronously.  This means that the
> slaves are used for reading data only and only the master is used for
> writing data.
> As the number of reads typically exceed the number of writes replication
> is supposed to work well.
>
> However, I can't figure out how Django handles it.  I can't see anything
> in the documentation or the settings that would allow writes to be
> handled by a different host to the reads.
>
> Is there a way to do this?
>
> MerMer
>
>
> >
>


-- 
Julio Nobrega - http://www.inerciasensorial.com.br

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread kemuri

hey Merric,

On Apr 9, 4:10 am, Merric Mercer <[EMAIL PROTECTED]> wrote:
> The django book's chapter on deployment mentions  the use of Database
> replication as a means to scale using MySQL.
>

If you want to try something cool, try MySQL Cluster, and better 5.1
since it has disk-based support instead of only in-memory. You will
need to do some modifications to the CREATE TABLE if you want to go
that way, but well..

Basically if comes to this that you have 2 data nodes which store data
and in the MySQL servers you have tables using the NDB engine. This
means that you can add like 40 MySQL servers all using the same data,
at the same time.

Anyway, this is a bit advanced MySQL.. And it's not really a Django
matter indeed.

Best regards,

Geert


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread Merric Mercer

Thank Kemuri,

MySQL cluster seems very cool, but I'm not sure it is the best solution 
if the DB is split over different networks . Latency might be an issue 
with the synchronous setup that MySQL cluster provides.

Having looked at it a bit further since my post I am considering 
"circular replication" with a master on each  network and slaves under 
each master (based on what Julio mentioned).   Using this type of setup 
I am hoping to test out Amazon's EC2 as a way to scale Django apps.

There's an interest article on setting up circular replication at 
http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html

MerMer

kemuri wrote:
> hey Merric,
>
> On Apr 9, 4:10 am, Merric Mercer <[EMAIL PROTECTED]> wrote:
>   
>> The django book's chapter on deployment mentions  the use of Database
>> replication as a means to scale using MySQL.
>>
>> 
>
> If you want to try something cool, try MySQL Cluster, and better 5.1
> since it has disk-based support instead of only in-memory. You will
> need to do some modifications to the CREATE TABLE if you want to go
> that way, but well..
>
> Basically if comes to this that you have 2 data nodes which store data
> and in the MySQL servers you have tables using the NDB engine. This
> means that you can add like 40 MySQL servers all using the same data,
> at the same time.
>
> Anyway, this is a bit advanced MySQL.. And it's not really a Django
> matter indeed.
>
> Best regards,
>
> Geert
>
>
> >
>
>
>   


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread Merric Mercer

Julio,

In this scenario am I right in thinking that in MySQL the Master 
automatically acts as a load balancer and that  I therefore don't need 
any other software
to automatically delegate reads between slaves and MySQL knows to send 
all writes to the Master?

I'm slightly confused,  I've been reading an article at on Scaling MySQL 
with replication at  http://docs.hp.com/en/5991-7432/ar01s05.html - to 
quote:-

"There should be two data sources configured in any application that is 
accessing the replicated database. One is the LVS, which acts as a load 
balancer for any read queries to the slave servers, and the other is the 
master server for any write queries.

Why would you need LVS to balance the load between the slaves if this is 
already handled by the Master?

MerMer

>   Yes, there's a way, but it's not in Django that you do this, but on
> the database. You don't need to configured Django (or any application
> accessing the database) to "talk" to slave hosts. It's the job of the
> database server software to abstract this step for you.
>
>   You're going to setup slaves to a master host, and in most cases,
> the former will be used for SELECT and the latter to I/U/D.
>
>   What happens is something like this: Master M and slaves S1 and S2
> are configured. Application asks M for row id 15. M sends query to S1.
> S1 answers to M who answers to application. Application asks for row
> id 20. M sends to the next slave, S2, and so on. As new queries keep
> coming, so M keeps rotating between its slaves.
>
>   


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread Merric Mercer

The official documentation on MySQL 5.1  "Using Replication for 
ScaleOut" is explicit and states that it is the application (Django) 
that needs to send the writes to the Master and the Reads to the 
Slaves.Unless I'm wrong this would rule out using replication with 
Django.  

The quote from the MySQL document is below.  According to quote changing 
Django to handle replication should be relatively trivial (but beyond my 
skill set at present),  - does anybody know whether any work has been 
done on this?

MerMer

"If the part of your code that is responsible for database access has 
been properly abstracted/modularized, converting it to run with a 
replicated setup should be very smooth and easy. Change the 
implementation of your database access to send all writes to the master, 
and to send reads to either the master or a slave. If your code does not 
have this level of abstraction, setting up a replicated system gives you 
the opportunity and motivation to clean it up. Start by creating a 
wrapper library or module that implements the following functions:

*

  |safe_writer_connect()|

*

  |safe_reader_connect()|

*

  |safe_reader_statement()|

*

  |safe_writer_statement()|

|safe_| in each function name means that the function takes care of 
handling all error conditions. You can use different names for the 
functions. The important thing is to have a unified interface for 
connecting for reads, connecting for writes, doing a read, and doing a 
write."



Julio Nobrega wrote:
>   Yes, there's a way, but it's not in Django that you do this, but on
> the database. You don't need to configured Django (or any application
> accessing the database) to "talk" to slave hosts. It's the job of the
> database server software to abstract this step for you.
>
>   You're going to setup slaves to a master host, and in most cases,
> the former will be used for SELECT and the latter to I/U/D.
>
>   What happens is something like this: Master M and slaves S1 and S2
> are configured. Application asks M for row id 15. M sends query to S1.
> S1 answers to M who answers to application. Application asks for row
> id 20. M sends to the next slave, S2, and so on. As new queries keep
> coming, so M keeps rotating between its slaves.
>
> On 4/8/07, Merric Mercer <[EMAIL PROTECTED]> wrote:
>   
>> The django book's chapter on deployment mentions  the use of Database
>> replication as a means to scale using MySQL.
>>
>> My understanding with Database replication is it uses  a MASTER DB and a
>> number of SLAVES.
>> The master updates the slaves, asynchronously.  This means that the
>> slaves are used for reading data only and only the master is used for
>> writing data.
>> As the number of reads typically exceed the number of writes replication
>> is supposed to work well.
>>
>> However, I can't figure out how Django handles it.  I can't see anything
>> in the documentation or the settings that would allow writes to be
>> handled by a different host to the reads.
>>
>> Is there a way to do this?
>>
>> MerMer
>>
>>
>> 
>
>
>   


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread James Bennett

On 4/9/07, Merric Mercer <[EMAIL PROTECTED]> wrote:
> The official documentation on MySQL 5.1  "Using Replication for
> ScaleOut" is explicit and states that it is the application (Django)
> that needs to send the writes to the Master and the Reads to the
> Slaves.Unless I'm wrong this would rule out using replication with
> Django.

If you're using MySQL, you want MySQL Cluster, which distributes the
data over "data nodes" in a cluster and controls access through a "SQL
node".

Failing that, you want a dedicated connection-pooling utility between
the application and database layers which can route queries
appropriately; this is not logic that belongs in the application
layer, because the application should not need to know or care how
many databases are actually behind it.

-- 
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread Julio Nobrega

  Hummm... looks like I own you an apology, Merric. I was wrong about
how to access the replicated data on the slave hosts. I was either
thinking about clusters or I was flat out wrong.

  But what James said, and kemuri, approachs my mistake. I hope it helps you :)

On 4/9/07, Merric Mercer <[EMAIL PROTECTED]> wrote:
>
> The official documentation on MySQL 5.1  "Using Replication for
> ScaleOut" is explicit and states that it is the application (Django)
> that needs to send the writes to the Master and the Reads to the
> Slaves.Unless I'm wrong this would rule out using replication with
> Django.
>
> The quote from the MySQL document is below.  According to quote changing
> Django to handle replication should be relatively trivial (but beyond my
> skill set at present),  - does anybody know whether any work has been
> done on this?
>
> MerMer

-- 
Julio Nobrega - http://www.inerciasensorial.com.br

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread Merric Mercer

James,

The issue with Cluster is that it is designed to work synchronously. 
This is fine when the all the DB is on a fast, local network but not 
when the DB needs to be replicated to geographically different networks, 
where latency becomes a major issue. 

Django already cares and knows about the DB it uses - so I not sure I 
agree that this can be abstracted out of Django and  I can see a bunch 
of reasons why Django needs to know more about the DB for large scale 
projects.

1. Replication with MySQL
2. Using multiple databases
3. Partitioning tables across multiple servers.

James Bennett wrote:
> On 4/9/07, Merric Mercer <[EMAIL PROTECTED]> wrote:
>   
>> The official documentation on MySQL 5.1  "Using Replication for
>> ScaleOut" is explicit and states that it is the application (Django)
>> that needs to send the writes to the Master and the Reads to the
>> Slaves.Unless I'm wrong this would rule out using replication with
>> Django.
>> 
>
> If you're using MySQL, you want MySQL Cluster, which distributes the
> data over "data nodes" in a cluster and controls access through a "SQL
> node".
>
> Failing that, you want a dedicated connection-pooling utility between
> the application and database layers which can route queries
> appropriately; this is not logic that belongs in the application
> layer, because the application should not need to know or care how
> many databases are actually behind it.
>
>   


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-09 Thread James Bennett

On 4/9/07, Merric Mercer <[EMAIL PROTECTED]> wrote:
> The issue with Cluster is that it is designed to work synchronously.
> This is fine when the all the DB is on a fast, local network but not
> when the DB needs to be replicated to geographically different networks,
> where latency becomes a major issue.

Wide distribution of the database seems to me to be a fairly unusual
case (and if the DB is distributed in that fashion, why can't the web
nodes also be distributed, with round-robin DNS or some similar setup,
so that they can remain in proximity to a particular database/DB
cluster and make use of it?), so I'm not sure it's within scope to
have to build this into a general-purpose application framework;
again, it seems like some sort of dedicated connection management
sitting between the layers is the ideal solution.

> Django already cares and knows about the DB it uses - so I not sure I
> agree that this can be abstracted out of Django and  I can see a bunch
> of reasons why Django needs to know more about the DB for large scale
> projects.

We'll have to agree to disagree, I guess; personally, I think that the
less the application layer has to know about the other layers, the
better, and that the ideal is a single point of connection between
them. For a database to punt on what is essentially a database-layer
issue, and demand that client-access logic be rewritten to suit the
database developers' unwillingness to deal with it, is a bit
disappointing.

-- 
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Django scaling and Database replication

2007-04-10 Thread kemuri


On Apr 10, 3:38 am, Merric Mercer <[EMAIL PROTECTED]> wrote:
> James,
>
> The issue with Cluster is that it is designed to work synchronously.
> This is fine when the all the DB is on a fast, local network but not
> when the DB needs to be replicated to geographically different networks,
> where latency becomes a major issue.

True.
With MySQL 5.1 you could replicate between Clusters, but usually
1 site is active, while the other is ready for take over. Master-
Master
replication is not going to work well with Cluster atm.

> Django already cares and knows about the DB it uses - so I not sure I
> agree that this can be abstracted out of Django and  I can see a bunch
> of reasons why Django needs to know more about the DB for large scale
> projects.
>
> 1. Replication with MySQL

But replicating to a machine on the other side of the planet is going
to
give you problems with latency as well. The Slave will get further
behind
with the Master, so they are not going to see the same data all the
time.

Anyway, MySQL cluster was just an idea, I still have to use it myself
with Django. The good part is that no need to change Django to scale
it out. But I would like to put in a backend for MySQL Cluster, but
not using
SQL.. That would kickass :)

Cheers,

Geert


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---