Re: building distributed systems with django?

2008-06-08 Thread Ed McCaffrey
Twitter isn't a good candidate for simply creating a cache and replicating
your database.  I'll just post a link, since it goes along with my own
thoughts.

http://ayende.com/Blog/archive/2008/06/03/Architecting-Twitter.aspx


On Sun, Jun 8, 2008 at 12:32 PM, lgr888999 <[EMAIL PROTECTED]> wrote:

>
> Pat: Im pretty sure you can build a twitter clone with django and make
> it scale pretty well with replication but you still have a single
> point of failure if the master db gets to much load.  Im thinking
> about simpleDB from amazon. I wonder if that would be a good platform
> to build something twitter-like on.
>
> On Jun 8, 10:42 am, Pat <[EMAIL PROTECTED]> wrote:
> > I'm not certain I understand what you're asking because high available
> > isn't exactly related to distributed systems.  I think you're asking
> > about sharding (partitioning of data across multiple databases).  If
> > that's your question, then the answer is yes, it can be done.  It's
> > just writing a custom backend, tweeking managers, etc.   The deep
> > truth is, however, that all you would be doing is building the Google
> > App. Engine.  The question is why not start there.
> >
> > As it turns out, twitter is not an example of decentralized
> > distributed system though ("We currently use one database for writes
> > with multiple slaves for read queries" ~
> http://blog.twitter.com/2008/05/its-not-rocket-science-but-its-our-wo...).
> > Their problems aren't really related to a lack of sharding as much as
> > they are querying on a many-to-many model I think.  There's a good
> > blog out there on the problem but I can't find the link.
> >
> > On Jun 7, 7:11 am, lgr888999 <[EMAIL PROTECTED]> wrote:
> >
> > > Im curious to hear if theres any django developer that has built any
> > > decentralized distributed system with help of django? I know building
> > > a website doesnt include any scaling problems in the beginning, im
> > > asking more out of personal interests in high availability. If you
> > > dont have any experience with it feel free to post your thoughts of
> > > how you would build a huge decentraliced system. Now of course it
> > > would depend on what the purpose of the system is so lets just take
> > > twitter as an example. :)
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: building distributed systems with django?

2008-06-08 Thread lgr888999

Pat: Im pretty sure you can build a twitter clone with django and make
it scale pretty well with replication but you still have a single
point of failure if the master db gets to much load.  Im thinking
about simpleDB from amazon. I wonder if that would be a good platform
to build something twitter-like on.

On Jun 8, 10:42 am, Pat <[EMAIL PROTECTED]> wrote:
> I'm not certain I understand what you're asking because high available
> isn't exactly related to distributed systems.  I think you're asking
> about sharding (partitioning of data across multiple databases).  If
> that's your question, then the answer is yes, it can be done.  It's
> just writing a custom backend, tweeking managers, etc.   The deep
> truth is, however, that all you would be doing is building the Google
> App. Engine.  The question is why not start there.
>
> As it turns out, twitter is not an example of decentralized
> distributed system though ("We currently use one database for writes
> with multiple slaves for read queries" 
> ~http://blog.twitter.com/2008/05/its-not-rocket-science-but-its-our-wo...).
> Their problems aren't really related to a lack of sharding as much as
> they are querying on a many-to-many model I think.  There's a good
> blog out there on the problem but I can't find the link.
>
> On Jun 7, 7:11 am, lgr888999 <[EMAIL PROTECTED]> wrote:
>
> > Im curious to hear if theres any django developer that has built any
> > decentralized distributed system with help of django? I know building
> > a website doesnt include any scaling problems in the beginning, im
> > asking more out of personal interests in high availability. If you
> > dont have any experience with it feel free to post your thoughts of
> > how you would build a huge decentraliced system. Now of course it
> > would depend on what the purpose of the system is so lets just take
> > twitter as an example. :)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: building distributed systems with django?

2008-06-08 Thread Pat

I'm not certain I understand what you're asking because high available
isn't exactly related to distributed systems.  I think you're asking
about sharding (partitioning of data across multiple databases).  If
that's your question, then the answer is yes, it can be done.  It's
just writing a custom backend, tweeking managers, etc.   The deep
truth is, however, that all you would be doing is building the Google
App. Engine.  The question is why not start there.

As it turns out, twitter is not an example of decentralized
distributed system though ("We currently use one database for writes
with multiple slaves for read queries" ~
http://blog.twitter.com/2008/05/its-not-rocket-science-but-its-our-work.html).
Their problems aren't really related to a lack of sharding as much as
they are querying on a many-to-many model I think.  There's a good
blog out there on the problem but I can't find the link.

On Jun 7, 7:11 am, lgr888999 <[EMAIL PROTECTED]> wrote:
> Im curious to hear if theres any django developer that has built any
> decentralized distributed system with help of django? I know building
> a website doesnt include any scaling problems in the beginning, im
> asking more out of personal interests in high availability. If you
> dont have any experience with it feel free to post your thoughts of
> how you would build a huge decentraliced system. Now of course it
> would depend on what the purpose of the system is so lets just take
> twitter as an example. :)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: building distributed systems with django?

2008-06-08 Thread John Dohn
On Sun, Jun 8, 2008 at 7:57 AM, lgr888999 <[EMAIL PROTECTED]> wrote:

>
> replication isnt exactly distributing... with only one master db which
> handles all the writes you have a single point of failure...


In MySQL and very likely in other DBs you can have a multi-master setup with
more than one write-host.

JDJ

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: building distributed systems with django?

2008-06-07 Thread lgr888999

replication isnt exactly distributing... with only one master db which
handles all the writes you have a single point of failure...

On 7 Juni, 20:05, Jeff Anderson <[EMAIL PROTECTED]> wrote:
> lgr888999 wrote:
> > Great write up! Thanks! Im fully aware of that the problem is in the
> > datastorage but django doesnt support for example sharding or multiple
> > databases out of the box, the other way around its more about keeping
> > things dry and normalized which makes it harder to build something
> > decentralized. Hence this post to see if someone succeded with such a
> > setup.
>
> Many databases support replication, and you can use an SQL proxy that is
> smart enough for load balancing.
>
> Chapter 20 in the Django Book details this. The authors have implemented
> a distributed django system.http://www.djangobook.com/en/1.0/chapter20/
> About halfway down there is a section header "scaling".
>
> Hope this helps!
>
> Jeff Anderson
>
>  signature.asc
> 1KHämta
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: building distributed systems with django?

2008-06-07 Thread Jeff Anderson

lgr888999 wrote:

Great write up! Thanks! Im fully aware of that the problem is in the
datastorage but django doesnt support for example sharding or multiple
databases out of the box, the other way around its more about keeping
things dry and normalized which makes it harder to build something
decentralized. Hence this post to see if someone succeded with such a
setup.
  
Many databases support replication, and you can use an SQL proxy that is 
smart enough for load balancing.


Chapter 20 in the Django Book details this. The authors have implemented 
a distributed django system. http://www.djangobook.com/en/1.0/chapter20/ 
About halfway down there is a section header "scaling".


Hope this helps!


Jeff Anderson



signature.asc
Description: OpenPGP digital signature


Re: building distributed systems with django?

2008-06-07 Thread lgr888999

Great write up! Thanks! Im fully aware of that the problem is in the
datastorage but django doesnt support for example sharding or multiple
databases out of the box, the other way around its more about keeping
things dry and normalized which makes it harder to build something
decentralized. Hence this post to see if someone succeded with such a
setup.

On Jun 7, 5:07 pm, "John Dohn" <[EMAIL PROTECTED]> wrote:
> On Sun, Jun 8, 2008 at 2:11 AM, lgr888999 <[EMAIL PROTECTED]> wrote:
> > how you would build a huge decentraliced system. Now of course it
> > would depend on what the purpose of the system is so lets just take
> > twitter as an example. :)
>
> It's easy. All you have to do is to avoid all single points of failure and
> all possible bottlenecks. Just that ;-)
>
> Now, in practice this is *very* complicated. Have an example of a pretty
> simple website with 3 classic tiers - webservers, app logic and database
> backend.
>
> The first bottleneck and point of failure is the path to reach your
> internet-facing servers. This is relatively easily avoidable with acquiring
> your own "portable" block of IP addresses (PA) and have multiple paths to
> the wide net through independent ISPs. Provided you have your datacenters in
> multiple locations you'll get pretty reliable access to your service for
> most of the internet.
>
> Another major bottleneck is indeed the database. Unless you look as high as
> Google or Yahoo are with their custom replicated/redundant DB solutions
> you'll probably end up with some sort of SQL backend. You shouldn't aim for
> having access to all DB updates from all connected clients immediately, in
> no time. It helps a lot if you could identify "clouds" of objects that must
> appear to work synchronously and the rest that may get updated when its time
> comes. For instance - a twitter user that posts a message must be able to
> see it immediately on his page. Otherwise he'll ge confused. On the other
> hand whether his friends can see it in 1 secs or 1 minute is not that
> important in most cases.
>
> Objects directly related to one user's session are obviously in the
> "synchronous cloud", others are in "async cloud" and it's not that critical
> that one session has immediate access to other sessions' clouds. The
> importance of this separation comes up once you have to deal with multiple
> geographically distant datacenters (DCs). You can have a DB cluster in each
> of them (Oracle RAC, MySQL NDB, or something similar) and then you'll have
> to design replication strategies between the datacenters.
>
> This is probably one of the most difficult parts of application design. You
> must ensure the replication is resilient against things like conflicting
> updates (since transactions won't work over multiple DCs) leading e.g. to
> duplicate keys. And there's much more. Some things will require a "global
> ack" from all DCs worldwide before they could be committed, e.g.
> registration of new user must ensure that the same one is not being
> registered at the same time somewhere else.
>
> OTOH Things like currently logged-in users and their session information may
> not need to be replicated elsewhere at all. These tend to be high-volume
> things and often are better treated differently from "real content". Luckily
> for you most user sessions will send all requests to just one DC because of
> quite stable routing paths in the internet. However it may happen that a
> user starts a session talking to DC1 and after a while transfers to DC2. In
> that case you can require re-login or, better and more user-friendly,
> request his session data from his "home" DC.
>
> As you can see it's not that much about Django or the web application to
> build a distributed scalable website. The core part is the datastore
> management.
>
> All the above comes from my experience with operational management of a
> major news-site with three distinct datacenters on two continents with
> millions page views a day. Indeed our setup is much more complex with
> different subsystems having their specific requirements but to share some
> hints the above simplification is sufficient.
>
> Hope that helps ;-)
>
> JDJ
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: building distributed systems with django?

2008-06-07 Thread John Dohn
On Sun, Jun 8, 2008 at 2:11 AM, lgr888999 <[EMAIL PROTECTED]> wrote:

> how you would build a huge decentraliced system. Now of course it
> would depend on what the purpose of the system is so lets just take
> twitter as an example. :)
>

It's easy. All you have to do is to avoid all single points of failure and
all possible bottlenecks. Just that ;-)

Now, in practice this is *very* complicated. Have an example of a pretty
simple website with 3 classic tiers - webservers, app logic and database
backend.

The first bottleneck and point of failure is the path to reach your
internet-facing servers. This is relatively easily avoidable with acquiring
your own "portable" block of IP addresses (PA) and have multiple paths to
the wide net through independent ISPs. Provided you have your datacenters in
multiple locations you'll get pretty reliable access to your service for
most of the internet.

Another major bottleneck is indeed the database. Unless you look as high as
Google or Yahoo are with their custom replicated/redundant DB solutions
you'll probably end up with some sort of SQL backend. You shouldn't aim for
having access to all DB updates from all connected clients immediately, in
no time. It helps a lot if you could identify "clouds" of objects that must
appear to work synchronously and the rest that may get updated when its time
comes. For instance - a twitter user that posts a message must be able to
see it immediately on his page. Otherwise he'll ge confused. On the other
hand whether his friends can see it in 1 secs or 1 minute is not that
important in most cases.

Objects directly related to one user's session are obviously in the
"synchronous cloud", others are in "async cloud" and it's not that critical
that one session has immediate access to other sessions' clouds. The
importance of this separation comes up once you have to deal with multiple
geographically distant datacenters (DCs). You can have a DB cluster in each
of them (Oracle RAC, MySQL NDB, or something similar) and then you'll have
to design replication strategies between the datacenters.

This is probably one of the most difficult parts of application design. You
must ensure the replication is resilient against things like conflicting
updates (since transactions won't work over multiple DCs) leading e.g. to
duplicate keys. And there's much more. Some things will require a "global
ack" from all DCs worldwide before they could be committed, e.g.
registration of new user must ensure that the same one is not being
registered at the same time somewhere else.

OTOH Things like currently logged-in users and their session information may
not need to be replicated elsewhere at all. These tend to be high-volume
things and often are better treated differently from "real content". Luckily
for you most user sessions will send all requests to just one DC because of
quite stable routing paths in the internet. However it may happen that a
user starts a session talking to DC1 and after a while transfers to DC2. In
that case you can require re-login or, better and more user-friendly,
request his session data from his "home" DC.

As you can see it's not that much about Django or the web application to
build a distributed scalable website. The core part is the datastore
management.

All the above comes from my experience with operational management of a
major news-site with three distinct datacenters on two continents with
millions page views a day. Indeed our setup is much more complex with
different subsystems having their specific requirements but to share some
hints the above simplification is sufficient.

Hope that helps ;-)

JDJ

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



building distributed systems with django?

2008-06-07 Thread lgr888999

Im curious to hear if theres any django developer that has built any
decentralized distributed system with help of django? I know building
a website doesnt include any scaling problems in the beginning, im
asking more out of personal interests in high availability. If you
dont have any experience with it feel free to post your thoughts of
how you would build a huge decentraliced system. Now of course it
would depend on what the purpose of the system is so lets just take
twitter as an example. :)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---