Re: how to scale (was: how to do something at startup)

Mark Green Sun, 30 Sep 2007 18:00:48 -0700

On Sun, 2007-09-30 at 16:16 -0500, James Bennett wrote:
> On 9/30/07, Mark Green <[EMAIL PROTECTED]> wrote:
> > Hm, this raises some serious scalabity questions for me.
> > >From your description it sounds like there is no template
> > fragment caching, not even db connection pooling possible
> > with django?
> 
> You can cache anything you want to cache; read the caching
> documentation (the whole thing) before jumping to conclusions about
> that. At work we use a custom template Node class which caches its
> output, for example.


Sorry, I was indeed jumping too quick on the caching issue or
rather wording my concerns poorly.

I'm not sure what drove me to call it "fragment caching".
What I really meant to point at are the little things (such as
form_for_model()) that would likely benefit from some object
caching instead of burning cycles for each request.

I do admit though that this may be scratching the realm of
micro-optimization and I realize I shouldn't have brought it up
without at least measuring it first. Let's just skip this point for now
(my bad, sorry again) and instead focus on the (imho) more glaring
issue of "no persistent connections", see below.

> As for database clustering, there's a philosophical issue here: Django
> shouldn't need to know whether there's one database server behind it,
> or five, or a hundred. We've had success using pgpool, for example,
> which -- from Django's point of view -- looks the same as any
> PostgreSQL database, but in reality is pooling connections and
> supports multiple actual databases running behind it.
> 
> Think of it the same way you'd do load-balancing in front of your
> application: just as users shouldn't need to know that you have, say,
> ten web nodes running Django, and just as they shouldn't have to stop
> and ask, "which one of the site's web nodes to I want to request a
> page from?", Django shouldn't need to know how many database nodes you
> have, or which one it should talk to on each query. The less the
> various layers of your stack have to know about each other, the easier
> it'll be to make changes.
> 
> I'd suggest reading the deployment chapter of the Django book for more 
> details:
> 
> http://www.djangobook.com/en/beta/chapter21/
> 
> > And what about integration with a messaging framework
> > (spread or somesuch) for efficient cluster communications?
> 
> So long as there's an interface you can talk to from Python, or over
> standard networking protocols, what's the holdup? Django does not have
> "out of the box" support for interoperating with every single
> component someone might want to use, but then neither would an
> "enterprise" Java framework; that's why you have programmers ;)

First off, thanks for all the insight. Unfortunately I think
you misread my "db connection pooling" as "db clustering".

My question was really only about the former, a much simpler problem:
How to keep a tcp connection persistent and re-use it across requests?

Creating and discarding tcp connections at a high rate imposes a
measurable overhead for both the initiator (django) and the
receiving end (e.g. RDBMS or even a pgpool on localhost).
While this overhead may be constant in most (not all!) scenarios
it's still a waste of resources that doesn't sit well with me.
In particular, if and when the receiving end slows down under load,
the last thing you want is incoming connection attempts to pile up.

I do understand (and endorse very much) that django is a shared nothing
architecture but imho that doesn't imply "zero internal persistence
across requests".

Further problems arise when you need to integrate with a remote peer
that simply depends on persistent connections. My current candidate is
the spread toolkit (http://www.spread.org) but it's certainly not the
only piece of "environmental software" working that way.

I'm currently approaching the problem by spawning a custom thread on
first request (thus my inquiry about "how to do something at startup"),
but I think django would benefit from providing standard infrastructure
for that - which comes for free when proper connection pooling for the
ORM is implemented.


-mark


PS: Sorry if django actually *does* proper pooling already and I'm 
    beating a dead horse here. My assumption that it doesn't do it
    comes from the fact that it doesn't seem to pull up a
    persistent thread and because my grep for "pool" over the
    svn sources didn't hit anything. If murder is the case
    you can just ignore my whole ranting...



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: how to scale (was: how to do something at startup)

Reply via email to