Re: Intensive Computations

2008-10-07 Thread bruno desthuilliers



puzzler a écrit :
> > I don't know much about your background so please forgive me if I give
> > you obvious advises, but perhaps you'd be better *first* coding the
> > solution to your problem - eventually looking for help on
> > comp.lang.python. Then you'll have real facts, not estimations.
> > Specially if yourcomputationhappens to be memory-hungry, since the
> > best way to put a server on it's knees is to make it go swapping. So
> > in your case, you may want to look for space optimisation instead of
> > time optimisation.
>
> Well, yes, but I'm trying to decide what language to use to program
> this computation.  Since my web framework is written entirely in
> Django, there's a bit of an advantage to using Python, just in terms
> of seamlessly calling the function from Django (but do I use separate
> threads, separate process, or do I not even need to worry about
> that?).  But I can probably get a hundred-fold speedup by using a
> statically-typed, compiled language.  But then I am forced to use a
> separate os process to call the program (more heavyweight than
> threads?), and need to find a language with an API to manipulate the
> database, worry about privileges, etc.

First point : if your computation happens to be that "intensive", you
don't want to call it 'in process'. It just won't scale. So you do
have to use separate processes - possibly on different machines.

Second point : if you computation only depends on it's inputs (which
you said was the case), then you just don't have to worry about
database access - this will still be handled by your Django app. All
you need is a way to invoke the computation process with appropriate
params, and a way to know when it's done and get the result back. All
this  - inter process communication - is far from rocket science, and
has long been solved in quite a lot of ways.

Third point : all this is based on "wet finger" estimate. So it's just
worth nothing. Experience proved that we developpers can be very bad
at this kind of guessing games. IOW, until you have facts, you just
don't know.

So, to make a long story short, don't bother about this for now, and
start writing code. Start in Python, benchmark, profile, and then
you'll at least have a clue as what the real constraints are and what
the appropriate strategy might be.

> That's why I'm asking now, ahead of programming, for more info about
> how hard it is to incorporate an intensive computation into the Django
> framework.

Stop worrying about django, it's has nothing to do with it. And don't
worry about how Python can communicate with either another process
(whatever langage has been used to implement the program) or a C-coded
library or whatever.

>  It could very well affect my choice of language.

Python is good for prototyping, and it's the main language used in
your app. So start with Python. Once you'll have something up and
running, you'll have hard facts wrt/ time-space problem, and it will
*then* be time considering alternate solutions.  IOW : don't worry
about Django, don't worry about how Django, don't worrry about Python,
and *write this damn code*. I repeat : until you do have working code,
everrything else is just blah blah, waste of time and premature
optimization.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Intensive Computations

2008-10-06 Thread Brian Neal

On Oct 6, 5:53 pm, puzzler <[EMAIL PROTECTED]> wrote:
> That's why I'm asking now, ahead of programming, for more info about
> how hard it is to incorporate an intensive computation into the Django
> framework.  It could very well affect my choice of language.

It is just python, really. So yes, you could do lots of things. You
could have a server running in C++ that your python app hands the
calculations off to. The C++ app could even send the results back to
your python code to store in the database. You have lots of options.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Intensive Computations

2008-10-06 Thread puzzler


> I don't know much about your background so please forgive me if I give
> you obvious advises, but perhaps you'd be better *first* coding the
> solution to your problem - eventually looking for help on
> comp.lang.python. Then you'll have real facts, not estimations.
> Specially if yourcomputationhappens to be memory-hungry, since the
> best way to put a server on it's knees is to make it go swapping. So
> in your case, you may want to look for space optimisation instead of
> time optimisation.

Well, yes, but I'm trying to decide what language to use to program
this computation.  Since my web framework is written entirely in
Django, there's a bit of an advantage to using Python, just in terms
of seamlessly calling the function from Django (but do I use separate
threads, separate process, or do I not even need to worry about
that?).  But I can probably get a hundred-fold speedup by using a
statically-typed, compiled language.  But then I am forced to use a
separate os process to call the program (more heavyweight than
threads?), and need to find a language with an API to manipulate the
database, worry about privileges, etc.

That's why I'm asking now, ahead of programming, for more info about
how hard it is to incorporate an intensive computation into the Django
framework.  It could very well affect my choice of language.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Intensive Computations

2008-10-06 Thread bruno desthuilliers



On 6 oct, 23:18, puzzler <[EMAIL PROTECTED]> wrote:
> > I'm afraid you failed to give enough informations about your
> > "intensivecomputation" (ie: what kind ofcomputation, what input data
> > does it requires, what does it outputs, is it mostly IO bound or cpu
> > bound or else, etc), nor about the expected load. According to these
> > informations, answers can range from "just call the function directly"
> > to "you'll obviously need a cluster" - which of course makes a big
> > difference.
>
> The computationally intensive function takes a string as an input, and
> generates a string of XML as the output.

You may want to consider json or yaml instead - a less verbose format
eats less memory (and/or requires less IO if you have to stream to
save on memory).

>  It does not rely on any
> additional data.

Ok, so you don't need access to the database to do your computation,
and you can easily split this computation from the server process.

>  The input string represents a problem space to be
> searched, and the output is the solution, or an indication that no
> solution is possible.  It's not really a "number-crunching" problem,
> but more like a breadth-first search of a large tree.

For which definition of "large" ? And where does this tree lives ?

>  So no I/O.
> Just CPU and memory intensive.

memory intensive can easily become 'IO bound' - when your server
starts to swap. Believe me, it's the worse possible thing for a web
server.

> I understand how picking the right data structures and algorithms
> impacts performance.  I am predicting it will still take about 4-5
> minutes to generate the solution.

I don't know much about your background so please forgive me if I give
you obvious advises, but perhaps you'd be better *first* coding the
solution to your problem - eventually looking for help on
comp.lang.python. Then you'll have real facts, not estimations.
Specially if your computation happens to be memory-hungry, since the
best way to put a server on it's knees is to make it go swapping. So
in your case, you may want to look for space optimisation instead of
time optimisation.

>  Once the solution is generated, it
> will be permanently stored in the database.  Future queries to solve
> the same problem will first consult the database to see if it has
> already been solved.

You may want to only store metadata in the database, and keep the xml
(or json or yaml or whatever) on the file system. This would let you
serve the xml (etc) as 'static' file.

>  I'd like to allow for the possiblity of, say,
> 100 users using this service at a time.

On a shared hosting ? 100 concurrent executions of a long (wrt/ an
average HTTP request processing time) cpu+memory hungry computation ?
Well, I'm not a sys-admin, but I'm a bit skeptical.

> I'd be using a Django webhosting service, specifically webfaction, so
> I'm somewhat restricted in terms of resources, and what can be
> installed.  I have no idea if they have psyco as an option.

Psyco only works on x86 - if they use some other processor
architecture, then no luck. But anyway, psyco trades space for speed,
which might not actually solve your problem.

Serously : first implement a module doing the computation, and only
the computation. Write it with a main entry point so you can either
use it as a module or as a script. Then start benchmarking, so you
really know how much resources (cpu, ram, space etc) you'll need for a
single instance of the whole computation. Until this is done,
everything else is just blah.

My 2 cents...
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Intensive Computations

2008-10-06 Thread puzzler

> I'm afraid you failed to give enough informations about your
> "intensivecomputation" (ie: what kind ofcomputation, what input data
> does it requires, what does it outputs, is it mostly IO bound or cpu
> bound or else, etc), nor about the expected load. According to these
> informations, answers can range from "just call the function directly"
> to "you'll obviously need a cluster" - which of course makes a big
> difference.

The computationally intensive function takes a string as an input, and
generates a string of XML as the output.  It does not rely on any
additional data.  The input string represents a problem space to be
searched, and the output is the solution, or an indication that no
solution is possible.  It's not really a "number-crunching" problem,
but more like a breadth-first search of a large tree.  So no I/O.
Just CPU and memory intensive.

I understand how picking the right data structures and algorithms
impacts performance.  I am predicting it will still take about 4-5
minutes to generate the solution.  Once the solution is generated, it
will be permanently stored in the database.  Future queries to solve
the same problem will first consult the database to see if it has
already been solved.  I'd like to allow for the possiblity of, say,
100 users using this service at a time.

I'd be using a Django webhosting service, specifically webfaction, so
I'm somewhat restricted in terms of resources, and what can be
installed.  I have no idea if they have psyco as an option.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Intensive Computations

2008-10-06 Thread bruno desthuilliers

On 6 oct, 18:46, puzzler <[EMAIL PROTECTED]> wrote:
> Let's say I want to provide a web form, where the user enters in some
> data.  Then, when the user hits the submit button, it triggers a very
> computationally intensive program to process that data that will take,
> say, 4 or 5 minutes.

Is it 'wet-finger' estimation or is it based on any serious
benchmark ?

> The user is taken to a page that shows a little
> "waiting" animation; and when the data-processing is complete, the
> result is shown.  I'm imagining that the best way to do this is to
> have the intensive computation put the result in a database when done,
> and the "waiting" page keeps polling the database to see whether the
> result is there.

Not necessarily - I mean, 'polling the database' - but you'll indeed
need to have a way to know when the computation is done !-)

> So here are my questions:
>
> If I write the computationally intensive function as a standard python
> function, and call it normally, will my whole Django app essentially
> "freeze" and stop serving other users while it processes this
> computation.

Mostly depends on how you deploy django, but "production" deployment
usually implies either multthreading or server process fork - else,
even serving static pages would rapidly freeze. But:

>  If so, do I instead use threads or separate os processes
> to handle the computation?  If the intensive computation consumes too
> much memory, or otherwise encounters some sort of error, will my whole
> Django server die?  Which solution best isolates the server from this
> kind of error?

if your computation is that heavy, your best bet is to totally isolate
it from your front-end server. IOW : design the whole thing to
delegate computation to a distinct "computation-server". How you
handle communication between the two (or more...) servers is up to you
(from row socket with custom protocol to http).

> Because this is computationally intensive, maybe Python isn't the best
> way to write this program.

Depends on the kind of computation... Pure Python, while not
ridiculous for a hi-level dynamic language, doesn't shine when it
comes to number-crunching, for sure - but then, there are dedicated C-
coded libs like numpy, and possible JIT compiler like psyco.

But before even starting to look at any of these solutions, you may
want to make sure you correctly designed and implemented your
computation. FWIW, some very simple design/coding change can make a
very huge difference. You may want to read this thread on
comp.lang.py:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/21aef9dec3fb5403/fbece20fb4b18f9e

where a simple rewrite using the appropriate data structure and
algorithm and a couple simple optimizations ended up in running 143
times faster (and, not posted but I did have an eye on it to, eating
quite less memory too, which can make a *very* huge difference when
you have several instances running concurrently - the difference
between using real RAM or the swap... )

>  If I want to write this program in another
> language, does Django offer any way to invoke such a program,

Django is just a Python framework, so the real question is "does
Python offer any way to invoke such a program". And the answer is that
Python offers many ways to skin this cat, from a braindead
'os.system("mycommand here") to C-coded lib binding, including any
mmap or any networking solution.

> and
> would such a program have access to the database which is controlled
> by Django and its database API?

The database is not "controlled" by Django, it's "accessed" by Django.
Any program in any language having bindings for your RDBMS can access
it - granted appropriate rights.

> Thanks!  Any comments, or pointers to appropriate documentation, is
> much appreciated.

I'm afraid you failed to give enough informations about your
"intensive computation" (ie: what kind of computation, what input data
does it requires, what does it outputs, is it mostly IO bound or cpu
bound or else, etc), nor about the expected load. According to these
informations, answers can range from "just call the function directly"
to "you'll obviously need a cluster" - which of course makes a big
difference.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Intensive Computations

2008-10-06 Thread puzzler

Let's say I want to provide a web form, where the user enters in some
data.  Then, when the user hits the submit button, it triggers a very
computationally intensive program to process that data that will take,
say, 4 or 5 minutes.  The user is taken to a page that shows a little
"waiting" animation; and when the data-processing is complete, the
result is shown.  I'm imagining that the best way to do this is to
have the intensive computation put the result in a database when done,
and the "waiting" page keeps polling the database to see whether the
result is there.

So here are my questions:

If I write the computationally intensive function as a standard python
function, and call it normally, will my whole Django app essentially
"freeze" and stop serving other users while it processes this
computation.  If so, do I instead use threads or separate os processes
to handle the computation?  If the intensive computation consumes too
much memory, or otherwise encounters some sort of error, will my whole
Django server die?  Which solution best isolates the server from this
kind of error?

Because this is computationally intensive, maybe Python isn't the best
way to write this program.  If I want to write this program in another
language, does Django offer any way to invoke such a program, and
would such a program have access to the database which is controlled
by Django and its database API?

Thanks!  Any comments, or pointers to appropriate documentation, is
much appreciated.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---