Re: Intensive Computations
puzzler a écrit : > > I don't know much about your background so please forgive me if I give > > you obvious advises, but perhaps you'd be better *first* coding the > > solution to your problem - eventually looking for help on > > comp.lang.python. Then you'll have real facts, not estimations. > > Specially if yourcomputationhappens to be memory-hungry, since the > > best way to put a server on it's knees is to make it go swapping. So > > in your case, you may want to look for space optimisation instead of > > time optimisation. > > Well, yes, but I'm trying to decide what language to use to program > this computation. Since my web framework is written entirely in > Django, there's a bit of an advantage to using Python, just in terms > of seamlessly calling the function from Django (but do I use separate > threads, separate process, or do I not even need to worry about > that?). But I can probably get a hundred-fold speedup by using a > statically-typed, compiled language. But then I am forced to use a > separate os process to call the program (more heavyweight than > threads?), and need to find a language with an API to manipulate the > database, worry about privileges, etc. First point : if your computation happens to be that "intensive", you don't want to call it 'in process'. It just won't scale. So you do have to use separate processes - possibly on different machines. Second point : if you computation only depends on it's inputs (which you said was the case), then you just don't have to worry about database access - this will still be handled by your Django app. All you need is a way to invoke the computation process with appropriate params, and a way to know when it's done and get the result back. All this - inter process communication - is far from rocket science, and has long been solved in quite a lot of ways. Third point : all this is based on "wet finger" estimate. So it's just worth nothing. Experience proved that we developpers can be very bad at this kind of guessing games. IOW, until you have facts, you just don't know. So, to make a long story short, don't bother about this for now, and start writing code. Start in Python, benchmark, profile, and then you'll at least have a clue as what the real constraints are and what the appropriate strategy might be. > That's why I'm asking now, ahead of programming, for more info about > how hard it is to incorporate an intensive computation into the Django > framework. Stop worrying about django, it's has nothing to do with it. And don't worry about how Python can communicate with either another process (whatever langage has been used to implement the program) or a C-coded library or whatever. > It could very well affect my choice of language. Python is good for prototyping, and it's the main language used in your app. So start with Python. Once you'll have something up and running, you'll have hard facts wrt/ time-space problem, and it will *then* be time considering alternate solutions. IOW : don't worry about Django, don't worry about how Django, don't worrry about Python, and *write this damn code*. I repeat : until you do have working code, everrything else is just blah blah, waste of time and premature optimization. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Intensive Computations
On Oct 6, 5:53 pm, puzzler <[EMAIL PROTECTED]> wrote: > That's why I'm asking now, ahead of programming, for more info about > how hard it is to incorporate an intensive computation into the Django > framework. It could very well affect my choice of language. It is just python, really. So yes, you could do lots of things. You could have a server running in C++ that your python app hands the calculations off to. The C++ app could even send the results back to your python code to store in the database. You have lots of options. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Intensive Computations
> I don't know much about your background so please forgive me if I give > you obvious advises, but perhaps you'd be better *first* coding the > solution to your problem - eventually looking for help on > comp.lang.python. Then you'll have real facts, not estimations. > Specially if yourcomputationhappens to be memory-hungry, since the > best way to put a server on it's knees is to make it go swapping. So > in your case, you may want to look for space optimisation instead of > time optimisation. Well, yes, but I'm trying to decide what language to use to program this computation. Since my web framework is written entirely in Django, there's a bit of an advantage to using Python, just in terms of seamlessly calling the function from Django (but do I use separate threads, separate process, or do I not even need to worry about that?). But I can probably get a hundred-fold speedup by using a statically-typed, compiled language. But then I am forced to use a separate os process to call the program (more heavyweight than threads?), and need to find a language with an API to manipulate the database, worry about privileges, etc. That's why I'm asking now, ahead of programming, for more info about how hard it is to incorporate an intensive computation into the Django framework. It could very well affect my choice of language. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Intensive Computations
On 6 oct, 23:18, puzzler <[EMAIL PROTECTED]> wrote: > > I'm afraid you failed to give enough informations about your > > "intensivecomputation" (ie: what kind ofcomputation, what input data > > does it requires, what does it outputs, is it mostly IO bound or cpu > > bound or else, etc), nor about the expected load. According to these > > informations, answers can range from "just call the function directly" > > to "you'll obviously need a cluster" - which of course makes a big > > difference. > > The computationally intensive function takes a string as an input, and > generates a string of XML as the output. You may want to consider json or yaml instead - a less verbose format eats less memory (and/or requires less IO if you have to stream to save on memory). > It does not rely on any > additional data. Ok, so you don't need access to the database to do your computation, and you can easily split this computation from the server process. > The input string represents a problem space to be > searched, and the output is the solution, or an indication that no > solution is possible. It's not really a "number-crunching" problem, > but more like a breadth-first search of a large tree. For which definition of "large" ? And where does this tree lives ? > So no I/O. > Just CPU and memory intensive. memory intensive can easily become 'IO bound' - when your server starts to swap. Believe me, it's the worse possible thing for a web server. > I understand how picking the right data structures and algorithms > impacts performance. I am predicting it will still take about 4-5 > minutes to generate the solution. I don't know much about your background so please forgive me if I give you obvious advises, but perhaps you'd be better *first* coding the solution to your problem - eventually looking for help on comp.lang.python. Then you'll have real facts, not estimations. Specially if your computation happens to be memory-hungry, since the best way to put a server on it's knees is to make it go swapping. So in your case, you may want to look for space optimisation instead of time optimisation. > Once the solution is generated, it > will be permanently stored in the database. Future queries to solve > the same problem will first consult the database to see if it has > already been solved. You may want to only store metadata in the database, and keep the xml (or json or yaml or whatever) on the file system. This would let you serve the xml (etc) as 'static' file. > I'd like to allow for the possiblity of, say, > 100 users using this service at a time. On a shared hosting ? 100 concurrent executions of a long (wrt/ an average HTTP request processing time) cpu+memory hungry computation ? Well, I'm not a sys-admin, but I'm a bit skeptical. > I'd be using a Django webhosting service, specifically webfaction, so > I'm somewhat restricted in terms of resources, and what can be > installed. I have no idea if they have psyco as an option. Psyco only works on x86 - if they use some other processor architecture, then no luck. But anyway, psyco trades space for speed, which might not actually solve your problem. Serously : first implement a module doing the computation, and only the computation. Write it with a main entry point so you can either use it as a module or as a script. Then start benchmarking, so you really know how much resources (cpu, ram, space etc) you'll need for a single instance of the whole computation. Until this is done, everything else is just blah. My 2 cents... --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Intensive Computations
> I'm afraid you failed to give enough informations about your > "intensivecomputation" (ie: what kind ofcomputation, what input data > does it requires, what does it outputs, is it mostly IO bound or cpu > bound or else, etc), nor about the expected load. According to these > informations, answers can range from "just call the function directly" > to "you'll obviously need a cluster" - which of course makes a big > difference. The computationally intensive function takes a string as an input, and generates a string of XML as the output. It does not rely on any additional data. The input string represents a problem space to be searched, and the output is the solution, or an indication that no solution is possible. It's not really a "number-crunching" problem, but more like a breadth-first search of a large tree. So no I/O. Just CPU and memory intensive. I understand how picking the right data structures and algorithms impacts performance. I am predicting it will still take about 4-5 minutes to generate the solution. Once the solution is generated, it will be permanently stored in the database. Future queries to solve the same problem will first consult the database to see if it has already been solved. I'd like to allow for the possiblity of, say, 100 users using this service at a time. I'd be using a Django webhosting service, specifically webfaction, so I'm somewhat restricted in terms of resources, and what can be installed. I have no idea if they have psyco as an option. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Intensive Computations
On 6 oct, 18:46, puzzler <[EMAIL PROTECTED]> wrote: > Let's say I want to provide a web form, where the user enters in some > data. Then, when the user hits the submit button, it triggers a very > computationally intensive program to process that data that will take, > say, 4 or 5 minutes. Is it 'wet-finger' estimation or is it based on any serious benchmark ? > The user is taken to a page that shows a little > "waiting" animation; and when the data-processing is complete, the > result is shown. I'm imagining that the best way to do this is to > have the intensive computation put the result in a database when done, > and the "waiting" page keeps polling the database to see whether the > result is there. Not necessarily - I mean, 'polling the database' - but you'll indeed need to have a way to know when the computation is done !-) > So here are my questions: > > If I write the computationally intensive function as a standard python > function, and call it normally, will my whole Django app essentially > "freeze" and stop serving other users while it processes this > computation. Mostly depends on how you deploy django, but "production" deployment usually implies either multthreading or server process fork - else, even serving static pages would rapidly freeze. But: > If so, do I instead use threads or separate os processes > to handle the computation? If the intensive computation consumes too > much memory, or otherwise encounters some sort of error, will my whole > Django server die? Which solution best isolates the server from this > kind of error? if your computation is that heavy, your best bet is to totally isolate it from your front-end server. IOW : design the whole thing to delegate computation to a distinct "computation-server". How you handle communication between the two (or more...) servers is up to you (from row socket with custom protocol to http). > Because this is computationally intensive, maybe Python isn't the best > way to write this program. Depends on the kind of computation... Pure Python, while not ridiculous for a hi-level dynamic language, doesn't shine when it comes to number-crunching, for sure - but then, there are dedicated C- coded libs like numpy, and possible JIT compiler like psyco. But before even starting to look at any of these solutions, you may want to make sure you correctly designed and implemented your computation. FWIW, some very simple design/coding change can make a very huge difference. You may want to read this thread on comp.lang.py: http://groups.google.com/group/comp.lang.python/browse_thread/thread/21aef9dec3fb5403/fbece20fb4b18f9e where a simple rewrite using the appropriate data structure and algorithm and a couple simple optimizations ended up in running 143 times faster (and, not posted but I did have an eye on it to, eating quite less memory too, which can make a *very* huge difference when you have several instances running concurrently - the difference between using real RAM or the swap... ) > If I want to write this program in another > language, does Django offer any way to invoke such a program, Django is just a Python framework, so the real question is "does Python offer any way to invoke such a program". And the answer is that Python offers many ways to skin this cat, from a braindead 'os.system("mycommand here") to C-coded lib binding, including any mmap or any networking solution. > and > would such a program have access to the database which is controlled > by Django and its database API? The database is not "controlled" by Django, it's "accessed" by Django. Any program in any language having bindings for your RDBMS can access it - granted appropriate rights. > Thanks! Any comments, or pointers to appropriate documentation, is > much appreciated. I'm afraid you failed to give enough informations about your "intensive computation" (ie: what kind of computation, what input data does it requires, what does it outputs, is it mostly IO bound or cpu bound or else, etc), nor about the expected load. According to these informations, answers can range from "just call the function directly" to "you'll obviously need a cluster" - which of course makes a big difference. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Intensive Computations
Let's say I want to provide a web form, where the user enters in some data. Then, when the user hits the submit button, it triggers a very computationally intensive program to process that data that will take, say, 4 or 5 minutes. The user is taken to a page that shows a little "waiting" animation; and when the data-processing is complete, the result is shown. I'm imagining that the best way to do this is to have the intensive computation put the result in a database when done, and the "waiting" page keeps polling the database to see whether the result is there. So here are my questions: If I write the computationally intensive function as a standard python function, and call it normally, will my whole Django app essentially "freeze" and stop serving other users while it processes this computation. If so, do I instead use threads or separate os processes to handle the computation? If the intensive computation consumes too much memory, or otherwise encounters some sort of error, will my whole Django server die? Which solution best isolates the server from this kind of error? Because this is computationally intensive, maybe Python isn't the best way to write this program. If I want to write this program in another language, does Django offer any way to invoke such a program, and would such a program have access to the database which is controlled by Django and its database API? Thanks! Any comments, or pointers to appropriate documentation, is much appreciated. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---