Re: how to start thread by group?
In message [EMAIL PROTECTED], Gabriel Genellina wrote: En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy [EMAIL PROTECTED] escribió: Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Gabriel Genellina wrote: Usually it's more efficient to create all the MAX_THREADS at once, and continuously feed them with tasks to be done. Given that the bottleneck is most likely to be the internet connection, I'd say the premature optimization is the root of all evil adage applies here. Feeding a fixed pool of worker threads with a Queue() is a standard design that is easy to understand and one the OP should learn. Re-using tested code is certainly efficient of programmer time. I'd like to add that debugging a program that continuously creates and destroys threads is a real PITA. That's God trying to tell you to avoid threads altogether. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to start thread by group?
On Oct 13, 6:54 am, Lawrence D'Oliveiro [EMAIL PROTECTED] central.gen.new_zealand wrote: In message [EMAIL PROTECTED], Gabriel Genellina wrote: En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy [EMAIL PROTECTED] escribió: Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Gabriel Genellina wrote: Usually it's more efficient to create all the MAX_THREADS at once, and continuously feed them with tasks to be done. Given that the bottleneck is most likely to be the internet connection, I'd say the premature optimization is the root of all evil adage applies here. Feeding a fixed pool of worker threads with a Queue() is a standard design that is easy to understand and one the OP should learn. Re-using tested code is certainly efficient of programmer time. I'd like to add that debugging a program that continuously creates and destroys threads is a real PITA. That's God trying to tell you to avoid threads altogether. Especially in a case like this that's tailor made for a trivial state- machine solution if you really want multiple connections. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to start thread by group?
On 7 Ott, 06:37, Gabriel Genellina [EMAIL PROTECTED] wrote: En Mon, 06 Oct 2008 11:24:51 -0300, [EMAIL PROTECTED] escribió: On 6 Ott, 15:24, oyster [EMAIL PROTECTED] wrote: my code is not right, can sb give me a hand? thanx for example, I have 1000 urls to be downloaded, but only 5 thread at one time I would restructure my code with someting like this ( WARNING: the following code is ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to express my idea of the algorithm (which, also, could be wrong:-) ): Your code creates one thread per url (but never more than MAX_THREADS alive at the same time). Usually it's more efficient to create all the MAX_THREADS at once, and continuously feed them with tasks to be done. A Queue object is the way to synchronize them; from the documentation: code from Queue import Queue from threading import Thread num_worker_threads = 3 list_of_urls = [http://foo.com;, http://bar.com;, http://baz.com;, http://spam.com;, http://egg.com;, ] def do_work(url): from time import sleep from random import randrange from threading import currentThread print %s downloading %s % (currentThread().getName(), url) sleep(randrange(5)) print %s done % currentThread().getName() # from this point on, copied almost verbatim from the Queue example # at the end ofhttp://docs.python.org/library/queue.html def worker(): while True: item = q.get() do_work(item) q.task_done() q = Queue() for i in range(num_worker_threads): t = Thread(target=worker) t.setDaemon(True) t.start() for item in list_of_urls: q.put(item) q.join() # block until all tasks are done print Finished /code -- Gabriel Genellina Agreed. I was trying to do what the OP was trying to do, but in a way that works. But keeping the thread alive and feeding them the URL is a better design, definitly. And no, I don't think its 'premature optimization': it is just cleaner. Ciao -- FB -- http://mail.python.org/mailman/listinfo/python-list
Re: how to start thread by group?
In message [EMAIL PROTECTED], Gabriel Genellina wrote: Usually it's more efficient to create all the MAX_THREADS at once, and continuously feed them with tasks to be done. Given that the bottleneck is most likely to be the internet connection, I'd say the premature optimization is the root of all evil adage applies here. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to start thread by group?
Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Gabriel Genellina wrote: Usually it's more efficient to create all the MAX_THREADS at once, and continuously feed them with tasks to be done. Given that the bottleneck is most likely to be the internet connection, I'd say the premature optimization is the root of all evil adage applies here. There is also the bottleneck of programmer time to understand, write, and maintain. In this case, 'more efficient' is simpler, and to me, more efficient of programmer time. Feeding a fixed pool of worker threads with a Queue() is a standard design that is easy to understand and one the OP should learn. Re-using tested code is certainly efficient of programmer time. Managing a variable pool of workers that die and need to be replaced is more complex (two loops nested within a loop) and error prone (though learning that alternative is probably not a bad idea also). tjr -- http://mail.python.org/mailman/listinfo/python-list
Re: how to start thread by group?
En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy [EMAIL PROTECTED] escribió: Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Gabriel Genellina wrote: Usually it's more efficient to create all the MAX_THREADS at once, and continuously feed them with tasks to be done. Given that the bottleneck is most likely to be the internet connection, I'd say the premature optimization is the root of all evil adage applies here. There is also the bottleneck of programmer time to understand, write, and maintain. In this case, 'more efficient' is simpler, and to me, more efficient of programmer time. Feeding a fixed pool of worker threads with a Queue() is a standard design that is easy to understand and one the OP should learn. Re-using tested code is certainly efficient of programmer time. Managing a variable pool of workers that die and need to be replaced is more complex (two loops nested within a loop) and error prone (though learning that alternative is probably not a bad idea also). I'd like to add that debugging a program that continuously creates and destroys threads is a real PITA. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
how to start thread by group?
my code is not right, can sb give me a hand? thanx for example, I have 1000 urls to be downloaded, but only 5 thread at one time def threadTask(ulr): download(url) threadsAll=[] for url in all_url: task=threading.Thread(target=threadTask, args=[url]) threadsAll.append(task) for every5task in groupcount(threadsAll,5): for everytask in every5task: everytask.start() for everytask in every5task: everytask.join() for everytask in every5task:#this does not run ok while everytask.isAlive(): pass -- http://mail.python.org/mailman/listinfo/python-list
Re: how to start thread by group?
On 6 Ott, 15:24, oyster [EMAIL PROTECTED] wrote: my code is not right, can sb give me a hand? thanx for example, I have 1000 urls to be downloaded, but only 5 thread at one time def threadTask(ulr): download(url) threadsAll=[] for url in all_url: task=threading.Thread(target=threadTask, args=[url]) threadsAll.append(task) for every5task in groupcount(threadsAll,5): for everytask in every5task: everytask.start() for everytask in every5task: everytask.join() for everytask in every5task: #this does not run ok while everytask.isAlive(): pass Thread.join() stops until the thread is finished. You are assuming that the threads terminates exactly in the order in which are started. Moreover, before starting the next 5 threads you are waiting that all previous 5 threads have been completed, while I believe your intention was to have always the full load of 5 threads downloading. I would restructure my code with someting like this ( WARNING: the following code is ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to express my idea of the algorithm (which, also, could be wrong:-) ): import threading, time MAX_THREADS = 5 DELAY = 0.01 # or whatever def task_function( url ): download( url ) def start_thread( url): task=threading.Thread(target=task_function, args=[url]) return task def main(): all_urls = load_urls() all_threads = [] while all_urls: while len(all_threads) MAX_THREADS: url = all_urls.pop(0) t = start_thread() all_threads.append(t) for t in all_threads if not t.isAlive(): t.join() all_threads.delete(t) time.sleep( DELAY ) HTH Ciao - FB -- http://mail.python.org/mailman/listinfo/python-list
Re: how to start thread by group?
En Mon, 06 Oct 2008 11:24:51 -0300, [EMAIL PROTECTED] escribió: On 6 Ott, 15:24, oyster [EMAIL PROTECTED] wrote: my code is not right, can sb give me a hand? thanx for example, I have 1000 urls to be downloaded, but only 5 thread at one time I would restructure my code with someting like this ( WARNING: the following code is ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to express my idea of the algorithm (which, also, could be wrong:-) ): Your code creates one thread per url (but never more than MAX_THREADS alive at the same time). Usually it's more efficient to create all the MAX_THREADS at once, and continuously feed them with tasks to be done. A Queue object is the way to synchronize them; from the documentation: code from Queue import Queue from threading import Thread num_worker_threads = 3 list_of_urls = [http://foo.com;, http://bar.com;, http://baz.com;, http://spam.com;, http://egg.com;, ] def do_work(url): from time import sleep from random import randrange from threading import currentThread print %s downloading %s % (currentThread().getName(), url) sleep(randrange(5)) print %s done % currentThread().getName() # from this point on, copied almost verbatim from the Queue example # at the end of http://docs.python.org/library/queue.html def worker(): while True: item = q.get() do_work(item) q.task_done() q = Queue() for i in range(num_worker_threads): t = Thread(target=worker) t.setDaemon(True) t.start() for item in list_of_urls: q.put(item) q.join() # block until all tasks are done print Finished /code -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list