Re: how to start thread by group?

2008-10-13 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Gabriel
Genellina wrote:

 En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy [EMAIL PROTECTED]
 escribió:

 Lawrence D'Oliveiro wrote:

 In message [EMAIL PROTECTED],
 Gabriel Genellina wrote:

 Usually it's more efficient to create all the MAX_THREADS at once, and
 continuously feed them with tasks to be done.

  Given that the bottleneck is most likely to be the internet
 connection, I'd say the premature optimization is the root of all evil
 adage applies here.

 Feeding a fixed pool of worker threads with a Queue() is a standard
 design that is easy to understand and one the OP should learn.  Re-using
 tested code is certainly efficient of programmer time.
 
 I'd like to add that debugging a program that continuously creates and
 destroys threads is a real PITA.

That's God trying to tell you to avoid threads altogether.
--
http://mail.python.org/mailman/listinfo/python-list


Re: how to start thread by group?

2008-10-13 Thread [EMAIL PROTECTED]
On Oct 13, 6:54 am, Lawrence D'Oliveiro [EMAIL PROTECTED]
central.gen.new_zealand wrote:
 In message [EMAIL PROTECTED], Gabriel



 Genellina wrote:
  En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy [EMAIL PROTECTED]
  escribió:

  Lawrence D'Oliveiro wrote:

  In message [EMAIL PROTECTED],
  Gabriel Genellina wrote:

  Usually it's more efficient to create all the MAX_THREADS at once, and
  continuously feed them with tasks to be done.

   Given that the bottleneck is most likely to be the internet
  connection, I'd say the premature optimization is the root of all evil
  adage applies here.

  Feeding a fixed pool of worker threads with a Queue() is a standard
  design that is easy to understand and one the OP should learn.  Re-using
  tested code is certainly efficient of programmer time.

  I'd like to add that debugging a program that continuously creates and
  destroys threads is a real PITA.

 That's God trying to tell you to avoid threads altogether.

Especially in a case like this that's tailor made for a trivial state-
machine solution if you really want multiple connections.
--
http://mail.python.org/mailman/listinfo/python-list


Re: how to start thread by group?

2008-10-08 Thread bieffe62
On 7 Ott, 06:37, Gabriel Genellina [EMAIL PROTECTED] wrote:
 En Mon, 06 Oct 2008 11:24:51 -0300, [EMAIL PROTECTED] escribió:

  On 6 Ott, 15:24, oyster [EMAIL PROTECTED] wrote:
  my code is not right, can sb give me a hand? thanx

  for example, I have 1000 urls to be downloaded, but only 5 thread at  
  one time
  I would restructure my code with someting like this ( WARNING: the
  following code is
  ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
  express my idea of
  the algorithm (which, also, could be wrong:-) ):

 Your code creates one thread per url (but never more than MAX_THREADS  
 alive at the same time). Usually it's more efficient to create all the  
 MAX_THREADS at once, and continuously feed them with tasks to be done. A  
 Queue object is the way to synchronize them; from the documentation:

 code
  from Queue import Queue
  from threading import Thread

 num_worker_threads = 3
 list_of_urls = [http://foo.com;, http://bar.com;,
                  http://baz.com;, http://spam.com;,
                  http://egg.com;,
                 ]

 def do_work(url):
      from time import sleep
      from random import randrange
      from threading import currentThread
      print %s downloading %s % (currentThread().getName(), url)
      sleep(randrange(5))
      print %s done % currentThread().getName()

 # from this point on, copied almost verbatim from the Queue example
 # at the end ofhttp://docs.python.org/library/queue.html

 def worker():
      while True:
          item = q.get()
          do_work(item)
          q.task_done()

 q = Queue()
 for i in range(num_worker_threads):
       t = Thread(target=worker)
       t.setDaemon(True)
       t.start()

 for item in list_of_urls:
      q.put(item)

 q.join()       # block until all tasks are done
 print Finished
 /code

 --
 Gabriel Genellina


Agreed.
I was trying to do what the OP was trying to do, but in a way that
works.
But keeping the thread alive and feeding them the URL is a better
design, definitly.
And no, I don't think its 'premature optimization': it is just
cleaner.

Ciao
--
FB
--
http://mail.python.org/mailman/listinfo/python-list


Re: how to start thread by group?

2008-10-07 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Gabriel
Genellina wrote:

 Usually it's more efficient to create all the MAX_THREADS at once, and
 continuously feed them with tasks to be done.

Given that the bottleneck is most likely to be the internet connection, I'd
say the premature optimization is the root of all evil adage applies
here.
--
http://mail.python.org/mailman/listinfo/python-list


Re: how to start thread by group?

2008-10-07 Thread Terry Reedy

Lawrence D'Oliveiro wrote:

In message [EMAIL PROTECTED], Gabriel
Genellina wrote:


Usually it's more efficient to create all the MAX_THREADS at once, and
continuously feed them with tasks to be done.


Given that the bottleneck is most likely to be the internet connection, I'd
say the premature optimization is the root of all evil adage applies
here.


There is also the bottleneck of programmer time to understand, write, 
and maintain.  In this case, 'more efficient' is simpler, and to me, 
more efficient of programmer time.  Feeding a fixed pool of worker 
threads with a Queue() is a standard design that is easy to understand 
and one the OP should learn.  Re-using tested code is certainly 
efficient of programmer time.  Managing a variable pool of workers that 
die and need to be replaced is more complex (two loops nested within a 
loop) and error prone (though learning that alternative is probably not 
a bad idea also).


tjr

--
http://mail.python.org/mailman/listinfo/python-list


Re: how to start thread by group?

2008-10-07 Thread Gabriel Genellina
En Tue, 07 Oct 2008 13:25:01 -0300, Terry Reedy [EMAIL PROTECTED]  
escribió:

Lawrence D'Oliveiro wrote:
In message [EMAIL PROTECTED],  
Gabriel Genellina wrote:



Usually it's more efficient to create all the MAX_THREADS at once, and
continuously feed them with tasks to be done.
 Given that the bottleneck is most likely to be the internet  
connection, I'd

say the premature optimization is the root of all evil adage applies
here.


There is also the bottleneck of programmer time to understand, write,  
and maintain.  In this case, 'more efficient' is simpler, and to me,  
more efficient of programmer time.  Feeding a fixed pool of worker  
threads with a Queue() is a standard design that is easy to understand  
and one the OP should learn.  Re-using tested code is certainly  
efficient of programmer time.  Managing a variable pool of workers that  
die and need to be replaced is more complex (two loops nested within a  
loop) and error prone (though learning that alternative is probably not  
a bad idea also).


I'd like to add that debugging a program that continuously creates and  
destroys threads is a real PITA.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


how to start thread by group?

2008-10-06 Thread oyster
my code is not right, can sb give me a hand? thanx

for example, I have 1000 urls to be downloaded, but only 5 thread at one time
def threadTask(ulr):
  download(url)

threadsAll=[]
for url in all_url:
 task=threading.Thread(target=threadTask, args=[url])
 threadsAll.append(task)

for every5task in groupcount(threadsAll,5):
for everytask in every5task:
everytask.start()

for everytask in every5task:
everytask.join()

for everytask in every5task:#this does not run ok
while everytask.isAlive():
pass
--
http://mail.python.org/mailman/listinfo/python-list


Re: how to start thread by group?

2008-10-06 Thread bieffe62
On 6 Ott, 15:24, oyster [EMAIL PROTECTED] wrote:
 my code is not right, can sb give me a hand? thanx

 for example, I have 1000 urls to be downloaded, but only 5 thread at one time
 def threadTask(ulr):
   download(url)

 threadsAll=[]
 for url in all_url:
      task=threading.Thread(target=threadTask, args=[url])
      threadsAll.append(task)

 for every5task in groupcount(threadsAll,5):
     for everytask in every5task:
         everytask.start()

     for everytask in every5task:
         everytask.join()

     for everytask in every5task:        #this does not run ok
         while everytask.isAlive():
             pass

Thread.join() stops until the thread is finished. You are assuming
that the threads
terminates exactly in the order in which are started. Moreover, before
starting the
next 5 threads you are waiting that all previous 5 threads have been
completed, while I
believe your intention was to have always the full load of 5 threads
downloading.

I would restructure my code with someting like this ( WARNING: the
following code is
ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
express my idea of
the algorithm (which, also, could be wrong:-) ):


import threading, time

MAX_THREADS = 5
DELAY = 0.01 # or whatever

def task_function( url ):
download( url )

def start_thread( url):
task=threading.Thread(target=task_function, args=[url])
return task

def main():
all_urls = load_urls()
all_threads = []
while all_urls:
while len(all_threads)  MAX_THREADS:
url = all_urls.pop(0)
t = start_thread()
all_threads.append(t)
for t in all_threads
if not t.isAlive():
t.join()
all_threads.delete(t)
time.sleep( DELAY )


HTH

Ciao
-
FB
--
http://mail.python.org/mailman/listinfo/python-list


Re: how to start thread by group?

2008-10-06 Thread Gabriel Genellina

En Mon, 06 Oct 2008 11:24:51 -0300, [EMAIL PROTECTED] escribió:


On 6 Ott, 15:24, oyster [EMAIL PROTECTED] wrote:

my code is not right, can sb give me a hand? thanx

for example, I have 1000 urls to be downloaded, but only 5 thread at  
one time



I would restructure my code with someting like this ( WARNING: the
following code is
ABSOLUTELY UNTESTED and shall be considered only as pseudo-code to
express my idea of
the algorithm (which, also, could be wrong:-) ):


Your code creates one thread per url (but never more than MAX_THREADS  
alive at the same time). Usually it's more efficient to create all the  
MAX_THREADS at once, and continuously feed them with tasks to be done. A  
Queue object is the way to synchronize them; from the documentation:


code
from Queue import Queue
from threading import Thread

num_worker_threads = 3
list_of_urls = [http://foo.com;, http://bar.com;,
http://baz.com;, http://spam.com;,
http://egg.com;,
   ]

def do_work(url):
from time import sleep
from random import randrange
from threading import currentThread
print %s downloading %s % (currentThread().getName(), url)
sleep(randrange(5))
print %s done % currentThread().getName()

# from this point on, copied almost verbatim from the Queue example
# at the end of http://docs.python.org/library/queue.html

def worker():
while True:
item = q.get()
do_work(item)
q.task_done()

q = Queue()
for i in range(num_worker_threads):
 t = Thread(target=worker)
 t.setDaemon(True)
 t.start()

for item in list_of_urls:
q.put(item)

q.join()   # block until all tasks are done
print Finished
/code


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list