Not sure if this one got out: One more for this morning. I thought Michael had suggested a while back that we could thread the retriever to allow more than one page to be retrieved at a time. Has any work been done on this? Since I'm fetching the pages at hotsync time, this would be a great speed up. I think it would only require modification of spider::parse. I'm thinking that instead of cycling through until the queue is empty, we cycle through until all threads are finished and the queue is empty. There would be another thread running that would check the queue for links to fetch. It would spawn a new thread to fetch each link. When each thread finishes, it would then dump the content on the main queue for parsing. Any ideas? Can Python even spawn threads?
