I'm trying to optimize a database driven web crawler and I was wondering if
anyone could offer any recommendations for interprocess communications.

Currently, the driver process periodically  queries a database to get a
list of URLs to crawler. It then stores these url's to be downloaded in a
complex in memory and pipes them to separate processes that do the actual
downloading. The problem is that the database queries are slow and block
the driver process.

I'd like to rearchitect the system so that the database queries occur in
the background. However, I'm unsure what mechanism to use. In theory,
threads would be ideally suited for this case but I'm not sure how stable
they are (I'm running Perl 5.14.1 with 200+ cpan modules).

Does anyone have any recommendations?



Boston-pm mailing list

Reply via email to