[EMAIL PROTECTED] wrote:
> 1) I/O issues: my biggest constraint in terms of resource will be
> bandwidth throttle neck.
> 2) Efficiency issues: The crawlers have to be fast, robust and as
> "memory efficient" as possible. I am running all of my crawlers on
> cheap pcs with about 500 mb RAM and P3 to P4 processors
> 3) Compatibility issues: Most of these crawlers will run on Unix
> (FreeBSD), so there should exist a pretty good compiler that can
> optimize my code these under the environments.

You should rethink your requirements. You expect to be I/O bound, so why do
you require a good "compiler"? Especially when asking about two interpreted
languages...

Consider using lxml (with Python), it has pretty much everything you need for
a web crawler, supports threaded parsing directly from HTTP URLs, and it's
plenty fast and pretty memory efficient.

http://codespeak.net/lxml/

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to