[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > I am currently planning to write my own web crawler. I know Python but > not Perl, and I am interested in knowing which of these two are a > better choice given the following scenario: > > 1) I/O issues: my biggest constraint in terms of resource will be > bandwidth throttle neck. > 2) Efficiency issues: The crawlers have to be fast, robust and as > "memory efficient" as possible. I am running all of my crawlers on > cheap pcs with about 500 mb RAM and P3 to P4 processors > 3) Compatibility issues: Most of these crawlers will run on Unix > (FreeBSD), so there should exist a pretty good compiler that can > optimize my code these under the environments. > > What are your opinions?
Use python with twisted. With a friend I wrote a crawler. Our first attempt was standard python. Our second attempt was with twisted. Twisted absolutely blew the socks off our first attempt - mainly because you can fetch 100s or 1000s of pages simultaneously, without threads. Python with twisted will satisfy 1-3. You'll have to get your head around its asynchronous nature, but once you do you'll be writing a killer crawler ;-) As for Perl - once upon a time I would have done this with perl, but I wouldn't go back now! -- Nick Craig-Wood <[EMAIL PROTECTED]> -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list