Rod Taylor wrote:
Add a few more extensions which I commonly see and cannot be parsed (that I am aware of). ZIP, mso, jar, bz2, XLS, pps, PPS, dot, etc.
[ ... ]
# skip image and other suffixes we can't yet parse --\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png)$ +-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|ps|wmf|zip|ZIP|ppt|mpg|xls|XLS||bz2|gz|mso|jar|rpm|tgz|Z|mov|MOV|exe|dot|pps|PPS)$
Is the '||' intentional, or a typo? Do you mean to prohibit files ending with just '.'?
Doug ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers