>may you will find that interesting also:
>http://maven.apache.org/using/multiproject.html

I'd rather suggest to support Apache HttpClient, huge amount of unnecessary
code could be easily removed from Nutch. We don't need to calculate "actual
URL" after redirecting, GetMethod does it all for us.

Using HTTP HEAD can improve performance; and many more staff. Google uses
HEAD method, I noticed from logs.

What about NekoHTML parser? getTextHelper method seems to be very strange,
Java 5 does it all (DOM level 3); new Parser plugin could be based on
http://htmlparser.sourceforge.net - and again we can remove buggy
getOutlinks().

I have experience with Maven, and CruiseControl. All Maven's staff
(checkstyle, javadoc, xdoc, developer's activity report, etc.) could be run
via ANT. Not a first priority...



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to