Hi, All,

I have since modified nutch to make it possible to fetch ftp sites.
This allows me to build up an intranet search engine for files
on both http and ftp servers. The fetcher is in stable running for weeks
over a few millions of urls.

In my modification, ftp response mimics http one, so that code changes
are at minimum (for the purpose of fetch). Specifically
(1) Response.java is made as an interface instead of a class
(2) HostQueueKey is tweaked to include url scheme (protocol) and port.
(3) of course, light change for HttpResponse.java, Http.java and others.

I would like to sumbit a patch if core developers think this approach is
sensible. My base code is nutch-2003-11-17.

Thanks,

John

On Sun, Dec 21, 2003 at 04:34:00PM -0800, [EMAIL PROTECTED] wrote:
> Hi, All,
>      
> Has anyone made nutch capable of fetching ftp sites besides http ones?
> Nutch uses its own http class when dealing with http fetches.
> Is there an ftp class in working too?
> 
> Thanks,
> 
> John
__________________________________________
http://www.neasys.com - A Good Place to Be
Come to visit us today!


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to