On Wed, 2005-04-20 at 09:55 -0700, Doug Cutting wrote:
> Jason Tang wrote:
> > Do anyone working on this issue [hiding file URLs when doing a remote 
> > search]
> > ? If none, I will go on.
> > I suppose it is not hard to support "indexing locally and searching 
> > remotely".
> 
> A simple way to implement this would be to change the protocol-file 
> plugin to handle http urls (add protocol-name="http" in plugin.xml), 
> then modify FileResponse.java to optionally accept http urls and convert 
> them to pathnames relative to some root directory.  Does that make sense?

Modifying the JSP sounds simpler for any particular installation. For
more general use, there's probably a general need for
Nutch-visible-URL-to-externally-visible-URL translation at display time
too.  For example, at one time we ran Nutch against an internal web
server with a mirror of a bunch of content that lived at some
externally-accessible URL; we wanted the search results to display the
externally-accessible URLs.

Last time I was doing filesystem indexing (with Nutch 0.5), I ran into a
bunch of minor problems:
- copying the entire filesystem into my segment directories was
undesirable, but mandatory
- limits on file size and number of outgoing links per "page" weren't
helpful
- if a directory name ended up in Nutch without a trailing slash
(file:///home/kragen rather than file:///home/kragen/), the relative
links from it were wrong.
- directories had links to "..", so three passes of crawling
from /home/kragen/a/b/c would index everything three levels down from
there, but also /home/kragen, /home/kragen/a/*,
and /home/kragen/a/b/*/*, which wasn't what I wanted.

Also, Nutch was noticeably slower than Lucene, for whatever reason, and
that was more noticeable when the data was coming from a
300-megabit-per-second hard disk than a 1-megabit-per-second network
link.




-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to