I don't use the nutch web application, but....  You don't have to
start nutch in the searcher directory.  You can set the location of
the searcher dir within the nutch-site.xml config file.

Add this node and set the location of your index:

<property>
  <name>searcher.dir</name>
  <value>/your/path/to/your/index</value>
  <description>
  Path to root of crawl.  This directory is searched (in
  order) for either the file search-servers.txt, containing a list of
  distributed search servers, or the directory "index" containing
  merged indexes, or the directory "segments" containing segment
  indexes.
  </description>
</property>







On 7/19/07, Robert Young <[EMAIL PROTECTED]> wrote:
> Tomcat only comes into it because we have to start Tomcat in the
> searcher directory, I'm guessing it's the same however you choose to
> use Nutch. It would still have to do a rename across physical volumes
> if searcher.dir is set to something different would it not?
>
> How does this sound as a sollution? Allow the user to set a
> configuration option setting the linkdb working dir, or allow the user
> to set a configuration flag to use another particular configuration
> option to set the base dir. Otherwise fall back to the default which
> is the current working directory.
>
> Cheers
> Rob
>
> On 7/19/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> > Robert Young wrote:
> > > In org.apache.nutch.crawl.LinkDb on line 261 it creates a working
> > > directory (newLinkDb) based on the current working directory. This
> > > should be configurable rather than being based on where Tomcat was
> > > started. I am planning on writing a patch to pull the hadoop.tmp.dir
> > > setting if it is available, falling back to the current directory.
> > >
> > > Can anyone see any obvious problems with doing this?
> >
> > I'm not sure what Tomcat has to do with this. LinkDb does it this way in
> > order to avoid rename() operation across physical volumes - if you
> > invoke rename() on a local FS it may trigger a costly copy operation.
> >
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >   ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >
>


-- 
"Conscious decisions by conscious minds are what make reality real"

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to