How does NutchBean know where to query?  Can I point
it differently after a merge to query against the
merged result?

I have two crawls (under nutch0.9 with cygwin) that I
merged as follows:

bin/mergecrawl.sh c:/n9b/merged/merge1
c:/n9a/z/sf911truth c:/n9b/z/wtc7

I can query them individually as follows:

[EMAIL PROTECTED] /cygdrive/c/n9a
$ bin/nutch org.apache.nutch.searcher.NutchBean
mission
Total hits: 2
 0 20070730135339/http://www.sf911truth.org/about.html
 ... California 9-11 Truth Alliance Mission: "Our
mission is to seek ...
 1 20070730135315/http://www.sf911truth.org/
 ... 1997 to June 2003 MissionĂ¡Statement and Meetings
...

and querying the second one:

[EMAIL PROTECTED] /cygdrive/c/n9b
$ bin/nutch org.apache.nutch.searcher.NutchBean
mission
Total hits: 3
 0 20070730135403/http://www.wtc7.net/lcache/wtc7.htm
 ... at the United States Mission to the United ...
new sense of mission to the agency ...
 1
20070730135403/http://www.wtc7.net/cache/awg_enews_2002_29.txt
AWG E-MAIL NEWS 2002-29
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
CONTENTS
1)  AGI GOVERNMENT AFFAIRS MONTHLY REVIEW: OCTOBER
2002
2)  AGI GOVERNMENT AFFAIRS  ...
 2
20070730135557/http://www.wtc7.net/cache/phillyblast_benthere.htm
Phillyblast Was Here: Phillyblast Was Here: (To see
the area implosions we missed, click here. )  ...


Those query results are what I get from running
NutchBean in c:/n9a and c:/n9b respectively.  

Now I'd like to run NutchBean against the merged
result and see if all of those rows show up.  That
leads me to  the question--how do you tell NutchBean
where to search?  How do I tell it to go against
c:/n9b/merged/merge1
rather than
c:/n9b/z/wtc7
which is somehow the default location for NutchBean.

Another question:  how does NutchBean know about that
latter location anyway?  Sure it's where I stored the
c:/n9b crawl originally-- does that mean there is a
config file somewhere that stores the location of the
most recent crawl?

These questions on NutchBean are quite general because
they apply any time a new crawl is created somewhere. 
You want to know how to point NutchBean to different
locations to query.


       
____________________________________________________________________________________
Pinpoint customers who are looking for what you sell. 
http://searchmarketing.yahoo.com/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to