How does NutchBean know where to query? Can I point it differently after a merge to query against the merged result?
I have two crawls (under nutch0.9 with cygwin) that I merged as follows: bin/mergecrawl.sh c:/n9b/merged/merge1 c:/n9a/z/sf911truth c:/n9b/z/wtc7 I can query them individually as follows: [EMAIL PROTECTED] /cygdrive/c/n9a $ bin/nutch org.apache.nutch.searcher.NutchBean mission Total hits: 2 0 20070730135339/http://www.sf911truth.org/about.html ... California 9-11 Truth Alliance Mission: "Our mission is to seek ... 1 20070730135315/http://www.sf911truth.org/ ... 1997 to June 2003 MissionĂ¡Statement and Meetings ... and querying the second one: [EMAIL PROTECTED] /cygdrive/c/n9b $ bin/nutch org.apache.nutch.searcher.NutchBean mission Total hits: 3 0 20070730135403/http://www.wtc7.net/lcache/wtc7.htm ... at the United States Mission to the United ... new sense of mission to the agency ... 1 20070730135403/http://www.wtc7.net/cache/awg_enews_2002_29.txt AWG E-MAIL NEWS 2002-29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ CONTENTS 1) AGI GOVERNMENT AFFAIRS MONTHLY REVIEW: OCTOBER 2002 2) AGI GOVERNMENT AFFAIRS ... 2 20070730135557/http://www.wtc7.net/cache/phillyblast_benthere.htm Phillyblast Was Here: Phillyblast Was Here: (To see the area implosions we missed, click here. ) ... Those query results are what I get from running NutchBean in c:/n9a and c:/n9b respectively. Now I'd like to run NutchBean against the merged result and see if all of those rows show up. That leads me to the question--how do you tell NutchBean where to search? How do I tell it to go against c:/n9b/merged/merge1 rather than c:/n9b/z/wtc7 which is somehow the default location for NutchBean. Another question: how does NutchBean know about that latter location anyway? Sure it's where I stored the c:/n9b crawl originally-- does that mean there is a config file somewhere that stores the location of the most recent crawl? These questions on NutchBean are quite general because they apply any time a new crawl is created somewhere. You want to know how to point NutchBean to different locations to query. ____________________________________________________________________________________ Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/ ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
