The most common reason for this is not setting an agent name in the 
configuration and therefore no results are fetched.  Another possibility 
is not setting the searcher.dir configuration directive correctly.

Dennis Kubes
----------------------------------------------------------
Thanks Dennis Kubes.
During craweling, It seems nutch crawels successfully and many datum are
added into
  the crawl directory.
Here is my /webapps/root/web-inf/classes/nutch-site.xml which sets search
directory:
////////////// /webapps/root/web-inf/classes/nutch-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>searcher.dir</name>
<value>/mnt/hdb7/search/nutch-0.9/nutch-0.9/bin/crawl</value>
</property>
</configuration>
//////////////////////////////////////////////////////////
Below is my /nutch-0.9/nutch-0.9/conf/nutch-site.xml
////////// /nutch-0.9/nutch-0.9/conf/nutch-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>http.agent.name</name>
  <value>nutch</value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.
  NOTE: You should also check other related properties:
        http.robots.agents
        http.agent.description
        http.agent.url
        http.agent.email
        http.agent.version
  and set their values appropriately.
  </description>
</property>
<property>
  <name>http.agent.description</name>
  <value>hello</value>
  <description>Further description of our bot- this text is used in
  the User-Agent header.  It appears in parenthesis after the agent name.
  </description>
</property>

<property>
  <name>http.agent.url</name>
  <value>http://hello.com</value>
  <description>A URL to advertise in the User-Agent header.  This will 
   appear in parenthesis after the agent name. Custom dictates that this
   should be a URL of a page explaining the purpose and behavior of this
   crawler.
  </description>
</property>

<property>
  <name>http.agent.email</name>
  <value>[EMAIL PROTECTED]</value>
  <description>An email address to advertise in the HTTP 'From' request
   header and User-Agent header. A good practice is to mangle this
   address (e.g. 'info at example dot com') to avoid spamming.
  </description>
</property>
</configuration>
//////////////////////////////////////////////////////////
Still returns 0.
-- 
View this message in context: 
http://www.nabble.com/Why-Nutch-returns-0-results--tf3629370.html#a10139011
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to