i have created two plugins tile and author which gets the meta data content
of the html pages  my /root/WEB-INF/classes/nutch-site.xml is as follows
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
 <name>http.agent.name</name>
 <value>ACHLA</value>
 <description>HTTP 'User-Agent' request header. MUST NOT be empty -

 please set this to a single word uniquely related to your organization.

 NOTE: You should also check other related properties:

 http.robots.agents
 http.agent.description
 http.agent.url

http.agent.email
 http.agent.version

 and set their values appropriately.

 </description>
</property>

<property>
 <name>http.agent.description</name>
 <value>ncsi123</value>

 <description>Further description of our bot- this text is used in
 the User-Agent header.  It appears in parenthesis after the agent name.
 </description>
</property>

<property>

 <name>http.agent.url</name>
 <value>www.google.com</value>
 <description>A URL to advertise in the User-Agent header.  This will
  appear in parenthesis after the agent name. Custom dictates that this

  should be a URL of a page explaining the purpose and behavior of this
  crawler.
 </description>
</property>

<property>
 <name>http.agent.email</name>
 <value>[EMAIL PROTECTED]</value>

 <description>An email address to advertise in the HTTP 'From' request
  header and User-Agent header. A good practice is to mangle this
  address (e.g. 'info at example dot com') to avoid spamming.

 </description>
</property>

<property>
<name>searcher.dir</name>
<value>/usr/nutch-0.9/karan11</value>
</property>

<property>
<name>plugin.includes</name>
<value>author|title|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic</value>



</property>



</configuration>

i have included the plugins but whenever i  run the search the
search.jsppage is not displayed..
plz help
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to