I think I have the nutch-site.xml set up properly it looks like this

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>http.agent.name</name>
  <value>testing</value>
  <description></description>
</property>

<property>
  <name>http.agent.description</name>
  <value>testing the nutch bot</value>
  <description></description>
</property>

<property>
  <name>http.agent.url</name>
  <value></value>
  <description>none</description>
</property>

<property>
  <name>http.agent.email</name>
  <value>none</value>
  <description></description>
</property>
<property>
  <name>plugin.includes</name>
  
<value>protocol-file|protocol-http|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic</value>
</property>
<property>
  <name>file.content.limit</name>
  <value>-1</value>
</property>
</configuration>


Yesterday I finally managed to get nutch to index files but I took an 
entirely different approach then all the documentation I've read suggested. 
I put a <a href="file://c:/xxxx
link into a page and things seemed to work. I am pretty sure that is a hack. 
But that the heck it worked. I'll watch the mailing lists and perhaps 
someone will post how to properly index a file system. I can easily imagine 
more users wishing to index file sytems then a some web site. I expect this 
topic will be a hot one as Nutch gains popularity, (which I think it will 
since it is a very cool add on to Lucene.)

jim s


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to