if I do a search on *.doc it returns about 7 files. if I do a search on
something that should be in a word doc, it doesn't return anything.
 
reading the wiki, I haven't got anything in nutch-site.xml, all the
parse ones are in parse-plugins.xml
 
should I have things in nutch-site.xml and if so, what is the xml
syntax for crawling word docs etc?
 
thanks
 
Steve


'This e-mail and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed.  If 
you have received this e-mail in error please notify North Devon District 
Council Information Systems. E-mail is inherently insecure without specific 
security measures being taken.  In essence  we cannot guarantee the safe and 
private delivery of all e-mail, both outbound and inbound, due to the 
complexity and nature of the networks that it may utilise. Please bear this in 
mind when sending critical or sensitive information. The views in this message 
are personal and are not necessarily those of North Devon District Council. 
Senders and recipients of email should be aware that under UK Data Protection 
and Freedom of Information legislation these contents may have to be disclosed 
in response to a request. Under the Regulation of Investigatory Powers Act 
2000, Lawful Business Practice Regulations, any E-mail sent to or from this 
address may be accessed by someone other than the recipient for system 
management and security purposes.'
_______________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to