Re: File system

2008-12-16 Thread Dennis Kubes
If you are talking about Nutch Contents which are stored in the segments during fetching of pages, then you would need to write MapReduce job to read in the Contents object and do whatever processing you desire. Dennis oSilvio wrote: Very useful information, thanks! But in order to extract

Re: File system

2008-12-16 Thread oSilvio
I've seen it now thanks for the attention oSilvio wrote: Very useful information, thanks! But in order to extract the data inside those files (like html pages) I can find no algorithm available by nutch, nor the process used to store the data. Do you know if it is possible to extract

Re: File system

2008-12-16 Thread oSilvio
Very useful information, thanks! But in order to extract the data inside those files (like html pages) I can find no algorithm available by nutch, nor the process used to store the data. Do you know if it is possible to extract using lucene? Dennis Kubes-2 wrote: The nutch databases are

Build failed in Hudson: Nutch-trunk #663

2008-12-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/663/changes -- [...truncated 2223 lines...] A src/plugin/protocol-http/src/test/org/apache/nutch A src/plugin/protocol-http/src/test/org/apache/nutch/protocol A

Issue with searching keywords

2008-12-16 Thread Rinesh1
Hi, I have used the keywords plugin to crawl the meta keyword details which happens fine. Here is the issue in case there is a keyword. 1.Say the keyword is RedHat (R and H in capital crawled from the site) On searching using keywords:RedHat nutch returns 0 results ..