inverted index - A sequence of (key, pointer) pairs where each pointer
points to a record in a database which contains the key value in some
particular field. The index is sorted on the key values to allow rapid
searching for a particular key value, using e.g. binary search. The
index is "inverted" in the sense that the key value is used to find
the record rather than the other way round.

in nutch indexes are created on:

<url, ParseData> from parse, for title, metadata, etc.

<url, ParseText> from parse, for text
<url, Inlinks> from invert, for anchors
<url, CrawlDatum> from fetch, for fetch date


Checkout the indexes folder after crawling.


On Mon, Mar 23, 2009 at 7:56 PM, Rodrigo Reyes C. <rre...@corbitecso.com>wrote:

> Ninad
>
> I've been reading your blog, specifically the article named "Nutch
> Architecture". I posted a comment there but I am not sure you have noticed
> it so I will post it here too.
>
> What do you mean by:
>
> *"The index is the inverted index of all of the pages the system has
> retrieved, and is created by merging all of the individual segment indexes.
> *"
>
> Can you give us an example of how the original segment index looks like and
> how it is inverted? Thanx
>
> Rodrigo
>
> 2009/3/21 Ninad Raut <ninad.evera...@gmail.com>
>
>> Check out my blog :
>>
>> http://j2eewebsearch.blogspot.com/
>>
>> Check out the third point...
>>
>> Let me know if you you get it all right. Your comments will be
>> appreciated.
>>
>> Regards,
>> Ninad
>>
>>
>> On Sat, Mar 21, 2009 at 6:32 AM, Rodrigo Reyes C. 
>> <rre...@corbitecso.com>wrote:
>>
>>> Hi
>>>
>>> I have configured my eclipse project as stated here
>>>
>>> http://wiki.apache.org/nutch/RunNutchInEclipse0.9
>>>
>>> Still, I am getting the following errors:
>>>
>>>    - The return type is incompatible with Parser.getParse(Content)
>>>    RTFParseFactory.java
>>>    nutch/src/plugin/parse-rtf/src/java/org/apache/nutch/parse/rtf    line 52
>>>    Java Problem
>>>    - Type mismatch: cannot convert from ParseResult to Parse
>>>    TestRTFParser.java
>>>    nutch/src/plugin/parse-rtf/src/test/org/apache/nutch/parse/rtf    line 78
>>>    Java Problem
>>>
>>> Any ideas on what could be wrong? I already included both
>>> http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-mp3/lib/and
>>> http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-rtf/lib/jars.
>>>
>>> Thanks in advance
>>>
>>> --
>>> Rodrigo Reyes C.
>>>
>>>
>>
>
>

Reply via email to