Re: Lucene Data Structures

Erick Erickson Tue, 16 Dec 2008 06:05:29 -0800

I question whether you *can* make this decision based upon the data
structure being used. I can code such that *any* data structure you care
to name will not perform well under some conditions <G>.


Not to mention the other characteristics of a search engine that get in
the way of even the very most efficient structure.

The only way I have ever gained confidence is to create a representative
dataset and measure. Which also has its perils, but....

Although I'll gladly admit that no amount of clever programming can
make up for a fundamentally flawed architecture.

But there are some pretty bright people coding all this up. You might get
more comfortable by looking at some of the success stories on the website.

But in the end, it's a "best guess" kind of thing. Perhaps you could explain
what you plan to do and folks with more experience than me might be able
to offer insights...

Best
Erick

On Tue, Dec 16, 2008 at 12:12 AM, Prafulla Kiran
<prafu...@tachyontech.net>wrote:

> Well, I have seen this link many times before. It doesn't really explain
> the data structures part of it. Perhaps I should have asked my question this
> way:
> "What data structures are being used by Lucene to read the posting lists
> from the index ?" .
> My guess is that a hash table is being used for reading the postings of
> each term, with the key being the term and the hash value being a multi
> level skip list.
> Please correct me if I am wrong.
>
> Regards,
> Prafulla
>
>
> Grant Ingersoll wrote:
>
>> http://lucene.apache.org/java/2_4_0/fileformats.html
>>
>> On Dec 15, 2008, at 12:15 AM, Prafulla Kiran wrote:
>>
>>  Hi Everybody,
>>>
>>> Could someone please explain the actual data structures being used by
>>> Lucene for storing the postings list in the index. I see a file called
>>> MultileveSkipListReader and MultiLevelSkipListWriter. Is lucene using
>>> Multi-level skip lists behind the scenes, for maintaining the index ? I want
>>> to understand clearly the actual data structure being used by lucene for
>>> storing the index and postings list, so that I can deduce the complexity for
>>> reading from that datastructure and decide whether my application would
>>> scale as per my requirements while using Lucene. So, someone please give me
>>> some pointers to the data structures being used by Lucene .
>>>
>>> TIA,
>>> Prafulla
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>> --------------------------
>> Grant Ingersoll
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database:
>> 270.9.18/1848 - Release Date: 12/14/2008 12:28 PM
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Lucene Data Structures

Reply via email to