Re: RAMDirectory vs MemoryIndex

Wolfgang Hoschek Wed, 22 Nov 2006 09:19:57 -0800

I've never tried it, but I guess you could write an Analyzer andTokenFilter that no only feeds into IndexWriter onIndexWriter.addDocument(), but as a sneaky side effect alsosimultaneously saves its tokens into a list so that you could laterturn that list into another TokenStream to be added to MemoryIndex.How much this might help depends on how expensive your analyzer chainis. For some examples on how to set up analyzers for chains of tokenstreams, see MemoryIndex.keywordTokenStream and class AnalzyerUtil inthe same package.


Wolfgang.


On Nov 22, 2006, at 4:15 AM, jm wrote:

checking one last thing, just in case...

as I mentioned, I have previously indexed the same document in another
index (for another purpose), as I am going to use the same analyzer,
would it be possible to avoid analyzing the doc again?

I see IndexWriter.addDocument() returns void, so it does not seem to
be an easy way to do that no?

thanks

On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:


On Nov 21, 2006, at 12:38 PM, jm wrote:

> Ok, thanks, I'll give MemoryIndex a go, and if that is not goodenoguh

> I will explore the other options then.

To get started you can use something like this:

for each document D:
     MemoryIndex index = createMemoryIndex(D, ...)
     for each query Q:
         float score = index.search(Q)
        if (score > 0.0) System.out.println("it's a match");




   private MemoryIndex createMemoryIndex(Document doc, Analyzer
analyzer) {
     MemoryIndex index = new MemoryIndex();
     Enumeration iter = doc.fields();
     while (iter.hasMoreElements()) {
       Field field = (Field) iter.nextElement();
       index.addField(field.name(), field.stringValue(), analyzer);
     }
     return index;
   }



>
>
> On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
>> On Nov 21, 2006, at 7:43 AM, jm wrote:
>>
>> > Hi,
>> >

>> > I have to decide between using a RAMDirectory andMemoryIndex, but

>> > not sure what approach will work better...
>> >
>> > I have to run many items (tens of thousands) against some
>> queries (100
>> > at most), but I have to do it one item at a time. And I already
>> have
>> > the lucene Document associated with each item, from a previous
>> > operation I perform.
>> >
>> > From what I read MemoryIndex should be faster, but apparently I
>> cannot
>> > reuse the document I already have, and I have to create a new
>> > MemoryIndex per item.
>>
>> A MemoryIndex object holds one document.
>>
>> > Using the RAMDirectory I can use only one of
>> > them, also one IndexWriter, and create a IndexSearcher and
>> IndexReader
>> > per item, for searching and removing the item each time.
>> >
>> > Any thoughts?
>>

>> The MemoryIndex impl is optimized to work efficiently withoutreusing

>> the MemoryIndex object for a subsequent document. See the source
>> code. Reusing the object would not further improve performance.
>>
>> Wolfgang.
>>

>>---------------------------------------------------------------------

>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>

>---------------------------------------------------------------------

> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: RAMDirectory vs MemoryIndex

Reply via email to