I've never tried it, but I guess you could write an Analyzer and
TokenFilter that no only feeds into IndexWriter on
IndexWriter.addDocument(), but as a sneaky side effect also
simultaneously saves its tokens into a list so that you could later
turn that list into another TokenStream to be added to MemoryIndex.
How much this might help depends on how expensive your analyzer chain
is. For some examples on how to set up analyzers for chains of token
streams, see MemoryIndex.keywordTokenStream and class AnalzyerUtil in
the same package.
Wolfgang.
On Nov 22, 2006, at 4:15 AM, jm wrote:
checking one last thing, just in case...
as I mentioned, I have previously indexed the same document in another
index (for another purpose), as I am going to use the same analyzer,
would it be possible to avoid analyzing the doc again?
I see IndexWriter.addDocument() returns void, so it does not seem to
be an easy way to do that no?
thanks
On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
On Nov 21, 2006, at 12:38 PM, jm wrote:
> Ok, thanks, I'll give MemoryIndex a go, and if that is not good
enoguh
> I will explore the other options then.
To get started you can use something like this:
for each document D:
MemoryIndex index = createMemoryIndex(D, ...)
for each query Q:
float score = index.search(Q)
if (score > 0.0) System.out.println("it's a match");
private MemoryIndex createMemoryIndex(Document doc, Analyzer
analyzer) {
MemoryIndex index = new MemoryIndex();
Enumeration iter = doc.fields();
while (iter.hasMoreElements()) {
Field field = (Field) iter.nextElement();
index.addField(field.name(), field.stringValue(), analyzer);
}
return index;
}
>
>
> On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote:
>> On Nov 21, 2006, at 7:43 AM, jm wrote:
>>
>> > Hi,
>> >
>> > I have to decide between using a RAMDirectory and
MemoryIndex, but
>> > not sure what approach will work better...
>> >
>> > I have to run many items (tens of thousands) against some
>> queries (100
>> > at most), but I have to do it one item at a time. And I already
>> have
>> > the lucene Document associated with each item, from a previous
>> > operation I perform.
>> >
>> > From what I read MemoryIndex should be faster, but apparently I
>> cannot
>> > reuse the document I already have, and I have to create a new
>> > MemoryIndex per item.
>>
>> A MemoryIndex object holds one document.
>>
>> > Using the RAMDirectory I can use only one of
>> > them, also one IndexWriter, and create a IndexSearcher and
>> IndexReader
>> > per item, for searching and removing the item each time.
>> >
>> > Any thoughts?
>>
>> The MemoryIndex impl is optimized to work efficiently without
reusing
>> the MemoryIndex object for a subsequent document. See the source
>> code. Reusing the object would not further improve performance.
>>
>> Wolfgang.
>>
>>
---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]