On Feb 11, 2009, at 1:05 PM, Mandy Günther wrote:

Hi all,
I want to use Lucene in my project but I have the following problem:
My goal is to store metadata in the index.

Her is an example of the stuff I need to index

1; My; determiner;                      
2; project; noun; 1
3; uses; verb; 1
4; Lucene; noun; 1
6; I; noun;             
7; need; verb; 6
8; help; noun; 6

It's a file containing rows with 4 columns.
The first column is a sequential number. The second column is the text
that need to be indexed. In my original file it's more than one word.
The third column
classifies the second column and the fourth column tells me where the
sentence of the word in the second column starts.
I will have large datafiles(up to 30GB) to index. Indexing and
searching performance is very important.
I tried to store the meta data(column no. 3 and 4) over the field names, e.g.

doc.add(new Field("1|determiner", My, ...));
doc.add(new Field("1|noun", project, ...));
...
doc.add(new Field("2|noun", I, ...));

But that's not a classy way!

I was thinking about using Payloads. But the API says the Payload
Status is experimantal and the performance is getting bad using
payloads.

I am not even sure if Payloads are meant to be used to store that kind of data.
Is it possible to store that kind of metadata as payload?

I think it makes sense to store these as payloads. You might even look at the 2.9-dev trunk version for the new AttributeSource capabilities that allow you to, essentially, strongly type payloads and provide custom serialization capabilities.

As for the "experimental" phase of payloads, I'd say they are here to stay, it's just that the exact methods may change.

How do you plan to search using the info? Note, that the new AttributeSource stuff is not yet searchable, whereas the Payload stuff is. Performance, I would think, should be roughly equivalent to using Spans, but don't quote me on it.

HTH,
Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to