Multiple Keywords/Keyphrases fields

Owen Densmore Sat, 12 Feb 2005 12:08:51 -0800

I'm getting a bit more serious about the final form of our lucene index. Each document has DocNumber, Authors, Title, Abstract, and Keywords. By Keywords, I mean a comma separated list, each entry having possibly many terms in a phrase like: temporal infomax, finite state automata, Markov chains, conditional entropy, neural information processing

I presume I should be using a field "Keywords" which have many "entries" or "instances" per document (one per comma separated phrase). But I'm not sure the right way to handle all this. My assumption is that I should analyze them individually, just as we do for free text (the Abstract, for example), thus in the example above having 5 entries of the nature doc.add(Field.Text("Keywords", "finite state automata")); etc, analyzing them because these are author-supplied strings with no canonical form.

For guidance, I looked in the archive and found the attached email, but I didn't see the answer. (I'm not concerned about the dups, I presume that is equivalent to a boos of some sort) Does this seem right?

Thanks once again.

Owen

From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Subject: Multiple equal Fields?
Date: Tue, 17 Feb 2004 12:47:58 +0100

Hi!
What happens if I do this:

doc.add(Field.Text("foo", "bar"));
doc.add(Field.Text("foo", "blah"));

Is there a field "foo" with value "blah" or are there two "foo"s (actually not possible) or is there one "foo" with the values "bar" and "blah"?

And what does happen in this case:

doc.add(Field.Text("foo", "bar"));
doc.add(Field.Text("foo", "bar"));
doc.add(Field.Text("foo", "bar"));

Does lucene store this only once?

Timo

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Multiple Keywords/Keyphrases fields

Reply via email to