I actually use Field.Text(String,String) to add documents to my index. Maybe
I do not understand the way an analyzer works, but I thought that all German
articles (der, die, das etc) should be filtered out. However if I use Luke
to view my index, the original text is completely stored in a field. And
what I need is term vector, that I can create from an indexed document
field. So this field should contain terms only.
Whether or not the text is stored in the index is a different concern
that how it is analyzed. If you want the text to be indexed, and not
stored, then use the Field.Text(String, String) method or the
appropriate constructor when adding a field to the Document. You'll
need to also store a reference to the actual file (URL, Path, etc) in
the document so it can be retrieved from the doc returned in the Hits
object.
Or did I completely misunderstand the question?
-Mike
On Wed, 22 Dec 2004 17:23:24 +0100, DES <[EMAIL PROTECTED]> wrote:
hi
i need to index my text so that index contains only tokenized stemmed
words without stopwords etc. The text ist german, so I tried to use
GermanAnalyzer, but it stores whole text, not terms. Please give me a tip
how to index terms only. Thanks!
DES
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]