R 2.4
I am rather new to Lucene, and it is entirely likely that the problem is
somehow mine -- but I have no clue how.
I have an index (of files), all fine; I post keyword queries and get lists of
documents, all fine.
Now I want to query against something other than keywords, so (here I am in
part trying to reuse someone else's code, since I hardly understand enough
about Lucene to write even a simple search) I try adding a Field to my
Document's, under some conditions, like this:
<pseudo-code>
Document document = new Document();
if (someCondition()) {
TokenStream stream = makeMyTokenStream(); // has correct reset() method
Field field = new Field("_MARKUP_TAGS_contents", stream,
Field.TermVector.WITH_OFFSETS);
document.add(field);
}
// verify here that the token-stream is behaving correctly
// verify here that the "_MARKUP_TAGS_contents" Field *IS* (YES!) added
// now add the "typical" fields; nothing surprising here:
document.add(makeField(LENGTH_FIELD, Integer.toString(_docLength)));
document.add(makeField(NAME_FIELD, _documentReference));
document.add(makeField(DATE_FIELD, new Date().toString()));
...
protected Field makeField(final String name, final String value) {
return new Field(name, value, Field.Store.YES,
Field.Index.NOT_ANALYZED);
}
</pseudo-code>
I don't *think* there's any problem there, and the token-stream that I create
seems to behave as expected.
The query that I post for a test is one that is simple and is satisfied by most
of the documents in my small corpus (of ten documents).
BUT .. when I post that query, the documents that are returned NEVER contain
the additional field added above under "some condition". Let me repeat: I have
verified that the Field *IS* added in the pseudo-code above.
Here is the code I use to post a query and retrieve Documents:
<pseudo-code>
IndexReader _anIndexReader =
IndexReader.open(getFileRepresentingIndexDirectory());
IndexSearcher _anIndexSearcher = new IndexSearcher(_anIndexReader );
TopFieldDocs tfd = _anIndexSearcher.search(someQuery, null, 1000, new Sort());
int length = tfd.totalHits;
for (int i = 0; i < length; ++i) {
int docIndex = tfd.scoreDocs[i].doc;
Document doc = _anIndexReader.document(docIndex);
String name = doc.get(NAME_FIELD); // same as in first block of
pseudo-code
System.err.println("The document " + name + " has these Fields:");
for (final Object item : doc.getFields()) {
Field field = (Field) item;
System.err.println("\t" + field.name());
}
// only ever has the "typical" Fields from first block of pseudo-code
}
</pseudo-code>
Thus it is clear that either the Lucene-index strips the extra Field (why?) or
that the Documents that are returned are somehow newly-created and lack the
extra field, but contain all the "typical" Fields (why?)
Any ideas?
thanks,
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]