why would a Field vanish from a Document?

rolarenfan Fri, 23 Jan 2009 13:01:47 -0800

R 2.4

I am rather new to Lucene, and it is entirely likely that the problem is 
somehow mine -- but I have no clue how.


I have an index (of files), all fine; I post keyword queries and get lists of 
documents, all fine. 

Now I want to query against something other than keywords, so (here I am in 
part trying to reuse someone else's code, since I hardly understand enough 
about Lucene to write even a simple search) I try adding a Field to my 
Document's, under some conditions, like this: 

<pseudo-code>
Document document = new Document();
if (someCondition()) { 
        TokenStream stream = makeMyTokenStream(); // has correct reset() method 
        Field field = new Field("_MARKUP_TAGS_contents", stream, 
Field.TermVector.WITH_OFFSETS);
        document.add(field);
}
// verify here that the token-stream is behaving correctly 
// verify here that the "_MARKUP_TAGS_contents" Field *IS* (YES!) added 

// now add the "typical" fields; nothing surprising here: 
document.add(makeField(LENGTH_FIELD, Integer.toString(_docLength)));
document.add(makeField(NAME_FIELD, _documentReference));
document.add(makeField(DATE_FIELD, new Date().toString()));

... 
        protected Field makeField(final String name, final String value) {
                return new Field(name, value, Field.Store.YES, 
Field.Index.NOT_ANALYZED);
        }

</pseudo-code>

I don't *think* there's any problem there, and the token-stream that I create 
seems to behave as expected. 

The query that I post for a test is one that is simple and is satisfied by most 
of the documents in my small corpus (of ten documents). 

BUT .. when I post that query, the documents that are returned NEVER contain 
the additional field added above under "some condition". Let me repeat: I have 
verified that the Field *IS* added in the pseudo-code above. 

Here is the code I use to post a query and retrieve Documents: 
<pseudo-code>
IndexReader _anIndexReader = 
IndexReader.open(getFileRepresentingIndexDirectory());
IndexSearcher _anIndexSearcher = new IndexSearcher(_anIndexReader );

TopFieldDocs tfd = _anIndexSearcher.search(someQuery, null, 1000, new Sort()); 
int length = tfd.totalHits; 

for (int i = 0; i < length; ++i) {
        int docIndex = tfd.scoreDocs[i].doc;
        Document doc = _anIndexReader.document(docIndex); 
        String name = doc.get(NAME_FIELD); // same as in first block of 
pseudo-code 

        System.err.println("The document " + name + " has these Fields:");
        for (final Object item : doc.getFields()) {
                Field field = (Field) item; 
                System.err.println("\t" + field.name());
        }
        // only ever has the "typical" Fields from first block of pseudo-code
}
</pseudo-code>


Thus it is clear that either the Lucene-index strips the extra Field (why?) or 
that the Documents that are returned are somehow newly-created and lack the 
extra field, but contain all the "typical" Fields (why?) 

Any ideas? 

thanks,
Paul 





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

why would a Field *vanish* from a Document?

Reply via email to

why would a Field vanish from a Document?