RE: Error: there are more terms than documents...

Bill.Chesky Thu, 23 Apr 2009 12:41:49 -0700

Sorry for that terrible formatting.  Let me try again.
==========================================================
Hello,


I'm getting a strange error when I make a Lucene (2.2.0) query:

java.lang.RuntimeException: there are more terms than documents in field
"objectId", but it's impossible to sort on tokenized fields

The strange thing is that I've read the javadoc for the Sort object
where it says:

====
The fields used to determine sort order must be carefully chosen.
Documents must contain a single term in such a field, and the value of
the term should indicate the document's relative position in a given
sort order. The field must be indexed, but should not be tokenized, and
does not need to be stored (unless you happen to want it back with the
rest of your document data). In other words: 

document.add (new Field ("byNumber", Integer.toString(x),
Field.Store.NO, Field.Index.UN_TOKENIZED));
====

Therefore when I create my "objectId" field in my document I use the
call:

doc.add(new Field("objectId", s.getObjectId(), Field.Store.NO,
Field.Index.UN_TOKENIZED));

Note: s.getObjectId() returns a String.

After the index is created and I print out a typical document (using the
Document.toString() method) I get this:

Document<stored/uncompressed,indexed
<id:1146513> stored/uncompressed,indexed
<_hibernate_class:com.mycompany.metadb.orm.Series> indexed
<RestrictionLevel:1> indexed,
tokenized<keywords:com.mycompany.metadbsync.index.seriestokenstr...@134a
b4e> indexed,
tokenized<characteristics:com.
mycompany.metadbsync.index.characteristictokenstr...@daa825> indexed
<objectId:DF.SES.AA.derek.Public_01> indexed
<Name:Public 01> indexed
<UserID:derek> indexed
<Data Class:Defined Formulas> indexed
<Location:AA> indexed
<Client:SES> indexed
<DIM1:DF> indexed
<DIM2:SES> indexed
<DIM3:AA> indexed
<DIM4:derek> indexed
<DIM5:Public_01> indexed
<Type:Formula>>

So it looks like it got created correctly.

For what it's worth the query call looks like this:

Hits hits = seriesIndexSearcher.search(query, new Sort("objectId"));

The actual query is a Boolean query with lots of TermQuery clauses and
sub clauses.  The term queries are against various of the other fields
shown above, including some of the tokenized fields.  

Any help appreciated.

regards,

Bill Chesky

PS. Just as an aside, what does it mean for a field to be stored or not
stored.  Looking at the output above, the 'id' field is stored and the
'objectId' is not.  Yet both of them get displayed by the
Document.toString() method.  So even the objectId field got "stored" at
least in the sense that I understand the term (otherwise how did it get
displayed) so I'm obviously missing something about what "stored" means
in the Lucene context.

 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Error: there are more terms than documents...

Reply via email to