I'm copying this reply from a topic with the same title from the defunct
'lucene-user' list. My comments follow it.
: I thought of putting empty strings instead of null values but I think
: empty strings are put first in the list while sorting which is the
: reverse of what anyone would want.
instead of adding a field with a null value, or value of an epty string,
why not just leave the field out for that/those doc(s)?
there's no requirement that every doc in your index has to have the exact
same set of fields.
If i rememebr correctly (you'll have to test this) sorting on a field
which doesn't exist for every doc does what you would want (docs with
values are listed before docs without)
-Hoss
The actual behavior is different than described above. I modified
TestSort.java:
// test sorts where the type of field is specified
public void testTypedSort() throws Exception {
assertMatches (full, queryF, sort, "JIZ");
}
The actual order of the results is: "ZJI". I believe this happens because
the field string cache 'order' array contains 0's for all the documents that
don't contain the field and thus sort first.
Suppose I want to exclude documents from being collected if they don't
contain the sort field. One way to do this is to index a unique
'empty_value' value for those documents and add a MUST_NOT boolean clause to
the query, for example: "<query terms> -field:empty_value)". But this seems
inefficient. Is there a better way?
Thanks,
Peter