Hello,
I'm witnessing a change in behavior between Lucene 4.9 and 5.4.1 that I
don't quite understand.
I'd like to track down what's happening under the hood. I'm working to
update the dependencies of an open source geospatial resolution tool (
https://github.com/Berico-Technologies/CLAVIN), which uses Lucene. I've
indexed the geonames.org database using both Lucene 4.9 and 5.4.1. We
index on the Population of each city for later sorting on query.
When running a fuzzy query "bostn~" with Occur.MUST in 4.9, we get the
expected result of Boston, where 6793534 is a boosted population. Here is
the scoreDoc.toString():
*Boston: doc=19586055 score=NaN shardIndex=-1 fields=[2.971942, 6793534]*
However, using 5.4.1, the fuzzy match with Occur.MUST returns "Basti Bosan"
and "Boston Basin", both of which have a population of zero before
returning Boston.
*Basti Bosan: doc=11707183 score=NaN shardIndex=0 fields=[1.5721874, 0]*
*Boston Basin: doc=12728320 score=NaN shardIndex=0 fields=[1.5721874,
0]Boston: doc=17515475 score=NaN shardIndex=0 fields=[1.4374285, 6793534]*
I'm wondering if something with the FIELD_SCORE calculation changed between
4.9 and 5.4.1, or perhaps I've done something incorrect in building the
index, etc.
It's worth mentioning that for this test I have built an index w/ both 4.9
and 5.4.1 using the same geonames database to ensure consistency. Also,
sort is set up with both versions in the same way:
*private static final Sort POPULATION_SORT = new Sort(new SortField[] {
SortField.FIELD_SCORE, *
* new SortedNumericSortField(SORT_POP.key(), SortField.Type.LONG, true) *
*});*
With regard to building the index, in 4.9, we added the population sort
field to the index like so:
*doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(),
Field.Store.YES));*
Because you can't sort on docValue = NONE anymore, in 5.4.1, we now add it
like this:
*doc.add(new LongField(SORT_POP.key(), geoName.getPopulation(),
LONG_FIELD_TYPE_STORED_SORTED));*
where LONG_FIELD_TYPE_STORED_SORTED is:
*private static final FieldType LONG_FIELD_TYPE_STORED_SORTED = new
FieldType();*
*static { LONG_FIELD_TYPE_STORED_SORTED.setTokenized(false);
LONG_FIELD_TYPE_STORED_SORTED.setOmitNorms(true);
LONG_FIELD_TYPE_STORED_SORTED.setIndexOptions(IndexOptions.DOCS);
LONG_FIELD_TYPE_STORED_SORTED
.setNumericType(FieldType.NumericType.LONG);LONG_FIELD_TYPE_STORED_SORTED.setStored(true);LONG_FIELD_TYPE_STORED_SORTED.setDocValuesType(DocValuesType.NUMERIC);LONG_FIELD_TYPE_STORED_SORTED.freeze();}*
I would greatly appreciate any insights here; and I'm happy to answer
questions to unravel this a bit more. Thank you for your time!
V/r,
Jeremy