[
https://issues.apache.org/jira/browse/LUCENE-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir resolved LUCENE-7851.
---------------------------------
Resolution: Not A Problem
lookupTerm has nothing to do with setDocument, because it looks for a term in
the "term dictionary" and returns the ordinal for that term. That term
dictionary is shared across the whole segment.
example docs and values:
||doc||value(s)||
|0|dog|
|1|cat|
|2|dog,cat,cat|
this is what the docvalues looks like
||doc||ords||comment
|0|1|
|1|0|
|2|0,1*|
* as you see, this fieldtype loses both the original order (it was dog, cat)
and frequency (cat was there twice) because its a SortedSet.
this is what the "term dictionary" looks like:
||ord||term||
|0|cat|
|1|dog|
lookupTerm(dog) is always 1, regardless of which document in the segment its
in. its 1 because it sorts after "cat". The values are deduplicated across all
documents in the segment in this way.
> Lucene54DocValuesProducer#getSortedSetTable lookupTerm does not honor
> setDocument
> ---------------------------------------------------------------------------------
>
> Key: LUCENE-7851
> URL: https://issues.apache.org/jira/browse/LUCENE-7851
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Vesa Pirila
>
> I'm having a problem with the lookupTerm method of the anonymous
> RandomAccessOrds class returned by
> Lucene54DocValuesProducer#getSortedSetTable(). It does not seem to honor
> setDocument. It returns the same ord every time regardless of my calling
> setDocument with different arguments.
> To reproduce:
> I have two documents with a multi-valued string field "strfield". Both have a
> single value "a". I have a custom class that extends FieldCacheSource. This
> is obviously just a dummy, but it's the simplest way I know to reproduce the
> problem.
> {code:java}
> public class MyValueSource extends FieldCacheSource {
> public MyValueSource(String field) {
> super(field);
> }
> @Override
> public FunctionValues getValues(Map map, LeafReaderContext readerContext)
> throws IOException {
> SortedSetDocValues dvs = DocValues.getSortedSet(readerContext.reader(),
> FieldNames.PARENTS_DATES);
> dvs.setDocument(0);
> long zeroOrd = dvs.lookupTerm(new BytesRef("a"));
> dvs.setDocument(1);
> long oneOrd = dvs.lookupTerm(new BytesRef("a"));
> assert(zeroOrd != oneOrd); // FAILS. The same ord is always returned.
> return new LongDocValues(this) {
> @Override
> public long longVal(int doc) {
> return 0;
> }
> };
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]