[ 
https://issues.apache.org/jira/browse/LUCENE-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-7851.
---------------------------------
    Resolution: Not A Problem

lookupTerm has nothing to do with setDocument, because it looks for a term in 
the "term dictionary" and returns the ordinal for that term. That term 
dictionary is shared across the whole segment.

example docs and values:

||doc||value(s)||
|0|dog|
|1|cat|
|2|dog,cat,cat|

this is what the docvalues looks like
||doc||ords||comment
|0|1|
|1|0|
|2|0,1*|

* as you see, this fieldtype loses both the original order (it was dog, cat) 
and frequency (cat was there twice) because its a SortedSet.

this is what the "term dictionary" looks like:
||ord||term||
|0|cat|
|1|dog|

lookupTerm(dog) is always 1, regardless of which document in the segment its 
in. its 1 because it sorts after "cat". The values are deduplicated across all 
documents in the segment in this way.

> Lucene54DocValuesProducer#getSortedSetTable lookupTerm does not honor 
> setDocument
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-7851
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7851
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Vesa Pirila
>
> I'm having a problem with the lookupTerm method of the anonymous 
> RandomAccessOrds class returned by 
> Lucene54DocValuesProducer#getSortedSetTable(). It does not seem to honor 
> setDocument. It returns the same ord every time regardless of my calling 
> setDocument with different arguments.
> To reproduce:
> I have two documents with a multi-valued string field "strfield". Both have a 
> single value "a". I have a custom class that extends FieldCacheSource. This 
> is obviously just a dummy, but it's the simplest way I know to reproduce the 
> problem.
> {code:java}
> public class MyValueSource extends FieldCacheSource {
>   public MyValueSource(String field) {
>     super(field);
>   }
>   @Override
>   public FunctionValues getValues(Map map, LeafReaderContext readerContext) 
> throws IOException {
>     SortedSetDocValues dvs = DocValues.getSortedSet(readerContext.reader(), 
> FieldNames.PARENTS_DATES);
>     dvs.setDocument(0);
>     long zeroOrd = dvs.lookupTerm(new BytesRef("a"));
>     dvs.setDocument(1);
>     long oneOrd = dvs.lookupTerm(new BytesRef("a"));
>     assert(zeroOrd != oneOrd); // FAILS. The same ord is always returned.
>     return new LongDocValues(this) {
>       @Override
>       public long longVal(int doc) {
>         return 0;
>       }
>     };
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to