[ https://issues.apache.org/jira/browse/LUCENE-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-7851. --------------------------------- Resolution: Not A Problem lookupTerm has nothing to do with setDocument, because it looks for a term in the "term dictionary" and returns the ordinal for that term. That term dictionary is shared across the whole segment. example docs and values: ||doc||value(s)|| |0|dog| |1|cat| |2|dog,cat,cat| this is what the docvalues looks like ||doc||ords||comment |0|1| |1|0| |2|0,1*| * as you see, this fieldtype loses both the original order (it was dog, cat) and frequency (cat was there twice) because its a SortedSet. this is what the "term dictionary" looks like: ||ord||term|| |0|cat| |1|dog| lookupTerm(dog) is always 1, regardless of which document in the segment its in. its 1 because it sorts after "cat". The values are deduplicated across all documents in the segment in this way. > Lucene54DocValuesProducer#getSortedSetTable lookupTerm does not honor > setDocument > --------------------------------------------------------------------------------- > > Key: LUCENE-7851 > URL: https://issues.apache.org/jira/browse/LUCENE-7851 > Project: Lucene - Core > Issue Type: Bug > Reporter: Vesa Pirila > > I'm having a problem with the lookupTerm method of the anonymous > RandomAccessOrds class returned by > Lucene54DocValuesProducer#getSortedSetTable(). It does not seem to honor > setDocument. It returns the same ord every time regardless of my calling > setDocument with different arguments. > To reproduce: > I have two documents with a multi-valued string field "strfield". Both have a > single value "a". I have a custom class that extends FieldCacheSource. This > is obviously just a dummy, but it's the simplest way I know to reproduce the > problem. > {code:java} > public class MyValueSource extends FieldCacheSource { > public MyValueSource(String field) { > super(field); > } > @Override > public FunctionValues getValues(Map map, LeafReaderContext readerContext) > throws IOException { > SortedSetDocValues dvs = DocValues.getSortedSet(readerContext.reader(), > FieldNames.PARENTS_DATES); > dvs.setDocument(0); > long zeroOrd = dvs.lookupTerm(new BytesRef("a")); > dvs.setDocument(1); > long oneOrd = dvs.lookupTerm(new BytesRef("a")); > assert(zeroOrd != oneOrd); // FAILS. The same ord is always returned. > return new LongDocValues(this) { > @Override > public long longVal(int doc) { > return 0; > } > }; > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org