Re: get distinct values from indexreader for given field

vvsevel Tue, 05 Dec 2023 10:55:14 -0800

Thanks Michael

On Tue, Nov 28, 2023 at 11:45 PM Michael Froh <[email protected]> wrote:


> Oh -- of course if you're using IntPoint / LongPoint for your numeric
> fields, they won't be indexed as terms, so loading terms for them won't
> work.
>
> It's not the prettiest solution, but I think the following should let you
> collect the set of distinct point values for an IntPoint field:
>
>
>                 final Set<Integer> collectedValues = new TreeSet<>();
>                 for (LeafReaderContext lrc : reader.leaves()) {
>                     LeafReader lr = lrc.reader();
>                     PointValues.IntersectVisitor collectingVisitor = new
> PointValues.IntersectVisitor() {
>                         @Override
>                         public void visit(int docID) throws IOException {
>
>                         }
>
>                         @Override
>                         public void visit(int docID, byte[] packedValue) {
>
> collectedValues.add(IntPoint.decodeDimension(packedValue, 0));
>                         }
>
>                         @Override
>                         public PointValues.Relation compare(byte[]
> minPackedValue, byte[] maxPackedValue) {
>                             return PointValues.Relation.CELL_CROSSES_QUERY;
>                         }
>                     };
>
> lr.getPointValues(fieldname).intersect(collectingVisitor);
>                 }
>
>
>
> On Tue, Nov 28, 2023 at 1:42 PM Michael Froh <[email protected]> wrote:
>
> > Hello!
> >
> > Instead of MultiFields.getFields(), you can use
> > MultiTerms.getTerms(reader, fieldname) to get the Terms instance.
> >
> > To decode your long / int values, you should be able to use
> > LongPoint/IntPoint.unpack to write the values into an array:
> >
> > long[] val = new long[1]; // Assuming 1-D values
> > LongPoint.unpack(value, 0, val);
> > values.add(val[0]);
> >
> > Hope that helps,
> > Froh
> >
> >
> > On Wed, Nov 22, 2023 at 11:09 AM <[email protected]> wrote:
> >
> >> Hello,
> >>
> >> In Lucene 6 I was doing this to get all values for a given field
> >> knowing its type:
> >>
> >> public List<Object> getDistinctValues(IndexReader reader, String
> >> fieldname,
> >> Class<? extends Object> type) throws IOException {
> >>
> >>     List<Object> values = new ArrayList<Object>();
> >>     Fields fields = MultiFields.getFields(reader);
> >>     if (fields == null) return values;
> >>
> >>     Terms terms = fields.terms(fieldname);
> >>     if (terms == null) return values;
> >>
> >>     TermsEnum iterator = terms.iterator();
> >>
> >>     BytesRef value = iterator.next();
> >>
> >>     while (value != null) {
> >>         if (type == Long.class) {
> >>             values.add(LegacyNumericUtils.prefixCodedToLong(value));
> >>         } else if (type == Integer.class) {
> >>             values.add(LegacyNumericUtils.prefixCodedToInt(value));
> >>         } else if (type == Boolean.class) {
> >>             values.add(LegacyNumericUtils.prefixCodedToInt(value) == 1 ?
> >> TRUE : FALSE);
> >>         } else if (type == Date.class) {
> >>             values.add(new
> >> Date(LegacyNumericUtils.prefixCodedToLong(value)));
> >>         } else if (type == String.class) {
> >>             values.add(value.utf8ToString());
> >>         } else {
> >>             // ...
> >>         }
> >>
> >>         value = iterator.next();
> >>     }
> >>
> >>     return values;
> >> }
> >>
> >> I am trying to upgrade to lucene 9.
> >> there were 2 changes over time:
> >> - LegacyNumericUtils has been removed in favor of PointBase
> >> - MultiFields.getFields() has been dropped, and I read we were
> encouraged
> >> to avoid fields in general
> >>
> >> what is proper way to implement getting distinct values for a specific
> >> field in a reader?
> >>
> >> thanks for your help,
> >>
> >> vs
> >>
> >
>

Re: get distinct values from indexreader for given field

Reply via email to