Re: get distinct values from indexreader for given field

Michael Froh Tue, 28 Nov 2023 14:45:29 -0800

Oh -- of course if you're using IntPoint / LongPoint for your numeric
fields, they won't be indexed as terms, so loading terms for them won't
work.


It's not the prettiest solution, but I think the following should let you
collect the set of distinct point values for an IntPoint field:


                final Set<Integer> collectedValues = new TreeSet<>();
                for (LeafReaderContext lrc : reader.leaves()) {
                    LeafReader lr = lrc.reader();
                    PointValues.IntersectVisitor collectingVisitor = new
PointValues.IntersectVisitor() {
                        @Override
                        public void visit(int docID) throws IOException {

                        }

                        @Override
                        public void visit(int docID, byte[] packedValue) {

collectedValues.add(IntPoint.decodeDimension(packedValue, 0));
                        }

                        @Override
                        public PointValues.Relation compare(byte[]
minPackedValue, byte[] maxPackedValue) {
                            return PointValues.Relation.CELL_CROSSES_QUERY;
                        }
                    };

lr.getPointValues(fieldname).intersect(collectingVisitor);
                }



On Tue, Nov 28, 2023 at 1:42 PM Michael Froh <msf...@gmail.com> wrote:

> Hello!
>
> Instead of MultiFields.getFields(), you can use
> MultiTerms.getTerms(reader, fieldname) to get the Terms instance.
>
> To decode your long / int values, you should be able to use
> LongPoint/IntPoint.unpack to write the values into an array:
>
> long[] val = new long[1]; // Assuming 1-D values
> LongPoint.unpack(value, 0, val);
> values.add(val[0]);
>
> Hope that helps,
> Froh
>
>
> On Wed, Nov 22, 2023 at 11:09 AM <vvse...@gmail.com> wrote:
>
>> Hello,
>>
>> In Lucene 6 I was doing this to get all values for a given field
>> knowing its type:
>>
>> public List<Object> getDistinctValues(IndexReader reader, String
>> fieldname,
>> Class<? extends Object> type) throws IOException {
>>
>>     List<Object> values = new ArrayList<Object>();
>>     Fields fields = MultiFields.getFields(reader);
>>     if (fields == null) return values;
>>
>>     Terms terms = fields.terms(fieldname);
>>     if (terms == null) return values;
>>
>>     TermsEnum iterator = terms.iterator();
>>
>>     BytesRef value = iterator.next();
>>
>>     while (value != null) {
>>         if (type == Long.class) {
>>             values.add(LegacyNumericUtils.prefixCodedToLong(value));
>>         } else if (type == Integer.class) {
>>             values.add(LegacyNumericUtils.prefixCodedToInt(value));
>>         } else if (type == Boolean.class) {
>>             values.add(LegacyNumericUtils.prefixCodedToInt(value) == 1 ?
>> TRUE : FALSE);
>>         } else if (type == Date.class) {
>>             values.add(new
>> Date(LegacyNumericUtils.prefixCodedToLong(value)));
>>         } else if (type == String.class) {
>>             values.add(value.utf8ToString());
>>         } else {
>>             // ...
>>         }
>>
>>         value = iterator.next();
>>     }
>>
>>     return values;
>> }
>>
>> I am trying to upgrade to lucene 9.
>> there were 2 changes over time:
>> - LegacyNumericUtils has been removed in favor of PointBase
>> - MultiFields.getFields() has been dropped, and I read we were encouraged
>> to avoid fields in general
>>
>> what is proper way to implement getting distinct values for a specific
>> field in a reader?
>>
>> thanks for your help,
>>
>> vs
>>
>

Re: get distinct values from indexreader for given field

Reply via email to