Oh -- of course if you're using IntPoint / LongPoint for your numeric
fields, they won't be indexed as terms, so loading terms for them won't
work.
It's not the prettiest solution, but I think the following should let you
collect the set of distinct point values for an IntPoint field:
final Set<Integer> collectedValues = new TreeSet<>();
for (LeafReaderContext lrc : reader.leaves()) {
LeafReader lr = lrc.reader();
PointValues.IntersectVisitor collectingVisitor = new
PointValues.IntersectVisitor() {
@Override
public void visit(int docID) throws IOException {
}
@Override
public void visit(int docID, byte[] packedValue) {
collectedValues.add(IntPoint.decodeDimension(packedValue, 0));
}
@Override
public PointValues.Relation compare(byte[]
minPackedValue, byte[] maxPackedValue) {
return PointValues.Relation.CELL_CROSSES_QUERY;
}
};
lr.getPointValues(fieldname).intersect(collectingVisitor);
}
On Tue, Nov 28, 2023 at 1:42 PM Michael Froh <[email protected]> wrote:
> Hello!
>
> Instead of MultiFields.getFields(), you can use
> MultiTerms.getTerms(reader, fieldname) to get the Terms instance.
>
> To decode your long / int values, you should be able to use
> LongPoint/IntPoint.unpack to write the values into an array:
>
> long[] val = new long[1]; // Assuming 1-D values
> LongPoint.unpack(value, 0, val);
> values.add(val[0]);
>
> Hope that helps,
> Froh
>
>
> On Wed, Nov 22, 2023 at 11:09 AM <[email protected]> wrote:
>
>> Hello,
>>
>> In Lucene 6 I was doing this to get all values for a given field
>> knowing its type:
>>
>> public List<Object> getDistinctValues(IndexReader reader, String
>> fieldname,
>> Class<? extends Object> type) throws IOException {
>>
>> List<Object> values = new ArrayList<Object>();
>> Fields fields = MultiFields.getFields(reader);
>> if (fields == null) return values;
>>
>> Terms terms = fields.terms(fieldname);
>> if (terms == null) return values;
>>
>> TermsEnum iterator = terms.iterator();
>>
>> BytesRef value = iterator.next();
>>
>> while (value != null) {
>> if (type == Long.class) {
>> values.add(LegacyNumericUtils.prefixCodedToLong(value));
>> } else if (type == Integer.class) {
>> values.add(LegacyNumericUtils.prefixCodedToInt(value));
>> } else if (type == Boolean.class) {
>> values.add(LegacyNumericUtils.prefixCodedToInt(value) == 1 ?
>> TRUE : FALSE);
>> } else if (type == Date.class) {
>> values.add(new
>> Date(LegacyNumericUtils.prefixCodedToLong(value)));
>> } else if (type == String.class) {
>> values.add(value.utf8ToString());
>> } else {
>> // ...
>> }
>>
>> value = iterator.next();
>> }
>>
>> return values;
>> }
>>
>> I am trying to upgrade to lucene 9.
>> there were 2 changes over time:
>> - LegacyNumericUtils has been removed in favor of PointBase
>> - MultiFields.getFields() has been dropped, and I read we were encouraged
>> to avoid fields in general
>>
>> what is proper way to implement getting distinct values for a specific
>> field in a reader?
>>
>> thanks for your help,
>>
>> vs
>>
>