Hi,

We added a custom field type to allow an indexed binary field type that
supports search (exact match), prefix search, and sort as unsigned bytes
lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator
accomplishes what we want, and even though the name of the comparator
mentions UTF8, it doesn't actually assume so and just does byte-level
operation, so it's good. However, when we do this across different nodes,
we run into an issue where in QueryComponent.doFieldSortValues:

          // Must do the same conversion when sorting by a
          // String field in Lucene, which returns the terms
          // data as BytesRef:
          if (val instanceof BytesRef) {
            UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
            field.setStringValue(spare.toString());
            val = ft.toObject(field);
          }

UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually
UTF8. I did a hack where I specified our own field comparator to be
ByteBuffer based to get around that instanceof check, but then the field
value gets transformed into BYTEARR in JavaBinCodec, and when it's
unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds,
a ShardFieldSortedHitQueue is constructed with
ShardDoc.getCachedComparator, which decides to give me comparatorNatural in
the else of the TODO for CUSTOM, which barfs because byte[] are not
Comparable...

Any advice is appreciated!

Thanks,
Jessica

Reply via email to