In reviewing SOLR-14787 <https://issues.apache.org/jira/browse/SOLR-14787> ,
with the author verbally, he pointed out there's some duplicated code where
he needed to decode a payload into a float. We
have org.apache.lucene.analysis.payloads.PayloadHelper#decodeFloat(byte[],
int)  but lucene/queries doesn't have lucene/analysis-common on the
classpath and he found no better option than to essentially cut and paste
those methods into his code. Neither of us likes that solution very much
because code duplication is not cool, so I went fishing for other options
or examples in the code base (on 8.x) and ran into
PayloadDecoder.FLOAT_DECODER... which looks quite broken, or at least
wildly different:

PayloadDecoder FLOAT_DECODER = bytes -> bytes == null ? 1 :
bytes.bytes[bytes.offset];

this coerces a single byte to a float, but analysis common is encoding via
PayloadHelper like this:

  public static byte[] encodeFloat(float payload, byte[] data, int offset){
    return encodeInt(Float.floatToIntBits(payload), data, offset);
  }

Which can clearly write more than one byte... and decoding via

  public static final float decodeFloat(byte [] bytes, int offset){
    return Float.intBitsToFloat(decodeInt(bytes, offset));
  }

  public static final int decodeInt(byte [] bytes, int offset){
    return ((bytes[offset] & 0xFF) << 24) | ((bytes[offset + 1] & 0xFF) <<
16)
         | ((bytes[offset + 2] & 0xFF) <<  8) |  (bytes[offset + 3] & 0xFF);
  }

Which is clearly expecting 4 bytes... and nothing like FLOAT_DECODER

The scary thing is PayloadDecoder.FLOAT_DECODER seems to be used in
BoostingTermBuilder... and a whole bunch of unit tests, which I presume are
passing on coincidence of handling floats that happen to qualify as small
value integers but I haven't tested that theory yet.

PayloadDecoder.FLOAT_DECODER has been around for a long time, so I thought
I'd solicit opinions before attempting to clean it up. My concept of a
cleanup here would include trying to find a home for the float decoding
logic (from PayloadHelper) that can be seen by both areas and unifying
encode/decode for float payloads to a single location, and also add a unit
test that round trips the encode/decode cycle (which I'm not seeing).

-Gus


http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to