[ https://issues.apache.org/jira/browse/AVRO-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
BELUGA BEHR updated AVRO-2048: ------------------------------ Attachment: AVRO-2048.3.patch > Avro Binary Decoding - Gracefully Handle Long Strings > ----------------------------------------------------- > > Key: AVRO-2048 > URL: https://issues.apache.org/jira/browse/AVRO-2048 > Project: Avro > Issue Type: Improvement > Components: java > Affects Versions: 1.7.7, 1.8.2 > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Minor > Attachments: AVRO-2048.1.patch, AVRO-2048.2.patch, AVRO-2048.3.patch > > > According to the > [specs|https://avro.apache.org/docs/1.8.2/spec.html#binary_encode_primitive]: > bq. a string is encoded as a *long* followed by that many bytes of UTF-8 > encoded character data. > However, that is currently not being adhered to: > {code:title=org.apache.avro.io.BinaryDecoder} > @Override > public Utf8 readString(Utf8 old) throws IOException { > int length = readInt(); > Utf8 result = (old != null ? old : new Utf8()); > result.setByteLength(length); > if (0 != length) { > doReadBytes(result.getBytes(), 0, length); > } > return result; > } > {code} > The first thing the code does here is to load an *int* value, not a *long*. > Because of the variable length nature of the size, this will mostly work. > However, there may be edge-cases where the serializer is putting in large > length values erroneously or nefariously. Let us gracefully detect such > scenarios and more closely adhere to the spec. -- This message was sent by Atlassian JIRA (v6.4.14#64029)