[ https://issues.apache.org/jira/browse/AVRO-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104571#comment-16104571 ]
ASF subversion and git services commented on AVRO-2048: ------------------------------------------------------- Commit 14488e35bc31f299de8cd88bd6d1ac07576eaa3e in avro's branch refs/heads/master from [~belugabehr] [ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=14488e3 ] AVRO-2048: Avro Binary Decoding - Gracefully Handle Long Strings > Avro Binary Decoding - Gracefully Handle Long Strings > ----------------------------------------------------- > > Key: AVRO-2048 > URL: https://issues.apache.org/jira/browse/AVRO-2048 > Project: Avro > Issue Type: Improvement > Components: java > Affects Versions: 1.7.7, 1.8.2 > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Minor > Attachments: AVRO-2048.1.patch, AVRO-2048.2.patch, AVRO-2048.3.patch > > > According to the > [specs|https://avro.apache.org/docs/1.8.2/spec.html#binary_encode_primitive]: > bq. a string is encoded as a *long* followed by that many bytes of UTF-8 > encoded character data. > However, that is currently not being adhered to: > {code:title=org.apache.avro.io.BinaryDecoder} > @Override > public Utf8 readString(Utf8 old) throws IOException { > int length = readInt(); > Utf8 result = (old != null ? old : new Utf8()); > result.setByteLength(length); > if (0 != length) { > doReadBytes(result.getBytes(), 0, length); > } > return result; > } > {code} > The first thing the code does here is to load an *int* value, not a *long*. > Because of the variable length nature of the size, this will mostly work. > However, there may be edge-cases where the serializer is putting in large > length values erroneously or nefariously. Let us gracefully detect such > scenarios and more closely adhere to the spec. -- This message was sent by Atlassian JIRA (v6.4.14#64029)