[
https://issues.apache.org/jira/browse/AVRO-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Mollitor updated AVRO-4067:
---------------------------------
Description:
Long values are used for many different areas of the spec, and in particular a
'zero' value is used often. for example:
{quote}a string is encoded as a long followed by that many bytes of UTF-8
encoded character data.
{quote}
{quote}Arrays are encoded as a series of blocks. Each block consists of a long
count value, followed by that many array items. A block with count zero
indicates the end of the array. Each item is encoded per the array’s item
schema.
{quote}
{quote}Maps are encoded as a series of blocks. Each block consists of a long
count value, followed by that many key/value pairs. A block with count zero
indicates the end of the map. Each item is encoded per the map’s value schema.
{quote}
Because of this, long values actually tend to be pretty small on average, and
so can often fit within the first byte of the variable-length array. Therefore,
the first byte should be prioritized.
For the first byte, if the high-order bit is set, then not only does it mean
there are more bytes to follow, but that the signed value of the byte will be
negative. Therefore, the inverse is that for a positive number (>=0), then
there are not more bytes to follow.
Check the first byte, and if it is positive, exit early, if it is zero, return
zero.
was:
Long values are used for many different areas of the spec, for example:
bq. a string is encoded as a long followed by that many bytes of UTF-8 encoded
character data.
Because of this, long values actually tend to be pretty small on average, and
so can often fit within the first byte of the variable-length array. Therefore,
the first byte should be prioritized.
For the first byte, if the high-order bit is set, then not only does it mean
there are more bytes to follow, but that the signed value of the byte will be
negative. Therefore, the inverse is that for a positive number (>=0), then
there are not more bytes to follow.
Check the first byte, and if it is positive, exit early, if it is zero, return
zero.
> Optimize First Byte of Long Decode
> ----------------------------------
>
> Key: AVRO-4067
> URL: https://issues.apache.org/jira/browse/AVRO-4067
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.12.0
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Minor
> Fix For: 1.13.0
>
>
> Long values are used for many different areas of the spec, and in particular
> a 'zero' value is used often. for example:
>
> {quote}a string is encoded as a long followed by that many bytes of UTF-8
> encoded character data.
> {quote}
> {quote}Arrays are encoded as a series of blocks. Each block consists of a
> long count value, followed by that many array items. A block with count zero
> indicates the end of the array. Each item is encoded per the array’s item
> schema.
> {quote}
> {quote}Maps are encoded as a series of blocks. Each block consists of a long
> count value, followed by that many key/value pairs. A block with count zero
> indicates the end of the map. Each item is encoded per the map’s value schema.
> {quote}
> Because of this, long values actually tend to be pretty small on average, and
> so can often fit within the first byte of the variable-length array.
> Therefore, the first byte should be prioritized.
> For the first byte, if the high-order bit is set, then not only does it mean
> there are more bytes to follow, but that the signed value of the byte will be
> negative. Therefore, the inverse is that for a positive number (>=0), then
> there are not more bytes to follow.
> Check the first byte, and if it is positive, exit early, if it is zero,
> return zero.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)