lidavidm commented on code in PR #13333: URL: https://github.com/apache/arrow/pull/13333#discussion_r897893686
########## docs/source/format/Columnar.rst: ########## @@ -765,6 +765,85 @@ application. We discuss dictionary encoding as it relates to serialization further below. +.. _run-length-encoded-layout: + +Run-Length-encoded Layout +------------------------- + +Run-Length is a data representation that represents data as sequences of the +same a, called runs. Each run is represented as a value, and an integer +describing how often this value is repeated. + +Any array can be run-length-encoded. A run-length encoded array has a single +buffer holding as many 32-bit integers, as there are runs. The actual values are +hold in a child array, which is just a regular array + +The dictionary is stored as an optional +property of an array. When a field is dictionary encoded, the values are +represented by an array of non-negative integers representing the index of the +value in the dictionary. The memory layout for a dictionary-encoded array is +the same as that of a primitive integer layout. The dictionary is handled as a +separate columnar array with its own respective layout. + +As an example, you could have the following data: :: + + type: Float32 + + [1.0, 1.0, 1.0, 1.0, null, null, 2.0] + +In Run-length-encoded form, this could appear as: + +:: + + * Length: 3, Null count: 2 + * Accumulated run lengths buffer: + + | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 6-63 | + |-------------|-------------|-------------|-----------------------| + | 4 | 6 | 7 | unspecified (padding) | Review Comment: Oops, right, you've mentioned that. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org