felipecrv commented on code in PR #37877:
URL: https://github.com/apache/arrow/pull/37877#discussion_r1990201006


##########
docs/source/format/Columnar.rst:
##########
@@ -487,6 +499,103 @@ will be represented as follows: ::
           |-------------------------------|-----------------------|
           | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | unspecified (padding) |
 
+ListView Layout
+~~~~~~~~~~~~~~~
+
+The ListView layout is defined by three buffers: a validity bitmap, an offsets
+buffer, and an additional sizes buffer. Sizes and offsets have the identical 
bit
+width and both 32-bit and 64-bit signed integer options are supported.
+
+As in the List layout, the offsets encode the start position of each slot in 
the
+child array. In contrast to the List layout, list lengths are stored explicitly
+in the sizes buffer instead of inferred. This allows offsets to be out of 
order.
+Elements of the child array do not have to be stored in the same order they
+logically appear in the list elements of the parent array.
+
+Every list-view value, including null values, has to guarantee the following
+invariants: ::
+
+    0 <= offsets[i] <= length of the child array
+    0 <= offsets[i] + size[i] <= length of the child array
+
+A list-view type is specified like ``ListView<T>``, where ``T`` is any type
+(primitive or nested). In these examples we use 32-bit offsets and sizes where
+the 64-bit version would be denoted by ``LargeListView<T>``.
+
+**Example Layout: ``ListView<Int8>`` Array**
+
+We illustrate an example of ``ListView<Int8>`` with length 4 having values::
+
+    [[12, -7, 25], null, [0, -127, 127, 50], []]

Review Comment:
   @adriangb anything can happen: they can be duplicated in the data or entries 
can point to the same data.
   
   Compact representation:
   
   ```
   buffers:
     offsets: [0, _, 3, _, 0]
     sizes:   [3, _, 4, 0, 3]
   
   children:
     values: [12, -7, 25, 0, -127, 127, 12]
   ```
   
   Common representation:
   
   ```
   buffers:
     offsets: [0, _, 3, _, 7]
     sizes:   [3, _, 4, 0, 3]
   
   children:
     values: [12, -7, 25, 0, -127, 127, 12, 12, -7, 25]
   ```
   
   *using _ to indicate that the value doesn't matter*
   
   Doing de-duplication is an expensive operation, but you can imagine some 
kernel, by construction, producing a compact list-view array. Imagine a 
function that generates an array of prefixes of another array given sizes -- 
every offset of would be `0` and only the sizes would vary.
   
   The main practical consequence of the `ListViewArray` is that lists can be 
written to the array in any random order. If you need to set array[i] to the 
logical value [a, b, c] all you have to do is append [a, b, c] to the child 
array and set offsets[i] and sizes[i] to the appropriate sizes. This is not 
possible with `ListArray` since an array at a random position i forces all the 
following values of the child array to move further.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to