Hi,

(this has no direction; I am just genuinely curious)

I am wondering, what is the rational to use "offsets" instead of
"lengths" to represent variable sized arrays?

I.e. ["a", "", None, "ab"] is represented as

offsets: [0, 1, 1, 1, 3]
values: "aab"

what is the reasoning to use this over

lengths: [1, 0, 0, 2]
values: "aab"

I am asking this because I have seen people using the LargeUtf8 type,
or breaking Record batches in chunks, to avoid hitting the ceiling of
i32 of large arrays with strings.

Is it to ensure O(1) random access (instead of having to sum all
deltas up to the index)?

Best,
Jorge

Reply via email to