[Question] Rational for offsets instead of deltas

Jorge Cardoso Leitão Thu, 17 Jun 2021 22:26:04 -0700

Hi,

(this has no direction; I am just genuinely curious)


I am wondering, what is the rational to use "offsets" instead of
"lengths" to represent variable sized arrays?

I.e. ["a", "", None, "ab"] is represented as

offsets: [0, 1, 1, 1, 3]
values: "aab"

what is the reasoning to use this over

lengths: [1, 0, 0, 2]
values: "aab"

I am asking this because I have seen people using the LargeUtf8 type,
or breaking Record batches in chunks, to avoid hitting the ceiling of
i32 of large arrays with strings.

Is it to ensure O(1) random access (instead of having to sum all
deltas up to the index)?

Best,
Jorge

[Question] Rational for offsets instead of deltas

Reply via email to