Le 14/09/2022 à 20:18, Weston Pace a écrit :
I will clarify the offset problem.  It essentially boils down to "if
you don't have constant access to elements then an array length offset
does not give you constant access to buffer offsets".

We start with an RLE<int64> array of length 200.  We slice it with
(start=10, length=100) to get an RLE<int64> array of length 100 and an
offset of 10.

Now we want to write an IPC file (or access the values for whatever
reason).  The values buffer has 400 bytes and the run ends buffer has
200 bytes (these numbers could be anything less than 1600/800 so I'm
picking these at random).  We need to copy a portion of the "run ends"
buffer into the file.  What bytes are these?  The only way to tell
would be to do a binary search on the 200 bytes run ends buffer.

On the other hand, if there were two child arrays then an
implementation, when slicing, could choose to always keep the offset
of the parent array at 0 and instead put the offsets in the child
arrays.  Now you have a parent array with offset 0, a run ends (int32)
array with offset 74 and length 5 and a values (int64) array with
offset 74 and length 5.

Why would the run ends and the values have the same offset?
Also, how do you interpret the run ends if you have a physical offset into the values array?


Say you have the logical values: [5, 5, 5, 6, 6, 7, 7, 7]

Run ends: [3, 5, 8]
Values: [5, 6, 7]

Say you want to slice the RLE array from Logical Offset 4 (which doesn't fall on a run boundary). How do you represent that with Physical Offsets into Run ends and Values?

As soon as you set a Physical Offset on the Values, the Run ends don't match anymore.

Regards

Antoine.

Reply via email to