Thank you everyone, I think I was pretty far off base in representing the work Tobias had done and both Tobias and Matt have clarified well.
* There are two child arrays not necessarily for slicing but more to help distinguish between the logical length (there are no buffers with the logical length but there is an array) and the physical length (all of the buffers are in these child arrays which are sized to the physical length). * There may be some caching tricks that could help with slicing but for the moment it is not a big deal. The extra log(N) is unlikely to be significant and, if it were, then an implementation or user library is free to use whatever caching mechanism needed to help. On Thu, Sep 15, 2022 at 8:16 AM Matt Topol <zotthewiz...@gmail.com> wrote: > > why would the run ends and values have the same offset? > > That's why I liked the idea of the children arrays and having the parent > offset being a "logical offset" and children being "physical offsets" > because it maintains the independence of the arrays. Slicing the RLE is > simply just setting the length and offset of the parent (at the cost of > O(Log(n)) to do a lookup without caching) and you can use run ends and > values with different offsets as the children. The only requirement is that > the "length" of the run ends and values arrays need to be the same. > > > > On Thu, Sep 15, 2022, 8:23 AM Tobias Zagorni <tob...@zagorni.eu.invalid> > wrote: > > > > { > > > length: 2 > > > offset: 6 > > > rle: { > > > length: 1 // actually physical length > > > offset: 2 > > > buffer: [3, 5,8] > > > } > > > values: { > > > length: 1 > > > offset: 2 > > > buffer: [5, 6, 7] > > > } > > > } > > > Does this make sense? > > > > I think this is a valid way of doing it. There are 2 problems I see > > with this variant: > > > > - The Slice function would need to be modified to to specially handle > > RLE. Slicing RLE based on a logical offset will now always be O(log(n)) > > . Slicing by just setting offset/length of the main array will no > > longer work. (which may not be that bad performance-wise, since in a > > lot of cases we can't get around resolving the phyiscal offset at some > > point). > > > > - How would we slice nested types, i.e a struct where some fields are > > run-length encoded? > > - We don't touch the rle fields since they are struct fields and > > these are not touched during slicing -> for anything where RLE is > > nested there is still only a logical offset. Now performance > > characteristics are also inconsistent between rle and struct<rle,...> > > - We change the RLE child's run ends/values arrays. Now we have a > > struct field that is no longer a valid array on its own, likely > > breaking a lot of things > > > > > Apologies Weston, if this is what you were getting at, but I think it > > > is > > > slightly different because you talked about 0 offsets in your > > > example. > > > > > > > > > > > So a slice at 4 would be: > > > > > Run ends: [5, 8] > > > > > Values: [6, 7] > > > > > > > > > How do you interpret that? > > > > Naively, that means the logical values [6, 6, 6, 6, 6, 7, 7, 7] > > > > It doesn't seem right... > > > > we could find the logical offset by looking into run_ends_array[-1] if > > run_ends_array.offset is not 0. That said the current code does not > > support that. > > > > > > > Best, > > Tobias > > > > >