+1

Thanks a lot for all this. Really exciting!!

On Mon, 19 Dec 2022, 17:56 Matt Topol, <zotthewiz...@gmail.com> wrote:

> That leaves us with a total vote of +1.5 so the vote carries with the
> caveat of changing the name to be Run End Encoded rather than Run Length
> Encoded (unless this means I need to do a new vote with the changed name?
> This is my first time doing one of these so please correct me if I need to
> do a new vote!)
>
> Thanks everyone for your feedback and comments!
>
> I'm going to go update the Go and Format specific PRs to make them regular
> PR's (instead of drafts) and get this all moving. Thanks in advance to
> anyone who reviews the upcoming PRs!
>
> --Matt
>
> On Fri, Dec 16, 2022 at 8:24 PM Weston Pace <weston.p...@gmail.com> wrote:
>
> > +1
> >
> > I agree that run-end encoding makes more sense but also don't see it
> > as a deal breaker.
> >
> > The most compelling counter-argument I've seen for new types is to
> > avoid a schism where some implementations do not support the newer
> > types.  However, for the type proposed here I think the risk is low
> > because data can be losslessly converted to existing formats for
> > compatibility with any system that doesn't support the type.
> >
> > Another argument I've seen is that we should introduce a more formal
> > distinction between "layouts" and "types" (with dictionary and
> > run-end-encoding being layouts).  However, this seems like an
> > impractical change at this point.  In addition, given that we have
> > dictionary as an array type the cat is already out of the bag.
> > Furthermore, systems and implementations are still welcome to make
> > this distinction themselves.  The spec only needs to specify what the
> > buffer layouts should be.  If a particular library chooses to group
> > those layouts into two different categories I think that would still
> > be feasible.
> >
> > -Weston
> >
> > On Fri, Dec 16, 2022 at 1:42 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> > >
> > > +1 on the proposal as written
> > >
> > > I think it makes sense and offers exciting opportunities for faster
> > > computation (especially for cases where parquet files can be decoded
> > > directly into such an array and avoid unpacking. RLE encoded dictionary
> > are
> > > quite compelling)
> > >
> > > I would prefer to use the term Run-End-Encoding (which would also
> follow
> > > the naming of the internal fields) but I don't view that as a deal
> > blocker.
> > >
> > > Thank you for all your work in this matter,
> > > Andrew
> > >
> > > On Wed, Dec 14, 2022 at 5:08 PM Matt Topol <zotthewiz...@gmail.com>
> > wrote:
> > >
> > > > I'm not at all opposed to renaming it as `Run-End-Encoding` if that
> > would
> > > > be preferable. Hopefully others will chime in with their feedback.
> > > >
> > > > --Matt
> > > >
> > > > On Wed, Dec 14, 2022 at 12:09 PM Ian Cook <i...@ursacomputing.com>
> > wrote:
> > > >
> > > > > Thank you Matt, Tobias, and others for the great work on this.
> > > > >
> > > > > I am -0.5 on this proposal in its current form because (pardon the
> > > > > pedantry) what we have implemented here is not run-length encoding;
> > it
> > > > > is run-end encoding. Based on community input, the choice was made
> to
> > > > > store run ends instead of run lengths because this enables
> O(log(N))
> > > > > random access as opposed to O(N). This is a sensible choice, but it
> > > > > comes with some trade-offs including limitations in array length
> > > > > (which maybe not really a problem in practice) and lack of
> > bit-for-bit
> > > > > equivalence with RLE encodings that use run lengths like Velox's
> > > > > SequenceVector encoding (which I think is a more serious problem in
> > > > > practice).
> > > > >
> > > > > I believe that we should either:
> > > > > (a) rename this to "run-end encoding"
> > > > > (b) change this to a parameterized type called "run encoding" that
> > > > > takes a Boolean parameter specifying whether run lengths or run
> ends
> > > > > are stored.
> > > > >
> > > > > Ian
> > > > >
> > > > > On Wed, Dec 14, 2022 at 11:27 AM Matt Topol <
> zotthewiz...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'd like to propose adding the RLE type based on earlier
> > > > > discussions[1][2]
> > > > > > to the Arrow format:
> > > > > > - Columnar Format description:
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/arrow/pull/13333/files#diff-8b68cf6859e881f2357f5df64bb073135d7ff6eeb51f116418660b3856564c60
> > > > > > - Flatbuffers changes:
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/arrow/pull/14176/files#diff-e54b4f5d2d279acc5d1df5df9a7636f0142a8041fe02f07034e0d8be48444b07
> > > > > >
> > > > > > There is a proposed implementation available in both C++ (written
> > by
> > > > > Tobias
> > > > > > Zagorni) and Go[3][4]. Both implementations have mostly the same
> > tests
> > > > > > implemented and were tested to be compatible over IPC with an
> > archery
> > > > > test.
> > > > > > In both cases, the implementations are split out among several
> > Draft
> > > > PRs
> > > > > so
> > > > > > that they can be easily reviewed piecemeal if the vote is
> approved,
> > > > with
> > > > > > each Draft PR including the changes of the one before it. The
> links
> > > > > > provided are the Draft PRs with the entirety of the changes
> > included.
> > > > > >
> > > > > > The vote will be open for at least 72 hours.
> > > > > >
> > > > > > [ ] +1 add the proposed RLE type to the Apache Arrow format
> > > > > > [ ] -1 do not add the proposed RLE type to the Apache Arrow
> format
> > > > > > because...
> > > > > >
> > > > > > Thanks much, and please let me know if any more information or
> > links
> > > > are
> > > > > > needed (I've never proposed a vote before on here!)
> > > > > >
> > > > > > --Matt
> > > > > >
> > > > > > [1]
> > https://lists.apache.org/thread/bfz3m5nyf7flq7n6q9b1bx3jhcn4wq29
> > > > > > [2]
> > https://lists.apache.org/thread/xb7c723csrtwt0md3m4p56bt0193n7jq
> > > > > > [3] https://github.com/apache/arrow/pull/14179
> > > > > > [4] https://github.com/apache/arrow/pull/14223
> > > > >
> > > >
> >
>

Reply via email to