Hello,

Since existing C Data Interface integrations sometimes don't parse beyond
the first `l` (or `L`) I'm going to start a new [VOTE] thread with Dewey's
suggestion:

+vl and +vL

If anyone objects to that and has a different suggestion, reply here so I
don't have to spam the list with too many new threads.

--
Felipe

On Thu, Oct 5, 2023 at 6:49 PM Dewey Dunnington
<de...@voltrondata.com.invalid> wrote:

> I won't belabour the point any more, but the difference in layout
> between a list and a list view is consequential enough to deserve its
> own top-level character in my opinion. My vote would be +1 for +vl and
> +vL.
>
> On Thu, Oct 5, 2023 at 6:40 PM Felipe Oliveira Carvalho
> <felipe...@gmail.com> wrote:
> >
> > > Union format strings share enough properties that having them in the
> > > same switch case doesn't result in additional complexity...lists and
> > > list views are completely different types (for the purposes of parsing
> > > the format string).
> >
> > Dense and sparse union differ a bit more than list and list-view.
> >
> > Not starting with `+l` for list-views would be a deviation from this
> > pattern started by unions.
> >
> >
> +------------------------+---------------------------------------------------+------------+
> > | ``+ud:I,J,...``        | dense union with type ids I,J...
> >  |            |
> >
> +------------------------+---------------------------------------------------+------------+
> > | ``+us:I,J,...``        | sparse union with type ids I,J...
> >   |            |
> >
> +------------------------+---------------------------------------------------+------------+
> >
> > Is sharing prefixes an issue?
> >
> > To make this more concrete, these are the parser changes for supporting
> > `+lv` and `+Lv` as I proposed in the beginning:
> >
> > @@ -1097,9 +1101,9 @@ struct SchemaImporter {
> >      RETURN_NOT_OK(f_parser_.CheckHasNext());
> >      switch (f_parser_.Next()) {
> >        case 'l':
> > -        return ProcessListLike<ListType>();
> > +        return ProcessVarLengthList<false>();
> >        case 'L':
> > -        return ProcessListLike<LargeListType>();
> > +        return ProcessVarLengthList<true>();
> >        case 'w':
> >          return ProcessFixedSizeList();
> >        case 's':
> > @@ -1195,12 +1199,30 @@ struct SchemaImporter {
> >      return CheckNoChildren(type);
> >    }
> >
> > -  template <typename ListType>
> > -  Status ProcessListLike() {
> > -    RETURN_NOT_OK(f_parser_.CheckAtEnd());
> > -    RETURN_NOT_OK(CheckNumChildren(1));
> > -    ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0));
> > -    type_ = std::make_shared<ListType>(field);
> > +  template <bool is_large_variation>
> > +  Status ProcessVarLengthList() {
> > +    if (f_parser_.AtEnd()) {
> > +      RETURN_NOT_OK(CheckNumChildren(1));
> > +      ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0));
> > +      if constexpr (is_large_variation) {
> > +        type_ = large_list(field);
> > +      } else {
> > +        type_ = list(field);
> > +      }
> > +    } else {
> > +      if (f_parser_.Next() == 'v') {
> > +        RETURN_NOT_OK(CheckNumChildren(1));
> > +        ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0));
> > +        if constexpr (is_large_variation) {
> > +          type_ = large_list_view(field);
> > +        } else {
> > +          type_ = list_view(field);
> > +        }
> > +      } else {
> > +        return f_parser_.Invalid();
> > +      }
> > +    }
> > +
> >      return Status::OK();
> >    }
> >
> > --
> > Felipe
> >
> >
> > On Thu, Oct 5, 2023 at 5:26 PM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> > >
> > > I don't think the parsing will be a problem even in C. It's not like
> you
> > > have to backtrack anyway.
> > >
> > > +1 from me on Felipe's proposal.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 05/10/2023 à 20:33, Felipe Oliveira Carvalho a écrit :
> > > > This mailing list thread is going to be the discussion.
> > > >
> > > > The union types also use two characters, so I didn’t think it would
> be a
> > > > problem.
> > > >
> > > > —
> > > > Felipe
> > > >
> > > > On Thu, 5 Oct 2023 at 15:26 Dewey Dunnington
> > > <de...@voltrondata.com.invalid>
> > > > wrote:
> > > >
> > > >> I'm sorry for missing earlier discussion on this or a PR into the
> > > >> format where this discussion may have occurred...is there a reason
> > > >> that +lv and +Lv were chosen over a single-character version (i.e.,
> > > >> maybe +v and +V)? A single-character version is (slightly) easier to
> > > >> parse in C.
> > > >>
> > > >> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho
> > > >> <felipe...@gmail.com> wrote:
> > > >>>
> > > >>> Hello,
> > > >>>
> > > >>> I'm writing to propose "+lv" and "+Lv" as format strings for
> list-view
> > > >> and
> > > >>> large list-view arrays passing through the Arrow C data interface
> [1].
> > > >>>
> > > >>> The vote will be open for at least 72 hours.
> > > >>>
> > > >>> [ ] +1 - I'm in favor of this new C Data Format string
> > > >>> [ ] +0
> > > >>> [ ] -1 - I'm against adding this new format string because....
> > > >>>
> > > >>> Thanks everyone!
> > > >>>
> > > >>> --
> > > >>> Felipe
> > > >>>
> > > >>> [1] https://arrow.apache.org/docs/format/CDataInterface.html
> > > >>
> > > >
> > >
>

Reply via email to