I won't belabour the point any more, but the difference in layout
between a list and a list view is consequential enough to deserve its
own top-level character in my opinion. My vote would be +1 for +vl and
+vL.

On Thu, Oct 5, 2023 at 6:40 PM Felipe Oliveira Carvalho
<felipe...@gmail.com> wrote:
>
> > Union format strings share enough properties that having them in the
> > same switch case doesn't result in additional complexity...lists and
> > list views are completely different types (for the purposes of parsing
> > the format string).
>
> Dense and sparse union differ a bit more than list and list-view.
>
> Not starting with `+l` for list-views would be a deviation from this
> pattern started by unions.
>
> +------------------------+---------------------------------------------------+------------+
> | ``+ud:I,J,...``        | dense union with type ids I,J...
>  |            |
> +------------------------+---------------------------------------------------+------------+
> | ``+us:I,J,...``        | sparse union with type ids I,J...
>   |            |
> +------------------------+---------------------------------------------------+------------+
>
> Is sharing prefixes an issue?
>
> To make this more concrete, these are the parser changes for supporting
> `+lv` and `+Lv` as I proposed in the beginning:
>
> @@ -1097,9 +1101,9 @@ struct SchemaImporter {
>      RETURN_NOT_OK(f_parser_.CheckHasNext());
>      switch (f_parser_.Next()) {
>        case 'l':
> -        return ProcessListLike<ListType>();
> +        return ProcessVarLengthList<false>();
>        case 'L':
> -        return ProcessListLike<LargeListType>();
> +        return ProcessVarLengthList<true>();
>        case 'w':
>          return ProcessFixedSizeList();
>        case 's':
> @@ -1195,12 +1199,30 @@ struct SchemaImporter {
>      return CheckNoChildren(type);
>    }
>
> -  template <typename ListType>
> -  Status ProcessListLike() {
> -    RETURN_NOT_OK(f_parser_.CheckAtEnd());
> -    RETURN_NOT_OK(CheckNumChildren(1));
> -    ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0));
> -    type_ = std::make_shared<ListType>(field);
> +  template <bool is_large_variation>
> +  Status ProcessVarLengthList() {
> +    if (f_parser_.AtEnd()) {
> +      RETURN_NOT_OK(CheckNumChildren(1));
> +      ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0));
> +      if constexpr (is_large_variation) {
> +        type_ = large_list(field);
> +      } else {
> +        type_ = list(field);
> +      }
> +    } else {
> +      if (f_parser_.Next() == 'v') {
> +        RETURN_NOT_OK(CheckNumChildren(1));
> +        ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0));
> +        if constexpr (is_large_variation) {
> +          type_ = large_list_view(field);
> +        } else {
> +          type_ = list_view(field);
> +        }
> +      } else {
> +        return f_parser_.Invalid();
> +      }
> +    }
> +
>      return Status::OK();
>    }
>
> --
> Felipe
>
>
> On Thu, Oct 5, 2023 at 5:26 PM Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > I don't think the parsing will be a problem even in C. It's not like you
> > have to backtrack anyway.
> >
> > +1 from me on Felipe's proposal.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 05/10/2023 à 20:33, Felipe Oliveira Carvalho a écrit :
> > > This mailing list thread is going to be the discussion.
> > >
> > > The union types also use two characters, so I didn’t think it would be a
> > > problem.
> > >
> > > —
> > > Felipe
> > >
> > > On Thu, 5 Oct 2023 at 15:26 Dewey Dunnington
> > <de...@voltrondata.com.invalid>
> > > wrote:
> > >
> > >> I'm sorry for missing earlier discussion on this or a PR into the
> > >> format where this discussion may have occurred...is there a reason
> > >> that +lv and +Lv were chosen over a single-character version (i.e.,
> > >> maybe +v and +V)? A single-character version is (slightly) easier to
> > >> parse in C.
> > >>
> > >> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho
> > >> <felipe...@gmail.com> wrote:
> > >>>
> > >>> Hello,
> > >>>
> > >>> I'm writing to propose "+lv" and "+Lv" as format strings for list-view
> > >> and
> > >>> large list-view arrays passing through the Arrow C data interface [1].
> > >>>
> > >>> The vote will be open for at least 72 hours.
> > >>>
> > >>> [ ] +1 - I'm in favor of this new C Data Format string
> > >>> [ ] +0
> > >>> [ ] -1 - I'm against adding this new format string because....
> > >>>
> > >>> Thanks everyone!
> > >>>
> > >>> --
> > >>> Felipe
> > >>>
> > >>> [1] https://arrow.apache.org/docs/format/CDataInterface.html
> > >>
> > >
> >

Reply via email to