I won't belabour the point any more, but the difference in layout between a list and a list view is consequential enough to deserve its own top-level character in my opinion. My vote would be +1 for +vl and +vL.
On Thu, Oct 5, 2023 at 6:40 PM Felipe Oliveira Carvalho <felipe...@gmail.com> wrote: > > > Union format strings share enough properties that having them in the > > same switch case doesn't result in additional complexity...lists and > > list views are completely different types (for the purposes of parsing > > the format string). > > Dense and sparse union differ a bit more than list and list-view. > > Not starting with `+l` for list-views would be a deviation from this > pattern started by unions. > > +------------------------+---------------------------------------------------+------------+ > | ``+ud:I,J,...`` | dense union with type ids I,J... > | | > +------------------------+---------------------------------------------------+------------+ > | ``+us:I,J,...`` | sparse union with type ids I,J... > | | > +------------------------+---------------------------------------------------+------------+ > > Is sharing prefixes an issue? > > To make this more concrete, these are the parser changes for supporting > `+lv` and `+Lv` as I proposed in the beginning: > > @@ -1097,9 +1101,9 @@ struct SchemaImporter { > RETURN_NOT_OK(f_parser_.CheckHasNext()); > switch (f_parser_.Next()) { > case 'l': > - return ProcessListLike<ListType>(); > + return ProcessVarLengthList<false>(); > case 'L': > - return ProcessListLike<LargeListType>(); > + return ProcessVarLengthList<true>(); > case 'w': > return ProcessFixedSizeList(); > case 's': > @@ -1195,12 +1199,30 @@ struct SchemaImporter { > return CheckNoChildren(type); > } > > - template <typename ListType> > - Status ProcessListLike() { > - RETURN_NOT_OK(f_parser_.CheckAtEnd()); > - RETURN_NOT_OK(CheckNumChildren(1)); > - ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0)); > - type_ = std::make_shared<ListType>(field); > + template <bool is_large_variation> > + Status ProcessVarLengthList() { > + if (f_parser_.AtEnd()) { > + RETURN_NOT_OK(CheckNumChildren(1)); > + ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0)); > + if constexpr (is_large_variation) { > + type_ = large_list(field); > + } else { > + type_ = list(field); > + } > + } else { > + if (f_parser_.Next() == 'v') { > + RETURN_NOT_OK(CheckNumChildren(1)); > + ARROW_ASSIGN_OR_RAISE(auto field, MakeChildField(0)); > + if constexpr (is_large_variation) { > + type_ = large_list_view(field); > + } else { > + type_ = list_view(field); > + } > + } else { > + return f_parser_.Invalid(); > + } > + } > + > return Status::OK(); > } > > -- > Felipe > > > On Thu, Oct 5, 2023 at 5:26 PM Antoine Pitrou <anto...@python.org> wrote: > > > > > I don't think the parsing will be a problem even in C. It's not like you > > have to backtrack anyway. > > > > +1 from me on Felipe's proposal. > > > > Regards > > > > Antoine. > > > > > > Le 05/10/2023 à 20:33, Felipe Oliveira Carvalho a écrit : > > > This mailing list thread is going to be the discussion. > > > > > > The union types also use two characters, so I didn’t think it would be a > > > problem. > > > > > > — > > > Felipe > > > > > > On Thu, 5 Oct 2023 at 15:26 Dewey Dunnington > > <de...@voltrondata.com.invalid> > > > wrote: > > > > > >> I'm sorry for missing earlier discussion on this or a PR into the > > >> format where this discussion may have occurred...is there a reason > > >> that +lv and +Lv were chosen over a single-character version (i.e., > > >> maybe +v and +V)? A single-character version is (slightly) easier to > > >> parse in C. > > >> > > >> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho > > >> <felipe...@gmail.com> wrote: > > >>> > > >>> Hello, > > >>> > > >>> I'm writing to propose "+lv" and "+Lv" as format strings for list-view > > >> and > > >>> large list-view arrays passing through the Arrow C data interface [1]. > > >>> > > >>> The vote will be open for at least 72 hours. > > >>> > > >>> [ ] +1 - I'm in favor of this new C Data Format string > > >>> [ ] +0 > > >>> [ ] -1 - I'm against adding this new format string because.... > > >>> > > >>> Thanks everyone! > > >>> > > >>> -- > > >>> Felipe > > >>> > > >>> [1] https://arrow.apache.org/docs/format/CDataInterface.html > > >> > > > > >