A change in the length of an array is equivalent to a change in at least
one of its buffers (i.e. length is always physical).

* Primitive arrays (i32, i64, etc): the arrays' length is equal to the
length of the buffer divided by the size of the type. E.g. buffer.len() = 8
and i32 <=> length = 2)
* Variable length (binary, list, utf8): the arrays' length is equal to the
length of the offset buffer divided by the size of the offset type minus
one (e.g. buffer.len() = 12 and i32 <=> length = 2)
* StructArray: the arrays' length is equal to the length of any of its
fields.
* ...

When appending a slot to a StructArray (null or not), we need to append one
item to each of its fields
* a primitive array field the values buffer is increased by the size of the
backing type (and, if it exists, its validity is increased by 1 bit)
* In variable length arrays the values offsets buffer is increased by the
size of the offset type (and, if it exists, its validity is increased by 1
bit)
* ...

What we append on each of its fields is underdetermined. Most
implementations append a null item, but anything is ok. For example, if the
field is a primitive array and has no validity, it may make more sense to
append a slot with value 0 to avoid allocating a validity. But if the field
itself is deeply nested, a null may be cheaper (less pushes on its
children).

Best,
Jorge



On Fri, Feb 18, 2022 at 8:02 PM Phillip Cloud <cpcl...@gmail.com> wrote:

> I think I'm confused by where this appended value lives. Is it only a
> logical value or does the value show up in memory?
> For example, appending another null to the name field is only going to
> change the validity map, offsets array and length and there will not be any
> changes the values buffer.
>
> The value is logically there, but there's no additional values-buffer
> memory.
>
> Is that correct?
>
> On Fri, Feb 18, 2022 at 1:44 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > >
> > > It is definitely required according to my understanding, and to how the
> > > C++ implementation works.  The validation functions in the C++
> > > implementation also check for this (if a child buffer is too small for
> > > the number of values advertised by the parent, it is an error).
> >
> > +1.
> >
> > I think the wording is confusing.   "While a struct does not have
> physical
> > storage for each of its semantic slots" refers to the fact that all
> fields
> > in the struct are stored in separate child arrays and not as buffers on
> the
> > Struct array itself.  The actual value used in the child Array isn't
> > important i the struct is null but it must be appended so the length of
> the
> > struct is equal to the length of all of its children.
> >
> > -Micah
> >
> > On Fri, Feb 18, 2022 at 10:39 AM Antoine Pitrou <anto...@python.org>
> > wrote:
> >
> > >
> > > Le 18/02/2022 à 19:29, Phillip Cloud a écrit :
> > > >
> > > > The description underneath the example says:
> > > >
> > > >> While a struct does not have physical storage for each of its
> semantic
> > > > slots
> > > >> (i.e. each scalar C-like struct), an entire struct slot can be set
> to
> > > > null via the validity bitmap.
> > > >
> > > > To me this suggests that appending a sentinel value to the values
> > buffer
> > > > for a field is allowed,
> > > > but not required.
> > > >
> > > > Am I understanding this correctly?
> > >
> > > It is definitely required according to my understanding, and to how the
> > > C++ implementation works.  The validation functions in the C++
> > > implementation also check for this (if a child buffer is too small for
> > > the number of values advertised by the parent, it is an error).
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> >
>

Reply via email to