Le 18/02/2022 à 23:44, Micah Kornfield a écrit :

Ok, then perhaps you might have some thoughts on the original question: is
the JavaScript implementation currently incorrect?

I think whether it is a bug or not depends on the contract of the builder.
If the contract is that the builder assumes users will ensure equal lengths
of all the children, then it is probably correct as is.  If it is more
consistent with the code that the builder of the struct should manage
appending a placeholder-value to its children then it is a reasonable
change.

Agreed with Micah. What stands is that in a struct array, the child arrays must have the same length as the parent. Then it's a matter of how the builder API is defined, and different implementations may choose different strategies.

Regards

Antoine.




I seem to recall at least in C++ that we actually changed the behavior of
builders in this regard at some point, but I might be misremembering (the
change might have been appending a place-holder value instead
appends nulls, to lower the chances of needing validity buffers on children
arrays if all values in the struct are null).

Whatever the implementation is, the post-condition for the resulting struct
array is that its length is equal to the length of all of its children
arrays.

Cheers,
Micah



On Fri, Feb 18, 2022 at 1:12 PM Phillip Cloud <cpcl...@gmail.com> wrote:

On Fri, Feb 18, 2022 at 3:44 PM Antoine Pitrou <anto...@python.org> wrote:


Le 18/02/2022 à 21:32, Phillip Cloud a écrit :

I am really struggling to see how anything I've said is inconsistent
with
the spec or what you are saying here.

To recap what I've said:

1. Appending a null sentinel to the values buffer isn't _required_
unless
the type requires it.
Ex: "joemark" in the spec example. No sentinels were append for the two
null values in the parent struct array.

There is no notion of sentinel in the Arrow format, so I don't
understand what you're saying.


The word "sentinel" is a linguistic placeholder for "some set of bytes".
Hopefully that's clear from the context.



(a sentinel is a physical value having a specific meaning, for example a
data format that has no separate validity bitmap could use the integer
value 42 to indicate null values in an integer array; the Arrow format
has a separate validity bitmap and therefore doesn't make use of
sentinel values)


2. Appending a null value sentinel is _allowed_ to be there if the type
does not require it.
Ex: "joefoofoomark" extending the spec example, assuming the other
associated buffers (validity, offsets) are correctly constructed.

Is either of those statements incorrect?

To me, they simply don't make sense given that sentinels don't exist in
Arrow.


Do they make sense after substituting in "a null entry in a string array
with a non-zero number of bytes"?



That said, a null entry in a string array can be backed by a non-zero
number of bytes in the values buffer. That is unrelated to the question
about struct arrays. For example, "joefoofoomark" can very well be the
values buffer for a string array with the logical values ["joe", null,
"mark"]. In this case, the offsets will be [0, 3, 9, 13].


Ok, then perhaps you might have some thoughts on the original question: is
the JavaScript implementation currently incorrect?



Regards

Antoine.



Reply via email to