Le 18/02/2022 à 21:32, Phillip Cloud a écrit :

I am really struggling to see how anything I've said is inconsistent with
the spec or what you are saying here.

To recap what I've said:

1. Appending a null sentinel to the values buffer isn't _required_ unless
the type requires it.
Ex: "joemark" in the spec example. No sentinels were append for the two
null values in the parent struct array.

There is no notion of sentinel in the Arrow format, so I don't understand what you're saying.

(a sentinel is a physical value having a specific meaning, for example a data format that has no separate validity bitmap could use the integer value 42 to indicate null values in an integer array; the Arrow format has a separate validity bitmap and therefore doesn't make use of sentinel values)

2. Appending a null value sentinel is _allowed_ to be there if the type
does not require it.
Ex: "joefoofoomark" extending the spec example, assuming the other
associated buffers (validity, offsets) are correctly constructed.

Is either of those statements incorrect?

To me, they simply don't make sense given that sentinels don't exist in Arrow.

That said, a null entry in a string array can be backed by a non-zero number of bytes in the values buffer. That is unrelated to the question about struct arrays. For example, "joefoofoomark" can very well be the values buffer for a string array with the logical values ["joe", null, "mark"]. In this case, the offsets will be [0, 3, 9, 13].

Regards

Antoine.

Reply via email to