[C++][Discuss] Representing null union scalars

Antoine Pitrou Thu, 29 Jul 2021 04:00:15 -0700


Hello,

The Scalar base class has a `bool is_valid` member that is used torepresent null scalars for all types (including the null type).

A UnionScalar, since it inherits from Scalar, has the following de factostructure:


{
  // whether the scalar is null
  bool is_valid;
  // the underlying union value (only set if `is_valid` is true)
  std::shared_ptr<Scalar> value;
}

However, union arrays don't have a top-level validity bitmap. A null inan union array is always encoded as a "valid" top-level entry with anull entry in the chosen child array.

Therefore, there are two possible conventions for representing nullunion scalars:


1) set `scalar.is_valid` to false and `scalar.value` to nullptr

2) set `scalar.is_valid` to true and `scalar.value` to a null scalar forone the union's child types


Advantages / drawbacks of convention 1
--------------------------------------

+ consistency: `MakeNullScalar(union type)` always returns a scalar with`is_valid` set to false

- `union_array.GetScalar(i)` can return a null scalar even when`union_array.IsNull(i)` would return false


Advantages / drawbacks of convention 2
--------------------------------------

+ more congruent with the union datatype definition

+ `union_array.GetScalar(i)` always returns a valid scalar, just like`union_array.IsNull(i)` always returns false

- `MakeNullScalar(union type)` returns a valid scalar (or should itreturn an error??)

- breaks compatibility, since the current behaviour follows convention 1(but UnionScalar is probably not widely used...)

This came in the context of https://github.com/apache/arrow/pull/10817,but is unrelated to the changes in this PR.


Thoughts? Opinions?

Regards

Antoine.

[C++][Discuss] Representing null union scalars

Reply via email to