Copilot commented on code in PR #50049:
URL: https://github.com/apache/arrow/pull/50049#discussion_r3315093613
##########
cpp/src/arrow/array/builder_binary.h:
##########
@@ -415,8 +422,6 @@ class ARROW_EXPORT BinaryBuilder : public
BaseBinaryBuilder<BinaryType> {
/// \endcond
Status Finish(std::shared_ptr<BinaryArray>* out) { return FinishTyped(out); }
Review Comment:
`BinaryBuilder::Finish(std::shared_ptr<BinaryArray>* ...)` still calls
`FinishTyped`, which goes through `ArrayBuilder::Finish()` and then
`std::static_pointer_cast<BinaryArray>`. Now that `BinaryBuilder::type()` may
return an extension type, `Finish()` can produce an `ExtensionArray`, making
this cast undefined behavior. Consider overriding this `Finish(BinaryArray*)`
overload to either (a) return a `TypeError` when `type()->id()` is not
`Type::BINARY`, or (b) explicitly finish to a storage `BinaryArray` when the
builder was constructed with an extension type.
##########
cpp/src/arrow/array/builder_binary.h:
##########
@@ -60,11 +60,15 @@ class BaseBinaryBuilder
explicit BaseBinaryBuilder(MemoryPool* pool = default_memory_pool(),
int64_t alignment = kDefaultBufferAlignment)
: ArrayBuilder(pool, alignment),
+ type_(std::make_shared<TypeClass>()),
offsets_builder_(pool, alignment),
value_data_builder_(pool, alignment) {}
BaseBinaryBuilder(const std::shared_ptr<DataType>& type, MemoryPool* pool)
- : BaseBinaryBuilder(pool) {}
+ : ArrayBuilder(pool),
+ type_(type),
+ offsets_builder_(pool),
+ value_data_builder_(pool) {}
Review Comment:
The `BaseBinaryBuilder(const std::shared_ptr<DataType>& type, ...)`
constructor now stores and later uses the passed-in `type` to build
`ArrayData`. Since this is a public constructor (inherited by `BinaryBuilder` /
`LargeBinaryBuilder`), passing an incompatible type (wrong id / wrong offset
width / wrong layout) would now yield an invalid array. It would be safer to
DCHECK/ARROW_CHECK that `type` is either the expected base binary type or an
extension type with a matching storage type, and also check `type` is non-null.
##########
cpp/src/arrow/array/builder_binary.h:
##########
@@ -445,8 +450,6 @@ class ARROW_EXPORT LargeBinaryBuilder : public
BaseBinaryBuilder<LargeBinaryType
/// \endcond
Status Finish(std::shared_ptr<LargeBinaryArray>* out) { return
FinishTyped(out); }
Review Comment:
`LargeBinaryBuilder::Finish(std::shared_ptr<LargeBinaryArray>* ...)` uses
`FinishTyped`, which performs a `std::static_pointer_cast` after calling
`ArrayBuilder::Finish()`. With this PR, `type()` can now be an extension type,
so `Finish()` may return an `ExtensionArray` and the static cast becomes
undefined behavior. Suggest guarding this overload (e.g., return `TypeError`
unless `type()->id() == Type::LARGE_BINARY`) or finishing explicitly to a
storage `LargeBinaryArray` when `type()` is an extension type.
##########
cpp/src/arrow/array/builder_binary.h:
##########
@@ -60,11 +60,15 @@ class BaseBinaryBuilder
explicit BaseBinaryBuilder(MemoryPool* pool = default_memory_pool(),
int64_t alignment = kDefaultBufferAlignment)
: ArrayBuilder(pool, alignment),
+ type_(std::make_shared<TypeClass>()),
Review Comment:
The default `BaseBinaryBuilder` constructor now does
`type_(std::make_shared<TypeClass>())`, which allocates a new type instance per
builder. For consistency with other builders and to avoid repeated allocations,
consider initializing with the appropriate singleton (e.g.,
`TypeTraits<TypeClass>::type_singleton()` / `binary()` / `large_binary()`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]