Copilot commented on code in PR #50049:
URL: https://github.com/apache/arrow/pull/50049#discussion_r3315093613


##########
cpp/src/arrow/array/builder_binary.h:
##########
@@ -415,8 +422,6 @@ class ARROW_EXPORT BinaryBuilder : public 
BaseBinaryBuilder<BinaryType> {
   /// \endcond
 
   Status Finish(std::shared_ptr<BinaryArray>* out) { return FinishTyped(out); }

Review Comment:
   `BinaryBuilder::Finish(std::shared_ptr<BinaryArray>* ...)` still calls 
`FinishTyped`, which goes through `ArrayBuilder::Finish()` and then 
`std::static_pointer_cast<BinaryArray>`. Now that `BinaryBuilder::type()` may 
return an extension type, `Finish()` can produce an `ExtensionArray`, making 
this cast undefined behavior. Consider overriding this `Finish(BinaryArray*)` 
overload to either (a) return a `TypeError` when `type()->id()` is not 
`Type::BINARY`, or (b) explicitly finish to a storage `BinaryArray` when the 
builder was constructed with an extension type.
   



##########
cpp/src/arrow/array/builder_binary.h:
##########
@@ -60,11 +60,15 @@ class BaseBinaryBuilder
   explicit BaseBinaryBuilder(MemoryPool* pool = default_memory_pool(),
                              int64_t alignment = kDefaultBufferAlignment)
       : ArrayBuilder(pool, alignment),
+        type_(std::make_shared<TypeClass>()),
         offsets_builder_(pool, alignment),
         value_data_builder_(pool, alignment) {}
 
   BaseBinaryBuilder(const std::shared_ptr<DataType>& type, MemoryPool* pool)
-      : BaseBinaryBuilder(pool) {}
+      : ArrayBuilder(pool),
+        type_(type),
+        offsets_builder_(pool),
+        value_data_builder_(pool) {}

Review Comment:
   The `BaseBinaryBuilder(const std::shared_ptr<DataType>& type, ...)` 
constructor now stores and later uses the passed-in `type` to build 
`ArrayData`. Since this is a public constructor (inherited by `BinaryBuilder` / 
`LargeBinaryBuilder`), passing an incompatible type (wrong id / wrong offset 
width / wrong layout) would now yield an invalid array. It would be safer to 
DCHECK/ARROW_CHECK that `type` is either the expected base binary type or an 
extension type with a matching storage type, and also check `type` is non-null.



##########
cpp/src/arrow/array/builder_binary.h:
##########
@@ -445,8 +450,6 @@ class ARROW_EXPORT LargeBinaryBuilder : public 
BaseBinaryBuilder<LargeBinaryType
   /// \endcond
 
   Status Finish(std::shared_ptr<LargeBinaryArray>* out) { return 
FinishTyped(out); }

Review Comment:
   `LargeBinaryBuilder::Finish(std::shared_ptr<LargeBinaryArray>* ...)` uses 
`FinishTyped`, which performs a `std::static_pointer_cast` after calling 
`ArrayBuilder::Finish()`. With this PR, `type()` can now be an extension type, 
so `Finish()` may return an `ExtensionArray` and the static cast becomes 
undefined behavior. Suggest guarding this overload (e.g., return `TypeError` 
unless `type()->id() == Type::LARGE_BINARY`) or finishing explicitly to a 
storage `LargeBinaryArray` when `type()` is an extension type.
   



##########
cpp/src/arrow/array/builder_binary.h:
##########
@@ -60,11 +60,15 @@ class BaseBinaryBuilder
   explicit BaseBinaryBuilder(MemoryPool* pool = default_memory_pool(),
                              int64_t alignment = kDefaultBufferAlignment)
       : ArrayBuilder(pool, alignment),
+        type_(std::make_shared<TypeClass>()),

Review Comment:
   The default `BaseBinaryBuilder` constructor now does 
`type_(std::make_shared<TypeClass>())`, which allocates a new type instance per 
builder. For consistency with other builders and to avoid repeated allocations, 
consider initializing with the appropriate singleton (e.g., 
`TypeTraits<TypeClass>::type_singleton()` / `binary()` / `large_binary()`).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to