github-actions[bot] commented on code in PR #63001:
URL: https://github.com/apache/doris/pull/63001#discussion_r3192769770
##########
be/src/core/data_type/data_type_map.cpp:
##########
@@ -145,8 +145,8 @@ const char* DataTypeMap::deserialize(const char* buf,
MutableColumnPtr* column,
memcpy(map_offsets.data(), buf, sizeof(ColumnArray::Offset64) *
real_have_saved_num);
buf += sizeof(ColumnArray::Offset64) * real_have_saved_num;
// key value
- auto nested_keys_column = map_column->get_keys_ptr()->assume_mutable();
- auto nested_values_column = map_column->get_values_ptr()->assume_mutable();
+ auto nested_keys_column = std::move(*map_column->get_keys_ptr()).mutate();
+ auto nested_values_column =
std::move(*map_column->get_values_ptr()).mutate();
Review Comment:
Same COW ownership issue here: `mutate()` may return cloned key/value
children when the map subcolumns are shared, but the deserialized
`nested_keys_column` and `nested_values_column` are dropped instead of being
written back to `map_column`. That leaves updated offsets with stale
keys/values. Please assign both mutated children back to `get_keys_ptr()` and
`get_values_ptr()` after deserializing them.
##########
be/src/core/data_type/data_type_array.cpp:
##########
@@ -126,7 +126,7 @@ const char* DataTypeArray::deserialize(const char* buf,
MutableColumnPtr* column
memcpy(offsets.data(), buf, sizeof(ColumnArray::Offset64) *
real_have_saved_num);
buf += sizeof(ColumnArray::Offset64) * real_have_saved_num;
// children
- auto nested_column = data_column->get_data_ptr()->assume_mutable();
+ auto nested_column = std::move(*data_column->get_data_ptr()).mutate();
buf = get_nested_type()->deserialize(buf, &nested_column, be_exec_version);
Review Comment:
This now mutates a local `nested_column`, but if `get_data_ptr()` is shared,
`mutate()` clones the child and deserialization fills only that clone. Since
the clone is never assigned back to the `ColumnArray`, the array keeps the old
child while offsets have already been resized/copied, producing an inconsistent
deserialized array. Please store `nested_column` back into
`data_column->get_data_ptr()` after `get_nested_type()->deserialize(...)`.
##########
be/src/core/data_type/data_type_struct.cpp:
##########
@@ -223,7 +223,7 @@ const char* DataTypeStruct::deserialize(const char* buf,
MutableColumnPtr* colum
auto* struct_column = assert_cast<ColumnStruct*>(origin_column);
DCHECK(elems.size() == struct_column->tuple_size());
for (size_t i = 0; i < elems.size(); ++i) {
- auto child_column = struct_column->get_column_ptr(i)->assume_mutable();
+ auto child_column =
std::move(*struct_column->get_column_ptr(i)).mutate();
buf = elems[i]->deserialize(buf, &child_column, be_exec_version);
Review Comment:
If a struct child is shared, this `mutate()` clones it;
`elems[i]->deserialize()` then writes into the clone, but the clone is
discarded at the end of the loop iteration. The parent `ColumnStruct` continues
to reference the old child. Please write `child_column` back to
`struct_column->get_column_ptr(i)` after deserialization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]