github-actions[bot] commented on code in PR #64944:
URL: https://github.com/apache/doris/pull/64944#discussion_r3490446687
##########
be/src/core/column/column_nullable.cpp:
##########
@@ -182,18 +182,15 @@ void ColumnNullable::update_crcs_with_value(uint32_t*
__restrict hashes, doris::
void ColumnNullable::update_crc32c_batch(uint32_t* __restrict hashes,
const uint8_t* __restrict /* null_map
*/) const {
const auto* __restrict real_null_data =
get_null_map_column().get_data().data();
- if (_nested_column->support_replace_column_null_data()) {
- // nullmap process is slow, replace null data to default value to
avoid nullmap process
- // This is an intentional in-place mutation inside a logically-const
hash computation:
- // null positions are overwritten with defaults so the inner hash loop
needs no null checks.
- // The invariant is that a column instance is not hashed concurrently
through the same
- // owner while this per-block hash path runs. Shared aliases are
detached by mutate()
- // before this normalized nested column is written back.
- auto nested_mut = std::move(*static_cast<const
IColumn::Ptr&>(_nested_column)).mutate();
- nested_mut->replace_column_null_data(real_null_data);
-
static_cast<IColumn::Ptr&>(const_cast<IColumn::WrappedPtr&>(_nested_column)) =
- std::move(nested_mut);
+ if (!has_null()) {
_nested_column->update_crc32c_batch(hashes, nullptr);
+ return;
+ }
+
+ if (_nested_column->support_replace_column_null_data()) {
+ auto nested_for_hash =
_nested_column->clone_resized(_nested_column->size());
Review Comment:
This fixes the UAF by avoiding the const-path mutation, but it moves a
full-column allocation/copy into the CRC32C partition hashing hot path.
`update_crc32c_batch()` is called for each partition column from
`Crc32CHashPartitioner::_do_hash` in exchange/local exchange (and from the
Iceberg partition path). With this branch, any nullable fixed-width key that
has one NULL now does `clone_resized(size)` for the whole nested column, then
scans it again to replace NULL payloads, then scans it again to hash, for every
block. On the normal ~4k-row blocks that is an allocation/copy per nullable key
per block.
Can we keep the no-mutation property without materializing a cloned column,
e.g. by adding a type-aware "hash default for null rows" path for
`ColumnVector`/`ColumnDecimal`? Passing the null map to the existing fallback
is not equivalent for all fixed-width types because the previous optimized
branch hashed the nested type's default value width, while
`HashUtil::crc32c_null()` hashes a 4-byte zero.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]