yx-keith commented on PR #64152:
URL: https://github.com/apache/doris/pull/64152#issuecomment-4667434898

   The fix is correct: write/read_column_to_pb now serialize/deserialize the 
logical value via get_element()/serialize()/deserialize() + insert_value(), 
which is consistent with the existing write_column_to_arrow, 
write_column_to_orc, to_string paths in this file, and with the bitmap/hll 
SerDes. The added set_id(PGenericType::QUANTILE_STATE) and the switch to 
assert_cast are also correct. The same-class bug does not exist in the 
bitmap/hll SerDes — they were already correct — so QuantileState was the only 
outlier.
   
   Four points to address before merge:
   
   deserialize return value is ignored. QuantileState::deserialize returns 
bool. Since this PR fixes a data-corruption path, malformed input should 
surface as an error rather than be silently accepted:
   
   
   QuantileState value;
   if (!value.deserialize(Slice(arg.bytes_value(i)))) {
       return Status::InternalError("Failed to deserialize QuantileState from 
pb");
   }
   col.insert_value(std::move(value));
   The test's equality check is partial. Comparing only 
get_value_by_percentile(0.5/1.0) samples a few quantile points and can miss 
corruption that preserves those points. A round-trip test can assert exact 
equality by serializing both elements and comparing the byte strings 
(optionally also get_serialized_size()). Keep the percentile checks as a 
supplement if desired.
   
   This is a wire-format change. The protobuf payload for QuantileState changes 
from fixed-size raw memory to variable-length serialized bytes. In a rolling 
upgrade, old and new BEs are incompatible on the RPC UDF / RPC aggregate / 
fold-constant paths. Real-world risk is low because the old format was already 
broken (it transmitted pointer values, not data) and never produced correct 
cross-node results. This PR currently has no backport labels — if it is 
backported, please call out the format change so the merge can ensure affected 
nodes are upgraded together.
   
   No SQL-level regression test. The unit test covers the SerDe round-trip but 
not the actual RPC UDF / fold-constant path from the issue. If a reproducing 
query exists, adding a regression test (fold-constant on a constant 
QuantileState expression is the easiest to construct) would cover the 
end-to-end path. Optional


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to