luis4a0 commented on code in PR #12107:
URL: https://github.com/apache/gluten/pull/12107#discussion_r3260487761
##########
cpp/velox/shuffle/VeloxHashShuffleWriter.cc:
##########
@@ -1315,6 +1316,47 @@ uint64_t
VeloxHashShuffleWriter::valueBufferSizeForFixedWidthArray(uint32_t fixe
return valueBufferSize;
}
+void VeloxHashShuffleWriter::accumulateInputEncodingCounts(const
ColumnarBatch& cb) {
+ // Only velox-typed batches expose per-child encoding; foreign batches
+ // (e.g. arrow round-trips coming from non-velox sources) will be flattened
+ // by `VeloxColumnarBatch::from` later and we'd undercount, so just skip
+ // them here rather than reporting a misleading "all flat" mix.
+ if (cb.getType() != "velox") {
+ return;
+ }
+ const auto* veloxBatch = dynamic_cast<const VeloxColumnarBatch*>(&cb);
+ if (veloxBatch == nullptr) {
+ return;
+ }
+ const auto& rowVector = veloxBatch->getRowVector();
+ if (rowVector == nullptr) {
+ return;
+ }
+ for (const auto& child : rowVector->children()) {
+ if (child == nullptr) {
+ ++inputEncodingCounts_[kInputEncodingOther];
+ continue;
+ }
+ switch (child->encoding()) {
+ case facebook::velox::VectorEncoding::Simple::FLAT:
+ ++inputEncodingCounts_[kInputEncodingFlat];
+ break;
+ case facebook::velox::VectorEncoding::Simple::DICTIONARY:
+ ++inputEncodingCounts_[kInputEncodingDictionary];
+ break;
+ case facebook::velox::VectorEncoding::Simple::CONSTANT:
+ ++inputEncodingCounts_[kInputEncodingConstant];
+ break;
+ case facebook::velox::VectorEncoding::Simple::LAZY:
+ ++inputEncodingCounts_[kInputEncodingLazy];
+ break;
+ default:
+ ++inputEncodingCounts_[kInputEncodingOther];
Review Comment:
Good catch, fixed in a208a78. Added a new `kInputEncodingComplex` bucket so
ROW / MAP / FLAT_MAP / ARRAY no longer get conflated with the rare-encoding
catch-all. `kInputEncodingOther` now only covers BIASED / SEQUENCE / FUNCTION
(and any future additions to `VectorEncoding::Simple`). New `complex` gtest
case exercises ARRAY + MAP children landing in the new bucket; existing cases
also assert `kInputEncodingComplex == 0` so the boundary is checked.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]