github-actions[bot] commented on code in PR #62054:
URL: https://github.com/apache/doris/pull/62054#discussion_r3049338139


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/distribute/worker/job/UnassignedScanBucketOlapTableJob.java:
##########
@@ -539,6 +539,27 @@ protected int degreeOfParallelism(int maxParallel, boolean 
useLocalShuffleToAddP
         }
 
         int maxParallelism = (int) Math.max(tabletNum, 
fragment.getParallelExecNum());
-        return Math.min(maxParallelism, colocateMaxParallelNum);
+        int result = Math.min(maxParallelism, colocateMaxParallelNum);
+
+        // When the fragment has a non-serial exchange, the BE won't insert 
local exchange
+        // for the exchange pipeline (distribution matches 
BUCKET_HASH_SHUFFLE). Instances
+        // beyond per-BE bucket count are "padding" with no bucket assignment 
— they create
+        // VDataStreamRecvrs that never receive data and hang. Cap at per-BE 
bucket count.
+        if (useLocalShuffleToAddParallel && hasNonSerialExchangeInFragment()) {

Review Comment:
   This guard looks too broad for the hang described in the PR. 
`hasNonSerialExchangeInFragment()` is also true for ordinary `HASH_PARTITIONED` 
exchanges, but those plans still include every receiver instance in the sender 
destination list, so the extra local-shuffle instances are not the same kind of 
untargeted "padding" instances as in BUCKET_SHUFFLE. On the BE side 
`Pipeline::need_to_local_exchange()` treats `BUCKET_HASH_SHUFFLE` and 
`HASH_SHUFFLE` as compatible hash distributions 
(`be/src/exec/pipeline/pipeline.cpp:57-72`), so a pooled bucket-scan fragment 
can still legitimately use more instances than its local bucket count when the 
downstream operator only requires hash shuffle. Capping to `maxParallel` here 
would silently reduce that parallelism.
   
   Can this be narrowed to the actual failing case 
(`BUCKET_SHFFULE_HASH_PARTITIONED` / bucket-index destinations), with a 
regression test that proves a pooled bucket-scan fragment with a non-serial 
`HASH_PARTITIONED` exchange still keeps its higher instance count?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to