oracleloyall opened a new pull request, #1435:
URL: https://github.com/apache/cloudberry/pull/1435

   In resource-constrained environments, 10TB-scale 8-parallel processing in 
cloudberry may encounter specific anomalies related to Motion layer UDP 
communication. Below are four key scenarios and how the code modifications 
address them.
   
   Four Anomaly Scenarios
   
   1. Capacity Mismatch:
      The receiving end’s buffer becomes full, but the sender is unaware. As a 
result, the sender’s unacknowledged packet queue continues transmitting, 
leading to unnecessary retransmissions and packet drops.
   
   2. False Deadlock Detection:
      The peer node processes heartbeat packets but fails to free up buffer 
capacity. This triggers a false deadlock judgment, incorrectly flagging network 
anomalies.
   
   3. Unprocessed Packets Require Main Thread Wakeup:
      When the receive queue is full, incoming data packets are discarded. 
However, the main thread still needs to be awakened to process backlogged 
packets in the queue, preventing permanent stalling.
   
   4. Execution Time Mismatch Across Nodes:
      Issues like data skew, computational performance gaps, or I/O bottlenecks 
cause significant differences in execution time between nodes. For example, in 
a hash join, if the inner table’s  is not ready, the node cannot process data 
from other nodes, leading to packet timeouts.
      *Example Plan*: Packets from  to  (via ) timeout because the  in  remains 
unready, blocking packet processing.
   ```
                                                           ->  Redistribute 
Motion 96:96  (slice10; segments: 96)  (cost=0.00..27670.24 rows=791688 wi
   dth=14)
                                                                  Hash Key: 
web_sales.ws_item_sk
                                                                  ->  Hash Semi 
Join  (cost=0.00..27635.55 rows=791688 width=14)
                                                                        Hash 
Cond: (web_sales.ws_bill_customer_sk = share2_ref3.c_customer_sk)
                                                                        ->  
Redistribute Motion 96:96  (slice11; segments: 96)  (cost=0.00..26675.88 ro
   ws=1994891 width=18)
                                                                              
Hash Key: web_sales.ws_bill_customer_sk
                                                                              
->  Hash Join  (cost=0.00..26563.48 rows=1994891 width=18)
                                                                                
    Hash Cond: (web_sales.ws_sold_date_sk = date_dim_3.d_date_sk)
                                                                                
    ->  Seq Scan on web_sales  (cost=0.00..8887.20 rows=74999595 width=
   22)
                                                                                
    ->  Hash  (cost=441.83..441.83 rows=49 width=4)
                                                                                
          ->  Seq Scan on date_dim date_dim_3  (cost=0.00..441.83 rows=
   49 width=4)
                                                                                
                Filter: ((d_year = 1999) AND (d_moy = 2))
                                                                        ->  
Hash  (cost=436.83..436.83 rows=605155 width=4)
                                                                              
->  Shared Scan (share slice:id 10:2)  (cost=0.00..436.83 rows=605155 wid
   th=4)
                                                            ->  Hash  
(cost=5289.15..5289.15 rows=3243 width=4)
                                                                  ->  
HashAggregate  (cost=0.00..5289.15 rows=3243 width=4)
                                                                        Group 
Key: share0_ref3.i_item_sk
                                                                        ->  
Shared Scan (share slice:id 7:0)  (cost=0.00..791.01 rows=37345054 width=4)
   ```
   Code Modifications and Their Impact
   
   The code changes target the above scenarios by enhancing UDP communication 
feedback, adjusting deadlock checks, and ensuring proper thread wakeup. Key 
modifications:
   
   1. Addressing Capacity Mismatch:
      - Added  (256) to flag when the receive buffer is full.
      - When the receive queue is full (), a response with  is sent to the 
sender (). This notifies the sender to pause or adjust transmission, preventing 
blind retransmissions.
   
   2. Fixing False Deadlock Detection:
      - Modified  to accept  as a parameter, enabling ACK polling during 
deadlock checks.
      - Extended the initial timeout for deadlock suspicion from  to 600 
seconds, reducing premature network error reports.
      - If no response is received after 600 seconds, the buffer capacity is 
incrementally increased () to alleviate false bottlenecks, with detailed 
logging before triggering an error.
   
   3. Ensuring Main Thread Wakeup on Full Queue:
      - In , even when packets are dropped due to a full queue, the main thread 
is awakened () if the packet matches the waiting query/node/route. This ensures 
backlogged packets in the queue are processed.
   
   4. Mitigating Node Execution Mismatches:
      - Added logging for retransmissions after  attempts, providing visibility 
into prolonged packet delays (e.g., due to unready ).
      - Reset  after successful ACK polling, preventing excessive retry counts 
from triggering false timeouts.
   <!-- Thank you for your contribution to Apache Cloudberry (Incubating)! -->
   
   Fixes #ISSUE_Number
   
   ### What does this PR do?
   <!-- Brief overview of the changes, including any major features or fixes -->
   
   ### Type of Change
   - [x] Bug fix (non-breaking change)
   - [ ] New feature (non-breaking change)
   - [ ] Breaking change (fix or feature with breaking changes)
   - [ ] Documentation update
   
   ### Breaking Changes
   <!-- Remove if not applicable. If yes, explain impact and migration path -->
   
   ### Test Plan
   <!-- How did you test these changes? -->
   - [ ] Unit tests added/updated
   - [ ] Integration tests added/updated
   - [x] Passed `make installcheck`
   - [x] Passed `make -C src/test installcheck-cbdb-parallel`
   
   ### Impact
   <!-- Remove sections that don't apply -->
   **Performance:**
   <!-- Any performance implications? -->
   
   **User-facing changes:**
   <!-- Any changes visible to users? -->
   
   **Dependencies:**
   <!-- New dependencies or version changes? -->
   
   ### Checklist
   - [ ] Followed [contribution 
guide](https://cloudberry.apache.org/contribute/code)
   - [ ] Added/updated documentation
   - [ ] Reviewed code for security implications
   - [ ] Requested review from [cloudberry 
committers](https://github.com/orgs/apache/teams/cloudberry-committers)
   
   ### Additional Context
   <!-- Any other information that would help reviewers? Remove if none -->
   
   ### CI Skip Instructions
   <!--
   To skip CI builds, add the appropriate CI skip identifier to your PR title.
   The identifier must:
   - Be in square brackets []
   - Include the word "ci" and either "skip" or "no"
   - Only use for documentation-only changes or when absolutely necessary
   -->
   
   ---
   <!-- Join our community:
   - Mailing list: 
[[email protected]](https://lists.apache.org/[email protected])
 (subscribe: [email protected])
   - Discussions: https://github.com/apache/cloudberry/discussions -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to