davidzollo commented on PR #10497:
URL: https://github.com/apache/seatunnel/pull/10497#issuecomment-3938818428

   > hi @davidzollo quick ping, is there is anything I need to do?
   
   
   Hi there! 👋 Thank you for contributing to Apache SeaTunnel.
   
   First of all, this PR adds real value:
   - Multi-table support for DynamoDB Sink.
   - Retry handling for DynamoDB `unprocessedItems`.
   
   Both directions are useful in production. Since this is a non-trivial area, 
I’m sharing detailed feedback to help align the implementation with SeaTunnel’s 
runtime behavior and improve maintainability.
   
   ---
   
   ### 1. Multi-Table Routing Semantics (Important Clarification)
   
   **Observation:**
   `AmazonDynamoDBWriter` routes records using `element.getTableId()` and 
`DynamoDbSinkClient` maintains per-table batches in a map.
   
   **Clarification:**
   In SeaTunnel multi-table pipelines, `row.tableId` is indeed used by the 
framework for routing. Also, in many practical flows, one writer instance 
effectively serves one table route. So using `tableId` is not automatically 
wrong.
   
   **Why this still needs care:**
   - `tableId` can be rewritten by transforms (e.g., rename/merge style 
transforms), so its meaning depends on the full pipeline.
   - Keeping a per-table map inside one writer may be unnecessary complexity if 
runtime assignment is effectively single-route per writer.
   
   **Suggested Improvement:**
   - Keep current behavior if you intentionally support mixed-table rows in one 
writer instance.
   - Otherwise, simplify to a single-table batch path and document assumptions 
clearly.
   - Add a short comment in writer/client to explain expected runtime routing 
semantics.
   
   ### 2. Synchronization Scope in `flush()` (High)
   
   **Observation:**
   `DynamoDbSinkClient.flush()` performs network I/O and retry sleep while 
holding `synchronized (lock)`.
   
   **Risk:**
   Locking during remote calls and `Thread.sleep` can block writers for long 
periods, causing throughput collapse and hard-to-debug contention under 
backpressure.
   
   **Suggested Improvement:**
   Only protect shared-memory operations inside the lock (copy + clear), then 
run `flushWithRetry(...)` outside the synchronized block.
   
   ### 3. Retry Policy Hardcoded (Medium)
   
   **Observation:**
   `maxRetries`, `baseDelayMs`, and `maxDelayMs` are hardcoded.
   
   **Risk:**
   Different DynamoDB environments need different retry windows. Hardcoded 
values can be too strict or too slow depending on workload.
   
   **Suggested Improvement:**
   Expose retry settings via connector options (with current values as 
defaults), parse them in config, and document them in connector docs.
   
   ### 4. Test Focus and Runtime Fidelity (Medium)
   
   **Observation:**
   `AmazonDynamoDBMultiTableSinkTest` validates multi-table behavior mainly 
through mocked writer/client interactions.
   
   **Risk:**
   Some test cases may overfit the current implementation details (reflection + 
internal state assertions), making refactoring harder.
   
   **Suggested Improvement:**
   - Keep interface-level checks (`SupportMultiTableSink`, 
`SupportMultiTableSinkWriter`)—good coverage.
   - Add/keep behavior tests for retry correctness (`unprocessedItems` 
eventually drained / retries exhausted).
   - Reduce dependence on internal private-field reflection where possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to