davidzollo commented on issue #10666:
URL: https://github.com/apache/seatunnel/issues/10666#issuecomment-4149603664

   Thanks, these two questions are very important.
   
   My current thinking is:
   
   1. Sink should not be in the first scope.
   
   For this issue, I suggest we focus on the **source / collection side only** 
first.
   
   The reason is that a lightweight edge collector is mainly meant to solve 
**"data can only be accessed on remote hosts, but processing should stay 
centralized in Zeta"**.
   
   If we include Sink in the first design, the problem becomes much larger:
   
   * remote write-back into isolated networks
   * reverse traffic / reverse tunnel design
   * delivery semantics for sink acknowledgements
   * much more operational complexity
   
   So my suggestion is:
   
   * **V1:** edge collector for source-side ingestion only
   * **Future extension:** discuss an edge sink model separately if there is a 
real demand
   
   2. Edge cluster mode should not be the MVP, but the design should leave room 
for it.
   
   I agree that this question matters a lot.
   
   If edge collection must support a real cluster mode in the first version, 
the complexity will increase significantly, because then we need to think about:
   
   * edge-side coordination
   * partition assignment and rebalance
   * failover between edge nodes
   * edge-side state / checkpoint ownership
   * service discovery between edge nodes
   
   Because of that, my preference would be:
   
   * **V1:** single-agent model, or multiple independent agents without an 
edge-cluster control plane
   * **V2+:** if needed, add formal edge-cluster support
   
   In other words, for large sources, the first practical step could be to 
allow **multiple independent edge agents** to send different partitions / 
directories / topics / shards into Zeta, without introducing a separate 
edge-cluster scheduler in the first version.
   
   That keeps the first design much simpler, while still leaving room for 
future evolution.
   
   So overall, my current preference is:
   
   * **Scope of this issue:** source-side edge collection
   * **MVP:** lightweight agent + central Zeta ingress
   * **Not in MVP:** edge sink + full edge-cluster control plane
   
   If this scope sounds reasonable, I can also add a follow-up comment to 
outline a possible MVP boundary more concretely.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to