Re: [I] [Feature][Zeta] Support lightweight edge collector clients for remote data collection [seatunnel]

via GitHub Sun, 29 Mar 2026 15:12:11 -0700


julianbradford19-png commented on issue #10666:
URL: https://github.com/apache/seatunnel/issues/10666#issuecomment-4151071653


   Agreed — for V1, I’d focus on source-side collection only, without Sink or
   edge-cluster mode. Multiple independent agents sending data to a central
   Zeta ingress should cover most MVP use cases and keep complexity
   manageable. Future versions can explore edge-cluster coordination and sink
   support.
   
   On Sun, Mar 29, 2026, 2:25 AM David Zollo ***@***.***> wrote:
   
   > *davidzollo* left a comment (apache/seatunnel#10666)
   > <https://github.com/apache/seatunnel/issues/10666#issuecomment-4149619897>
   >
   > I have a few questions:
   >
   >    1. Does Sink need to be supported?
   >    2. Does edge collection need to be planned to support a cluster mode?
   >    For example, if the data source for edge collection is very large and a
   >    single node cannot support synchronization, collection needs to be
   >    performed in a cluster mode on edge machines. Different approaches may 
lead
   >    to different design solutions and varying levels of complexity.
   >
   > Good question, these two questions are very important.
   >
   > My current thinking is:
   >
   >    1. Sink should not be in the first scope.
   >
   > For this issue, I suggest we focus on the *source / collection side only*
   > first.
   >
   > The reason is that a lightweight edge collector is mainly meant to solve 
*"data
   > can only be accessed on remote hosts, but processing should stay
   > centralized in Zeta"*.
   >
   > If we include Sink in the first design, the problem becomes much larger:
   >
   >    - remote write-back into isolated networks
   >    - reverse traffic / reverse tunnel design
   >    - delivery semantics for sink acknowledgements
   >    - much more operational complexity
   >
   > So my suggestion is:
   >
   >    - *V1:* edge collector for source-side ingestion only
   >    - *Future extension:* discuss an edge sink model separately if there
   >    is a real demand
   >
   >
   >    2. Edge cluster mode should not be the MVP, but the design should
   >    leave room for it.
   >
   > I agree that this question matters a lot.
   >
   > If edge collection must support a real cluster mode in the first version,
   > the complexity will increase significantly, because then we need to think
   > about:
   >
   >    - edge-side coordination
   >    - partition assignment and rebalance
   >    - failover between edge nodes
   >    - edge-side state / checkpoint ownership
   >    - service discovery between edge nodes
   >
   > Because of that, my preference would be:
   >
   >    - *V1:* single-agent model, or multiple independent agents without an
   >    edge-cluster control plane
   >    - *V2+:* if needed, add formal edge-cluster support
   >
   > In other words, for large sources, the first practical step could be to
   > allow *multiple independent edge agents* to send different partitions /
   > directories / topics / shards into Zeta, without introducing a separate
   > edge-cluster scheduler in the first version.
   >
   > That keeps the first design much simpler, while still leaving room for
   > future evolution.
   >
   > So overall, my current preference is:
   >
   >    - *Scope of this issue:* source-side edge collection
   >    - *MVP:* lightweight agent + central Zeta ingress
   >    - *Not in MVP:* edge sink + full edge-cluster control plane
   >
   > If this scope sounds reasonable, I can also add a follow-up comment to
   > outline a possible MVP boundary more concretely.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/seatunnel/issues/10666?email_source=notifications&email_token=B74KUG4QGN4XP5NGVBPDSXL4TDFYLA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJUHE3DCOJYHE32M4TFMFZW63VKON2WE43DOJUWEZLEUVSXMZLOOSWGM33PORSXEX3DNRUWG2Y#issuecomment-4149619897>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/B74KUG2CQZM7CMLZSOPIDHT4TDFYLAVCNFSM6AAAAACXDHLHX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCNBZGYYTSOBZG4>
   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Feature][Zeta] Support lightweight edge collector clients for remote data collection [seatunnel]

Reply via email to