Hi Sergio, re: 1.1 My thought is that a BlueGreen Mixin isn't Kubernetes specific and could be reused by other deployment control planes. However, I do agree that attaching it to the sink has other implications so I am happy to pivot if we can find an alternative solution.
re: 2 I'm happy to leave it out of the Phase 2 implementation, but I think it should be possible. For example we use Phase 1 with cross cluster migrations today. Phase 2 within a single cluster isn't particularly useful for us. re: GateInjectorExecutor This sounds like a neat idea. I need to read more about how it would work but from a high level, injecting an operator before your sinks sounds like a good idea. Better isolation, possible with SQL, no mixins, etc. I will mention that part of the reason I want it before the sinks is because nine out of ten people building pipelines struggle to understand where their state is and how Phase 2 would affect the correctness of their state depending on where they put the gate. I understand that if you have a remote lookup and want to save bandwidth, you could optimize your pipeline by moving the gate before the remote call; however, that seems like an optimization that can be made later. Thanks for driving this! Let me know how we can help. Ryan van Huuksloot Staff Engineer, Infrastructure | Streaming Platform [image: Shopify] <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> On Mon, Mar 16, 2026 at 2:21 AM Sergio Chong Loo <[email protected]> wrote: > Hi Ryan > > > Thanks a lot for these details. For sure some of these observations popped > up during our initial discussions, and that’s why our initial goal was to > introduce this as simple as possible and gradually enhance it to cover gaps. > > > Allow me to address your concerns: > > 1. I’m happy you stressed the point of “disruption to existing > pipelines”. However, there’s a few points about attempting to build this > functionality into the sinks (or sources) right off the bat (read further > below for my alternative): > 1. Kubernetes centric: as of now the Blue/Green Deployments support > is a Kubernetes specific solution, adding a mixin directly available to > sinks would “leak” this support outside of K8s > 2. A sink being aware of these deployment phases violates single > responsibility, but more importantly… > 3. Flink currently has many connectors, with the majority being > maintained outside of the Flink code base, by separate teams, separate > repos, separate release cycles. This would complicate things > significantly > as to try and add support for this for every potential flink connector > project out there would be a cumbersome. Blue/Green Phase 2 then only > would > works with "gate-aware" sinks. > 2. I’d leave the conversation about migrating jobs between K8s > clusters outside of this scope, even Phase 1 is meant to only work in a > single cluster… > 3. Watermarking, excellent point, it’s indeed a requirement so I’ll > make sure this is validated where applicable (by the concrete > implementation) > > > Having said what I said about point 1.1 above, I’m currently working on an > approach which uses a “GateInjectorPipelineExecutor” so to speak; in other > words a custom PipelineExecutor that would be shipped with the K8s > Operator, invoked by Flink Configuration (via “execution.target:”). This > custom piece would instantiate and inject the Gate at a fixed point in the > StreamGraph right before job submission. I still have to validate and > ensure a few things are correctly taken care of (like Type Information, > etc.) but the theory looks promising. > > > For the most part this works well with Flink SQL (same configuration), > here’s my estimation: > > > tEnv.executeSql("INSERT INTO my_sink ...") > > └─> SQL planner → ExecNodeGraph → Transformation[] > > └─> StreamGraph > > └─> GateInjectorExecutor injects GateProcessFunction > > └─> StreamGraph' (mutated) → JobGraph > > └─> Submit Job > > > I’m aiming to share some updates along these lines in the next few weeks > but hopefully this falls inline with your objectives/thoughts overall. > > > Sergio > > > > On Mar 6, 2026, at 3:36 PM, Ryan van Huuksloot via dev < > [email protected]> wrote: > > Hi Sergio, > Thanks for starting this conversation. > > A few thoughts regarding BlueGreen Phase 2: > 1. The Gate Operator is interesting but I don't like that we would have to > modify users' pipelines for them to use Phase 2. This gate function seems > like it could be a Mixin that connectors would implement. If you want to > use Phase 2, your sinks must implement this Mixin. I understand that a > unique GateFunction has pros, but it works less well with FlinkSQL - and > the trade-off doesn't seem worthwhile. > 2. Regarding the ConfigMap. We should consider a solution that supports > migrating Flink jobs between Kubernetes clusters. Otherwise Phase 2 is only > useful for in cluster operations. > 3. Watermarking is a requirement. Will the Flink Kubernetes Operator > validate that the pipeline is using watermarks? > > What happens when idleness is configured? Watermarks will get ignored from > > these “slow” subtasks and advance, could records from the ignored subtasks > eventually be lost? > Yes they would be lost, but that would happen irrespective of Phase 2. > > I'll have more thoughts after we discuss the Gate Operator, as that is > crucial to the FLIP right now. > > Ryan van Huuksloot > Staff Engineer, Infrastructure | Streaming Platform > [image: Shopify] > <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> > > > On Mon, Mar 2, 2026 at 6:52 PM Sergio Chong Loo <[email protected]> > wrote: > > Bumping this (Advanced Blue/Green deployments - FLIP-504) thread after > making some code adjustments. > > FYI @drossos <https://github.com/drossos> @ryanvanhuuksloot < > https://github.com/ryanvanhuuksloot> I’d like to get your feedback since > I know you’re interested in this feature. > > Thanks, > - Sergio > > > On Dec 5, 2025, at 2:31 PM, Sergio Chong Loo <[email protected]> > > wrote: > > > Hi folks, > > FLIP-503 (already merged) introduced the Basic Blue/Green Deployment > > functionality to the Flink K8s Operator. It was very straightforward, > simply transitioning to the second deployment once it's considered stable. > > > FLIP-504 is an Advanced version added on top of 503 and brings about the > > notion of "record-level" coordination between the 2 deployments to have no > data duplication and exactly once semantics while preserving a smooth > transition. > > > The main goals are: > • For the community to take a quick look at the current > > functionality (previously mentioned at the Flink Forward 2025 Conference) > > • To get feedback and improvement suggestions > > Flip 504 details: > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337677650 > > > Draft PR: https://github.com/apache/flink-kubernetes-operator/pull/1043 > > Thank you! > - Sergio > > > > > >
