Hi All!

Thank you all for reviewing the PR and already helping to make it better. I
have opened a bunch of jira tickets under
https://issues.apache.org/jira/browse/FLINK-25963 based on some comments
and incomplete features in general.

Given that there were no major objections about the prototype, I will merge
it now so we can start collaborating together.

Cheers,
Gyula

On Wed, Feb 16, 2022 at 3:52 AM Yang Wang <danrtsey...@gmail.com> wrote:

> Thanks for the explanation.
> Given that it is unrelated with java version in Flink.
> Starting with java11 for the flink-kubernetes-operator makes sense to me.
>
>
> Best,
> Yang
>
> Thomas Weise <t...@apache.org> 于2022年2月15日周二 23:57写道:
>
> > Hi,
> >
> > At this point I see no reason to support Java 8 for a new project.
> > Java 8 is being phased out, we should start with 11.
> >
> > Also, since the operator isn't a library but effectively just a docker
> > image, the ability to change the Java version isn't as critical as it
> > is for Flink core, which needs to run in many different environments.
> >
> > Cheers,
> > Thomas
> >
> > On Tue, Feb 15, 2022 at 4:50 AM Gyula Fóra <gyula.f...@gmail.com> wrote:
> > >
> > > Hi Devs,
> > >
> > > Yang Wang discovered that the current prototype is not compatible with
> > Java
> > > 8 but only 11 and upwards.
> > >
> > > The reason for this is that the java operator SDK itself is not java 8
> > > compatible unfortunately.
> > >
> > > Given that Java 8 is on the road to deprecation and that the operator
> > runs
> > > as a containerized deployment, are there any concerns regarding making
> > the
> > > target java version 11?
> > > This should not affect deployed flink clusters and jobs, those should
> > still
> > > work with Java 8, but only the kubernetes operator itself.
> > >
> > > Cheers,
> > > Gyula
> > >
> > >
> > > On Tue, Feb 15, 2022 at 1:06 PM Yang Wang <danrtsey...@gmail.com>
> wrote:
> > >
> > > > I also lean to not introduce the savepoint/checkpoint related fields
> > to the
> > > > job spec, especially in the very beginning of
> > flink-kubernetes-operator.
> > > >
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Gyula Fóra <gyula.f...@gmail.com> 于2022年2月15日周二 19:02写道:
> > > >
> > > > > Hi Peng Yuan!
> > > > >
> > > > > While I do agree that savepoint path is a very important production
> > > > > configuration there are a lot of other things that come to my mind:
> > > > >  - savepoint dir
> > > > >  - checkpoint dir
> > > > >  - checkpoint interval/timeout
> > > > >  - high availability settings (provider/storagedir etc)
> > > > >
> > > > > just to name a few...
> > > > >
> > > > > While these are all production critical, they have nice clean Flink
> > > > config
> > > > > settings to go with them. If we stand introducing these to jobspec
> we
> > > > only
> > > > > get confusion about priority order etc and it is going to be hard
> to
> > > > change
> > > > > or remove them in the future. In any case we should validate that
> > these
> > > > > configs exist in cases where users use a stateful upgrade mode for
> > > > example.
> > > > > This is something we need to add for sure.
> > > > >
> > > > > As for the other options you mentioned like automatic savepoint
> > > > generation
> > > > > for instance, those deserve an independent discussion of their own
> I
> > > > > believe :)
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > > On Tue, Feb 15, 2022 at 11:23 AM K Fred <yuanpengf...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Matyas!
> > > > > >
> > > > > > Thanks for your reply!
> > > > > > For 1. and 3. scenarios,I couldn't agree more with the
> podTemplate
> > > > > solution
> > > > > > , i missed this part.
> > > > > > For savepoint related configuration, I think it's very important
> > to be
> > > > > > specified in JobSpec, Because savepoint is a very common
> > configuration
> > > > > for
> > > > > > upgrading a job, if it has been placed in JobSpec can be
> obviously
> > > > > > configured by the user. In addition, other advanced properties
> can
> > be
> > > > put
> > > > > > into flinkConfiguration customized by expert users.
> > > > > > A bunch of savepoint configuration as follows:
> > > > > >
> > > > > > > fromSavepoint——Job restart from
> > > > > >
> > > > > > autoSavepointSecond—— Automatically take a savepoint to the
> > > > > `savepointsDir`
> > > > > > > every n seconds.
> > > > > >
> > > > > > savepointsDir—— Savepoints dir where to store automatically taken
> > > > > > > savepoints
> > > > > >
> > > > > > savepointGeneration—— Update savepoint generation of job status
> > for a
> > > > > > > running job (should be defined in JobStatus)
> > > > > >
> > > > > >
> > > > > > Best wishes,
> > > > > > Peng Yuan.
> > > > > >
> > > > > > On Tue, Feb 15, 2022 at 4:41 PM Őrhidi Mátyás <
> > matyas.orh...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Peng,
> > > > > > >
> > > > > > > Thanks for your feedback. Regarding 1. and 3. scenarios, the
> > > > > podTemplate
> > > > > > > functionality in the operator could cover both. We also need to
> > be
> > > > > > careful
> > > > > > > about introducing proxy parameters in the CRD spec. The
> savepoint
> > > > path
> > > > > is
> > > > > > > usually accompanied with a bunch of other configurations for
> > example,
> > > > > so
> > > > > > > users need to use configuration params anyway. What do you
> think?
> > > > > > >
> > > > > > > Best,
> > > > > > > Matyas
> > > > > > >
> > > > > > > On Tue, Feb 15, 2022 at 8:58 AM K Fred <yuanpengf...@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Gyula!
> > > > > > > >
> > > > > > > > I have reviewed the prototype design of
> > flink-kubernetes-operator
> > > > you
> > > > > > > > submitted, and I have the following questions:
> > > > > > > >
> > > > > > > > 1.Can a Flink Jar package that supports pulling from the
> > sidecar be
> > > > > > added
> > > > > > > > to the JobSpec? just like this:
> > > > > > > >
> > > > > > > > > initContainers:
> > > > > > > > >       - name: downloader
> > > > > > > > >         image: curlimages/curl
> > > > > > > > >         env:
> > > > > > > > >           - name: JAR_URL
> > > > > > > > >             value:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://repo1.maven.org/maven2/org/apache/flink/flink-examples-streaming_2.12/1.14.3/flink-examples-streaming_2.12-1.14.3-WordCount.jar
> > > > > > > > >           - name: DEST_PATH
> > > > > > > > >             value: /cache/flink-app.jar
> > > > > > > > >         command: ['sh', '-c', 'curl -o ${DEST_PATH}
> > ${JAR_URL}']
> > > > > > > >
> > > > > > > > 2.Can we add savepoint path property to job specification?
> > > > > > > > 3.Can we add an extra port to the JobManagerSpec and
> > > > TaskManagerSpec
> > > > > to
> > > > > > > > expose some service ,such as prometheus?The property can be
> > this:
> > > > > > > >
> > > > > > > > > extraPorts:
> > > > > > > > >       - name: prom
> > > > > > > > >         containerPort: 9249
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Best wishes,
> > > > > > > > Peng Yuan
> > > > > > > >
> > > > > > > > On Tue, Feb 15, 2022 at 12:23 AM Gyula Fóra <
> gyf...@apache.org
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Flink Devs!
> > > > > > > > >
> > > > > > > > > We would like to present to you the first prototype of the
> > > > > > > > > flink-kubernetes-operator that was built based on the FLIP
> > and
> > > > the
> > > > > > > > > discussion on this mail thread. We would also like to call
> > out
> > > > some
> > > > > > > > design
> > > > > > > > > decisions that we have made regarding architecture
> components
> > > > that
> > > > > > were
> > > > > > > > not
> > > > > > > > > explicitly mentioned in the FLIP document/thread and give
> > you the
> > > > > > > > > opportunity to raise any concerns here.
> > > > > > > > >
> > > > > > > > > You can find the initial prototype here:
> > > > > > > > > https://github.com/apache/flink-kubernetes-operator/pull/1
> > > > > > > > >
> > > > > > > > > We will leave the PR open for 1-2 days before merging to
> let
> > > > people
> > > > > > > > comment
> > > > > > > > > on it, but please be mindful that this is an initial
> > prototype
> > > > with
> > > > > > > many
> > > > > > > > > rough edges. It is not intended to be a complete
> > implementation
> > > > of
> > > > > > the
> > > > > > > > FLIP
> > > > > > > > > specs as that will take some more work from all of us :)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > *Prototype feature set:*The prototype contains a basic
> > working
> > > > > > version
> > > > > > > of
> > > > > > > > > the flink-kubernetes-operator that supports deployment and
> > > > > lifecycle
> > > > > > > > > management of a stateful native flink application. We have
> > basic
> > > > > > > support
> > > > > > > > > for stateful and stateless upgrades, UI ingress, pod
> > templates
> > > > etc.
> > > > > > > Error
> > > > > > > > > handling at this point is largely missing.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > *Features / design decisions that were not explicitly
> > discussed
> > > > in
> > > > > > this
> > > > > > > > > thread*
> > > > > > > > >
> > > > > > > > > *Basic Admission control using a Webhook*Standard resource
> > > > > admission
> > > > > > > > > control in Kubernetes to validate and potentially reject
> > > > resources
> > > > > is
> > > > > > > > done
> > > > > > > > > through Webhooks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
> > > > > > > > > This is a necessary mechanism to give the user an upfront
> > error
> > > > > when
> > > > > > an
> > > > > > > > > incorrect resource was submitted. In the Flink operator's
> > case we
> > > > > > need
> > > > > > > to
> > > > > > > > > validate that the FlinkDeployment yaml actually makes sense
> > and
> > > > > does
> > > > > > > not
> > > > > > > > > contain erroneous config options that would inevitably lead
> > to
> > > > > > > > > deployment/job failures.
> > > > > > > > >
> > > > > > > > > We have implemented a simple webhook that we can use for
> this
> > > > type
> > > > > of
> > > > > > > > > validation, as a separate maven module
> > > > (flink-kubernetes-webhook).
> > > > > > The
> > > > > > > > > webhook is an optional component and can be enabled or
> > disabled
> > > > > > during
> > > > > > > > > deployment. To avoid pulling in new external dependencies
> we
> > have
> > > > > > used
> > > > > > > > the
> > > > > > > > > Flink Shaded Netty module to build the simple rest endpoint
> > > > > required.
> > > > > > > If
> > > > > > > > > the community feels that Netty adds unnecessary complexity
> > to the
> > > > > > > webhook
> > > > > > > > > implementation we are open to alternative backends such as
> > > > > Springboot
> > > > > > > for
> > > > > > > > > instance which would practically eliminate all the
> > boilerplate.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > *Helm Chart for deployment*Helm charts provide an industry
> > > > standard
> > > > > > way
> > > > > > > > of
> > > > > > > > > managing kubernetes deployments. We have created a helm
> chart
> > > > > > prototype
> > > > > > > > > that can be used to deploy the operator together with all
> > > > required
> > > > > > > > > resources. The helm chart allows easy configuration for
> > things
> > > > like
> > > > > > > > images,
> > > > > > > > > namespaces etc and flags to control specific parts of the
> > > > > deployment
> > > > > > > such
> > > > > > > > > as RBAC or the webhook.
> > > > > > > > >
> > > > > > > > > The helm chart provided is intended to be a first version
> > that
> > > > > worked
> > > > > > > for
> > > > > > > > > us during development but we expect to have a lot of
> > iterations
> > > > on
> > > > > it
> > > > > > > > based
> > > > > > > > > on the feedback from the community.
> > > > > > > > >
> > > > > > > > > *Acknowledgment*
> > > > > > > > > We would like to thank everyone who has provided support
> and
> > > > > valuable
> > > > > > > > > feedback on this FLIP.
> > > > > > > > > We would also like to thank Yang Wang & Alexis
> Sarda-Espinosa
> > > > > > > > specifically
> > > > > > > > > for making their operators open source and available to us
> > which
> > > > > had
> > > > > > a
> > > > > > > > big
> > > > > > > > > impact on the FLIP and the prototype.
> > > > > > > > >
> > > > > > > > > We are looking forward to continuing development on the
> > operator
> > > > > > > together
> > > > > > > > > with the broader community.
> > > > > > > > > All work will be tracked using the ASF Jira from now on.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Gyula
> > > > > > > > >
> > > > > > > > > On Mon, Feb 14, 2022 at 9:21 AM K Fred <
> > yuanpengf...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Gyula,
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > > It's great to see the project getting started and I can't
> > wait
> > > > to
> > > > > > see
> > > > > > > > the
> > > > > > > > > > PR and start contributing code.😄😄😄
> > > > > > > > > >
> > > > > > > > > > Best Wishes!
> > > > > > > > > > Peng Yuan
> > > > > > > > > >
> > > > > > > > > > On Mon, Feb 14, 2022 at 4:14 PM Gyula Fóra <
> > > > gyula.f...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Peng Yuan!
> > > > > > > > > > >
> > > > > > > > > > > The repo is already created:
> > > > > > > > > > > https://github.com/apache/flink-kubernetes-operator
> > > > > > > > > > >
> > > > > > > > > > > We will open the PR with the initial prototype later
> > today,
> > > > > stay
> > > > > > > > tuned
> > > > > > > > > in
> > > > > > > > > > > this thread! :)
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Gyula
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 14, 2022 at 9:09 AM K Fred <
> > > > yuanpengf...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi All,
> > > > > > > > > > > >
> > > > > > > > > > > > Has the project of flink-kubernetes-operator been
> > created
> > > > in
> > > > > > > > github?
> > > > > > > > > > > >
> > > > > > > > > > > > Peng Yuan
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Feb 9, 2022 at 1:23 AM Gyula Fóra <
> > > > > > gyula.f...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I agree with flink-kubernetes-operator as the repo
> > name
> > > > :)
> > > > > > > > > > > > > Don't have any better idea
> > > > > > > > > > > > >
> > > > > > > > > > > > > Gyula
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Feb 5, 2022 at 2:41 AM Thomas Weise <
> > > > > t...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the continued feedback and discussion.
> > Looks
> > > > > > like
> > > > > > > we
> > > > > > > > > are
> > > > > > > > > > > > > > ready to start a VOTE, I will initiate it
> shortly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In parallel it would be good to find the
> repository
> > > > name.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > My suggestion would be: flink-kubernetes-operator
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I thought "flink-operator" could be a bit
> > misleading
> > > > > since
> > > > > > > the
> > > > > > > > > term
> > > > > > > > > > > > > > operator already has a meaning in Flink.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I also considered "flink-k8s-operator" but that
> > would
> > > > be
> > > > > > > almost
> > > > > > > > > > > > > > identical to existing operator implementations
> and
> > > > could
> > > > > > lead
> > > > > > > > to
> > > > > > > > > > > > > > confusion in the future.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thoughts?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Thomas
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Feb 4, 2022 at 5:15 AM Gyula Fóra <
> > > > > > > > gyula.f...@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Danny,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So far we have been focusing our dev efforts on
> > the
> > > > > > initial
> > > > > > > > > > native
> > > > > > > > > > > > > > > implementation with the team.
> > > > > > > > > > > > > > > If the discussion and vote goes well for this
> > FLIP we
> > > > > are
> > > > > > > > > looking
> > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > to contributing the initial version sometime
> next
> > > > week
> > > > > > > > (fingers
> > > > > > > > > > > > > crossed).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > At that point I think we can already start the
> > dev
> > > > work
> > > > > > to
> > > > > > > > > > support
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > standalone mode as well, especially if you can
> > > > dedicate
> > > > > > > some
> > > > > > > > > > effort
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > pushing that side.
> > > > > > > > > > > > > > > Working together on this sounds like a great
> > idea and
> > > > > we
> > > > > > > > should
> > > > > > > > > > > start
> > > > > > > > > > > > > as
> > > > > > > > > > > > > > > soon as possible! :)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Gyula
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Feb 4, 2022 at 2:07 PM Danny Cranmer <
> > > > > > > > > > > > dannycran...@apache.org>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have been discussing this one with my team.
> > We
> > > > are
> > > > > > > > > interested
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > Standalone mode, and are willing to
> contribute
> > > > > towards
> > > > > > > the
> > > > > > > > > > > > > > implementation.
> > > > > > > > > > > > > > > > Potentially we can work together to support
> > both
> > > > > modes
> > > > > > in
> > > > > > > > > > > parallel?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Feb 2, 2022 at 4:02 PM Gyula Fóra <
> > > > > > > > > > gyula.f...@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Danny!
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the feedback :)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Versioning:
> > > > > > > > > > > > > > > > > Versioning will be independent from Flink
> > and the
> > > > > > > > operator
> > > > > > > > > > will
> > > > > > > > > > > > > > depend
> > > > > > > > > > > > > > > > on a
> > > > > > > > > > > > > > > > > fixed flink version (in every given
> operator
> > > > > > version).
> > > > > > > > > > > > > > > > > This should be the exact same setup as with
> > > > > Stateful
> > > > > > > > > > Functions
> > > > > > > > > > > (
> > > > > > > > > > > > > > > > > https://github.com/apache/flink-statefun).
> > So
> > > > > > > > independent
> > > > > > > > > > > > release
> > > > > > > > > > > > > > cycle
> > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > still within the Flink umbrella.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Deployment error handling:
> > > > > > > > > > > > > > > > > I think that's a very good point, as
> general
> > > > > > exception
> > > > > > > > > > handling
> > > > > > > > > > > > for
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > different failure scenarios is a tricky
> > problem.
> > > > I
> > > > > > > think
> > > > > > > > > the
> > > > > > > > > > > > > > exception
> > > > > > > > > > > > > > > > > classifiers and retry strategies could
> avoid
> > a
> > > > lot
> > > > > of
> > > > > > > > > manual
> > > > > > > > > > > > > > intervention
> > > > > > > > > > > > > > > > > from the user. We will definitely need to
> add
> > > > > > something
> > > > > > > > > like
> > > > > > > > > > > > this.
> > > > > > > > > > > > > > Once
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > have the repo created with the initial
> > operator
> > > > > code
> > > > > > we
> > > > > > > > > > should
> > > > > > > > > > > > open
> > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > tickets for this and put it on the short
> term
> > > > > > roadmap!
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > Gyula
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Feb 2, 2022 at 4:50 PM Danny
> Cranmer
> > <
> > > > > > > > > > > > > > dannycran...@apache.org>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hey team,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Great work on the FLIP, I am looking
> > forward to
> > > > > > this
> > > > > > > > > one. I
> > > > > > > > > > > > agree
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > can move forward to the voting stage.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I have general feedback around how we
> will
> > > > handle
> > > > > > job
> > > > > > > > > > > > submission
> > > > > > > > > > > > > > > > failure
> > > > > > > > > > > > > > > > > > and retry. As discussed in the Rejected
> > > > > > Alternatives
> > > > > > > > > > section,
> > > > > > > > > > > > we
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > Java to handle job submission failures
> > from the
> > > > > > Flink
> > > > > > > > > > client.
> > > > > > > > > > > > It
> > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > useful to have the ability to configure
> > > > exception
> > > > > > > > > > classifiers
> > > > > > > > > > > > and
> > > > > > > > > > > > > > retry
> > > > > > > > > > > > > > > > > > strategy as part of operator
> configuration.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Given this will be in a separate Github
> > > > > repository
> > > > > > I
> > > > > > > am
> > > > > > > > > > > curious
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > ther
> > > > > > > > > > > > > > > > > > versioning strategy will work in relation
> > to
> > > > the
> > > > > > > Flink
> > > > > > > > > > > version?
> > > > > > > > > > > > > Do
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > any other components with a similar setup
> > I can
> > > > > > look
> > > > > > > > at?
> > > > > > > > > > Will
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > version track Flink or will it use its
> own
> > > > > > versioning
> > > > > > > > > > > strategy
> > > > > > > > > > > > > > with a
> > > > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > > version support matrix, or similar?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Feb 1, 2022 at 2:33 PM Márton
> > Balassi <
> > > > > > > > > > > > > > > > balassi.mar...@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi team,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thank you for the great feedback,
> Thomas
> > has
> > > > > > > updated
> > > > > > > > > the
> > > > > > > > > > > FLIP
> > > > > > > > > > > > > > page
> > > > > > > > > > > > > > > > > > > accordingly. If you are comfortable
> with
> > the
> > > > > > > > currently
> > > > > > > > > > > > existing
> > > > > > > > > > > > > > > > design
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > depth in the FLIP [1] I suggest moving
> > > > forward
> > > > > to
> > > > > > > the
> > > > > > > > > > > voting
> > > > > > > > > > > > > > stage -
> > > > > > > > > > > > > > > > > once
> > > > > > > > > > > > > > > > > > > that reaches a positive conclusion it
> > lets us
> > > > > > > create
> > > > > > > > > the
> > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > code
> > > > > > > > > > > > > > > > > > > repository under the flink project for
> > the
> > > > > > > operator.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I encourage everyone to keep improving
> > the
> > > > > > details
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > > meantime,
> > > > > > > > > > > > > > > > > > however
> > > > > > > > > > > > > > > > > > > I believe given the existing design and
> > the
> > > > > > general
> > > > > > > > > > > sentiment
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > thread that the most efficient path
> from
> > here
> > > > > is
> > > > > > > > > starting
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > implementation so that we can
> > collectively
> > > > > > iterate
> > > > > > > > over
> > > > > > > > > > it.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Mon, Jan 31, 2022 at 10:15 PM Thomas
> > > > Weise <
> > > > > > > > > > > > t...@apache.org>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > HI Xintong,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks for the feedback and please
> see
> > > > > > responses
> > > > > > > > > below
> > > > > > > > > > > -->
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 12:21 AM
> > Xintong
> > > > > Song <
> > > > > > > > > > > > > > > > tonysong...@gmail.com
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks Thomas for drafting this
> > FLIP, and
> > > > > > > > everyone
> > > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > discussion.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I also have a few questions and
> > comments.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > ## Job Submission
> > > > > > > > > > > > > > > > > > > > > Deploying a Flink session cluster
> via
> > > > > > kubectl &
> > > > > > > > CR
> > > > > > > > > > and
> > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > > submitting
> > > > > > > > > > > > > > > > > > > > jobs
> > > > > > > > > > > > > > > > > > > > > to the cluster via Flink cli / REST
> > is
> > > > > > probably
> > > > > > > > the
> > > > > > > > > > > > > approach
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > requires
> > > > > > > > > > > > > > > > > > > > > the least effort. However, I'd like
> > to
> > > > > point
> > > > > > > out
> > > > > > > > 2
> > > > > > > > > > > > > > weaknesses.
> > > > > > > > > > > > > > > > > > > > > 1. A lot of users use Flink in
> > > > > > > perjob/application
> > > > > > > > > > > modes.
> > > > > > > > > > > > > For
> > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > > > users,
> > > > > > > > > > > > > > > > > > > > > having to run the job in two steps
> > > > (deploy
> > > > > > the
> > > > > > > > > > cluster,
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > submit
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > job)
> > > > > > > > > > > > > > > > > > > > > is not that convenient.
> > > > > > > > > > > > > > > > > > > > > 2. One of our motivations is being
> > able
> > > > to
> > > > > > > manage
> > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > applications'
> > > > > > > > > > > > > > > > > > > > > lifecycles with kubectl. Submitting
> > jobs
> > > > > from
> > > > > > > cli
> > > > > > > > > > > sounds
> > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > aligned
> > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > this motivation.
> > > > > > > > > > > > > > > > > > > > > I think it's probably worth it to
> > support
> > > > > > > > > submitting
> > > > > > > > > > > jobs
> > > > > > > > > > > > > via
> > > > > > > > > > > > > > > > > > kubectl &
> > > > > > > > > > > > > > > > > > > > CR
> > > > > > > > > > > > > > > > > > > > > in the first version, both together
> > with
> > > > > > > > deploying
> > > > > > > > > > the
> > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > like
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > perjob/application mode and after
> > > > deploying
> > > > > > the
> > > > > > > > > > cluster
> > > > > > > > > > > > > like
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > session
> > > > > > > > > > > > > > > > > > > > > mode.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > The intention is to support
> application
> > > > > > > management
> > > > > > > > > > > through
> > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > CR,
> > > > > > > > > > > > > > > > > > > > which means there won't be any 2 step
> > > > > > submission
> > > > > > > > > > process,
> > > > > > > > > > > > > > which as
> > > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > allude to would defeat the purpose of
> > this
> > > > > > > project.
> > > > > > > > > The
> > > > > > > > > > > CR
> > > > > > > > > > > > > > example
> > > > > > > > > > > > > > > > > > shows
> > > > > > > > > > > > > > > > > > > > the application part. Please note
> that
> > the
> > > > > bare
> > > > > > > > > cluster
> > > > > > > > > > > > > > support is
> > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > *additional* feature for scenarios
> that
> > > > > require
> > > > > > > > > > external
> > > > > > > > > > > > job
> > > > > > > > > > > > > > > > > > management.
> > > > > > > > > > > > > > > > > > > Is
> > > > > > > > > > > > > > > > > > > > there anything on the FLIP page that
> > > > creates
> > > > > a
> > > > > > > > > > different
> > > > > > > > > > > > > > > > impression?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > ## Versioning
> > > > > > > > > > > > > > > > > > > > > Which Flink versions does the
> > operator
> > > > plan
> > > > > > to
> > > > > > > > > > support?
> > > > > > > > > > > > > > > > > > > > > 1. Native K8s deployment was
> firstly
> > > > > > introduced
> > > > > > > > in
> > > > > > > > > > > Flink
> > > > > > > > > > > > > 1.10
> > > > > > > > > > > > > > > > > > > > > 2. Native K8s HA was introduced in
> > Flink
> > > > > 1.12
> > > > > > > > > > > > > > > > > > > > > 3. The Pod template support was
> > > > introduced
> > > > > in
> > > > > > > > Flink
> > > > > > > > > > > 1.13
> > > > > > > > > > > > > > > > > > > > > 4. There was some changes to the
> > Flink
> > > > > docker
> > > > > > > > image
> > > > > > > > > > > > > > entrypoint
> > > > > > > > > > > > > > > > > script
> > > > > > > > > > > > > > > > > > > in,
> > > > > > > > > > > > > > > > > > > > > IIRC, Flink 1.13
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Great, thanks for providing this. It
> is
> > > > > > important
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > > > > > > compatibility
> > > > > > > > > > > > > > > > > > > > going forward also. We are targeting
> > Flink
> > > > > > 1.14.x
> > > > > > > > > > > upwards.
> > > > > > > > > > > > > > Before
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > operator is ready there will be
> another
> > > > Flink
> > > > > > > > > release.
> > > > > > > > > > > > Let's
> > > > > > > > > > > > > > see if
> > > > > > > > > > > > > > > > > > > anyone
> > > > > > > > > > > > > > > > > > > > is interested in earlier versions?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > ## Compatibility
> > > > > > > > > > > > > > > > > > > > > What kind of API compatibility we
> can
> > > > > commit
> > > > > > > to?
> > > > > > > > > It's
> > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > fine
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > > alpha / beta version APIs that
> allow
> > > > > > > incompatible
> > > > > > > > > > > future
> > > > > > > > > > > > > > changes
> > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > first version. But eventually we
> > would
> > > > need
> > > > > > to
> > > > > > > > > > > guarantee
> > > > > > > > > > > > > > > > backwards
> > > > > > > > > > > > > > > > > > > > > compatibility, so that an early
> > version
> > > > CR
> > > > > > can
> > > > > > > > work
> > > > > > > > > > > with
> > > > > > > > > > > > a
> > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > version
> > > > > > > > > > > > > > > > > > > > > operator.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Another great point and please let me
> > > > include
> > > > > > > that
> > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > page.
> > > > > > > > > > > > > > > > > > ;-)
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I think we should allow incompatible
> > > > changes
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > first
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > > versions, similar to how other major
> > > > features
> > > > > > > have
> > > > > > > > > > > evolved
> > > > > > > > > > > > > > > > recently,
> > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > as FLIP-27.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Would be great to get broader
> feedback
> > on
> > > > > this
> > > > > > > one.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > Thomas
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Fri, Jan 28, 2022 at 1:18 PM
> > Thomas
> > > > > Weise
> > > > > > <
> > > > > > > > > > > > > t...@apache.org
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks for the feedback!
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > # 1 Flink Native vs Standalone
> > > > > > integration
> > > > > > > > > > > > > > > > > > > > > > > Maybe we should make this more
> > clear
> > > > in
> > > > > > the
> > > > > > > > > FLIP
> > > > > > > > > > > but
> > > > > > > > > > > > we
> > > > > > > > > > > > > > > > agreed
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > first version of the operator
> > based
> > > > on
> > > > > > the
> > > > > > > > > native
> > > > > > > > > > > > > > > > integration.
> > > > > > > > > > > > > > > > > > > > > > > While this clearly does not
> > cover all
> > > > > > > > use-cases
> > > > > > > > > > and
> > > > > > > > > > > > > > > > > requirements,
> > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > seems
> > > > > > > > > > > > > > > > > > > > > > > this would lead to a much
> smaller
> > > > > initial
> > > > > > > > > effort
> > > > > > > > > > > and
> > > > > > > > > > > > a
> > > > > > > > > > > > > > nicer
> > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > > > version.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I'm also leaning towards the
> native
> > > > > > > > integration,
> > > > > > > > > as
> > > > > > > > > > > > long
> > > > > > > > > > > > > > as it
> > > > > > > > > > > > > > > > > > > reduces
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > MVP effort. Ultimately the
> operator
> > > > will
> > > > > > need
> > > > > > > > to
> > > > > > > > > > also
> > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > standalone mode. I would like to
> > gain
> > > > > more
> > > > > > > > > > confidence
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > native
> > > > > > > > > > > > > > > > > > > > > > integration reduces the effort.
> > While
> > > > it
> > > > > > cuts
> > > > > > > > the
> > > > > > > > > > > > effort
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > handle
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > TM
> > > > > > > > > > > > > > > > > > > > > > pod creation, some mapping code
> > from
> > > > the
> > > > > CR
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > native
> > > > > > > > > > > > > > > > > > integration
> > > > > > > > > > > > > > > > > > > > > > client and config needs to be
> > created.
> > > > As
> > > > > > > > > mentioned
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > FLIP,
> > > > > > > > > > > > > > > > > > > native
> > > > > > > > > > > > > > > > > > > > > > integration requires the Flink
> job
> > > > > manager
> > > > > > to
> > > > > > > > > have
> > > > > > > > > > > > access
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > k8s
> > > > > > > > > > > > > > > > > > > > API
> > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > create pods, which in some
> > scenarios
> > > > may
> > > > > be
> > > > > > > > seen
> > > > > > > > > as
> > > > > > > > > > > > > > > > unfavorable.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >  > > > # Pod Template
> > > > > > > > > > > > > > > > > > > > > > > > > Is the pod template in CR
> > same
> > > > with
> > > > > > > what
> > > > > > > > > > Flink
> > > > > > > > > > > > has
> > > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > > > > > > supported[4]?
> > > > > > > > > > > > > > > > > > > > > > > > > Then I am afraid not the
> > > > arbitrary
> > > > > > > > > field(e.g.
> > > > > > > > > > > > > > cpu/memory
> > > > > > > > > > > > > > > > > > > > resources)
> > > > > > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > > > > > take effect.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Yes, pod template would look
> almost
> > > > > > > identical.
> > > > > > > > > > There
> > > > > > > > > > > > are
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > few
> > > > > > > > > > > > > > > > > > > settings
> > > > > > > > > > > > > > > > > > > > > > that the operator will control
> (and
> > > > that
> > > > > > may
> > > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > blacklisted),
> > > > > > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > > > > in general we would not want to
> > place
> > > > > > > > > > restrictions. I
> > > > > > > > > > > > > > think a
> > > > > > > > > > > > > > > > > > > mechanism
> > > > > > > > > > > > > > > > > > > > > > where a pod template is merged
> from
> > > > > > multiple
> > > > > > > > > layers
> > > > > > > > > > > > would
> > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > interesting to make this more
> > flexible.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > > > > > > > > Thomas
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Reply via email to