Hi,

I will leave this under discussion until next monday then if no comments
are provided I will put it to vote.

Regards,

Nicolas

On Tue, Jun 24, 2025 at 8:54 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> Thanks for your efforts!
> The FLIP looks good from my side.
>
> G
>
>
> On Tue, Jun 24, 2025 at 7:22 AM Nicolas Fraison <
> nicolas.frai...@datadoghq.com> wrote:
>
>> Hi,
>>
>> This remaining part of the doc has been updated
>>
>> On Thu, Jun 19, 2025 at 9:37 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
>> wrote:
>>
>>> I think in this SSLContextLoader class call() function is not correct
>>> and must be onFileOrDirectoryModified(...), right?
>>>
>>> [image: Screenshot 2025-06-19 at 9.35.14.png]
>>>
>>> On Thu, Jun 19, 2025 at 8:14 AM Nicolas Fraison <
>>> nicolas.frai...@datadoghq.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Which part of the doc are you referring to?
>>>>
>>>> On Wed, Jun 18, 2025 at 7:08 PM Gabor Somogyi <
>>>> gabor.g.somo...@gmail.com> wrote:
>>>>
>>>>> There is still a call function there which is left there from the
>>>>> previous
>>>>> design...
>>>>>
>>>>> On Tue, Jun 17, 2025 at 9:48 AM Nicolas Fraison
>>>>> <nicolas.frai...@datadoghq.com.invalid> wrote:
>>>>>
>>>>> > Done
>>>>> >
>>>>> > On Mon, Jun 16, 2025 at 2:46 PM Gabor Somogyi <
>>>>> gabor.g.somo...@gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> > > I still see 3 Callable occurrence in the FLIP...
>>>>> > >
>>>>> > > On Thu, Jun 12, 2025 at 2:16 PM Nicolas Fraison
>>>>> > > <nicolas.frai...@datadoghq.com.invalid> wrote:
>>>>> > >
>>>>> > > > Hi, FLIP has been updated with last comment
>>>>> > > >
>>>>> > > > On Thu, Jun 5, 2025 at 6:35 PM Gabor Somogyi <
>>>>> > gabor.g.somo...@gmail.com>
>>>>> > > > wrote:
>>>>> > > >
>>>>> > > > > The interface has been added. Do we intend to pass an
>>>>> > > > > instance registerWatchedPath function as we discussed?
>>>>> > > > > If yes then the FLIP needs further adjustments all places
>>>>> where now
>>>>> > > > > Callable provided.
>>>>> > > > >
>>>>> > > > > G
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > On Thu, Jun 5, 2025 at 2:27 PM Nicolas Fraison
>>>>> > > > > <nicolas.frai...@datadoghq.com.invalid> wrote:
>>>>> > > > >
>>>>> > > > > > Hi,
>>>>> > > > > >
>>>>> > > > > > It indeed make sense, FLIP has been updated
>>>>> > > > > >
>>>>> > > > > > Nicolas
>>>>> > > > > >
>>>>> > > > > > On Wed, Jun 4, 2025 at 12:10 PM Gabor Somogyi <
>>>>> > > > gabor.g.somo...@gmail.com
>>>>> > > > > >
>>>>> > > > > > wrote:
>>>>> > > > > >
>>>>> > > > > > > Hi,
>>>>> > > > > > >
>>>>> > > > > > > I've read it through. Basically looks good with one
>>>>> comment.
>>>>> > > > > > > *registerWatchedPath function* has a *Callable* as
>>>>> callback. For
>>>>> > > the
>>>>> > > > > > > current implementation that would be enough,
>>>>> > > > > > > but when somebody would like to use that for any other
>>>>> use-case
>>>>> > > then
>>>>> > > > it
>>>>> > > > > > > would be hard. Examples:
>>>>> > > > > > > * A single *call* function tells the user nothing what
>>>>> kind of
>>>>> > > event
>>>>> > > > is
>>>>> > > > > > > this
>>>>> > > > > > > * the watch service supports 3 events (create, modify,
>>>>> delete),
>>>>> > now
>>>>> > > > > when
>>>>> > > > > > I
>>>>> > > > > > > register I can get only updates (I presume)
>>>>> > > > > > > * 1+ dir watch with the same callback is not possible
>>>>> > > > > > >
>>>>> > > > > > > My suggestion would be to create a proper callback with
>>>>> all the
>>>>> > > event
>>>>> > > > > > type
>>>>> > > > > > > functions and no-op default behavior with the following
>>>>> names[1].
>>>>> > > > > > > This is ~30 lines addition but will increase the
>>>>> readability
>>>>> > > heavily
>>>>> > > > +
>>>>> > > > > no
>>>>> > > > > > > need to touch this code later.
>>>>> > > > > > >
>>>>> > > > > > > BR,
>>>>> > > > > > > G
>>>>> > > > > > >
>>>>> > > > > > > [1]
>>>>> > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/flink-kubernetes-operator/commit/e5a325c48965a50d61d0aa29e61ba79e97f27082#diff-a30b3ed9b8c53e998b15d7da7ad2e54374c98ffc3c920f76a70bce3fb37a9b2eR87-R93
>>>>> > > > > > >
>>>>> > > > > > >
>>>>> > > > > > > On Tue, Jun 3, 2025 at 8:53 AM Nicolas Fraison
>>>>> > > > > > > <nicolas.frai...@datadoghq.com.invalid> wrote:
>>>>> > > > > > >
>>>>> > > > > > > > Hi,
>>>>> > > > > > > >
>>>>> > > > > > > > The FLIP has been updated.
>>>>> > > > > > > > Let me know if you have some other comments.
>>>>> > > > > > > >
>>>>> > > > > > > > Nicolas
>>>>> > > > > > > >
>>>>> > > > > > > > On Tue, May 20, 2025 at 12:01 PM Nicolas Fraison <
>>>>> > > > > > > > nicolas.frai...@datadoghq.com> wrote:
>>>>> > > > > > > >
>>>>> > > > > > > > > Hi Gabor,
>>>>> > > > > > > > >
>>>>> > > > > > > > > Thanks for the feedback.
>>>>> > > > > > > > > The overall proposal with atomic dirty flag being set
>>>>> by the
>>>>> > > > > callback
>>>>> > > > > > > > will
>>>>> > > > > > > > > indeed work with any kind of implementation.
>>>>> > > > > > > > >
>>>>> > > > > > > > >  Will see to update the FLIP in a week or 2 if there
>>>>> are no
>>>>> > > other
>>>>> > > > > > > > comments
>>>>> > > > > > > > > on it
>>>>> > > > > > > > >
>>>>> > > > > > > > > Nicolas
>>>>> > > > > > > > >
>>>>> > > > > > > > >
>>>>> > > > > > > > > On Tue, May 20, 2025 at 11:08 AM Gabor Somogyi <
>>>>> > > > > > > > gabor.g.somo...@gmail.com>
>>>>> > > > > > > > > wrote:
>>>>> > > > > > > > >
>>>>> > > > > > > > >> Hi Nicolas,
>>>>> > > > > > > > >>
>>>>> > > > > > > > >> Related SSLContext I've not gone through all the
>>>>> cases where
>>>>> > > we
>>>>> > > > > need
>>>>> > > > > > > to
>>>>> > > > > > > > >> reload so instead I'm sharing the concept.
>>>>> > > > > > > > >> The main intention is that we must control all
>>>>> execution
>>>>> > paths
>>>>> > > > > which
>>>>> > > > > > > > >> decide
>>>>> > > > > > > > >> which certificate used for authentication.
>>>>> > > > > > > > >> Creating an SSLContext decorator which checks reload
>>>>> first
>>>>> > and
>>>>> > > > > then
>>>>> > > > > > > > >> forwards all calls to the original (wrapped) context
>>>>> > > > > > > > >> is one way to achieve that. If there are different
>>>>> > > > implementations
>>>>> > > > > > > which
>>>>> > > > > > > > >> end up in similar behavior then it's fine.
>>>>> > > > > > > > >>
>>>>> > > > > > > > >> BR,
>>>>> > > > > > > > >> G
>>>>> > > > > > > > >>
>>>>> > > > > > > > >>
>>>>> > > > > > > > >> On Tue, May 20, 2025 at 10:15 AM Nicolas Fraison
>>>>> > > > > > > > >> <nicolas.frai...@datadoghq.com.invalid> wrote:
>>>>> > > > > > > > >>
>>>>> > > > > > > > >> > Hi,
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >> > Overall your proposal looks great.
>>>>> > > > > > > > >> > The event handling must indeed be super fast and we
>>>>> must
>>>>> > > also
>>>>> > > > > not
>>>>> > > > > > > > change
>>>>> > > > > > > > >> > original code path if reload not needed
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >> > I have some concerns around the ReloadableSSLContext
>>>>> > > > > implementing
>>>>> > > > > > > all
>>>>> > > > > > > > >> > SSLContext
>>>>> > > > > > > > >> > Do you really mean SSLContext (java SSLContext one)
>>>>> or do
>>>>> > > you
>>>>> > > > > > refer
>>>>> > > > > > > to
>>>>> > > > > > > > >> the
>>>>> > > > > > > > >> > SslContext from netty?
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >> > If java SSLContext
>>>>> > > > > > > > >> > - I'm not sure how this will manage reload from the
>>>>> > > BlobServer
>>>>> > > > > > > > >> > BlobServer relies on creation of an
>>>>> SSLServerSocketFactory
>>>>> > > > from
>>>>> > > > > > the
>>>>> > > > > > > > >> > SSLContext.
>>>>> > > > > > > > >> > But from my current understanding the
>>>>> > SSLServerSocketFactory
>>>>> > > > > does
>>>>> > > > > > > not
>>>>> > > > > > > > >> have
>>>>> > > > > > > > >> > any connection with the SSLContext.
>>>>> > > > > > > > >> > It only has some with an SSLContextImpl extends
>>>>> > > SSLContextSpi.
>>>>> > > > > > > > >> > I think we will need a callback here to enforce
>>>>> recreation
>>>>> > > of
>>>>> > > > > the
>>>>> > > > > > > > >> > BlobServer socket.
>>>>> > > > > > > > >> > - I'm also don't see how to attach this to the
>>>>> SslContext
>>>>> > > from
>>>>> > > > > > netty
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >> > If netty SslContext
>>>>> > > > > > > > >> > - we would still need to have the callback to
>>>>> recreate the
>>>>> > > > > > > BlobServer
>>>>> > > > > > > > >> > socket
>>>>> > > > > > > > >> > - for netty and pekko we should be able to rely on
>>>>> it
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >> > Nicolas
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >> > On Wed, May 7, 2025 at 12:40 PM Gabor Somogyi <
>>>>> > > > > > > > >> gabor.g.somo...@gmail.com>
>>>>> > > > > > > > >> > wrote:
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >> > > Hi All,
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > > I've read through the concerns/proposals related
>>>>> the
>>>>> > watch
>>>>> > > > > > service
>>>>> > > > > > > > and
>>>>> > > > > > > > >> > here
>>>>> > > > > > > > >> > > are my conclusions:
>>>>> > > > > > > > >> > > * Watch service can watch local file systems
>>>>> only: It's
>>>>> > > fair
>>>>> > > > > to
>>>>> > > > > > > say
>>>>> > > > > > > > >> that
>>>>> > > > > > > > >> > > certificates must be copied to local FS in order
>>>>> to work
>>>>> > > > (init
>>>>> > > > > > > > >> container
>>>>> > > > > > > > >> > cp
>>>>> > > > > > > > >> > > command or something)
>>>>> > > > > > > > >> > > * Watch service can send multiple events even for
>>>>> a
>>>>> > single
>>>>> > > > > > > directory
>>>>> > > > > > > > >> > > change: this can be mitigated with a single
>>>>> atomic dirty
>>>>> > > > flag
>>>>> > > > > > (see
>>>>> > > > > > > > my
>>>>> > > > > > > > >> > > design suggestion)
>>>>> > > > > > > > >> > > * Polling file modification time: When I hear any
>>>>> kind
>>>>> > of
>>>>> > > > > > polling
>>>>> > > > > > > > >> > > implemented by us is just something I'm mostly
>>>>> opposing
>>>>> > > > > > > > >> > > * security.ssl.internal.keystore.reload.duration:
>>>>> > Security
>>>>> > > > > > > features
>>>>> > > > > > > > >> must
>>>>> > > > > > > > >> > be
>>>>> > > > > > > > >> > > executed nearly immediately no matter what
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > > As a general remark, not sure how many users are
>>>>> using
>>>>> > the
>>>>> > > > > watch
>>>>> > > > > > > > >> service
>>>>> > > > > > > > >> > > based approach in the operator but until now I've
>>>>> not
>>>>> > seen
>>>>> > > > any
>>>>> > > > > > > issue
>>>>> > > > > > > > >> > > related to that.
>>>>> > > > > > > > >> > > If somebody is having some specifics then please
>>>>> share.
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > > For now I would shoot for keystore not to have
>>>>> feature
>>>>> > > > creep.
>>>>> > > > > > When
>>>>> > > > > > > > >> there
>>>>> > > > > > > > >> > is
>>>>> > > > > > > > >> > > a valid use-case, community interest and the
>>>>> keystore
>>>>> > > story
>>>>> > > > is
>>>>> > > > > > > > already
>>>>> > > > > > > > >> > rock
>>>>> > > > > > > > >> > > stable
>>>>> > > > > > > > >> > > then we can consider to involve truststore later.
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > > Having the mentioned assumption that the operator
>>>>> > approach
>>>>> > > > > > works,
>>>>> > > > > > > > >> here is
>>>>> > > > > > > > >> > > my high level proposal:
>>>>> > > > > > > > >> > > * Let's have enable flag for all such watch
>>>>> > functionality.
>>>>> > > > If
>>>>> > > > > > it's
>>>>> > > > > > > > >> false
>>>>> > > > > > > > >> > > then the currently existing functionality must
>>>>> remain
>>>>> > > as-is
>>>>> > > > > > > > >> > > * Let's have a LocalFSWatchService which is a
>>>>> singleton
>>>>> > > > which
>>>>> > > > > > has
>>>>> > > > > > > no
>>>>> > > > > > > > >> path
>>>>> > > > > > > > >> > > registrations by default
>>>>> > > > > > > > >> > > * Add path registration functionality which is
>>>>> > > synchronised
>>>>> > > > > > where
>>>>> > > > > > > a
>>>>> > > > > > > > >> > > callback can be registered
>>>>> > > > > > > > >> > > * Let's have a ReloadableSSLContext which
>>>>> implements the
>>>>> > > > > > mentioned
>>>>> > > > > > > > >> > callback
>>>>> > > > > > > > >> > > * Inside the callback set an atomic dirty flag
>>>>> only
>>>>> > (this
>>>>> > > > can
>>>>> > > > > > > handle
>>>>> > > > > > > > >> > > multiple events for the same directory change +
>>>>> event
>>>>> > > > handling
>>>>> > > > > > in
>>>>> > > > > > > > the
>>>>> > > > > > > > >> > watch
>>>>> > > > > > > > >> > > service must be extreme fast)
>>>>> > > > > > > > >> > > * Inside ReloadableSSLContext all SSLContext
>>>>> actions
>>>>> > must
>>>>> > > be
>>>>> > > > > > > > >> overridden.
>>>>> > > > > > > > >> > At
>>>>> > > > > > > > >> > > the beginning of each function dirty flag must be
>>>>> > checked
>>>>> > > > > > > > >> > > and if dirty then certificates must be reloaded,
>>>>> flag
>>>>> > can
>>>>> > > be
>>>>> > > > > set
>>>>> > > > > > > > back
>>>>> > > > > > > > >> to
>>>>> > > > > > > > >> > > false (then original functionality call). It's
>>>>> extremely
>>>>> > > > > > important
>>>>> > > > > > > > >> that
>>>>> > > > > > > > >> > > context reload must be synchronised.
>>>>> > > > > > > > >> > > A synchronised boolean check + possible context
>>>>> reload
>>>>> > can
>>>>> > > > > > consume
>>>>> > > > > > > > >> some
>>>>> > > > > > > > >> > > time but I wouldn't expect any significant
>>>>> performance
>>>>> > > drop.
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > > My proposal main drivers:
>>>>> > > > > > > > >> > > * Original code path must run when no watch
>>>>> service
>>>>> > asked
>>>>> > > > > > > > >> > > * Super fast event handling because million
>>>>> events may
>>>>> > > come
>>>>> > > > in
>>>>> > > > > > > (not
>>>>> > > > > > > > >> > > expecting but we should be prepared)
>>>>> > > > > > > > >> > > * Clean separation between dir/file watch and
>>>>> file/dir
>>>>> > > usage
>>>>> > > > > > > > >> > > * Well considered synchronisation model
>>>>> > > > > > > > >> > > * Extensive unit testing because we're intended
>>>>> to touch
>>>>> > > the
>>>>> > > > > > heart
>>>>> > > > > > > > of
>>>>> > > > > > > > >> > Flink
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > > Happy to hear other opinions.
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > > BR,
>>>>> > > > > > > > >> > > G
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > > On Mon, Apr 28, 2025 at 1:54 PM Nicolas Fraison
>>>>> > > > > > > > >> > > <nicolas.frai...@datadoghq.com.invalid> wrote:
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> > > > Thanks all for your feedback and sorry for the
>>>>> late
>>>>> > > answer
>>>>> > > > > (I
>>>>> > > > > > > was
>>>>> > > > > > > > on
>>>>> > > > > > > > >> > > > holiday).
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > 1. Indeed it would add 4 threads.
>>>>> > > > > > > > >> > > > For the non pekko component we can indeed have
>>>>> one
>>>>> > > watcher
>>>>> > > > > > > service
>>>>> > > > > > > > >> used
>>>>> > > > > > > > >> > > to
>>>>> > > > > > > > >> > > > reload SSLContext for those components
>>>>> > > > > > > > >> > > > For pekko this is a little more challenging as
>>>>> the
>>>>> > > > creation
>>>>> > > > > of
>>>>> > > > > > > the
>>>>> > > > > > > > >> > pekko
>>>>> > > > > > > > >> > > > ssl engine is managed by pekko himself.
>>>>> > > > > > > > >> > > > Flink only generates appropriate config with
>>>>> class to
>>>>> > > > > execute
>>>>> > > > > > to
>>>>> > > > > > > > >> > initiate
>>>>> > > > > > > > >> > > > the pekko ssl engine [1]. This means that I
>>>>> will not
>>>>> > be
>>>>> > > > able
>>>>> > > > > > to
>>>>> > > > > > > > >> provide
>>>>> > > > > > > > >> > > the
>>>>> > > > > > > > >> > > > watcher service to this ssl engine.
>>>>> > > > > > > > >> > > > One solution would be to rely on a singleton
>>>>> instead
>>>>> > of
>>>>> > > a
>>>>> > > > > > > service
>>>>> > > > > > > > >> > > injected
>>>>> > > > > > > > >> > > > in each component but I'm not sure this is fine
>>>>> to use
>>>>> > > > such
>>>>> > > > > in
>>>>> > > > > > > > >> flink.
>>>>> > > > > > > > >> > > > WDYT?
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > We can also add a specific flink configuration
>>>>> > > > > > > > >> > > > (security.ssl.internal.keystore.reload.enable)
>>>>> to only
>>>>> > > add
>>>>> > > > > > this
>>>>> > > > > > > > >> watcher
>>>>> > > > > > > > >> > > > mechanism if the config is enabled to avoid
>>>>> adding
>>>>> > those
>>>>> > > > > > threads
>>>>> > > > > > > > if
>>>>> > > > > > > > >> > this
>>>>> > > > > > > > >> > > is
>>>>> > > > > > > > >> > > > not needed.
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > 2. I'm fine with the LocalFSWatchService naming.
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > 3. Also agree that some e2e tests must be added.
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > @doguscan namal, Thanks for challenging the
>>>>> proposal.
>>>>> > > > > > > > >> > > > FYI, we are planning to rely on certificates
>>>>> with
>>>>> > really
>>>>> > > > > short
>>>>> > > > > > > > >> > validity.
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > D1. It seems that the proposal to rely on a
>>>>> reload
>>>>> > > period
>>>>> > > > > will
>>>>> > > > > > > > still
>>>>> > > > > > > > >> > face
>>>>> > > > > > > > >> > > > potential issues:
>>>>> > > > > > > > >> > > > - receiving events while file content updates
>>>>> are
>>>>> > still
>>>>> > > in
>>>>> > > > > > > > progress:
>>>>> > > > > > > > >> > > > There is no guarantee that we will not load the
>>>>> > > > certificate
>>>>> > > > > > > while
>>>>> > > > > > > > >> file
>>>>> > > > > > > > >> > > > content updates are still in progress
>>>>> > > > > > > > >> > > > - while it will not be affected by multiple
>>>>> > > notifications,
>>>>> > > > > we
>>>>> > > > > > > can
>>>>> > > > > > > > >> > reach a
>>>>> > > > > > > > >> > > > point where only the truststore is updated when
>>>>> the
>>>>> > > reload
>>>>> > > > > > > > happens.
>>>>> > > > > > > > >> > > > Which means that if the keystore is updated just
>>>>> > after,
>>>>> > > it
>>>>> > > > > > will
>>>>> > > > > > > > not
>>>>> > > > > > > > >> be
>>>>> > > > > > > > >> > > > taken in account before next run of the reload
>>>>> > mechanism
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > I think that with WatchService and appropriate
>>>>> reload
>>>>> > > > grace
>>>>> > > > > > > period
>>>>> > > > > > > > >> > > > mechanism we should be able to mitigate those 2
>>>>> issues
>>>>> > > > > > (ensuring
>>>>> > > > > > > > >> > minimum
>>>>> > > > > > > > >> > > > reload even with multiple notify)
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > From KIP-1119 [2] it looks like the same kind of
>>>>> > > > > requirements
>>>>> > > > > > is
>>>>> > > > > > > > >> under
>>>>> > > > > > > > >> > > > discussion for Kafka to also rely on the
>>>>> WatchService
>>>>> > > Java
>>>>> > > > > API
>>>>> > > > > > > > >> > > (SpringBoot
>>>>> > > > > > > > >> > > > seems to also rely on this API to manage ssl
>>>>> reload
>>>>> > > [3]).
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > D3. Do we have a real use case for this?
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > [1]
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >>
>>>>> > > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/flink/blob/b523264ab45d37cd9584a0e8c06f1ef6bd1aaed7/flink-rpc/flink-rpc-akka/src/main/java/org/apache/flink/runtime/rpc/pekko/PekkoUtils.java#L372
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > [2]
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >>
>>>>> > > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1119%3A+Add+support+for+SSL+auto+reload
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > [3]
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >>
>>>>> > > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://docs.spring.io/spring-boot/reference/features/ssl.html#features.ssl.reloading
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > Regards,
>>>>> > > > > > > > >> > > > Nicolas
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > On Wed, Apr 23, 2025 at 12:14 PM Doğuşcan Namal
>>>>> <
>>>>> > > > > > > > >> > > namal.dogus...@gmail.com>
>>>>> > > > > > > > >> > > > wrote:
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > > Hi Nicolas, thanks for the FLIP.
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > I am fully supportive of the motivation and we
>>>>> > should
>>>>> > > be
>>>>> > > > > > > > >> supporting
>>>>> > > > > > > > >> > > this
>>>>> > > > > > > > >> > > > > feature. Here are couple of comments from my
>>>>> side:
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > D1)
>>>>> > > > > > > > >> > > > > Since you shared the implementation details
>>>>> on the
>>>>> > > FLIP
>>>>> > > > as
>>>>> > > > > > > > well, I
>>>>> > > > > > > > >> > > would
>>>>> > > > > > > > >> > > > > like to discuss whether using Java's
>>>>> WatchService is
>>>>> > > the
>>>>> > > > > > best
>>>>> > > > > > > > >> choice
>>>>> > > > > > > > >> > > > here.
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > I believe that the certificate renewal is not
>>>>> a
>>>>> > > frequent
>>>>> > > > > > > > >> operation.
>>>>> > > > > > > > >> > > Even
>>>>> > > > > > > > >> > > > > once a day certificate renewals are not
>>>>> > > realistic(except
>>>>> > > > > for
>>>>> > > > > > > > test
>>>>> > > > > > > > >> > cases
>>>>> > > > > > > > >> > > > > maybe) but let's assume that this covers up
>>>>> the
>>>>> > p99.99
>>>>> > > > of
>>>>> > > > > > the
>>>>> > > > > > > > use
>>>>> > > > > > > > >> > > cases.
>>>>> > > > > > > > >> > > > I
>>>>> > > > > > > > >> > > > > am confident on this estimation since there
>>>>> hasn't
>>>>> > > been
>>>>> > > > a
>>>>> > > > > > > > request
>>>>> > > > > > > > >> > from
>>>>> > > > > > > > >> > > > the
>>>>> > > > > > > > >> > > > > community for this feature so far, which
>>>>> confirms
>>>>> > that
>>>>> > > > > > people
>>>>> > > > > > > > were
>>>>> > > > > > > > >> > okay
>>>>> > > > > > > > >> > > > > with infrequent cluster restarts. Following
>>>>> that it
>>>>> > is
>>>>> > > > > > > > >> infrequent, I
>>>>> > > > > > > > >> > > > > believe that spawning up a thread that
>>>>> watches the
>>>>> > > file
>>>>> > > > > > > > >> modification
>>>>> > > > > > > > >> > > > > operations is not the best use of the limited
>>>>> > > resources
>>>>> > > > > on a
>>>>> > > > > > > > >> cluster.
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > There are some known limitations of the
>>>>> WatchService
>>>>> > > as
>>>>> > > > > well
>>>>> > > > > > > > such
>>>>> > > > > > > > >> as
>>>>> > > > > > > > >> > > > > receiving multiple modification events for
>>>>> the same
>>>>> > > > > > occurence
>>>>> > > > > > > > [1],
>>>>> > > > > > > > >> > > > > inotify's (WatchService's underlying
>>>>> mechanism in
>>>>> > > Linux
>>>>> > > > > > > > >> environments)
>>>>> > > > > > > > >> > > > > problems on containerized environments due to
>>>>> remote
>>>>> > > > file
>>>>> > > > > > > > systems
>>>>> > > > > > > > >> [2]
>>>>> > > > > > > > >> > > or
>>>>> > > > > > > > >> > > > > receiving events while file content updates
>>>>> are
>>>>> > still
>>>>> > > in
>>>>> > > > > > > > progress.
>>>>> > > > > > > > >> > > [3]. I
>>>>> > > > > > > > >> > > > > do not know if these limitations are
>>>>> addressed in
>>>>> > the
>>>>> > > > > newer
>>>>> > > > > > > > >> versions
>>>>> > > > > > > > >> > > but
>>>>> > > > > > > > >> > > > > regardless of that it is clear that we may
>>>>> face with
>>>>> > > > some
>>>>> > > > > > ugly
>>>>> > > > > > > > >> edge
>>>>> > > > > > > > >> > > cases
>>>>> > > > > > > > >> > > > > due to that.
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > Given these complications, I would recommend
>>>>> just
>>>>> > > > > creating a
>>>>> > > > > > > new
>>>>> > > > > > > > >> > > > SSLContext
>>>>> > > > > > > > >> > > > > after a configured duration is expired. We
>>>>> could
>>>>> > > record
>>>>> > > > > the
>>>>> > > > > > > > >> timestamp
>>>>> > > > > > > > >> > > > when
>>>>> > > > > > > > >> > > > > the previous SSLContext is created and update
>>>>> it
>>>>> > > after a
>>>>> > > > > > > > >> configured
>>>>> > > > > > > > >> > > > > duration is passed. This will be much easier
>>>>> to test
>>>>> > > and
>>>>> > > > > > > reason
>>>>> > > > > > > > >> about
>>>>> > > > > > > > >> > > > when
>>>>> > > > > > > > >> > > > > it is running on production. This will
>>>>> eliminate the
>>>>> > > > > > necessity
>>>>> > > > > > > > to
>>>>> > > > > > > > >> > > reason
>>>>> > > > > > > > >> > > > > about the file modification operations as
>>>>> well.
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > I briefly skimmed through the classes that
>>>>> need to
>>>>> > be
>>>>> > > > > > modified
>>>>> > > > > > > > >> and it
>>>>> > > > > > > > >> > > > > looked feasible for me. Let me know what are
>>>>> your
>>>>> > > > comments
>>>>> > > > > > on
>>>>> > > > > > > > >> these.
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > Note that this is already used in the Kafka
>>>>> world
>>>>> > > where
>>>>> > > > a
>>>>> > > > > > new
>>>>> > > > > > > > >> > > SSLContext
>>>>> > > > > > > > >> > > > is
>>>>> > > > > > > > >> > > > > created after 12 hours.
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > D2) We could provide a configuration to the
>>>>> user,
>>>>> > such
>>>>> > > > as
>>>>> > > > > > > > >> > > > >
>>>>> "security.ssl.internal.keystore.reload.duration" so
>>>>> > > they
>>>>> > > > > > could
>>>>> > > > > > > > >> decide
>>>>> > > > > > > > >> > > how
>>>>> > > > > > > > >> > > > > often the new certificates should be loaded.
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > D3)
>>>>> > > > > > > > >> > > > > On the other hand, I wonder whether we should
>>>>> also
>>>>> > > > handle
>>>>> > > > > > > > >> supporting
>>>>> > > > > > > > >> > > > > updating the file paths of the truststores and
>>>>> > > keystores
>>>>> > > > > > under
>>>>> > > > > > > > >> this
>>>>> > > > > > > > >> > > FLIP
>>>>> > > > > > > > >> > > > as
>>>>> > > > > > > > >> > > > > well. Since the name of the FLIP is "Handle
>>>>> TLS
>>>>> > > > > Certificate
>>>>> > > > > > > > >> Renewal"
>>>>> > > > > > > > >> > I
>>>>> > > > > > > > >> > > > > think we could bring that into scope too :)
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > Thanks
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > [1]
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >>
>>>>> > > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://stackoverflow.com/questions/16777869/java-7-watchservice-ignoring-multiple-occurrences-of-the-same-event
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > [2]
>>>>> > > > > > > >
>>>>> https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > [3]
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >>
>>>>> > > > > >
>>>>> > >
>>>>> https://surajatreyac.github.io/2014-07-29/reactive_file_handling.html
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > [4] See also Platform Dependencies -
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >>
>>>>> > > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://docs.oracle.com/javase/8/docs/api/index.html?java/nio/file/WatchService.html
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > On Fri, 18 Apr 2025 at 18:25, Gabor Somogyi <
>>>>> > > > > > > > >> > gabor.g.somo...@gmail.com
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > > > > wrote:
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > > > > Hi Robert,
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > Since I've added the same feature to the
>>>>> operator
>>>>> > > I'll
>>>>> > > > > > take
>>>>> > > > > > > a
>>>>> > > > > > > > >> look
>>>>> > > > > > > > >> > at
>>>>> > > > > > > > >> > > > it.
>>>>> > > > > > > > >> > > > > > Though it won't be lightning fast since I'm
>>>>> having
>>>>> > > > > several
>>>>> > > > > > > > weeks
>>>>> > > > > > > > >> > off.
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > Your questions are valid especially
>>>>> considering
>>>>> > the
>>>>> > > > fact
>>>>> > > > > > > that
>>>>> > > > > > > > >> this
>>>>> > > > > > > > >> > > > > feature
>>>>> > > > > > > > >> > > > > > touches the hearth of the authentication so
>>>>> this
>>>>> > > must
>>>>> > > > be
>>>>> > > > > > > rock
>>>>> > > > > > > > >> solid
>>>>> > > > > > > > >> > > > > > in order to avoid grey hair :)
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > (1) I would vote on a single service which
>>>>> is
>>>>> > > heavily
>>>>> > > > > unit
>>>>> > > > > > > > >> tested
>>>>> > > > > > > > >> > > with
>>>>> > > > > > > > >> > > > > > all the possible combinations including
>>>>> threading.
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > Some standalone app could be added to
>>>>> really play
>>>>> > > with
>>>>> > > > > it
>>>>> > > > > > > > (that
>>>>> > > > > > > > >> > would
>>>>> > > > > > > > >> > > > > help
>>>>> > > > > > > > >> > > > > > review).
>>>>> > > > > > > > >> > > > > > I mean, create X files, start Y threads,
>>>>> and make
>>>>> > > > > > > assertions.
>>>>> > > > > > > > >> > > > > > The reason why I'm suggesting it is the
>>>>> fact that
>>>>> > > > AFAIR
>>>>> > > > > > the
>>>>> > > > > > > > >> watch
>>>>> > > > > > > > >> > > > service
>>>>> > > > > > > > >> > > > > > is quite sensitive even in single thread.
>>>>> If we
>>>>> > > could
>>>>> > > > do
>>>>> > > > > > > this
>>>>> > > > > > > > >> in a
>>>>> > > > > > > > >> > > > finite
>>>>> > > > > > > > >> > > > > > time
>>>>> > > > > > > > >> > > > > > consuming unit test then it's even better.
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > (2) +1 on that name to avoid confusion
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > (3) I agree that some e2e is must, however
>>>>> this
>>>>> > can
>>>>> > > be
>>>>> > > > > > > easily
>>>>> > > > > > > > >> and
>>>>> > > > > > > > >> > > > deeply
>>>>> > > > > > > > >> > > > > > unit
>>>>> > > > > > > > >> > > > > > tested so that part is also essential. One
>>>>> key
>>>>> > test
>>>>> > > > here
>>>>> > > > > > is
>>>>> > > > > > > > when
>>>>> > > > > > > > >> > > > > > certificates
>>>>> > > > > > > > >> > > > > > are not changing then no action must be
>>>>> performed
>>>>> > > (not
>>>>> > > > > to
>>>>> > > > > > > > break
>>>>> > > > > > > > >> the
>>>>> > > > > > > > >> > > > whole
>>>>> > > > > > > > >> > > > > > system apart).
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > Purely personal opinion but such feature
>>>>> > > developments
>>>>> > > > > are
>>>>> > > > > > > slow
>>>>> > > > > > > > >> by
>>>>> > > > > > > > >> > > > nature
>>>>> > > > > > > > >> > > > > > because
>>>>> > > > > > > > >> > > > > > of edge case / stress testing.
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > BR,
>>>>> > > > > > > > >> > > > > > G
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > On Fri, Apr 18, 2025 at 4:53 PM Robert
>>>>> Metzger <
>>>>> > > > > > > > >> > rmetz...@apache.org>
>>>>> > > > > > > > >> > > > > > wrote:
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > > > > Hi Nicolas,
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > > > This looks like a nice improvement,
>>>>> thanks for
>>>>> > the
>>>>> > > > > write
>>>>> > > > > > > up.
>>>>> > > > > > > > >> > > > > > > Are you in touch with any committer who's
>>>>> > willing
>>>>> > > to
>>>>> > > > > > > review
>>>>> > > > > > > > /
>>>>> > > > > > > > >> > merge
>>>>> > > > > > > > >> > > > > this?
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > > > Some random questions on the FLIP:
>>>>> > > > > > > > >> > > > > > > (1)  "Each service that depends on TLS
>>>>> > > certificates
>>>>> > > > > will
>>>>> > > > > > > > >> > > initialize a
>>>>> > > > > > > > >> > > > > > > FileSytemWatchService"
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > > > It seems that there are 4 components
>>>>> using SSL,
>>>>> > > does
>>>>> > > > > > this
>>>>> > > > > > > > mean
>>>>> > > > > > > > >> > > there
>>>>> > > > > > > > >> > > > > will
>>>>> > > > > > > > >> > > > > > > be 4 additional threads running, watching
>>>>> the
>>>>> > same
>>>>> > > > set
>>>>> > > > > > of
>>>>> > > > > > > > >> files?
>>>>> > > > > > > > >> > > > > > > Wouldn't it be better to introduce a
>>>>> central
>>>>> > file
>>>>> > > > > > watching
>>>>> > > > > > > > >> > service,
>>>>> > > > > > > > >> > > > and
>>>>> > > > > > > > >> > > > > > SSL
>>>>> > > > > > > > >> > > > > > > users can subscribe to updates, to reduce
>>>>> the
>>>>> > > number
>>>>> > > > > of
>>>>> > > > > > > > >> threads?
>>>>> > > > > > > > >> > > > > > > If this makes the whole effort 4x more
>>>>> > > complicated,
>>>>> > > > I
>>>>> > > > > > > > wouldn't
>>>>> > > > > > > > >> > > > consider
>>>>> > > > > > > > >> > > > > > it,
>>>>> > > > > > > > >> > > > > > > but if its roughly the same effort, we
>>>>> should :)
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > > > (2) "FileSytemWatchService"
>>>>> > > > > > > > >> > > > > > > When I read this name, I was wondering,
>>>>> whether
>>>>> > > this
>>>>> > > > > is
>>>>> > > > > > > > >> somehow
>>>>> > > > > > > > >> > > > related
>>>>> > > > > > > > >> > > > > > to
>>>>> > > > > > > > >> > > > > > > the Flink "FileSystem" classes. Which I
>>>>> think
>>>>> > its'
>>>>> > > > > not.
>>>>> > > > > > > > >> > > > > > > Maybe a different name, that makes this
>>>>> > separation
>>>>> > > > > more
>>>>> > > > > > > > >> explicit,
>>>>> > > > > > > > >> > > > would
>>>>> > > > > > > > >> > > > > > > make sense. Maybe "LocalFSWatchService"?
>>>>> > > > > > > > >> > > > > > > (I'm sorry to bring up naming stuff --
>>>>> its very
>>>>> > > > > > > subjective,
>>>>> > > > > > > > >> and
>>>>> > > > > > > > >> > > > > > difficult)
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > > > (3) For the test plan: There seem to be
>>>>> some SSL
>>>>> > > > > related
>>>>> > > > > > > e2e
>>>>> > > > > > > > >> > tests:
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >>
>>>>> > > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_ssl.sh
>>>>> > > > > > > > >> > > > > > > It would be nice to extend them to cover
>>>>> this
>>>>> > > > feature
>>>>> > > > > as
>>>>> > > > > > > > >> well. I
>>>>> > > > > > > > >> > > > would
>>>>> > > > > > > > >> > > > > > hate
>>>>> > > > > > > > >> > > > > > > for this feature to slowly break by future
>>>>> > > changes,
>>>>> > > > so
>>>>> > > > > > > good
>>>>> > > > > > > > >> e2e
>>>>> > > > > > > > >> > > test
>>>>> > > > > > > > >> > > > > > > coverage is key, in particular bc so many
>>>>> > > components
>>>>> > > > > are
>>>>> > > > > > > > >> > involved.
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > > > Best,
>>>>> > > > > > > > >> > > > > > > Robert
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > > > On Wed, Apr 16, 2025 at 11:55 AM Nicolas
>>>>> Fraison
>>>>> > > > > > > > >> > > > > > > <nicolas.frai...@datadoghq.com.invalid>
>>>>> wrote:
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > > > > Hi All,
>>>>> > > > > > > > >> > > > > > > >
>>>>> > > > > > > > >> > > > > > > > I'd like to start a discussion to
>>>>> Handle TLS
>>>>> > > > > > Certificate
>>>>> > > > > > > > >> > Renewal
>>>>> > > > > > > > >> > > > > > > > Please provide some feedback on this
>>>>> proposal:
>>>>> > > > > > > > >> > > > > > > >
>>>>> > > > > > > > >> > > > > > > >
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >>
>>>>> > > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-523%3A+Handle+TLS+Certificate+Renewal
>>>>> > > > > > > > >> > > > > > > >
>>>>> > > > > > > > >> > > > > > > > Regards,
>>>>> > > > > > > > >> > > > > > > >
>>>>> > > > > > > > >> > > > > > > > Nicolas Fraison
>>>>> > > > > > > > >> > > > > > > >
>>>>> > > > > > > > >> > > > > > >
>>>>> > > > > > > > >> > > > > >
>>>>> > > > > > > > >> > > > >
>>>>> > > > > > > > >> > > >
>>>>> > > > > > > > >> > >
>>>>> > > > > > > > >> >
>>>>> > > > > > > > >>
>>>>> > > > > > > > >
>>>>> > > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>> >
>>>>>
>>>>

Reply via email to