Hi, I will leave this under discussion until next monday then if no comments are provided I will put it to vote.
Regards, Nicolas On Tue, Jun 24, 2025 at 8:54 AM Gabor Somogyi <gabor.g.somo...@gmail.com> wrote: > Thanks for your efforts! > The FLIP looks good from my side. > > G > > > On Tue, Jun 24, 2025 at 7:22 AM Nicolas Fraison < > nicolas.frai...@datadoghq.com> wrote: > >> Hi, >> >> This remaining part of the doc has been updated >> >> On Thu, Jun 19, 2025 at 9:37 AM Gabor Somogyi <gabor.g.somo...@gmail.com> >> wrote: >> >>> I think in this SSLContextLoader class call() function is not correct >>> and must be onFileOrDirectoryModified(...), right? >>> >>> [image: Screenshot 2025-06-19 at 9.35.14.png] >>> >>> On Thu, Jun 19, 2025 at 8:14 AM Nicolas Fraison < >>> nicolas.frai...@datadoghq.com> wrote: >>> >>>> Hi, >>>> >>>> Which part of the doc are you referring to? >>>> >>>> On Wed, Jun 18, 2025 at 7:08 PM Gabor Somogyi < >>>> gabor.g.somo...@gmail.com> wrote: >>>> >>>>> There is still a call function there which is left there from the >>>>> previous >>>>> design... >>>>> >>>>> On Tue, Jun 17, 2025 at 9:48 AM Nicolas Fraison >>>>> <nicolas.frai...@datadoghq.com.invalid> wrote: >>>>> >>>>> > Done >>>>> > >>>>> > On Mon, Jun 16, 2025 at 2:46 PM Gabor Somogyi < >>>>> gabor.g.somo...@gmail.com> >>>>> > wrote: >>>>> > >>>>> > > I still see 3 Callable occurrence in the FLIP... >>>>> > > >>>>> > > On Thu, Jun 12, 2025 at 2:16 PM Nicolas Fraison >>>>> > > <nicolas.frai...@datadoghq.com.invalid> wrote: >>>>> > > >>>>> > > > Hi, FLIP has been updated with last comment >>>>> > > > >>>>> > > > On Thu, Jun 5, 2025 at 6:35 PM Gabor Somogyi < >>>>> > gabor.g.somo...@gmail.com> >>>>> > > > wrote: >>>>> > > > >>>>> > > > > The interface has been added. Do we intend to pass an >>>>> > > > > instance registerWatchedPath function as we discussed? >>>>> > > > > If yes then the FLIP needs further adjustments all places >>>>> where now >>>>> > > > > Callable provided. >>>>> > > > > >>>>> > > > > G >>>>> > > > > >>>>> > > > > >>>>> > > > > On Thu, Jun 5, 2025 at 2:27 PM Nicolas Fraison >>>>> > > > > <nicolas.frai...@datadoghq.com.invalid> wrote: >>>>> > > > > >>>>> > > > > > Hi, >>>>> > > > > > >>>>> > > > > > It indeed make sense, FLIP has been updated >>>>> > > > > > >>>>> > > > > > Nicolas >>>>> > > > > > >>>>> > > > > > On Wed, Jun 4, 2025 at 12:10 PM Gabor Somogyi < >>>>> > > > gabor.g.somo...@gmail.com >>>>> > > > > > >>>>> > > > > > wrote: >>>>> > > > > > >>>>> > > > > > > Hi, >>>>> > > > > > > >>>>> > > > > > > I've read it through. Basically looks good with one >>>>> comment. >>>>> > > > > > > *registerWatchedPath function* has a *Callable* as >>>>> callback. For >>>>> > > the >>>>> > > > > > > current implementation that would be enough, >>>>> > > > > > > but when somebody would like to use that for any other >>>>> use-case >>>>> > > then >>>>> > > > it >>>>> > > > > > > would be hard. Examples: >>>>> > > > > > > * A single *call* function tells the user nothing what >>>>> kind of >>>>> > > event >>>>> > > > is >>>>> > > > > > > this >>>>> > > > > > > * the watch service supports 3 events (create, modify, >>>>> delete), >>>>> > now >>>>> > > > > when >>>>> > > > > > I >>>>> > > > > > > register I can get only updates (I presume) >>>>> > > > > > > * 1+ dir watch with the same callback is not possible >>>>> > > > > > > >>>>> > > > > > > My suggestion would be to create a proper callback with >>>>> all the >>>>> > > event >>>>> > > > > > type >>>>> > > > > > > functions and no-op default behavior with the following >>>>> names[1]. >>>>> > > > > > > This is ~30 lines addition but will increase the >>>>> readability >>>>> > > heavily >>>>> > > > + >>>>> > > > > no >>>>> > > > > > > need to touch this code later. >>>>> > > > > > > >>>>> > > > > > > BR, >>>>> > > > > > > G >>>>> > > > > > > >>>>> > > > > > > [1] >>>>> > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > >>>>> https://github.com/apache/flink-kubernetes-operator/commit/e5a325c48965a50d61d0aa29e61ba79e97f27082#diff-a30b3ed9b8c53e998b15d7da7ad2e54374c98ffc3c920f76a70bce3fb37a9b2eR87-R93 >>>>> > > > > > > >>>>> > > > > > > >>>>> > > > > > > On Tue, Jun 3, 2025 at 8:53 AM Nicolas Fraison >>>>> > > > > > > <nicolas.frai...@datadoghq.com.invalid> wrote: >>>>> > > > > > > >>>>> > > > > > > > Hi, >>>>> > > > > > > > >>>>> > > > > > > > The FLIP has been updated. >>>>> > > > > > > > Let me know if you have some other comments. >>>>> > > > > > > > >>>>> > > > > > > > Nicolas >>>>> > > > > > > > >>>>> > > > > > > > On Tue, May 20, 2025 at 12:01 PM Nicolas Fraison < >>>>> > > > > > > > nicolas.frai...@datadoghq.com> wrote: >>>>> > > > > > > > >>>>> > > > > > > > > Hi Gabor, >>>>> > > > > > > > > >>>>> > > > > > > > > Thanks for the feedback. >>>>> > > > > > > > > The overall proposal with atomic dirty flag being set >>>>> by the >>>>> > > > > callback >>>>> > > > > > > > will >>>>> > > > > > > > > indeed work with any kind of implementation. >>>>> > > > > > > > > >>>>> > > > > > > > > Will see to update the FLIP in a week or 2 if there >>>>> are no >>>>> > > other >>>>> > > > > > > > comments >>>>> > > > > > > > > on it >>>>> > > > > > > > > >>>>> > > > > > > > > Nicolas >>>>> > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > > On Tue, May 20, 2025 at 11:08 AM Gabor Somogyi < >>>>> > > > > > > > gabor.g.somo...@gmail.com> >>>>> > > > > > > > > wrote: >>>>> > > > > > > > > >>>>> > > > > > > > >> Hi Nicolas, >>>>> > > > > > > > >> >>>>> > > > > > > > >> Related SSLContext I've not gone through all the >>>>> cases where >>>>> > > we >>>>> > > > > need >>>>> > > > > > > to >>>>> > > > > > > > >> reload so instead I'm sharing the concept. >>>>> > > > > > > > >> The main intention is that we must control all >>>>> execution >>>>> > paths >>>>> > > > > which >>>>> > > > > > > > >> decide >>>>> > > > > > > > >> which certificate used for authentication. >>>>> > > > > > > > >> Creating an SSLContext decorator which checks reload >>>>> first >>>>> > and >>>>> > > > > then >>>>> > > > > > > > >> forwards all calls to the original (wrapped) context >>>>> > > > > > > > >> is one way to achieve that. If there are different >>>>> > > > implementations >>>>> > > > > > > which >>>>> > > > > > > > >> end up in similar behavior then it's fine. >>>>> > > > > > > > >> >>>>> > > > > > > > >> BR, >>>>> > > > > > > > >> G >>>>> > > > > > > > >> >>>>> > > > > > > > >> >>>>> > > > > > > > >> On Tue, May 20, 2025 at 10:15 AM Nicolas Fraison >>>>> > > > > > > > >> <nicolas.frai...@datadoghq.com.invalid> wrote: >>>>> > > > > > > > >> >>>>> > > > > > > > >> > Hi, >>>>> > > > > > > > >> > >>>>> > > > > > > > >> > Overall your proposal looks great. >>>>> > > > > > > > >> > The event handling must indeed be super fast and we >>>>> must >>>>> > > also >>>>> > > > > not >>>>> > > > > > > > change >>>>> > > > > > > > >> > original code path if reload not needed >>>>> > > > > > > > >> > >>>>> > > > > > > > >> > I have some concerns around the ReloadableSSLContext >>>>> > > > > implementing >>>>> > > > > > > all >>>>> > > > > > > > >> > SSLContext >>>>> > > > > > > > >> > Do you really mean SSLContext (java SSLContext one) >>>>> or do >>>>> > > you >>>>> > > > > > refer >>>>> > > > > > > to >>>>> > > > > > > > >> the >>>>> > > > > > > > >> > SslContext from netty? >>>>> > > > > > > > >> > >>>>> > > > > > > > >> > If java SSLContext >>>>> > > > > > > > >> > - I'm not sure how this will manage reload from the >>>>> > > BlobServer >>>>> > > > > > > > >> > BlobServer relies on creation of an >>>>> SSLServerSocketFactory >>>>> > > > from >>>>> > > > > > the >>>>> > > > > > > > >> > SSLContext. >>>>> > > > > > > > >> > But from my current understanding the >>>>> > SSLServerSocketFactory >>>>> > > > > does >>>>> > > > > > > not >>>>> > > > > > > > >> have >>>>> > > > > > > > >> > any connection with the SSLContext. >>>>> > > > > > > > >> > It only has some with an SSLContextImpl extends >>>>> > > SSLContextSpi. >>>>> > > > > > > > >> > I think we will need a callback here to enforce >>>>> recreation >>>>> > > of >>>>> > > > > the >>>>> > > > > > > > >> > BlobServer socket. >>>>> > > > > > > > >> > - I'm also don't see how to attach this to the >>>>> SslContext >>>>> > > from >>>>> > > > > > netty >>>>> > > > > > > > >> > >>>>> > > > > > > > >> > If netty SslContext >>>>> > > > > > > > >> > - we would still need to have the callback to >>>>> recreate the >>>>> > > > > > > BlobServer >>>>> > > > > > > > >> > socket >>>>> > > > > > > > >> > - for netty and pekko we should be able to rely on >>>>> it >>>>> > > > > > > > >> > >>>>> > > > > > > > >> > Nicolas >>>>> > > > > > > > >> > >>>>> > > > > > > > >> > On Wed, May 7, 2025 at 12:40 PM Gabor Somogyi < >>>>> > > > > > > > >> gabor.g.somo...@gmail.com> >>>>> > > > > > > > >> > wrote: >>>>> > > > > > > > >> > >>>>> > > > > > > > >> > > Hi All, >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > I've read through the concerns/proposals related >>>>> the >>>>> > watch >>>>> > > > > > service >>>>> > > > > > > > and >>>>> > > > > > > > >> > here >>>>> > > > > > > > >> > > are my conclusions: >>>>> > > > > > > > >> > > * Watch service can watch local file systems >>>>> only: It's >>>>> > > fair >>>>> > > > > to >>>>> > > > > > > say >>>>> > > > > > > > >> that >>>>> > > > > > > > >> > > certificates must be copied to local FS in order >>>>> to work >>>>> > > > (init >>>>> > > > > > > > >> container >>>>> > > > > > > > >> > cp >>>>> > > > > > > > >> > > command or something) >>>>> > > > > > > > >> > > * Watch service can send multiple events even for >>>>> a >>>>> > single >>>>> > > > > > > directory >>>>> > > > > > > > >> > > change: this can be mitigated with a single >>>>> atomic dirty >>>>> > > > flag >>>>> > > > > > (see >>>>> > > > > > > > my >>>>> > > > > > > > >> > > design suggestion) >>>>> > > > > > > > >> > > * Polling file modification time: When I hear any >>>>> kind >>>>> > of >>>>> > > > > > polling >>>>> > > > > > > > >> > > implemented by us is just something I'm mostly >>>>> opposing >>>>> > > > > > > > >> > > * security.ssl.internal.keystore.reload.duration: >>>>> > Security >>>>> > > > > > > features >>>>> > > > > > > > >> must >>>>> > > > > > > > >> > be >>>>> > > > > > > > >> > > executed nearly immediately no matter what >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > As a general remark, not sure how many users are >>>>> using >>>>> > the >>>>> > > > > watch >>>>> > > > > > > > >> service >>>>> > > > > > > > >> > > based approach in the operator but until now I've >>>>> not >>>>> > seen >>>>> > > > any >>>>> > > > > > > issue >>>>> > > > > > > > >> > > related to that. >>>>> > > > > > > > >> > > If somebody is having some specifics then please >>>>> share. >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > For now I would shoot for keystore not to have >>>>> feature >>>>> > > > creep. >>>>> > > > > > When >>>>> > > > > > > > >> there >>>>> > > > > > > > >> > is >>>>> > > > > > > > >> > > a valid use-case, community interest and the >>>>> keystore >>>>> > > story >>>>> > > > is >>>>> > > > > > > > already >>>>> > > > > > > > >> > rock >>>>> > > > > > > > >> > > stable >>>>> > > > > > > > >> > > then we can consider to involve truststore later. >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > Having the mentioned assumption that the operator >>>>> > approach >>>>> > > > > > works, >>>>> > > > > > > > >> here is >>>>> > > > > > > > >> > > my high level proposal: >>>>> > > > > > > > >> > > * Let's have enable flag for all such watch >>>>> > functionality. >>>>> > > > If >>>>> > > > > > it's >>>>> > > > > > > > >> false >>>>> > > > > > > > >> > > then the currently existing functionality must >>>>> remain >>>>> > > as-is >>>>> > > > > > > > >> > > * Let's have a LocalFSWatchService which is a >>>>> singleton >>>>> > > > which >>>>> > > > > > has >>>>> > > > > > > no >>>>> > > > > > > > >> path >>>>> > > > > > > > >> > > registrations by default >>>>> > > > > > > > >> > > * Add path registration functionality which is >>>>> > > synchronised >>>>> > > > > > where >>>>> > > > > > > a >>>>> > > > > > > > >> > > callback can be registered >>>>> > > > > > > > >> > > * Let's have a ReloadableSSLContext which >>>>> implements the >>>>> > > > > > mentioned >>>>> > > > > > > > >> > callback >>>>> > > > > > > > >> > > * Inside the callback set an atomic dirty flag >>>>> only >>>>> > (this >>>>> > > > can >>>>> > > > > > > handle >>>>> > > > > > > > >> > > multiple events for the same directory change + >>>>> event >>>>> > > > handling >>>>> > > > > > in >>>>> > > > > > > > the >>>>> > > > > > > > >> > watch >>>>> > > > > > > > >> > > service must be extreme fast) >>>>> > > > > > > > >> > > * Inside ReloadableSSLContext all SSLContext >>>>> actions >>>>> > must >>>>> > > be >>>>> > > > > > > > >> overridden. >>>>> > > > > > > > >> > At >>>>> > > > > > > > >> > > the beginning of each function dirty flag must be >>>>> > checked >>>>> > > > > > > > >> > > and if dirty then certificates must be reloaded, >>>>> flag >>>>> > can >>>>> > > be >>>>> > > > > set >>>>> > > > > > > > back >>>>> > > > > > > > >> to >>>>> > > > > > > > >> > > false (then original functionality call). It's >>>>> extremely >>>>> > > > > > important >>>>> > > > > > > > >> that >>>>> > > > > > > > >> > > context reload must be synchronised. >>>>> > > > > > > > >> > > A synchronised boolean check + possible context >>>>> reload >>>>> > can >>>>> > > > > > consume >>>>> > > > > > > > >> some >>>>> > > > > > > > >> > > time but I wouldn't expect any significant >>>>> performance >>>>> > > drop. >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > My proposal main drivers: >>>>> > > > > > > > >> > > * Original code path must run when no watch >>>>> service >>>>> > asked >>>>> > > > > > > > >> > > * Super fast event handling because million >>>>> events may >>>>> > > come >>>>> > > > in >>>>> > > > > > > (not >>>>> > > > > > > > >> > > expecting but we should be prepared) >>>>> > > > > > > > >> > > * Clean separation between dir/file watch and >>>>> file/dir >>>>> > > usage >>>>> > > > > > > > >> > > * Well considered synchronisation model >>>>> > > > > > > > >> > > * Extensive unit testing because we're intended >>>>> to touch >>>>> > > the >>>>> > > > > > heart >>>>> > > > > > > > of >>>>> > > > > > > > >> > Flink >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > Happy to hear other opinions. >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > BR, >>>>> > > > > > > > >> > > G >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > On Mon, Apr 28, 2025 at 1:54 PM Nicolas Fraison >>>>> > > > > > > > >> > > <nicolas.frai...@datadoghq.com.invalid> wrote: >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > > > Thanks all for your feedback and sorry for the >>>>> late >>>>> > > answer >>>>> > > > > (I >>>>> > > > > > > was >>>>> > > > > > > > on >>>>> > > > > > > > >> > > > holiday). >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > 1. Indeed it would add 4 threads. >>>>> > > > > > > > >> > > > For the non pekko component we can indeed have >>>>> one >>>>> > > watcher >>>>> > > > > > > service >>>>> > > > > > > > >> used >>>>> > > > > > > > >> > > to >>>>> > > > > > > > >> > > > reload SSLContext for those components >>>>> > > > > > > > >> > > > For pekko this is a little more challenging as >>>>> the >>>>> > > > creation >>>>> > > > > of >>>>> > > > > > > the >>>>> > > > > > > > >> > pekko >>>>> > > > > > > > >> > > > ssl engine is managed by pekko himself. >>>>> > > > > > > > >> > > > Flink only generates appropriate config with >>>>> class to >>>>> > > > > execute >>>>> > > > > > to >>>>> > > > > > > > >> > initiate >>>>> > > > > > > > >> > > > the pekko ssl engine [1]. This means that I >>>>> will not >>>>> > be >>>>> > > > able >>>>> > > > > > to >>>>> > > > > > > > >> provide >>>>> > > > > > > > >> > > the >>>>> > > > > > > > >> > > > watcher service to this ssl engine. >>>>> > > > > > > > >> > > > One solution would be to rely on a singleton >>>>> instead >>>>> > of >>>>> > > a >>>>> > > > > > > service >>>>> > > > > > > > >> > > injected >>>>> > > > > > > > >> > > > in each component but I'm not sure this is fine >>>>> to use >>>>> > > > such >>>>> > > > > in >>>>> > > > > > > > >> flink. >>>>> > > > > > > > >> > > > WDYT? >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > We can also add a specific flink configuration >>>>> > > > > > > > >> > > > (security.ssl.internal.keystore.reload.enable) >>>>> to only >>>>> > > add >>>>> > > > > > this >>>>> > > > > > > > >> watcher >>>>> > > > > > > > >> > > > mechanism if the config is enabled to avoid >>>>> adding >>>>> > those >>>>> > > > > > threads >>>>> > > > > > > > if >>>>> > > > > > > > >> > this >>>>> > > > > > > > >> > > is >>>>> > > > > > > > >> > > > not needed. >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > 2. I'm fine with the LocalFSWatchService naming. >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > 3. Also agree that some e2e tests must be added. >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > @doguscan namal, Thanks for challenging the >>>>> proposal. >>>>> > > > > > > > >> > > > FYI, we are planning to rely on certificates >>>>> with >>>>> > really >>>>> > > > > short >>>>> > > > > > > > >> > validity. >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > D1. It seems that the proposal to rely on a >>>>> reload >>>>> > > period >>>>> > > > > will >>>>> > > > > > > > still >>>>> > > > > > > > >> > face >>>>> > > > > > > > >> > > > potential issues: >>>>> > > > > > > > >> > > > - receiving events while file content updates >>>>> are >>>>> > still >>>>> > > in >>>>> > > > > > > > progress: >>>>> > > > > > > > >> > > > There is no guarantee that we will not load the >>>>> > > > certificate >>>>> > > > > > > while >>>>> > > > > > > > >> file >>>>> > > > > > > > >> > > > content updates are still in progress >>>>> > > > > > > > >> > > > - while it will not be affected by multiple >>>>> > > notifications, >>>>> > > > > we >>>>> > > > > > > can >>>>> > > > > > > > >> > reach a >>>>> > > > > > > > >> > > > point where only the truststore is updated when >>>>> the >>>>> > > reload >>>>> > > > > > > > happens. >>>>> > > > > > > > >> > > > Which means that if the keystore is updated just >>>>> > after, >>>>> > > it >>>>> > > > > > will >>>>> > > > > > > > not >>>>> > > > > > > > >> be >>>>> > > > > > > > >> > > > taken in account before next run of the reload >>>>> > mechanism >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > I think that with WatchService and appropriate >>>>> reload >>>>> > > > grace >>>>> > > > > > > period >>>>> > > > > > > > >> > > > mechanism we should be able to mitigate those 2 >>>>> issues >>>>> > > > > > (ensuring >>>>> > > > > > > > >> > minimum >>>>> > > > > > > > >> > > > reload even with multiple notify) >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > From KIP-1119 [2] it looks like the same kind of >>>>> > > > > requirements >>>>> > > > > > is >>>>> > > > > > > > >> under >>>>> > > > > > > > >> > > > discussion for Kafka to also rely on the >>>>> WatchService >>>>> > > Java >>>>> > > > > API >>>>> > > > > > > > >> > > (SpringBoot >>>>> > > > > > > > >> > > > seems to also rely on this API to manage ssl >>>>> reload >>>>> > > [3]). >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > D3. Do we have a real use case for this? >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > [1] >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > >>>>> > > > > > > > >> >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > >>>>> https://github.com/apache/flink/blob/b523264ab45d37cd9584a0e8c06f1ef6bd1aaed7/flink-rpc/flink-rpc-akka/src/main/java/org/apache/flink/runtime/rpc/pekko/PekkoUtils.java#L372 >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > [2] >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > >>>>> > > > > > > > >> >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1119%3A+Add+support+for+SSL+auto+reload >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > [3] >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > >>>>> > > > > > > > >> >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > >>>>> https://docs.spring.io/spring-boot/reference/features/ssl.html#features.ssl.reloading >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > Regards, >>>>> > > > > > > > >> > > > Nicolas >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > On Wed, Apr 23, 2025 at 12:14 PM Doğuşcan Namal >>>>> < >>>>> > > > > > > > >> > > namal.dogus...@gmail.com> >>>>> > > > > > > > >> > > > wrote: >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > > Hi Nicolas, thanks for the FLIP. >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > I am fully supportive of the motivation and we >>>>> > should >>>>> > > be >>>>> > > > > > > > >> supporting >>>>> > > > > > > > >> > > this >>>>> > > > > > > > >> > > > > feature. Here are couple of comments from my >>>>> side: >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > D1) >>>>> > > > > > > > >> > > > > Since you shared the implementation details >>>>> on the >>>>> > > FLIP >>>>> > > > as >>>>> > > > > > > > well, I >>>>> > > > > > > > >> > > would >>>>> > > > > > > > >> > > > > like to discuss whether using Java's >>>>> WatchService is >>>>> > > the >>>>> > > > > > best >>>>> > > > > > > > >> choice >>>>> > > > > > > > >> > > > here. >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > I believe that the certificate renewal is not >>>>> a >>>>> > > frequent >>>>> > > > > > > > >> operation. >>>>> > > > > > > > >> > > Even >>>>> > > > > > > > >> > > > > once a day certificate renewals are not >>>>> > > realistic(except >>>>> > > > > for >>>>> > > > > > > > test >>>>> > > > > > > > >> > cases >>>>> > > > > > > > >> > > > > maybe) but let's assume that this covers up >>>>> the >>>>> > p99.99 >>>>> > > > of >>>>> > > > > > the >>>>> > > > > > > > use >>>>> > > > > > > > >> > > cases. >>>>> > > > > > > > >> > > > I >>>>> > > > > > > > >> > > > > am confident on this estimation since there >>>>> hasn't >>>>> > > been >>>>> > > > a >>>>> > > > > > > > request >>>>> > > > > > > > >> > from >>>>> > > > > > > > >> > > > the >>>>> > > > > > > > >> > > > > community for this feature so far, which >>>>> confirms >>>>> > that >>>>> > > > > > people >>>>> > > > > > > > were >>>>> > > > > > > > >> > okay >>>>> > > > > > > > >> > > > > with infrequent cluster restarts. Following >>>>> that it >>>>> > is >>>>> > > > > > > > >> infrequent, I >>>>> > > > > > > > >> > > > > believe that spawning up a thread that >>>>> watches the >>>>> > > file >>>>> > > > > > > > >> modification >>>>> > > > > > > > >> > > > > operations is not the best use of the limited >>>>> > > resources >>>>> > > > > on a >>>>> > > > > > > > >> cluster. >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > There are some known limitations of the >>>>> WatchService >>>>> > > as >>>>> > > > > well >>>>> > > > > > > > such >>>>> > > > > > > > >> as >>>>> > > > > > > > >> > > > > receiving multiple modification events for >>>>> the same >>>>> > > > > > occurence >>>>> > > > > > > > [1], >>>>> > > > > > > > >> > > > > inotify's (WatchService's underlying >>>>> mechanism in >>>>> > > Linux >>>>> > > > > > > > >> environments) >>>>> > > > > > > > >> > > > > problems on containerized environments due to >>>>> remote >>>>> > > > file >>>>> > > > > > > > systems >>>>> > > > > > > > >> [2] >>>>> > > > > > > > >> > > or >>>>> > > > > > > > >> > > > > receiving events while file content updates >>>>> are >>>>> > still >>>>> > > in >>>>> > > > > > > > progress. >>>>> > > > > > > > >> > > [3]. I >>>>> > > > > > > > >> > > > > do not know if these limitations are >>>>> addressed in >>>>> > the >>>>> > > > > newer >>>>> > > > > > > > >> versions >>>>> > > > > > > > >> > > but >>>>> > > > > > > > >> > > > > regardless of that it is clear that we may >>>>> face with >>>>> > > > some >>>>> > > > > > ugly >>>>> > > > > > > > >> edge >>>>> > > > > > > > >> > > cases >>>>> > > > > > > > >> > > > > due to that. >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > Given these complications, I would recommend >>>>> just >>>>> > > > > creating a >>>>> > > > > > > new >>>>> > > > > > > > >> > > > SSLContext >>>>> > > > > > > > >> > > > > after a configured duration is expired. We >>>>> could >>>>> > > record >>>>> > > > > the >>>>> > > > > > > > >> timestamp >>>>> > > > > > > > >> > > > when >>>>> > > > > > > > >> > > > > the previous SSLContext is created and update >>>>> it >>>>> > > after a >>>>> > > > > > > > >> configured >>>>> > > > > > > > >> > > > > duration is passed. This will be much easier >>>>> to test >>>>> > > and >>>>> > > > > > > reason >>>>> > > > > > > > >> about >>>>> > > > > > > > >> > > > when >>>>> > > > > > > > >> > > > > it is running on production. This will >>>>> eliminate the >>>>> > > > > > necessity >>>>> > > > > > > > to >>>>> > > > > > > > >> > > reason >>>>> > > > > > > > >> > > > > about the file modification operations as >>>>> well. >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > I briefly skimmed through the classes that >>>>> need to >>>>> > be >>>>> > > > > > modified >>>>> > > > > > > > >> and it >>>>> > > > > > > > >> > > > > looked feasible for me. Let me know what are >>>>> your >>>>> > > > comments >>>>> > > > > > on >>>>> > > > > > > > >> these. >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > Note that this is already used in the Kafka >>>>> world >>>>> > > where >>>>> > > > a >>>>> > > > > > new >>>>> > > > > > > > >> > > SSLContext >>>>> > > > > > > > >> > > > is >>>>> > > > > > > > >> > > > > created after 12 hours. >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > D2) We could provide a configuration to the >>>>> user, >>>>> > such >>>>> > > > as >>>>> > > > > > > > >> > > > > >>>>> "security.ssl.internal.keystore.reload.duration" so >>>>> > > they >>>>> > > > > > could >>>>> > > > > > > > >> decide >>>>> > > > > > > > >> > > how >>>>> > > > > > > > >> > > > > often the new certificates should be loaded. >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > D3) >>>>> > > > > > > > >> > > > > On the other hand, I wonder whether we should >>>>> also >>>>> > > > handle >>>>> > > > > > > > >> supporting >>>>> > > > > > > > >> > > > > updating the file paths of the truststores and >>>>> > > keystores >>>>> > > > > > under >>>>> > > > > > > > >> this >>>>> > > > > > > > >> > > FLIP >>>>> > > > > > > > >> > > > as >>>>> > > > > > > > >> > > > > well. Since the name of the FLIP is "Handle >>>>> TLS >>>>> > > > > Certificate >>>>> > > > > > > > >> Renewal" >>>>> > > > > > > > >> > I >>>>> > > > > > > > >> > > > > think we could bring that into scope too :) >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > Thanks >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > [1] >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > >>>>> > > > > > > > >> >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > >>>>> https://stackoverflow.com/questions/16777869/java-7-watchservice-ignoring-multiple-occurrences-of-the-same-event >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > [2] >>>>> > > > > > > > >>>>> https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/ >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > [3] >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> >>>>> > > > > > >>>>> > > >>>>> https://surajatreyac.github.io/2014-07-29/reactive_file_handling.html >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > [4] See also Platform Dependencies - >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > >>>>> > > > > > > > >> >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > >>>>> https://docs.oracle.com/javase/8/docs/api/index.html?java/nio/file/WatchService.html >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > On Fri, 18 Apr 2025 at 18:25, Gabor Somogyi < >>>>> > > > > > > > >> > gabor.g.somo...@gmail.com >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > > > wrote: >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > > > Hi Robert, >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > Since I've added the same feature to the >>>>> operator >>>>> > > I'll >>>>> > > > > > take >>>>> > > > > > > a >>>>> > > > > > > > >> look >>>>> > > > > > > > >> > at >>>>> > > > > > > > >> > > > it. >>>>> > > > > > > > >> > > > > > Though it won't be lightning fast since I'm >>>>> having >>>>> > > > > several >>>>> > > > > > > > weeks >>>>> > > > > > > > >> > off. >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > Your questions are valid especially >>>>> considering >>>>> > the >>>>> > > > fact >>>>> > > > > > > that >>>>> > > > > > > > >> this >>>>> > > > > > > > >> > > > > feature >>>>> > > > > > > > >> > > > > > touches the hearth of the authentication so >>>>> this >>>>> > > must >>>>> > > > be >>>>> > > > > > > rock >>>>> > > > > > > > >> solid >>>>> > > > > > > > >> > > > > > in order to avoid grey hair :) >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > (1) I would vote on a single service which >>>>> is >>>>> > > heavily >>>>> > > > > unit >>>>> > > > > > > > >> tested >>>>> > > > > > > > >> > > with >>>>> > > > > > > > >> > > > > > all the possible combinations including >>>>> threading. >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > Some standalone app could be added to >>>>> really play >>>>> > > with >>>>> > > > > it >>>>> > > > > > > > (that >>>>> > > > > > > > >> > would >>>>> > > > > > > > >> > > > > help >>>>> > > > > > > > >> > > > > > review). >>>>> > > > > > > > >> > > > > > I mean, create X files, start Y threads, >>>>> and make >>>>> > > > > > > assertions. >>>>> > > > > > > > >> > > > > > The reason why I'm suggesting it is the >>>>> fact that >>>>> > > > AFAIR >>>>> > > > > > the >>>>> > > > > > > > >> watch >>>>> > > > > > > > >> > > > service >>>>> > > > > > > > >> > > > > > is quite sensitive even in single thread. >>>>> If we >>>>> > > could >>>>> > > > do >>>>> > > > > > > this >>>>> > > > > > > > >> in a >>>>> > > > > > > > >> > > > finite >>>>> > > > > > > > >> > > > > > time >>>>> > > > > > > > >> > > > > > consuming unit test then it's even better. >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > (2) +1 on that name to avoid confusion >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > (3) I agree that some e2e is must, however >>>>> this >>>>> > can >>>>> > > be >>>>> > > > > > > easily >>>>> > > > > > > > >> and >>>>> > > > > > > > >> > > > deeply >>>>> > > > > > > > >> > > > > > unit >>>>> > > > > > > > >> > > > > > tested so that part is also essential. One >>>>> key >>>>> > test >>>>> > > > here >>>>> > > > > > is >>>>> > > > > > > > when >>>>> > > > > > > > >> > > > > > certificates >>>>> > > > > > > > >> > > > > > are not changing then no action must be >>>>> performed >>>>> > > (not >>>>> > > > > to >>>>> > > > > > > > break >>>>> > > > > > > > >> the >>>>> > > > > > > > >> > > > whole >>>>> > > > > > > > >> > > > > > system apart). >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > Purely personal opinion but such feature >>>>> > > developments >>>>> > > > > are >>>>> > > > > > > slow >>>>> > > > > > > > >> by >>>>> > > > > > > > >> > > > nature >>>>> > > > > > > > >> > > > > > because >>>>> > > > > > > > >> > > > > > of edge case / stress testing. >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > BR, >>>>> > > > > > > > >> > > > > > G >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > On Fri, Apr 18, 2025 at 4:53 PM Robert >>>>> Metzger < >>>>> > > > > > > > >> > rmetz...@apache.org> >>>>> > > > > > > > >> > > > > > wrote: >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > > > Hi Nicolas, >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > > This looks like a nice improvement, >>>>> thanks for >>>>> > the >>>>> > > > > write >>>>> > > > > > > up. >>>>> > > > > > > > >> > > > > > > Are you in touch with any committer who's >>>>> > willing >>>>> > > to >>>>> > > > > > > review >>>>> > > > > > > > / >>>>> > > > > > > > >> > merge >>>>> > > > > > > > >> > > > > this? >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > > Some random questions on the FLIP: >>>>> > > > > > > > >> > > > > > > (1) "Each service that depends on TLS >>>>> > > certificates >>>>> > > > > will >>>>> > > > > > > > >> > > initialize a >>>>> > > > > > > > >> > > > > > > FileSytemWatchService" >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > > It seems that there are 4 components >>>>> using SSL, >>>>> > > does >>>>> > > > > > this >>>>> > > > > > > > mean >>>>> > > > > > > > >> > > there >>>>> > > > > > > > >> > > > > will >>>>> > > > > > > > >> > > > > > > be 4 additional threads running, watching >>>>> the >>>>> > same >>>>> > > > set >>>>> > > > > > of >>>>> > > > > > > > >> files? >>>>> > > > > > > > >> > > > > > > Wouldn't it be better to introduce a >>>>> central >>>>> > file >>>>> > > > > > watching >>>>> > > > > > > > >> > service, >>>>> > > > > > > > >> > > > and >>>>> > > > > > > > >> > > > > > SSL >>>>> > > > > > > > >> > > > > > > users can subscribe to updates, to reduce >>>>> the >>>>> > > number >>>>> > > > > of >>>>> > > > > > > > >> threads? >>>>> > > > > > > > >> > > > > > > If this makes the whole effort 4x more >>>>> > > complicated, >>>>> > > > I >>>>> > > > > > > > wouldn't >>>>> > > > > > > > >> > > > consider >>>>> > > > > > > > >> > > > > > it, >>>>> > > > > > > > >> > > > > > > but if its roughly the same effort, we >>>>> should :) >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > > (2) "FileSytemWatchService" >>>>> > > > > > > > >> > > > > > > When I read this name, I was wondering, >>>>> whether >>>>> > > this >>>>> > > > > is >>>>> > > > > > > > >> somehow >>>>> > > > > > > > >> > > > related >>>>> > > > > > > > >> > > > > > to >>>>> > > > > > > > >> > > > > > > the Flink "FileSystem" classes. Which I >>>>> think >>>>> > its' >>>>> > > > > not. >>>>> > > > > > > > >> > > > > > > Maybe a different name, that makes this >>>>> > separation >>>>> > > > > more >>>>> > > > > > > > >> explicit, >>>>> > > > > > > > >> > > > would >>>>> > > > > > > > >> > > > > > > make sense. Maybe "LocalFSWatchService"? >>>>> > > > > > > > >> > > > > > > (I'm sorry to bring up naming stuff -- >>>>> its very >>>>> > > > > > > subjective, >>>>> > > > > > > > >> and >>>>> > > > > > > > >> > > > > > difficult) >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > > (3) For the test plan: There seem to be >>>>> some SSL >>>>> > > > > related >>>>> > > > > > > e2e >>>>> > > > > > > > >> > tests: >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > >>>>> > > > > > > > >> >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > >>>>> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_ssl.sh >>>>> > > > > > > > >> > > > > > > It would be nice to extend them to cover >>>>> this >>>>> > > > feature >>>>> > > > > as >>>>> > > > > > > > >> well. I >>>>> > > > > > > > >> > > > would >>>>> > > > > > > > >> > > > > > hate >>>>> > > > > > > > >> > > > > > > for this feature to slowly break by future >>>>> > > changes, >>>>> > > > so >>>>> > > > > > > good >>>>> > > > > > > > >> e2e >>>>> > > > > > > > >> > > test >>>>> > > > > > > > >> > > > > > > coverage is key, in particular bc so many >>>>> > > components >>>>> > > > > are >>>>> > > > > > > > >> > involved. >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > > Best, >>>>> > > > > > > > >> > > > > > > Robert >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > > On Wed, Apr 16, 2025 at 11:55 AM Nicolas >>>>> Fraison >>>>> > > > > > > > >> > > > > > > <nicolas.frai...@datadoghq.com.invalid> >>>>> wrote: >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > > > Hi All, >>>>> > > > > > > > >> > > > > > > > >>>>> > > > > > > > >> > > > > > > > I'd like to start a discussion to >>>>> Handle TLS >>>>> > > > > > Certificate >>>>> > > > > > > > >> > Renewal >>>>> > > > > > > > >> > > > > > > > Please provide some feedback on this >>>>> proposal: >>>>> > > > > > > > >> > > > > > > > >>>>> > > > > > > > >> > > > > > > > >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > >>>>> > > > > > > > >> >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-523%3A+Handle+TLS+Certificate+Renewal >>>>> > > > > > > > >> > > > > > > > >>>>> > > > > > > > >> > > > > > > > Regards, >>>>> > > > > > > > >> > > > > > > > >>>>> > > > > > > > >> > > > > > > > Nicolas Fraison >>>>> > > > > > > > >> > > > > > > > >>>>> > > > > > > > >> > > > > > > >>>>> > > > > > > > >> > > > > > >>>>> > > > > > > > >> > > > > >>>>> > > > > > > > >> > > > >>>>> > > > > > > > >> > > >>>>> > > > > > > > >> > >>>>> > > > > > > > >> >>>>> > > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > >>>>> >>>>