I think in this SSLContextLoader class call() function is not correct and must be onFileOrDirectoryModified(...), right?
[image: Screenshot 2025-06-19 at 9.35.14.png] On Thu, Jun 19, 2025 at 8:14 AM Nicolas Fraison < nicolas.frai...@datadoghq.com> wrote: > Hi, > > Which part of the doc are you referring to? > > On Wed, Jun 18, 2025 at 7:08 PM Gabor Somogyi <gabor.g.somo...@gmail.com> > wrote: > >> There is still a call function there which is left there from the previous >> design... >> >> On Tue, Jun 17, 2025 at 9:48 AM Nicolas Fraison >> <nicolas.frai...@datadoghq.com.invalid> wrote: >> >> > Done >> > >> > On Mon, Jun 16, 2025 at 2:46 PM Gabor Somogyi < >> gabor.g.somo...@gmail.com> >> > wrote: >> > >> > > I still see 3 Callable occurrence in the FLIP... >> > > >> > > On Thu, Jun 12, 2025 at 2:16 PM Nicolas Fraison >> > > <nicolas.frai...@datadoghq.com.invalid> wrote: >> > > >> > > > Hi, FLIP has been updated with last comment >> > > > >> > > > On Thu, Jun 5, 2025 at 6:35 PM Gabor Somogyi < >> > gabor.g.somo...@gmail.com> >> > > > wrote: >> > > > >> > > > > The interface has been added. Do we intend to pass an >> > > > > instance registerWatchedPath function as we discussed? >> > > > > If yes then the FLIP needs further adjustments all places where >> now >> > > > > Callable provided. >> > > > > >> > > > > G >> > > > > >> > > > > >> > > > > On Thu, Jun 5, 2025 at 2:27 PM Nicolas Fraison >> > > > > <nicolas.frai...@datadoghq.com.invalid> wrote: >> > > > > >> > > > > > Hi, >> > > > > > >> > > > > > It indeed make sense, FLIP has been updated >> > > > > > >> > > > > > Nicolas >> > > > > > >> > > > > > On Wed, Jun 4, 2025 at 12:10 PM Gabor Somogyi < >> > > > gabor.g.somo...@gmail.com >> > > > > > >> > > > > > wrote: >> > > > > > >> > > > > > > Hi, >> > > > > > > >> > > > > > > I've read it through. Basically looks good with one comment. >> > > > > > > *registerWatchedPath function* has a *Callable* as callback. >> For >> > > the >> > > > > > > current implementation that would be enough, >> > > > > > > but when somebody would like to use that for any other >> use-case >> > > then >> > > > it >> > > > > > > would be hard. Examples: >> > > > > > > * A single *call* function tells the user nothing what kind of >> > > event >> > > > is >> > > > > > > this >> > > > > > > * the watch service supports 3 events (create, modify, >> delete), >> > now >> > > > > when >> > > > > > I >> > > > > > > register I can get only updates (I presume) >> > > > > > > * 1+ dir watch with the same callback is not possible >> > > > > > > >> > > > > > > My suggestion would be to create a proper callback with all >> the >> > > event >> > > > > > type >> > > > > > > functions and no-op default behavior with the following >> names[1]. >> > > > > > > This is ~30 lines addition but will increase the readability >> > > heavily >> > > > + >> > > > > no >> > > > > > > need to touch this code later. >> > > > > > > >> > > > > > > BR, >> > > > > > > G >> > > > > > > >> > > > > > > [1] >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/apache/flink-kubernetes-operator/commit/e5a325c48965a50d61d0aa29e61ba79e97f27082#diff-a30b3ed9b8c53e998b15d7da7ad2e54374c98ffc3c920f76a70bce3fb37a9b2eR87-R93 >> > > > > > > >> > > > > > > >> > > > > > > On Tue, Jun 3, 2025 at 8:53 AM Nicolas Fraison >> > > > > > > <nicolas.frai...@datadoghq.com.invalid> wrote: >> > > > > > > >> > > > > > > > Hi, >> > > > > > > > >> > > > > > > > The FLIP has been updated. >> > > > > > > > Let me know if you have some other comments. >> > > > > > > > >> > > > > > > > Nicolas >> > > > > > > > >> > > > > > > > On Tue, May 20, 2025 at 12:01 PM Nicolas Fraison < >> > > > > > > > nicolas.frai...@datadoghq.com> wrote: >> > > > > > > > >> > > > > > > > > Hi Gabor, >> > > > > > > > > >> > > > > > > > > Thanks for the feedback. >> > > > > > > > > The overall proposal with atomic dirty flag being set by >> the >> > > > > callback >> > > > > > > > will >> > > > > > > > > indeed work with any kind of implementation. >> > > > > > > > > >> > > > > > > > > Will see to update the FLIP in a week or 2 if there are >> no >> > > other >> > > > > > > > comments >> > > > > > > > > on it >> > > > > > > > > >> > > > > > > > > Nicolas >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > On Tue, May 20, 2025 at 11:08 AM Gabor Somogyi < >> > > > > > > > gabor.g.somo...@gmail.com> >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > >> Hi Nicolas, >> > > > > > > > >> >> > > > > > > > >> Related SSLContext I've not gone through all the cases >> where >> > > we >> > > > > need >> > > > > > > to >> > > > > > > > >> reload so instead I'm sharing the concept. >> > > > > > > > >> The main intention is that we must control all execution >> > paths >> > > > > which >> > > > > > > > >> decide >> > > > > > > > >> which certificate used for authentication. >> > > > > > > > >> Creating an SSLContext decorator which checks reload >> first >> > and >> > > > > then >> > > > > > > > >> forwards all calls to the original (wrapped) context >> > > > > > > > >> is one way to achieve that. If there are different >> > > > implementations >> > > > > > > which >> > > > > > > > >> end up in similar behavior then it's fine. >> > > > > > > > >> >> > > > > > > > >> BR, >> > > > > > > > >> G >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> On Tue, May 20, 2025 at 10:15 AM Nicolas Fraison >> > > > > > > > >> <nicolas.frai...@datadoghq.com.invalid> wrote: >> > > > > > > > >> >> > > > > > > > >> > Hi, >> > > > > > > > >> > >> > > > > > > > >> > Overall your proposal looks great. >> > > > > > > > >> > The event handling must indeed be super fast and we >> must >> > > also >> > > > > not >> > > > > > > > change >> > > > > > > > >> > original code path if reload not needed >> > > > > > > > >> > >> > > > > > > > >> > I have some concerns around the ReloadableSSLContext >> > > > > implementing >> > > > > > > all >> > > > > > > > >> > SSLContext >> > > > > > > > >> > Do you really mean SSLContext (java SSLContext one) or >> do >> > > you >> > > > > > refer >> > > > > > > to >> > > > > > > > >> the >> > > > > > > > >> > SslContext from netty? >> > > > > > > > >> > >> > > > > > > > >> > If java SSLContext >> > > > > > > > >> > - I'm not sure how this will manage reload from the >> > > BlobServer >> > > > > > > > >> > BlobServer relies on creation of an >> SSLServerSocketFactory >> > > > from >> > > > > > the >> > > > > > > > >> > SSLContext. >> > > > > > > > >> > But from my current understanding the >> > SSLServerSocketFactory >> > > > > does >> > > > > > > not >> > > > > > > > >> have >> > > > > > > > >> > any connection with the SSLContext. >> > > > > > > > >> > It only has some with an SSLContextImpl extends >> > > SSLContextSpi. >> > > > > > > > >> > I think we will need a callback here to enforce >> recreation >> > > of >> > > > > the >> > > > > > > > >> > BlobServer socket. >> > > > > > > > >> > - I'm also don't see how to attach this to the >> SslContext >> > > from >> > > > > > netty >> > > > > > > > >> > >> > > > > > > > >> > If netty SslContext >> > > > > > > > >> > - we would still need to have the callback to recreate >> the >> > > > > > > BlobServer >> > > > > > > > >> > socket >> > > > > > > > >> > - for netty and pekko we should be able to rely on it >> > > > > > > > >> > >> > > > > > > > >> > Nicolas >> > > > > > > > >> > >> > > > > > > > >> > On Wed, May 7, 2025 at 12:40 PM Gabor Somogyi < >> > > > > > > > >> gabor.g.somo...@gmail.com> >> > > > > > > > >> > wrote: >> > > > > > > > >> > >> > > > > > > > >> > > Hi All, >> > > > > > > > >> > > >> > > > > > > > >> > > I've read through the concerns/proposals related the >> > watch >> > > > > > service >> > > > > > > > and >> > > > > > > > >> > here >> > > > > > > > >> > > are my conclusions: >> > > > > > > > >> > > * Watch service can watch local file systems only: >> It's >> > > fair >> > > > > to >> > > > > > > say >> > > > > > > > >> that >> > > > > > > > >> > > certificates must be copied to local FS in order to >> work >> > > > (init >> > > > > > > > >> container >> > > > > > > > >> > cp >> > > > > > > > >> > > command or something) >> > > > > > > > >> > > * Watch service can send multiple events even for a >> > single >> > > > > > > directory >> > > > > > > > >> > > change: this can be mitigated with a single atomic >> dirty >> > > > flag >> > > > > > (see >> > > > > > > > my >> > > > > > > > >> > > design suggestion) >> > > > > > > > >> > > * Polling file modification time: When I hear any >> kind >> > of >> > > > > > polling >> > > > > > > > >> > > implemented by us is just something I'm mostly >> opposing >> > > > > > > > >> > > * security.ssl.internal.keystore.reload.duration: >> > Security >> > > > > > > features >> > > > > > > > >> must >> > > > > > > > >> > be >> > > > > > > > >> > > executed nearly immediately no matter what >> > > > > > > > >> > > >> > > > > > > > >> > > As a general remark, not sure how many users are >> using >> > the >> > > > > watch >> > > > > > > > >> service >> > > > > > > > >> > > based approach in the operator but until now I've not >> > seen >> > > > any >> > > > > > > issue >> > > > > > > > >> > > related to that. >> > > > > > > > >> > > If somebody is having some specifics then please >> share. >> > > > > > > > >> > > >> > > > > > > > >> > > For now I would shoot for keystore not to have >> feature >> > > > creep. >> > > > > > When >> > > > > > > > >> there >> > > > > > > > >> > is >> > > > > > > > >> > > a valid use-case, community interest and the keystore >> > > story >> > > > is >> > > > > > > > already >> > > > > > > > >> > rock >> > > > > > > > >> > > stable >> > > > > > > > >> > > then we can consider to involve truststore later. >> > > > > > > > >> > > >> > > > > > > > >> > > Having the mentioned assumption that the operator >> > approach >> > > > > > works, >> > > > > > > > >> here is >> > > > > > > > >> > > my high level proposal: >> > > > > > > > >> > > * Let's have enable flag for all such watch >> > functionality. >> > > > If >> > > > > > it's >> > > > > > > > >> false >> > > > > > > > >> > > then the currently existing functionality must remain >> > > as-is >> > > > > > > > >> > > * Let's have a LocalFSWatchService which is a >> singleton >> > > > which >> > > > > > has >> > > > > > > no >> > > > > > > > >> path >> > > > > > > > >> > > registrations by default >> > > > > > > > >> > > * Add path registration functionality which is >> > > synchronised >> > > > > > where >> > > > > > > a >> > > > > > > > >> > > callback can be registered >> > > > > > > > >> > > * Let's have a ReloadableSSLContext which implements >> the >> > > > > > mentioned >> > > > > > > > >> > callback >> > > > > > > > >> > > * Inside the callback set an atomic dirty flag only >> > (this >> > > > can >> > > > > > > handle >> > > > > > > > >> > > multiple events for the same directory change + event >> > > > handling >> > > > > > in >> > > > > > > > the >> > > > > > > > >> > watch >> > > > > > > > >> > > service must be extreme fast) >> > > > > > > > >> > > * Inside ReloadableSSLContext all SSLContext actions >> > must >> > > be >> > > > > > > > >> overridden. >> > > > > > > > >> > At >> > > > > > > > >> > > the beginning of each function dirty flag must be >> > checked >> > > > > > > > >> > > and if dirty then certificates must be reloaded, flag >> > can >> > > be >> > > > > set >> > > > > > > > back >> > > > > > > > >> to >> > > > > > > > >> > > false (then original functionality call). It's >> extremely >> > > > > > important >> > > > > > > > >> that >> > > > > > > > >> > > context reload must be synchronised. >> > > > > > > > >> > > A synchronised boolean check + possible context >> reload >> > can >> > > > > > consume >> > > > > > > > >> some >> > > > > > > > >> > > time but I wouldn't expect any significant >> performance >> > > drop. >> > > > > > > > >> > > >> > > > > > > > >> > > My proposal main drivers: >> > > > > > > > >> > > * Original code path must run when no watch service >> > asked >> > > > > > > > >> > > * Super fast event handling because million events >> may >> > > come >> > > > in >> > > > > > > (not >> > > > > > > > >> > > expecting but we should be prepared) >> > > > > > > > >> > > * Clean separation between dir/file watch and >> file/dir >> > > usage >> > > > > > > > >> > > * Well considered synchronisation model >> > > > > > > > >> > > * Extensive unit testing because we're intended to >> touch >> > > the >> > > > > > heart >> > > > > > > > of >> > > > > > > > >> > Flink >> > > > > > > > >> > > >> > > > > > > > >> > > Happy to hear other opinions. >> > > > > > > > >> > > >> > > > > > > > >> > > BR, >> > > > > > > > >> > > G >> > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > > >> > > On Mon, Apr 28, 2025 at 1:54 PM Nicolas Fraison >> > > > > > > > >> > > <nicolas.frai...@datadoghq.com.invalid> wrote: >> > > > > > > > >> > > >> > > > > > > > >> > > > Thanks all for your feedback and sorry for the late >> > > answer >> > > > > (I >> > > > > > > was >> > > > > > > > on >> > > > > > > > >> > > > holiday). >> > > > > > > > >> > > > >> > > > > > > > >> > > > 1. Indeed it would add 4 threads. >> > > > > > > > >> > > > For the non pekko component we can indeed have one >> > > watcher >> > > > > > > service >> > > > > > > > >> used >> > > > > > > > >> > > to >> > > > > > > > >> > > > reload SSLContext for those components >> > > > > > > > >> > > > For pekko this is a little more challenging as the >> > > > creation >> > > > > of >> > > > > > > the >> > > > > > > > >> > pekko >> > > > > > > > >> > > > ssl engine is managed by pekko himself. >> > > > > > > > >> > > > Flink only generates appropriate config with class >> to >> > > > > execute >> > > > > > to >> > > > > > > > >> > initiate >> > > > > > > > >> > > > the pekko ssl engine [1]. This means that I will >> not >> > be >> > > > able >> > > > > > to >> > > > > > > > >> provide >> > > > > > > > >> > > the >> > > > > > > > >> > > > watcher service to this ssl engine. >> > > > > > > > >> > > > One solution would be to rely on a singleton >> instead >> > of >> > > a >> > > > > > > service >> > > > > > > > >> > > injected >> > > > > > > > >> > > > in each component but I'm not sure this is fine to >> use >> > > > such >> > > > > in >> > > > > > > > >> flink. >> > > > > > > > >> > > > WDYT? >> > > > > > > > >> > > > >> > > > > > > > >> > > > We can also add a specific flink configuration >> > > > > > > > >> > > > (security.ssl.internal.keystore.reload.enable) to >> only >> > > add >> > > > > > this >> > > > > > > > >> watcher >> > > > > > > > >> > > > mechanism if the config is enabled to avoid adding >> > those >> > > > > > threads >> > > > > > > > if >> > > > > > > > >> > this >> > > > > > > > >> > > is >> > > > > > > > >> > > > not needed. >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > 2. I'm fine with the LocalFSWatchService naming. >> > > > > > > > >> > > > >> > > > > > > > >> > > > 3. Also agree that some e2e tests must be added. >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > > @doguscan namal, Thanks for challenging the >> proposal. >> > > > > > > > >> > > > FYI, we are planning to rely on certificates with >> > really >> > > > > short >> > > > > > > > >> > validity. >> > > > > > > > >> > > > >> > > > > > > > >> > > > D1. It seems that the proposal to rely on a reload >> > > period >> > > > > will >> > > > > > > > still >> > > > > > > > >> > face >> > > > > > > > >> > > > potential issues: >> > > > > > > > >> > > > - receiving events while file content updates are >> > still >> > > in >> > > > > > > > progress: >> > > > > > > > >> > > > There is no guarantee that we will not load the >> > > > certificate >> > > > > > > while >> > > > > > > > >> file >> > > > > > > > >> > > > content updates are still in progress >> > > > > > > > >> > > > - while it will not be affected by multiple >> > > notifications, >> > > > > we >> > > > > > > can >> > > > > > > > >> > reach a >> > > > > > > > >> > > > point where only the truststore is updated when the >> > > reload >> > > > > > > > happens. >> > > > > > > > >> > > > Which means that if the keystore is updated just >> > after, >> > > it >> > > > > > will >> > > > > > > > not >> > > > > > > > >> be >> > > > > > > > >> > > > taken in account before next run of the reload >> > mechanism >> > > > > > > > >> > > > >> > > > > > > > >> > > > I think that with WatchService and appropriate >> reload >> > > > grace >> > > > > > > period >> > > > > > > > >> > > > mechanism we should be able to mitigate those 2 >> issues >> > > > > > (ensuring >> > > > > > > > >> > minimum >> > > > > > > > >> > > > reload even with multiple notify) >> > > > > > > > >> > > > >> > > > > > > > >> > > > From KIP-1119 [2] it looks like the same kind of >> > > > > requirements >> > > > > > is >> > > > > > > > >> under >> > > > > > > > >> > > > discussion for Kafka to also rely on the >> WatchService >> > > Java >> > > > > API >> > > > > > > > >> > > (SpringBoot >> > > > > > > > >> > > > seems to also rely on this API to manage ssl reload >> > > [3]). >> > > > > > > > >> > > > >> > > > > > > > >> > > > D3. Do we have a real use case for this? >> > > > > > > > >> > > > >> > > > > > > > >> > > > [1] >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/apache/flink/blob/b523264ab45d37cd9584a0e8c06f1ef6bd1aaed7/flink-rpc/flink-rpc-akka/src/main/java/org/apache/flink/runtime/rpc/pekko/PekkoUtils.java#L372 >> > > > > > > > >> > > > >> > > > > > > > >> > > > [2] >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1119%3A+Add+support+for+SSL+auto+reload >> > > > > > > > >> > > > >> > > > > > > > >> > > > [3] >> > > > > > > > >> > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://docs.spring.io/spring-boot/reference/features/ssl.html#features.ssl.reloading >> > > > > > > > >> > > > >> > > > > > > > >> > > > Regards, >> > > > > > > > >> > > > Nicolas >> > > > > > > > >> > > > >> > > > > > > > >> > > > On Wed, Apr 23, 2025 at 12:14 PM Doğuşcan Namal < >> > > > > > > > >> > > namal.dogus...@gmail.com> >> > > > > > > > >> > > > wrote: >> > > > > > > > >> > > > >> > > > > > > > >> > > > > Hi Nicolas, thanks for the FLIP. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > I am fully supportive of the motivation and we >> > should >> > > be >> > > > > > > > >> supporting >> > > > > > > > >> > > this >> > > > > > > > >> > > > > feature. Here are couple of comments from my >> side: >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > D1) >> > > > > > > > >> > > > > Since you shared the implementation details on >> the >> > > FLIP >> > > > as >> > > > > > > > well, I >> > > > > > > > >> > > would >> > > > > > > > >> > > > > like to discuss whether using Java's >> WatchService is >> > > the >> > > > > > best >> > > > > > > > >> choice >> > > > > > > > >> > > > here. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > I believe that the certificate renewal is not a >> > > frequent >> > > > > > > > >> operation. >> > > > > > > > >> > > Even >> > > > > > > > >> > > > > once a day certificate renewals are not >> > > realistic(except >> > > > > for >> > > > > > > > test >> > > > > > > > >> > cases >> > > > > > > > >> > > > > maybe) but let's assume that this covers up the >> > p99.99 >> > > > of >> > > > > > the >> > > > > > > > use >> > > > > > > > >> > > cases. >> > > > > > > > >> > > > I >> > > > > > > > >> > > > > am confident on this estimation since there >> hasn't >> > > been >> > > > a >> > > > > > > > request >> > > > > > > > >> > from >> > > > > > > > >> > > > the >> > > > > > > > >> > > > > community for this feature so far, which confirms >> > that >> > > > > > people >> > > > > > > > were >> > > > > > > > >> > okay >> > > > > > > > >> > > > > with infrequent cluster restarts. Following that >> it >> > is >> > > > > > > > >> infrequent, I >> > > > > > > > >> > > > > believe that spawning up a thread that watches >> the >> > > file >> > > > > > > > >> modification >> > > > > > > > >> > > > > operations is not the best use of the limited >> > > resources >> > > > > on a >> > > > > > > > >> cluster. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > There are some known limitations of the >> WatchService >> > > as >> > > > > well >> > > > > > > > such >> > > > > > > > >> as >> > > > > > > > >> > > > > receiving multiple modification events for the >> same >> > > > > > occurence >> > > > > > > > [1], >> > > > > > > > >> > > > > inotify's (WatchService's underlying mechanism in >> > > Linux >> > > > > > > > >> environments) >> > > > > > > > >> > > > > problems on containerized environments due to >> remote >> > > > file >> > > > > > > > systems >> > > > > > > > >> [2] >> > > > > > > > >> > > or >> > > > > > > > >> > > > > receiving events while file content updates are >> > still >> > > in >> > > > > > > > progress. >> > > > > > > > >> > > [3]. I >> > > > > > > > >> > > > > do not know if these limitations are addressed in >> > the >> > > > > newer >> > > > > > > > >> versions >> > > > > > > > >> > > but >> > > > > > > > >> > > > > regardless of that it is clear that we may face >> with >> > > > some >> > > > > > ugly >> > > > > > > > >> edge >> > > > > > > > >> > > cases >> > > > > > > > >> > > > > due to that. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > Given these complications, I would recommend just >> > > > > creating a >> > > > > > > new >> > > > > > > > >> > > > SSLContext >> > > > > > > > >> > > > > after a configured duration is expired. We could >> > > record >> > > > > the >> > > > > > > > >> timestamp >> > > > > > > > >> > > > when >> > > > > > > > >> > > > > the previous SSLContext is created and update it >> > > after a >> > > > > > > > >> configured >> > > > > > > > >> > > > > duration is passed. This will be much easier to >> test >> > > and >> > > > > > > reason >> > > > > > > > >> about >> > > > > > > > >> > > > when >> > > > > > > > >> > > > > it is running on production. This will eliminate >> the >> > > > > > necessity >> > > > > > > > to >> > > > > > > > >> > > reason >> > > > > > > > >> > > > > about the file modification operations as well. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > I briefly skimmed through the classes that need >> to >> > be >> > > > > > modified >> > > > > > > > >> and it >> > > > > > > > >> > > > > looked feasible for me. Let me know what are your >> > > > comments >> > > > > > on >> > > > > > > > >> these. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > Note that this is already used in the Kafka world >> > > where >> > > > a >> > > > > > new >> > > > > > > > >> > > SSLContext >> > > > > > > > >> > > > is >> > > > > > > > >> > > > > created after 12 hours. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > D2) We could provide a configuration to the user, >> > such >> > > > as >> > > > > > > > >> > > > > "security.ssl.internal.keystore.reload.duration" >> so >> > > they >> > > > > > could >> > > > > > > > >> decide >> > > > > > > > >> > > how >> > > > > > > > >> > > > > often the new certificates should be loaded. >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > D3) >> > > > > > > > >> > > > > On the other hand, I wonder whether we should >> also >> > > > handle >> > > > > > > > >> supporting >> > > > > > > > >> > > > > updating the file paths of the truststores and >> > > keystores >> > > > > > under >> > > > > > > > >> this >> > > > > > > > >> > > FLIP >> > > > > > > > >> > > > as >> > > > > > > > >> > > > > well. Since the name of the FLIP is "Handle TLS >> > > > > Certificate >> > > > > > > > >> Renewal" >> > > > > > > > >> > I >> > > > > > > > >> > > > > think we could bring that into scope too :) >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > Thanks >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > [1] >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://stackoverflow.com/questions/16777869/java-7-watchservice-ignoring-multiple-occurrences-of-the-same-event >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > [2] >> > > > > > > > >> https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/ >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > [3] >> > > > > > > > >> > > > >> > > > > > > > >> >> > > > > > >> > > https://surajatreyac.github.io/2014-07-29/reactive_file_handling.html >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > [4] See also Platform Dependencies - >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://docs.oracle.com/javase/8/docs/api/index.html?java/nio/file/WatchService.html >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > On Fri, 18 Apr 2025 at 18:25, Gabor Somogyi < >> > > > > > > > >> > gabor.g.somo...@gmail.com >> > > > > > > > >> > > > >> > > > > > > > >> > > > > wrote: >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > > Hi Robert, >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Since I've added the same feature to the >> operator >> > > I'll >> > > > > > take >> > > > > > > a >> > > > > > > > >> look >> > > > > > > > >> > at >> > > > > > > > >> > > > it. >> > > > > > > > >> > > > > > Though it won't be lightning fast since I'm >> having >> > > > > several >> > > > > > > > weeks >> > > > > > > > >> > off. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Your questions are valid especially considering >> > the >> > > > fact >> > > > > > > that >> > > > > > > > >> this >> > > > > > > > >> > > > > feature >> > > > > > > > >> > > > > > touches the hearth of the authentication so >> this >> > > must >> > > > be >> > > > > > > rock >> > > > > > > > >> solid >> > > > > > > > >> > > > > > in order to avoid grey hair :) >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > (1) I would vote on a single service which is >> > > heavily >> > > > > unit >> > > > > > > > >> tested >> > > > > > > > >> > > with >> > > > > > > > >> > > > > > all the possible combinations including >> threading. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Some standalone app could be added to really >> play >> > > with >> > > > > it >> > > > > > > > (that >> > > > > > > > >> > would >> > > > > > > > >> > > > > help >> > > > > > > > >> > > > > > review). >> > > > > > > > >> > > > > > I mean, create X files, start Y threads, and >> make >> > > > > > > assertions. >> > > > > > > > >> > > > > > The reason why I'm suggesting it is the fact >> that >> > > > AFAIR >> > > > > > the >> > > > > > > > >> watch >> > > > > > > > >> > > > service >> > > > > > > > >> > > > > > is quite sensitive even in single thread. If we >> > > could >> > > > do >> > > > > > > this >> > > > > > > > >> in a >> > > > > > > > >> > > > finite >> > > > > > > > >> > > > > > time >> > > > > > > > >> > > > > > consuming unit test then it's even better. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > (2) +1 on that name to avoid confusion >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > (3) I agree that some e2e is must, however this >> > can >> > > be >> > > > > > > easily >> > > > > > > > >> and >> > > > > > > > >> > > > deeply >> > > > > > > > >> > > > > > unit >> > > > > > > > >> > > > > > tested so that part is also essential. One key >> > test >> > > > here >> > > > > > is >> > > > > > > > when >> > > > > > > > >> > > > > > certificates >> > > > > > > > >> > > > > > are not changing then no action must be >> performed >> > > (not >> > > > > to >> > > > > > > > break >> > > > > > > > >> the >> > > > > > > > >> > > > whole >> > > > > > > > >> > > > > > system apart). >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > Purely personal opinion but such feature >> > > developments >> > > > > are >> > > > > > > slow >> > > > > > > > >> by >> > > > > > > > >> > > > nature >> > > > > > > > >> > > > > > because >> > > > > > > > >> > > > > > of edge case / stress testing. >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > BR, >> > > > > > > > >> > > > > > G >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > On Fri, Apr 18, 2025 at 4:53 PM Robert Metzger >> < >> > > > > > > > >> > rmetz...@apache.org> >> > > > > > > > >> > > > > > wrote: >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > > Hi Nicolas, >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > This looks like a nice improvement, thanks >> for >> > the >> > > > > write >> > > > > > > up. >> > > > > > > > >> > > > > > > Are you in touch with any committer who's >> > willing >> > > to >> > > > > > > review >> > > > > > > > / >> > > > > > > > >> > merge >> > > > > > > > >> > > > > this? >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > Some random questions on the FLIP: >> > > > > > > > >> > > > > > > (1) "Each service that depends on TLS >> > > certificates >> > > > > will >> > > > > > > > >> > > initialize a >> > > > > > > > >> > > > > > > FileSytemWatchService" >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > It seems that there are 4 components using >> SSL, >> > > does >> > > > > > this >> > > > > > > > mean >> > > > > > > > >> > > there >> > > > > > > > >> > > > > will >> > > > > > > > >> > > > > > > be 4 additional threads running, watching the >> > same >> > > > set >> > > > > > of >> > > > > > > > >> files? >> > > > > > > > >> > > > > > > Wouldn't it be better to introduce a central >> > file >> > > > > > watching >> > > > > > > > >> > service, >> > > > > > > > >> > > > and >> > > > > > > > >> > > > > > SSL >> > > > > > > > >> > > > > > > users can subscribe to updates, to reduce the >> > > number >> > > > > of >> > > > > > > > >> threads? >> > > > > > > > >> > > > > > > If this makes the whole effort 4x more >> > > complicated, >> > > > I >> > > > > > > > wouldn't >> > > > > > > > >> > > > consider >> > > > > > > > >> > > > > > it, >> > > > > > > > >> > > > > > > but if its roughly the same effort, we >> should :) >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > (2) "FileSytemWatchService" >> > > > > > > > >> > > > > > > When I read this name, I was wondering, >> whether >> > > this >> > > > > is >> > > > > > > > >> somehow >> > > > > > > > >> > > > related >> > > > > > > > >> > > > > > to >> > > > > > > > >> > > > > > > the Flink "FileSystem" classes. Which I think >> > its' >> > > > > not. >> > > > > > > > >> > > > > > > Maybe a different name, that makes this >> > separation >> > > > > more >> > > > > > > > >> explicit, >> > > > > > > > >> > > > would >> > > > > > > > >> > > > > > > make sense. Maybe "LocalFSWatchService"? >> > > > > > > > >> > > > > > > (I'm sorry to bring up naming stuff -- its >> very >> > > > > > > subjective, >> > > > > > > > >> and >> > > > > > > > >> > > > > > difficult) >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > (3) For the test plan: There seem to be some >> SSL >> > > > > related >> > > > > > > e2e >> > > > > > > > >> > tests: >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/common_ssl.sh >> > > > > > > > >> > > > > > > It would be nice to extend them to cover this >> > > > feature >> > > > > as >> > > > > > > > >> well. I >> > > > > > > > >> > > > would >> > > > > > > > >> > > > > > hate >> > > > > > > > >> > > > > > > for this feature to slowly break by future >> > > changes, >> > > > so >> > > > > > > good >> > > > > > > > >> e2e >> > > > > > > > >> > > test >> > > > > > > > >> > > > > > > coverage is key, in particular bc so many >> > > components >> > > > > are >> > > > > > > > >> > involved. >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > Best, >> > > > > > > > >> > > > > > > Robert >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > On Wed, Apr 16, 2025 at 11:55 AM Nicolas >> Fraison >> > > > > > > > >> > > > > > > <nicolas.frai...@datadoghq.com.invalid> >> wrote: >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > Hi All, >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > I'd like to start a discussion to Handle >> TLS >> > > > > > Certificate >> > > > > > > > >> > Renewal >> > > > > > > > >> > > > > > > > Please provide some feedback on this >> proposal: >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-523%3A+Handle+TLS+Certificate+Renewal >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > Regards, >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > Nicolas Fraison >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > >> >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >