In my offline discussion with Rajini and other folks, we basically had a
better understanding for the following problems:

1. Whether there should be a clear disabling mechanism for periodical
refreshing. We thought about using -1 for a special refreshing interval
value to disable it, but as a matter of fact users could just set the value
to max long in order to disable periodical refresh when they don't want to
have it in a rare case, so we don't need to have this extra complexity.

2. Whether we need file-watch triggering at all. It is agreed that having a
hybrid approach will increase the implementation complexity, and we could
potentially ask existing customers to avoid doing in-place file updates in
the upgrade notes if we don't implement the file-watcher at all. Compared
with the plain periodical refreshing approach, file-watcher still has the
merits of maintaining the same semantic behavior for existing in-place
security store swap, and it is not necessarily too complicated to block the
project progress. Above all, this features an internal implementation
detail which we don't need to resolve at the KIP discussion stage, so we
should keep it and revisit if things indeed go down.

Let me know your thoughts.

Best,
Boyang

On Fri, Jan 8, 2021 at 10:42 AM Boyang Chen <reluctanthero...@gmail.com>
wrote:

>
>
> On Fri, Jan 8, 2021 at 4:46 AM Rajini Sivaram <rajinisiva...@gmail.com>
> wrote:
>
>> Hi Boyang,
>>
>> Thanks for the responses. Follow up comments on a couple of those:
>>
>> 4) Can you provide some more details on the scenarios where file watcher
>> is
>> useful. You mentioned hybrid, but it is not clear to me why a watcher that
>> reloads 99% of the time would be useful. There are a few cases to
>> consider:
>> 4a) Refresh SSL stores prior to expiry. In this case, time-interval-based
>> reloading makes sense and this is similar to refresh of other security
>> credentials. It seems reasonable to expect that stores are updated well
>> before expiry and that the refresh interval would be configured
>> accordingly.
>> 4b) Remove a compromised client certificate from the trust store. This is
>> a
>> critical case. With the current code, you send an admin client request and
>> get feedback on whether the update happened. A watcher that may or may not
>> work is pretty much useless. The only reliable way to handle this case
>> after this KIP would be to change the file name and send an admin client
>> request.
>> 4c) Change broker key store, which also perhaps requires change to the
>> truststore. This is the ordering issue introduced by this KIP. This is
>> also
>> the issue that David Jacot brought up. With the current code, you update
>> keystore and truststore and send admin client requests to perform the
>> update and hence you have total control on when each update is performed
>> and whether they are performed together. With the proposed KIP, we lose
>> this control. Like 4b), the only reliable way after this KIP to handle
>> this
>> case would be to change filenames and send an admin client request.
>> 4d) Add a new client certificate to the broker truststore. This is the
>> only
>> case where a file watcher seems useful to reload sooner.
>>
>> To summarize, two of the four cases are no longer reliable without file
>> name change and it would be better to document these as limitations. These
>> are rare cases where file name change seems reasonable. Periodic refresh,
>> which is the most common case and one that we would want to support in
>> clients some day can be supported without a file watcher. That leaves 4d)
>> which I am not sure justifies the use of a file watcher that cannot be
>> disabled.
>>
>>  My understanding is that the hybrid approach reduces the amount of time
> to
> propagate security changes, instead of waiting for unnecessarily long
> time. Secondly IIUC,
> even with automatic tooling, you need to do the key store and trust store
> updates in two separate AlterConfigs,
> where one must happen after the other is successful. If we have to use an
> RPC, it has to decouple from
> AlterConfig and sent twice to a specific broker, which we discussed as a
> rejected alternative due to the complexity and overhead.
>
> To put it straight, how many use cases today do we have to do the same
> name store file content update? My impression was
> that KIP-687 was just trying to solve a very special edge case with a
> minimal effort, not trying to officially claim that
> in-place security file update should be a main-stream use case.
>
> 5) Dynamic updates are restricted to ensure brokers don't become unusable
>> after an update. For example, we don't allow the distinguished name (DN)
>>  in the certificate that is used to determine user identity to be changed
>> dynamically. Similarly, you cannot remove host names included in the
>> certificates  since that may fail client validation. For certificates used
>> for inter-broker communication, we validate the key store against the
>> trust
>> store to ensure that brokers are able to communicate to each other. With
>> admin client based updates, we attempt an update and provide feedback so
>> that the user can take action. The KIP needs more detail on how errors
>> will
>> be handled with the watcher or periodic updates. It also needs more detail
>> on the new requirements being introduced for ordering of updates of files
>> on the file system or clearly state the cases which will no longer be
>> supported without file  name change.
>>
>> I guess we will have to rely on the application error log to detect an
> update failure.
> If it helps, we could get a separate log file specifically for security
> store reloading progress, but
> I don't think we could propagate the validation error in the admin client
> as before. Could you
> show me the documentation on how the tooling is being used today for
> security store reloading which
> has an ordering that we might need to change?
>
>>
>> On Thu, Jan 7, 2021 at 6:46 PM Boyang Chen <reluctanthero...@gmail.com>
>> wrote:
>>
>> > Hey David, thanks for the feedback.
>> >
>> > On Thu, Jan 7, 2021 at 2:37 AM David Jacot <dja...@confluent.io> wrote:
>> >
>> > > Hi Boyang,
>> > >
>> > > Thanks for the KIP. I am fine with it in general. I just have a few
>> > > comments.
>> > >
>> > > With the proposal, we don't have the guarantee that both the new
>> keystore
>> > > and the new truststore will be picked up together so we may end up
>> with
>> > > the new keystore and the old truststore for a short period of time, or
>> > > permanently
>> > > if the second one can't be reloaded for any reason.
>> > >
>> > > This could disallow clients to authenticate for a while if the new
>> > keystore
>> > > and the
>> > > new trustore are not crafted to work with their old versions.
>> > >
>> > > I wonder how this would work in practice. Do we already have guards in
>> > > place to avoid this or could we add something to ensure that listeners
>> > are
>> > > updated only if both the truststore and the keystore works with each
>> > other?
>> > >
>> > > We don't have this issue today as both the truststore and the keystore
>> > are
>> > > reloaded when the AlterConfig RPC is received so the admin can control
>> > > this process. It is all or nothing.
>> > >
>> > > I'm not sure how we achieve that today, if we are updating both key
>> store
>> > and trust store
>> > in the same request, it is still possible that one update succeeded but
>> > another failed. Does users
>> > have awareness to make the guarantee by themselves? My
>> > point is that today there is no reliable way to achieve atomic update
>> > either, and hoping users
>> > to make the correct sequence of decisions would be hard.
>> >
>> > I think that this is acceptable but it is worth clearly mentioning that
>> > > there is no
>> > > guarantee from that regard in the KIP, and later in the doc. Perhaps,
>> we
>> > > could
>> > > also mention that updating them in place is not a best practice and
>> that
>> > > using
>> > > new paths gives better control to the admin.
>> > >
>> > > Best,
>> > > David
>> > >
>> > > On Wed, Jan 6, 2021 at 6:55 PM Jason Gustafson <ja...@confluent.io>
>> > wrote:
>> > >
>> > > > Thanks Boyang. Someone mentioned my email never showed up, but
>> > basically
>> > > I
>> > > > suggested tying the refresh configuration more directly to the
>> > > > configurations it would affect. I'm happy with the updates.
>> > > >
>> > > > -Jason
>> > > >
>> > > > On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <
>> reluctanthero...@gmail.com
>> > >
>> > > > wrote:
>> > > >
>> > > > > Thanks Jason for the feedback. I separated the time configs for
>> key
>> > > store
>> > > > > and trust store, and rename the configs as you proposed.
>> > > > >
>> > > > > Best,
>> > > > > Boyang
>> > > > >
>> > > > > On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <
>> > > reluctanthero...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hey there,
>> > > > > >
>> > > > > > bumping up this thread to see if there are further questions
>> > > regarding
>> > > > > the
>> > > > > > updated proposal.
>> > > > > >
>> > > > > > Best,
>> > > > > > Boyang
>> > > > > >
>> > > > > > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <
>> > > > reluctanthero...@gmail.com
>> > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > >> After some offline discussions, we believe that it's the right
>> > > > direction
>> > > > > >> to go by doing a hybrid approach which includes both file-watch
>> > > > trigger
>> > > > > and
>> > > > > >> interval based reloading. The former guarantees a swift change
>> in
>> > > 99%
>> > > > > time,
>> > > > > >> while the latter provides a time-based guarantee in the worst
>> case
>> > > > when
>> > > > > the
>> > > > > >> file-watch does not take effect. The current default reloading
>> > > > interval
>> > > > > is
>> > > > > >> set to 5 min. I have updated the KIP and ticket, feel free to
>> > check
>> > > > out
>> > > > > and
>> > > > > >> see if it makes sense.
>> > > > > >>
>> > > > > >> Best,
>> > > > > >> Boyang
>> > > > > >>
>> > > > > >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <
>> > > > reluctanthero...@gmail.com>
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >>> Hey Gwen, thanks for the feedback.
>> > > > > >>>
>> > > > > >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <
>> g...@confluent.io>
>> > > > > wrote:
>> > > > > >>>
>> > > > > >>>> Agree with Igor. IIRC, we also encountered cases where
>> filewatch
>> > > was
>> > > > > >>>> not triggered as expected. An interval will give us a better
>> > > > > >>>> worse-case scenario that is easily controlled by the Kafka
>> > admin.
>> > > > > >>>>
>> > > > > >>>> Are the cases you were referring to happening in the cloud
>> > > > > environment?
>> > > > > >>> Should we investigate instead of simply assuming the standard
>> API
>> > > > won't
>> > > > > >>> work? I checked around and found a similar complaint here
>> > > > > >>> <
>> https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
>> > > > > >>>
>> > > > > >>> I would be partially agreeing that we want to have a reliable
>> > > > approach
>> > > > > >>> for all different operating systems in general, but would be
>> > great
>> > > if
>> > > > > we
>> > > > > >>> could reach a quantitative measure of file-watch success rate
>> if
>> > > > > possible
>> > > > > >>> for us to make the call. Eventually, the benefit of
>> file-watch is
>> > > > more
>> > > > > >>> prompt reaction time and less configuration to the broker.
>> > > > > >>>
>> > > > > >>>> Gwen
>> > > > > >>>>
>> > > > > >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me>
>> wrote:
>> > > > > >>>> >
>> > > > > >>>> >
>> > > > > >>>> > > > The proposed change relies on a file watch, why not
>> also
>> > > have
>> > > > a
>> > > > > >>>> polling
>> > > > > >>>> > > > interval to check the file for changes?
>> > > > > >>>> > > >
>> > > > > >>>> > > > The periodical check could work, the slight downside is
>> > that
>> > > > we
>> > > > > >>>> need
>> > > > > >>>> > > additional configurations to schedule the interval. Do
>> you
>> > > think
>> > > > > the
>> > > > > >>>> > > file-watch approach has any extra overhead than the
>> interval
>> > > > based
>> > > > > >>>> solution?
>> > > > > >>>> >
>> > > > > >>>> > I don't think so. The reason I'm asking this is the KIP
>> > > currently
>> > > > > >>>> includes:
>> > > > > >>>> >
>> > > > > >>>> >   "When the file watch does not work for unknown reason,
>> user
>> > > > could
>> > > > > >>>> still try to change the store path in an explicit AlterConfig
>> > call
>> > > > in
>> > > > > the
>> > > > > >>>> worst case."
>> > > > > >>>> >
>> > > > > >>>> > Having the interval in addition to the file watch could
>> result
>> > > in
>> > > > a
>> > > > > >>>> better worst case scenario.
>> > > > > >>>> > I understand it would require introducing at least one new
>> > > > > >>>> configuration for the interval, so maybe this doesn't have to
>> > > solved
>> > > > > in
>> > > > > >>>> this KIP.
>> > > > > >>>> >
>> > > > > >>>> > --
>> > > > > >>>> > Igor
>> > > > > >>>> >
>> > > > > >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
>> > > > > >>>> > > Hey Igor, thanks for the feedback.
>> > > > > >>>> > >
>> > > > > >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me>
>> > > wrote:
>> > > > > >>>> > >
>> > > > > >>>> > > > Hi Boyang,
>> > > > > >>>> > > >
>> > > > > >>>> > >
>> > > > > >>>> > >
>> > > > > >>>> > > > What happens if the file is changed into an invalid
>> store?
>> > > > Does
>> > > > > >>>> the
>> > > > > >>>> > > > previous store stay in use?
>> > > > > >>>> > > >
>> > > > > >>>> > > > If the reload fails, the previous store should be
>> > > effective. I
>> > > > > >>>> will state
>> > > > > >>>> > > that in the KIP.
>> > > > > >>>> > >
>> > > > > >>>> > >
>> > > > > >>>> > > > Thanks,
>> > > > > >>>> > > >
>> > > > > >>>> > > > --
>> > > > > >>>> > > > Igor
>> > > > > >>>> > > >
>> > > > > >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
>> > > > > >>>> > > > > Hey there,
>> > > > > >>>> > > > >
>> > > > > >>>> > > > > I would like to start the discussion thread for
>> KIP-687:
>> > > > > >>>> > > > >
>> > > > > >>>> > > >
>> > > > > >>>>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
>> > > > > >>>> > > > >
>> > > > > >>>> > > > > This KIP is trying to deprecate the AlterConfigs API
>> > > support
>> > > > > of
>> > > > > >>>> updating
>> > > > > >>>> > > > > the security store by reloading path in-place, and
>> > replace
>> > > > > with
>> > > > > >>>> a
>> > > > > >>>> > > > > file-watch mechanism inside the broker. Let me know
>> what
>> > > you
>> > > > > >>>> think.
>> > > > > >>>> > > > >
>> > > > > >>>> > > > > Best,
>> > > > > >>>> > > > > Boyang
>> > > > > >>>> > > > >
>> > > > > >>>> > > >
>> > > > > >>>> > >
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> --
>> > > > > >>>> Gwen Shapira
>> > > > > >>>> Engineering Manager | Confluent
>> > > > > >>>> 650.450.2760 | @gwenshap
>> > > > > >>>> Follow us: Twitter | blog
>> > > > > >>>>
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to