Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Boyang Chen Thu, 07 Jan 2021 10:29:56 -0800

Thanks Rajini for the comments.

On Thu, Jan 7, 2021 at 2:27 AM Rajini Sivaram <rajinisiva...@gmail.com>
wrote:


> Hi Boyang,
>
> Thanks for the KIP, I have a few questions:
>
> 1) Will it be possible to enable/disable automatic file reloading? If not,
> we should mention in the compatibility section.
>
I don't think we need to support disabling reloading as it will become the
major mechanism to refresh
the security store. Will make that clear in the KIP.

>
> 2) We are introducing new common SSL configs and updating common code to
> perform automated updates. What does this mean for clients? Are we going to
> automatically reload client key stores and trust stores?
>
> This config would not be applied to clients. Will make that clear in the
config comments.

3) We should mention in the compatibility section that we are changing the
> audit log/authorization model for dynamic updates of SSL stores. At the
> moment, only a user with powerful Cluster:Alter permissions can dynamically
> update SSL stores on brokers. The KIP removes this restriction and relies
> purely on file system permissions for file-based stores, unlike for example
> PEM store updates which would still rely on Kafka permissions.
>
> Sounds good, will make that clear.

4) Is it really necessary to add a file watcher that is not guaranteed 100%
> of the time, if we are adding refresh interval configs? In particular, if
> we extend this to clients now or at some point in the future, wouldn't it
> be better just to use deterministic refresh intervals without the overhead
> of watching?
>
> It is a used hybrid approach in certain server side scenarios. I guess the
main advantage
is that we have a best case guarantee for immediate refresh, as well as a
worst case
guarantee when the trigger fails, but not too frequent to introduce
performance problems.


> 5) The KIP doesn't talk about validation of key and trust stores. I am
> guessing we will continue to perform the same validation that we perform
> today. But what happens if validation fails? We should mention in the KIP
> that we no longer provide feedback for this case (unlike admin client
> requests that returned an error).
>
We suggested to expose error metrics and write error messages in the server
side log. As we
decouple the update mechanism from AlterConfig request, this is
unavoidable. Note that we are
only applying an in-place security file update without name change, which I
believe is one edge case.


> 5a) If a key store doesn't conform (e.g. the DN was changed), we would fail
> the update if we apply the current validation. Would we do the validation
> every 5 minutes after that forever even though the file wasn't updated
> since? Or will we remember the last reloaded time to avoid reloading if
> file hasn't changed?
>
Could you elaborate the definition for `DN`? Curious what does `conform`
suggest?


> 5b) We perform additional validation for inter-broker key and trust stores
> to ensure we never break the broker with dynamic updates. Since this
> validation matches key store with trust store, it relies on the order in
> which stores are updated. With reloading in the admin client, user had
> control over the update. We should document any restrictions on the order
> in which files need to be updated on the file system to perform updates of
> inter-broker SSL stores. And as with 5a), it will be good to document the
> behaviour if validation fails.
>
> So you are suggesting that the key store update should happen before the
trust store, or vice versa?
I'm not familiar with the restriction TBH.

Regards,
>
> Rajini
>
>
>
> On Wed, Jan 6, 2021 at 5:55 PM Jason Gustafson <ja...@confluent.io> wrote:
>
> > Thanks Boyang. Someone mentioned my email never showed up, but basically
> I
> > suggested tying the refresh configuration more directly to the
> > configurations it would affect. I'm happy with the updates.
> >
> > -Jason
> >
> > On Tue, Jan 5, 2021 at 8:34 PM Boyang Chen <reluctanthero...@gmail.com>
> > wrote:
> >
> > > Thanks Jason for the feedback. I separated the time configs for key
> store
> > > and trust store, and rename the configs as you proposed.
> > >
> > > Best,
> > > Boyang
> > >
> > > On Mon, Dec 14, 2020 at 3:47 PM Boyang Chen <
> reluctanthero...@gmail.com>
> > > wrote:
> > >
> > > > Hey there,
> > > >
> > > > bumping up this thread to see if there are further questions
> regarding
> > > the
> > > > updated proposal.
> > > >
> > > > Best,
> > > > Boyang
> > > >
> > > > On Thu, Dec 10, 2020 at 11:52 AM Boyang Chen <
> > reluctanthero...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> After some offline discussions, we believe that it's the right
> > direction
> > > >> to go by doing a hybrid approach which includes both file-watch
> > trigger
> > > and
> > > >> interval based reloading. The former guarantees a swift change in
> 99%
> > > time,
> > > >> while the latter provides a time-based guarantee in the worst case
> > when
> > > the
> > > >> file-watch does not take effect. The current default reloading
> > interval
> > > is
> > > >> set to 5 min. I have updated the KIP and ticket, feel free to check
> > out
> > > and
> > > >> see if it makes sense.
> > > >>
> > > >> Best,
> > > >> Boyang
> > > >>
> > > >> On Tue, Dec 8, 2020 at 8:58 PM Boyang Chen <
> > reluctanthero...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hey Gwen, thanks for the feedback.
> > > >>>
> > > >>> On Sun, Dec 6, 2020 at 10:06 PM Gwen Shapira <g...@confluent.io>
> > > wrote:
> > > >>>
> > > >>>> Agree with Igor. IIRC, we also encountered cases where filewatch
> was
> > > >>>> not triggered as expected. An interval will give us a better
> > > >>>> worse-case scenario that is easily controlled by the Kafka admin.
> > > >>>>
> > > >>>> Are the cases you were referring to happening in the cloud
> > > environment?
> > > >>> Should we investigate instead of simply assuming the standard API
> > won't
> > > >>> work? I checked around and found a similar complaint here
> > > >>> <https://blog.arkey.fr/2019/09/13/watchservice-and-bind-mount/>.
> > > >>>
> > > >>> I would be partially agreeing that we want to have a reliable
> > approach
> > > >>> for all different operating systems in general, but would be great
> if
> > > we
> > > >>> could reach a quantitative measure of file-watch success rate if
> > > possible
> > > >>> for us to make the call. Eventually, the benefit of file-watch is
> > more
> > > >>> prompt reaction time and less configuration to the broker.
> > > >>>
> > > >>>> Gwen
> > > >>>>
> > > >>>> On Sun, Dec 6, 2020 at 8:17 AM Igor Soarez <i...@soarez.me> wrote:
> > > >>>> >
> > > >>>> >
> > > >>>> > > > The proposed change relies on a file watch, why not also
> have
> > a
> > > >>>> polling
> > > >>>> > > > interval to check the file for changes?
> > > >>>> > > >
> > > >>>> > > > The periodical check could work, the slight downside is that
> > we
> > > >>>> need
> > > >>>> > > additional configurations to schedule the interval. Do you
> think
> > > the
> > > >>>> > > file-watch approach has any extra overhead than the interval
> > based
> > > >>>> solution?
> > > >>>> >
> > > >>>> > I don't think so. The reason I'm asking this is the KIP
> currently
> > > >>>> includes:
> > > >>>> >
> > > >>>> >   "When the file watch does not work for unknown reason, user
> > could
> > > >>>> still try to change the store path in an explicit AlterConfig call
> > in
> > > the
> > > >>>> worst case."
> > > >>>> >
> > > >>>> > Having the interval in addition to the file watch could result
> in
> > a
> > > >>>> better worst case scenario.
> > > >>>> > I understand it would require introducing at least one new
> > > >>>> configuration for the interval, so maybe this doesn't have to
> solved
> > > in
> > > >>>> this KIP.
> > > >>>> >
> > > >>>> > --
> > > >>>> > Igor
> > > >>>> >
> > > >>>> > On Fri, Dec 4, 2020, at 5:14 PM, Boyang Chen wrote:
> > > >>>> > > Hey Igor, thanks for the feedback.
> > > >>>> > >
> > > >>>> > > On Fri, Dec 4, 2020 at 5:24 AM Igor Soarez <i...@soarez.me>
> wrote:
> > > >>>> > >
> > > >>>> > > > Hi Boyang,
> > > >>>> > > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > > What happens if the file is changed into an invalid store?
> > Does
> > > >>>> the
> > > >>>> > > > previous store stay in use?
> > > >>>> > > >
> > > >>>> > > > If the reload fails, the previous store should be
> effective. I
> > > >>>> will state
> > > >>>> > > that in the KIP.
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > > Thanks,
> > > >>>> > > >
> > > >>>> > > > --
> > > >>>> > > > Igor
> > > >>>> > > >
> > > >>>> > > > On Fri, Dec 4, 2020, at 1:28 AM, Boyang Chen wrote:
> > > >>>> > > > > Hey there,
> > > >>>> > > > >
> > > >>>> > > > > I would like to start the discussion thread for KIP-687:
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-687%3A+Automatic+Reloading+of+Security+Store
> > > >>>> > > > >
> > > >>>> > > > > This KIP is trying to deprecate the AlterConfigs API
> support
> > > of
> > > >>>> updating
> > > >>>> > > > > the security store by reloading path in-place, and replace
> > > with
> > > >>>> a
> > > >>>> > > > > file-watch mechanism inside the broker. Let me know what
> you
> > > >>>> think.
> > > >>>> > > > >
> > > >>>> > > > > Best,
> > > >>>> > > > > Boyang
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Gwen Shapira
> > > >>>> Engineering Manager | Confluent
> > > >>>> 650.450.2760 | @gwenshap
> > > >>>> Follow us: Twitter | blog
> > > >>>>
> > > >>>
> > >
> >
>

Re: [DISCUSS] KIP-687: Automatic Reloading of Security Store

Reply via email to