Re: [DISCUSS] FLIP-272: Generalized delegation token support

Danny Cranmer Mon, 21 Nov 2022 12:27:15 -0800

Hey,

> All the rest which is public can be considered accidental leak (for
example
"DelegationTokenManager" where Java doesn't support internal interfaces).


If the classes are internal, then they should be annotated @Internal. This
is a Flink thing [1]. My explicit recommendation is to ensure all
classes/interfaces that you expect end users or developers to use are
annotated @PublicEvolving or @Experimental and others are @Internal.

Thanks for the explanation of how the connectors will retrieve the tokens.
Do you intend for this to be via a static method or via the operator
context, or something else? With regards to the AWS clients we would
typically not rebuild the credential provider for each request. It would be
simple enough to implement a cached credential provider that is backed by
the TokenContainer. I suppose this is simpler than the hook/subscription
mechanism, we can always add that later if the need arises.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees

Thanks,
Danny

On Mon, Nov 21, 2022 at 5:45 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> Hi Danny,
>
> Thanks for spending your time to make the design better. My comments added
> inline.
> Please share your opinion or just correct me if I've misunderstood your
> suggestions.
>
> G
>
>
> On Mon, Nov 21, 2022 at 5:10 PM Danny Cranmer <dannycran...@apache.org>
> wrote:
>
> > Thanks for the FLIP Gabor.
> >
> > Generally looks good, have a couple of comments.
> >
> > If I understand correctly we plan on pulling the current implementation
> > apart and adding an extra layer of abstraction, to allow arbitrary
> > implementations. I notice that most of the delegation token classes [1]
> are
> > missing compatibility annotations, although a couple are @Experimental.
> Can
> > we update these classes to add the compatibility annotations ASAP?
> > The DelegationTokenManager class [2] you talk about adding already
> exists,
> > without any compatibility guarantees. The recent FLIPs [3][4] do not
> cover
> > what support un-annotated classes default to, unless I missed it.
> >
>
> You've touched an important part which every component must define/obey
> sooner or later when it's mature enough.
> The feature is just finished so marked as fully experimental as a whole.
> The intended developer facing API is "DelegationTokenProvider" (at least
> for now).
> All the rest which is public can be considered accidental leak (for example
> "DelegationTokenManager" where Java doesn't support internal interfaces).
>
> The "DelegationTokenManager" interface came into the picture in a comment
> to make it maybe pluggable but now I think that we must delete/change it.
> The reason behind is that "obtainDelegationTokens" API has a "Credentials"
> class as parameter which binding the whole stuff to Hadoop.
> I personally don't insist to any of the choices just saying that an API
> shouldn't contain implementation details.
>
> What we can do in short term: if you think there are places where
> additional annotations are needed then please share your explicit
> suggestions.
> When it's clear I can create a jira/PR and we can get a rid of those gaps.
>
> In mid/long term when we see that a fully custom provider can be added to
> the framework without Hadoop then API guarantees can come into the picture.
>
>
> >
> > > External connectors need to look into the TokenContainer and when token
> > is found with a specific key then it must be used
> > Will we implement some form of hook that connectors can subscribe to?
> > Otherwise I feel we will duplicate a bunch of logic across connectors
> > periodically polling this TokenContainer for updates.
> >
>
> Good point, in short not considered until you not mentioned it.
> In detail Hadoop components are checking the content of the UGI when
> authentication is needed.
> Of course this doesn't mean we must do the same here. We are going to do
> what is more convenient from API
> user perspective.
> My original idea is super simple and only 2 lines considering
> implementation:
> 1. get a byte array from the TokenContainer map with a specific key (the
> provider puts the serialized token byte array into the map with this key)
> 2. The byte array can be deserialized by the actual connector (reverse
> action done by the provider)
>
> Deserializing the tokens all the time can be time consuming so I tend to
> agree that maybe we can add
> listener logic. Such way a connector can subscribe for specific tokens.
> When the token arrives it receives
> a callback where deserialization happens. The deserialized tokens can be
> stored which would be definitely a speed-up.
>
> If we're intended to add a listener mechanism in this area then my
> suggestions would be:
> * define some sort of timeout for the listener execution. I know, normally
> this is just storing the value somewhere but if that is somehow
> broken then it can kill whole clusters.
> * some sort of retry logic would be good but to implement this is not
> trivial
>
> All in all the listener mechanism would be more brittle but I see your
> point and it makes sense.
> Please share your thoughts.
>
>
> > Thanks,
> >
> > [1]
> >
> >
> https://github.com/apache/flink/tree/0634d0cc0a401d268cc1b3788c8ee63a91ff33a6/flink-runtime/src/main/java/org/apache/flink/runtime/security/token
> > [2]
> >
> >
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/token/DelegationTokenManager.java
> > [3]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees
> > [4]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process
> >
> > On Wed, Nov 9, 2022 at 4:46 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
> > wrote:
> >
> > > OK, removed the fallback part.
> > >
> > > Thanks for the help!
> > >
> > > G
> > >
> > >
> > > On Wed, Nov 9, 2022 at 5:03 PM Őrhidi Mátyás <matyas.orh...@gmail.com>
> > > wrote:
> > >
> > > > looks better! no further concerns
> > > >
> > > > Cheers,
> > > > Matyas
> > > >
> > > > On Mon, Nov 7, 2022 at 9:21 AM Gabor Somogyi <
> > gabor.g.somo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Oh gosh, copied wrong config keys so fixed my last mail with green.
> > > > >
> > > > > On Mon, Nov 7, 2022 at 6:07 PM Gabor Somogyi <
> > > gabor.g.somo...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Matyas,
> > > > > >
> > > > > > In the meantime I was thinking about the per provider re-obtain
> > > feature
> > > > > > and here are my thoughts related that:
> > > > > > * I think it's a good feature in general but as mentioned I would
> > add
> > > > it
> > > > > > in a separate FLIP
> > > > > > * In case of Hadoop providers it just wouldn't work (HBase
> doesn't
> > > have
> > > > > > end timestamp so actually HDFS is triggering the re-obtain) but
> in
> > > all
> > > > > > non-hadoop providers it's a good idea
> > > > > > * Adding "security.delegation.tokens.renewal.retry.backoff" and
> > > > > > "security.delegation.tokens.renewal.time-ratio" is needed but as
> > you
> > > > > > mentioned fallback to kerberos configs just doesn't make sense
> > > > > > * In a later FLIP we can add per provider
> > > > >
> > >
> "security.delegation.token.provider.{providerName}.renewal.retry.backoff"
> > > > > > and/or
> > > > >
> > "security.delegation.token.provider.{providerName}.renewal.time-ratio"
> > > > > > for non-hadoop providers
> > > > > > * This is an additional feature which justifies to separate
> Hadoop
> > > and
> > > > > > non-hadoop providers on the API level
> > > > > >
> > > > > > Waiting on your opinion.
> > > > > >
> > > > > > G
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 7, 2022 at 4:17 PM Gabor Somogyi <
> > > > gabor.g.somo...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Matyas,
> > > > > >>
> > > > > >> Thanks for your comments, answered inline.
> > > > > >>
> > > > > >> G
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Nov 7, 2022 at 2:58 PM Őrhidi Mátyás <
> > > matyas.orh...@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hi Gabor,
> > > > > >>>
> > > > > >>> Thanks for driving this effort! A few thoughts on the topic:
> > > > > >>> - Could you please add a few examples of the delegation token
> > > > providers
> > > > > >>> we
> > > > > >>> expected to be added in the near future? Ideally these
> providers
> > > > could
> > > > > be
> > > > > >>> configured independently from each other.  However the
> > > configuration
> > > > > >>> defaults mentioned in the FLIP are derived from hadoop
> > > > configuration. I
> > > > > >>> don't see the point here.
> > > > > >>>
> > > > > >> A clear plan is to add S3 now and Kafka possibly later on.
> > > > > >>
> > > > > >> S3 looks straightforward but that doesn't fit into the existing
> > > > > framework.
> > > > > >> On Kafka side I've added the Kafka provider to Spark so I can
> > > imagine
> > > > > >> similar solution w/ minor differences.
> > > > > >> Please see the Spark solution:
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#security
> > > > > >> Here the minor planned difference is that Spark handles Kafka
> > token
> > > > > >> inside UGI which is not nice but works.
> > > > > >> I've just seen similar generalization effort on the Spark side
> too
> > > so
> > > > > the
> > > > > >> Kafka part may or may not change there.
> > > > > >>
> > > > > >> Not sure what you mean configs are derived from Hadoop.
> > > > > >>
> > > > > >> What I can think of is that
> > > > > >> "security.delegation.tokens.renewal.retry.backoff" is defaulting
> > to
> > > > > >> "security.kerberos.tokens.renewal.retry.backoff".
> > > > > >> This is basically happening for backward compatibility purposes.
> > > > > >>
> > > > > >> The other thing what I can think of is that you miss the
> > independent
> > > > > >> provider token obtain functionality.
> > > > > >> When I mean independent configuration I mean each provider has
> > it's
> > > > own
> > > > > >> set of keys which doesn't collide
> > > > > >> but the voting system when token obtain must happen remains and
> > not
> > > > > >> touched in this FLIP.
> > > > > >> Under voting system I mean each provider may send back its end
> > > > timestamp
> > > > > >> and the lowest wins
> > > > > >> (at that timestamp all tokens are going to be re-obtained).
> > > > > >> If that's what you mean we can think about solutions but it has
> > > > nothing
> > > > > >> to do with framework generalization.
> > > > > >>
> > > > > >>
> > > > > >>> - Are we planning to support such scenarios where we need to
> > > > read/write
> > > > > >>> from different authentication realms from the same application.
> > Two
> > > > > >>> Hadoop
> > > > > >>> clusters, Kafka clusters etc? This would need an authentication
> > > > > provider
> > > > > >>> per source/sink.
> > > > > >>>
> > > > > >>>
> > > > > >> It doesn't need 2 different provider per source/sink to do Kafka
> > to
> > > > > >> Kafka. Such cases cross-realm trust can be configured w/
> principal
> > > > > mapping.
> > > > > >> Please see the details here:
> > > > > >>
> > > >
> https://gist.github.com/gaborgsomogyi/c636f352ccec7730ff41ac1d524cb87d
> > > > > >> Even if the gist was originally created for Spark I tend to do
> the
> > > > same
> > > > > >> here in the future.
> > > > > >>
> > > > > >> Just to make an extract here Kafka principal mapping looks like
> > > this:
> > > > > >> sasl.kerberos.principal.to.local.rules=RULE:[1:$1@$0](.*@
> > > > > >> DT2HOST.COMPANY.COM)s/@.*//,DEFAULT
> > > > > >>
> > > > > >> Providing 2 users in Hadoop world is just impossible because UGI
> > is
> > > > > >> basically a singleton.
> > > > > >> Of course everything can be hacked around (for ex. changing
> > current
> > > > user
> > > > > >> on-the-fly) but that would be such a
> > > > > >> headache what I think we must avoid. It would end-up in
> > > > synchronization
> > > > > >> and performance hell.
> > > > > >> I've made some experiments in my Spark era and this would be the
> > > same
> > > > > >> here :)
> > > > > >>
> > > > > >>
> > > > > >>> Thanks,
> > > > > >>> Matyas
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Mon, Nov 7, 2022 at 5:10 AM Gabor Somogyi <
> > > > > gabor.g.somo...@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > Hi team,
> > > > > >>> >
> > > > > >>> > Delegation token framework is going to be finished soon
> (added
> > in
> > > > > >>> FLIP-211
> > > > > >>> > <
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework?src=contextnavpagetreemode
> > > > > >>> > >
> > > > > >>> > ).
> > > > > >>> > Previously there were concerns that the current
> implementation
> > is
> > > > > >>> bound to
> > > > > >>> > Hadoop and Kerberos authentication. This is fair concern and
> > as a
> > > > > >>> result
> > > > > >>> > we've created a proposal to generalize the delegation token
> > > > framework
> > > > > >>> > (practically making it authentication agnostic).
> > > > > >>> >
> > > > > >>> > This can open the path to add further non-hadoop and
> > non-Kerberos
> > > > > based
> > > > > >>> > providers like S3 or many others.
> > > > > >>> >
> > > > > >>> > One can find the FLIP in:
> > > > > >>> > - Wiki:
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-272%3A+Generalized+delegation+token+support
> > > > > >>> > - document:
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/12tFdx1AZVuW9BjwBht_pMNELgrqro8Z5-hzWeaRY4pc/edit?usp=sharing
> > > > > >>> >
> > > > > >>> > I would like to start a discussion to make the framework
> > better.
> > > > > >>> >
> > > > > >>> > BR,
> > > > > >>> > G
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-272: Generalized delegation token support

Reply via email to