Hi Matyas, In the meantime I was thinking about the per provider re-obtain feature and here are my thoughts related that: * I think it's a good feature in general but as mentioned I would add it in a separate FLIP * In case of Hadoop providers it just wouldn't work (HBase doesn't have end timestamp so actually HDFS is triggering the re-obtain) but in all non-hadoop providers it's a good idea * Adding "security.delegation.tokens.renewal.retry.backoff" and "security.delegation.tokens.renewal.time-ratio" is needed but as you mentioned fallback to kerberos configs just doesn't make sense * In a later FLIP we can add per provider "security.kerberos.token.provider.{providerName}.renewal.retry.backoff" and/or "security.kerberos.token.provider.{providerName}.renewal.time-ratio" * This is an additional feature which justifies to separate Hadoop and non-hadoop providers on the API level
Waiting on your opinion. G On Mon, Nov 7, 2022 at 4:17 PM Gabor Somogyi <gabor.g.somo...@gmail.com> wrote: > Hi Matyas, > > Thanks for your comments, answered inline. > > G > > > On Mon, Nov 7, 2022 at 2:58 PM Őrhidi Mátyás <matyas.orh...@gmail.com> > wrote: > >> Hi Gabor, >> >> Thanks for driving this effort! A few thoughts on the topic: >> - Could you please add a few examples of the delegation token providers we >> expected to be added in the near future? Ideally these providers could be >> configured independently from each other. However the configuration >> defaults mentioned in the FLIP are derived from hadoop configuration. I >> don't see the point here. >> > A clear plan is to add S3 now and Kafka possibly later on. > > S3 looks straightforward but that doesn't fit into the existing framework. > On Kafka side I've added the Kafka provider to Spark so I can imagine > similar solution w/ minor differences. > Please see the Spark solution: > https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#security > Here the minor planned difference is that Spark handles Kafka token inside > UGI which is not nice but works. > I've just seen similar generalization effort on the Spark side too so the > Kafka part may or may not change there. > > Not sure what you mean configs are derived from Hadoop. > > What I can think of is that > "security.delegation.tokens.renewal.retry.backoff" is defaulting to > "security.kerberos.tokens.renewal.retry.backoff". > This is basically happening for backward compatibility purposes. > > The other thing what I can think of is that you miss the independent > provider token obtain functionality. > When I mean independent configuration I mean each provider has it's own > set of keys which doesn't collide > but the voting system when token obtain must happen remains and not > touched in this FLIP. > Under voting system I mean each provider may send back its end timestamp > and the lowest wins > (at that timestamp all tokens are going to be re-obtained). > If that's what you mean we can think about solutions but it has nothing to > do with framework generalization. > > >> - Are we planning to support such scenarios where we need to read/write >> from different authentication realms from the same application. Two Hadoop >> clusters, Kafka clusters etc? This would need an authentication provider >> per source/sink. >> >> > It doesn't need 2 different provider per source/sink to do Kafka to Kafka. > Such cases cross-realm trust can be configured w/ principal mapping. > Please see the details here: > https://gist.github.com/gaborgsomogyi/c636f352ccec7730ff41ac1d524cb87d > Even if the gist was originally created for Spark I tend to do the same > here in the future. > > Just to make an extract here Kafka principal mapping looks like this: > sasl.kerberos.principal.to.local.rules=RULE:[1:$1@$0](.*@ > DT2HOST.COMPANY.COM)s/@.*//,DEFAULT > > Providing 2 users in Hadoop world is just impossible because UGI is > basically a singleton. > Of course everything can be hacked around (for ex. changing current user > on-the-fly) but that would be such a > headache what I think we must avoid. It would end-up in synchronization > and performance hell. > I've made some experiments in my Spark era and this would be the same here > :) > > >> Thanks, >> Matyas >> >> >> >> On Mon, Nov 7, 2022 at 5:10 AM Gabor Somogyi <gabor.g.somo...@gmail.com> >> wrote: >> >> > Hi team, >> > >> > Delegation token framework is going to be finished soon (added in >> FLIP-211 >> > < >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework?src=contextnavpagetreemode >> > > >> > ). >> > Previously there were concerns that the current implementation is bound >> to >> > Hadoop and Kerberos authentication. This is fair concern and as a result >> > we've created a proposal to generalize the delegation token framework >> > (practically making it authentication agnostic). >> > >> > This can open the path to add further non-hadoop and non-Kerberos based >> > providers like S3 or many others. >> > >> > One can find the FLIP in: >> > - Wiki: >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-272%3A+Generalized+delegation+token+support >> > - document: >> > >> > >> https://docs.google.com/document/d/12tFdx1AZVuW9BjwBht_pMNELgrqro8Z5-hzWeaRY4pc/edit?usp=sharing >> > >> > I would like to start a discussion to make the framework better. >> > >> > BR, >> > G >> > >> >