Oh gosh, copied wrong config keys so fixed my last mail with green. On Mon, Nov 7, 2022 at 6:07 PM Gabor Somogyi <gabor.g.somo...@gmail.com> wrote:
> Hi Matyas, > > In the meantime I was thinking about the per provider re-obtain feature > and here are my thoughts related that: > * I think it's a good feature in general but as mentioned I would add it > in a separate FLIP > * In case of Hadoop providers it just wouldn't work (HBase doesn't have > end timestamp so actually HDFS is triggering the re-obtain) but in all > non-hadoop providers it's a good idea > * Adding "security.delegation.tokens.renewal.retry.backoff" and > "security.delegation.tokens.renewal.time-ratio" is needed but as you > mentioned fallback to kerberos configs just doesn't make sense > * In a later FLIP we can add per provider > "security.delegation.token.provider.{providerName}.renewal.retry.backoff" > and/or "security.delegation.token.provider.{providerName}.renewal.time-ratio" > for non-hadoop providers > * This is an additional feature which justifies to separate Hadoop and > non-hadoop providers on the API level > > Waiting on your opinion. > > G > > > On Mon, Nov 7, 2022 at 4:17 PM Gabor Somogyi <gabor.g.somo...@gmail.com> > wrote: > >> Hi Matyas, >> >> Thanks for your comments, answered inline. >> >> G >> >> >> On Mon, Nov 7, 2022 at 2:58 PM Őrhidi Mátyás <matyas.orh...@gmail.com> >> wrote: >> >>> Hi Gabor, >>> >>> Thanks for driving this effort! A few thoughts on the topic: >>> - Could you please add a few examples of the delegation token providers >>> we >>> expected to be added in the near future? Ideally these providers could be >>> configured independently from each other. However the configuration >>> defaults mentioned in the FLIP are derived from hadoop configuration. I >>> don't see the point here. >>> >> A clear plan is to add S3 now and Kafka possibly later on. >> >> S3 looks straightforward but that doesn't fit into the existing framework. >> On Kafka side I've added the Kafka provider to Spark so I can imagine >> similar solution w/ minor differences. >> Please see the Spark solution: >> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#security >> Here the minor planned difference is that Spark handles Kafka token >> inside UGI which is not nice but works. >> I've just seen similar generalization effort on the Spark side too so the >> Kafka part may or may not change there. >> >> Not sure what you mean configs are derived from Hadoop. >> >> What I can think of is that >> "security.delegation.tokens.renewal.retry.backoff" is defaulting to >> "security.kerberos.tokens.renewal.retry.backoff". >> This is basically happening for backward compatibility purposes. >> >> The other thing what I can think of is that you miss the independent >> provider token obtain functionality. >> When I mean independent configuration I mean each provider has it's own >> set of keys which doesn't collide >> but the voting system when token obtain must happen remains and not >> touched in this FLIP. >> Under voting system I mean each provider may send back its end timestamp >> and the lowest wins >> (at that timestamp all tokens are going to be re-obtained). >> If that's what you mean we can think about solutions but it has nothing >> to do with framework generalization. >> >> >>> - Are we planning to support such scenarios where we need to read/write >>> from different authentication realms from the same application. Two >>> Hadoop >>> clusters, Kafka clusters etc? This would need an authentication provider >>> per source/sink. >>> >>> >> It doesn't need 2 different provider per source/sink to do Kafka to >> Kafka. Such cases cross-realm trust can be configured w/ principal mapping. >> Please see the details here: >> https://gist.github.com/gaborgsomogyi/c636f352ccec7730ff41ac1d524cb87d >> Even if the gist was originally created for Spark I tend to do the same >> here in the future. >> >> Just to make an extract here Kafka principal mapping looks like this: >> sasl.kerberos.principal.to.local.rules=RULE:[1:$1@$0](.*@ >> DT2HOST.COMPANY.COM)s/@.*//,DEFAULT >> >> Providing 2 users in Hadoop world is just impossible because UGI is >> basically a singleton. >> Of course everything can be hacked around (for ex. changing current user >> on-the-fly) but that would be such a >> headache what I think we must avoid. It would end-up in synchronization >> and performance hell. >> I've made some experiments in my Spark era and this would be the same >> here :) >> >> >>> Thanks, >>> Matyas >>> >>> >>> >>> On Mon, Nov 7, 2022 at 5:10 AM Gabor Somogyi <gabor.g.somo...@gmail.com> >>> wrote: >>> >>> > Hi team, >>> > >>> > Delegation token framework is going to be finished soon (added in >>> FLIP-211 >>> > < >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework?src=contextnavpagetreemode >>> > > >>> > ). >>> > Previously there were concerns that the current implementation is >>> bound to >>> > Hadoop and Kerberos authentication. This is fair concern and as a >>> result >>> > we've created a proposal to generalize the delegation token framework >>> > (practically making it authentication agnostic). >>> > >>> > This can open the path to add further non-hadoop and non-Kerberos based >>> > providers like S3 or many others. >>> > >>> > One can find the FLIP in: >>> > - Wiki: >>> > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-272%3A+Generalized+delegation+token+support >>> > - document: >>> > >>> > >>> https://docs.google.com/document/d/12tFdx1AZVuW9BjwBht_pMNELgrqro8Z5-hzWeaRY4pc/edit?usp=sharing >>> > >>> > I would like to start a discussion to make the framework better. >>> > >>> > BR, >>> > G >>> > >>> >>