Hi Matyas,

In the meantime I was thinking about the per provider re-obtain feature and
here are my thoughts related that:
* I think it's a good feature in general but as mentioned I would add it in
a separate FLIP
* In case of Hadoop providers it just wouldn't work (HBase doesn't have end
timestamp so actually HDFS is triggering the re-obtain) but in all
non-hadoop providers it's a good idea
* Adding "security.delegation.tokens.renewal.retry.backoff" and
"security.delegation.tokens.renewal.time-ratio" is needed but as you
mentioned fallback to kerberos configs just doesn't make sense
* In a later FLIP we can add per provider
"security.kerberos.token.provider.{providerName}.renewal.retry.backoff"
and/or "security.kerberos.token.provider.{providerName}.renewal.time-ratio"
* This is an additional feature which justifies to separate Hadoop and
non-hadoop providers on the API level

Waiting on your opinion.

G


On Mon, Nov 7, 2022 at 4:17 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:

> Hi Matyas,
>
> Thanks for your comments, answered inline.
>
> G
>
>
> On Mon, Nov 7, 2022 at 2:58 PM Őrhidi Mátyás <matyas.orh...@gmail.com>
> wrote:
>
>> Hi Gabor,
>>
>> Thanks for driving this effort! A few thoughts on the topic:
>> - Could you please add a few examples of the delegation token providers we
>> expected to be added in the near future? Ideally these providers could be
>> configured independently from each other.  However the configuration
>> defaults mentioned in the FLIP are derived from hadoop configuration. I
>> don't see the point here.
>>
> A clear plan is to add S3 now and Kafka possibly later on.
>
> S3 looks straightforward but that doesn't fit into the existing framework.
> On Kafka side I've added the Kafka provider to Spark so I can imagine
> similar solution w/ minor differences.
> Please see the Spark solution:
> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#security
> Here the minor planned difference is that Spark handles Kafka token inside
> UGI which is not nice but works.
> I've just seen similar generalization effort on the Spark side too so the
> Kafka part may or may not change there.
>
> Not sure what you mean configs are derived from Hadoop.
>
> What I can think of is that
> "security.delegation.tokens.renewal.retry.backoff" is defaulting to
> "security.kerberos.tokens.renewal.retry.backoff".
> This is basically happening for backward compatibility purposes.
>
> The other thing what I can think of is that you miss the independent
> provider token obtain functionality.
> When I mean independent configuration I mean each provider has it's own
> set of keys which doesn't collide
> but the voting system when token obtain must happen remains and not
> touched in this FLIP.
> Under voting system I mean each provider may send back its end timestamp
> and the lowest wins
> (at that timestamp all tokens are going to be re-obtained).
> If that's what you mean we can think about solutions but it has nothing to
> do with framework generalization.
>
>
>> - Are we planning to support such scenarios where we need to read/write
>> from different authentication realms from the same application. Two Hadoop
>> clusters, Kafka clusters etc? This would need an authentication provider
>> per source/sink.
>>
>>
> It doesn't need 2 different provider per source/sink to do Kafka to Kafka.
> Such cases cross-realm trust can be configured w/ principal mapping.
> Please see the details here:
> https://gist.github.com/gaborgsomogyi/c636f352ccec7730ff41ac1d524cb87d
> Even if the gist was originally created for Spark I tend to do the same
> here in the future.
>
> Just to make an extract here Kafka principal mapping looks like this:
> sasl.kerberos.principal.to.local.rules=RULE:[1:$1@$0](.*@
> DT2HOST.COMPANY.COM)s/@.*//,DEFAULT
>
> Providing 2 users in Hadoop world is just impossible because UGI is
> basically a singleton.
> Of course everything can be hacked around (for ex. changing current user
> on-the-fly) but that would be such a
> headache what I think we must avoid. It would end-up in synchronization
> and performance hell.
> I've made some experiments in my Spark era and this would be the same here
> :)
>
>
>> Thanks,
>> Matyas
>>
>>
>>
>> On Mon, Nov 7, 2022 at 5:10 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
>> wrote:
>>
>> > Hi team,
>> >
>> > Delegation token framework is going to be finished soon (added in
>> FLIP-211
>> > <
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework?src=contextnavpagetreemode
>> > >
>> > ).
>> > Previously there were concerns that the current implementation is bound
>> to
>> > Hadoop and Kerberos authentication. This is fair concern and as a result
>> > we've created a proposal to generalize the delegation token framework
>> > (practically making it authentication agnostic).
>> >
>> > This can open the path to add further non-hadoop and non-Kerberos based
>> > providers like S3 or many others.
>> >
>> > One can find the FLIP in:
>> > - Wiki:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-272%3A+Generalized+delegation+token+support
>> > - document:
>> >
>> >
>> https://docs.google.com/document/d/12tFdx1AZVuW9BjwBht_pMNELgrqro8Z5-hzWeaRY4pc/edit?usp=sharing
>> >
>> > I would like to start a discussion to make the framework better.
>> >
>> > BR,
>> > G
>> >
>>
>

Reply via email to