Re: [DISCUSS] FLIP-272: Generalized delegation token support

Gabor Somogyi Mon, 07 Nov 2022 09:20:50 -0800

Oh gosh, copied wrong config keys so fixed my last mail with green.

On Mon, Nov 7, 2022 at 6:07 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
wrote:


> Hi Matyas,
>
> In the meantime I was thinking about the per provider re-obtain feature
> and here are my thoughts related that:
> * I think it's a good feature in general but as mentioned I would add it
> in a separate FLIP
> * In case of Hadoop providers it just wouldn't work (HBase doesn't have
> end timestamp so actually HDFS is triggering the re-obtain) but in all
> non-hadoop providers it's a good idea
> * Adding "security.delegation.tokens.renewal.retry.backoff" and
> "security.delegation.tokens.renewal.time-ratio" is needed but as you
> mentioned fallback to kerberos configs just doesn't make sense
> * In a later FLIP we can add per provider 
> "security.delegation.token.provider.{providerName}.renewal.retry.backoff"
> and/or "security.delegation.token.provider.{providerName}.renewal.time-ratio"
> for non-hadoop providers
> * This is an additional feature which justifies to separate Hadoop and
> non-hadoop providers on the API level
>
> Waiting on your opinion.
>
> G
>
>
> On Mon, Nov 7, 2022 at 4:17 PM Gabor Somogyi <gabor.g.somo...@gmail.com>
> wrote:
>
>> Hi Matyas,
>>
>> Thanks for your comments, answered inline.
>>
>> G
>>
>>
>> On Mon, Nov 7, 2022 at 2:58 PM Őrhidi Mátyás <matyas.orh...@gmail.com>
>> wrote:
>>
>>> Hi Gabor,
>>>
>>> Thanks for driving this effort! A few thoughts on the topic:
>>> - Could you please add a few examples of the delegation token providers
>>> we
>>> expected to be added in the near future? Ideally these providers could be
>>> configured independently from each other.  However the configuration
>>> defaults mentioned in the FLIP are derived from hadoop configuration. I
>>> don't see the point here.
>>>
>> A clear plan is to add S3 now and Kafka possibly later on.
>>
>> S3 looks straightforward but that doesn't fit into the existing framework.
>> On Kafka side I've added the Kafka provider to Spark so I can imagine
>> similar solution w/ minor differences.
>> Please see the Spark solution:
>> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#security
>> Here the minor planned difference is that Spark handles Kafka token
>> inside UGI which is not nice but works.
>> I've just seen similar generalization effort on the Spark side too so the
>> Kafka part may or may not change there.
>>
>> Not sure what you mean configs are derived from Hadoop.
>>
>> What I can think of is that
>> "security.delegation.tokens.renewal.retry.backoff" is defaulting to
>> "security.kerberos.tokens.renewal.retry.backoff".
>> This is basically happening for backward compatibility purposes.
>>
>> The other thing what I can think of is that you miss the independent
>> provider token obtain functionality.
>> When I mean independent configuration I mean each provider has it's own
>> set of keys which doesn't collide
>> but the voting system when token obtain must happen remains and not
>> touched in this FLIP.
>> Under voting system I mean each provider may send back its end timestamp
>> and the lowest wins
>> (at that timestamp all tokens are going to be re-obtained).
>> If that's what you mean we can think about solutions but it has nothing
>> to do with framework generalization.
>>
>>
>>> - Are we planning to support such scenarios where we need to read/write
>>> from different authentication realms from the same application. Two
>>> Hadoop
>>> clusters, Kafka clusters etc? This would need an authentication provider
>>> per source/sink.
>>>
>>>
>> It doesn't need 2 different provider per source/sink to do Kafka to
>> Kafka. Such cases cross-realm trust can be configured w/ principal mapping.
>> Please see the details here:
>> https://gist.github.com/gaborgsomogyi/c636f352ccec7730ff41ac1d524cb87d
>> Even if the gist was originally created for Spark I tend to do the same
>> here in the future.
>>
>> Just to make an extract here Kafka principal mapping looks like this:
>> sasl.kerberos.principal.to.local.rules=RULE:[1:$1@$0](.*@
>> DT2HOST.COMPANY.COM)s/@.*//,DEFAULT
>>
>> Providing 2 users in Hadoop world is just impossible because UGI is
>> basically a singleton.
>> Of course everything can be hacked around (for ex. changing current user
>> on-the-fly) but that would be such a
>> headache what I think we must avoid. It would end-up in synchronization
>> and performance hell.
>> I've made some experiments in my Spark era and this would be the same
>> here :)
>>
>>
>>> Thanks,
>>> Matyas
>>>
>>>
>>>
>>> On Mon, Nov 7, 2022 at 5:10 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
>>> wrote:
>>>
>>> > Hi team,
>>> >
>>> > Delegation token framework is going to be finished soon (added in
>>> FLIP-211
>>> > <
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework?src=contextnavpagetreemode
>>> > >
>>> > ).
>>> > Previously there were concerns that the current implementation is
>>> bound to
>>> > Hadoop and Kerberos authentication. This is fair concern and as a
>>> result
>>> > we've created a proposal to generalize the delegation token framework
>>> > (practically making it authentication agnostic).
>>> >
>>> > This can open the path to add further non-hadoop and non-Kerberos based
>>> > providers like S3 or many others.
>>> >
>>> > One can find the FLIP in:
>>> > - Wiki:
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-272%3A+Generalized+delegation+token+support
>>> > - document:
>>> >
>>> >
>>> https://docs.google.com/document/d/12tFdx1AZVuW9BjwBht_pMNELgrqro8Z5-hzWeaRY4pc/edit?usp=sharing
>>> >
>>> > I would like to start a discussion to make the framework better.
>>> >
>>> > BR,
>>> > G
>>> >
>>>
>>

Re: [DISCUSS] FLIP-272: Generalized delegation token support

Reply via email to