First and foremost, I'll echo the comments about deprecation. IMO, we must
deprecate these implementations in a release before completely removing
them in a subsequent release.

I agree that the ZK and Journal implementations should be deprecated for
the reasons stated.

Concerning replacing the alias-based implementation with Derby, I
share some of the same concerns expressed by Larry:
- Attila has mentioned that Derby supports data encryption, but do we
enable it by default? Should we require it always?
- The questions around copying Derby data remains unanswered, at least
partially if the migration utility proposal was intended to address this
topic.


On Mon, Nov 13, 2023 at 8:02 AM Sandor Molnar <smol...@cloudera.com.invalid>
wrote:

> Hello folks!
>
> I'm starting this thread because I am convinced we should remove the
> following TokenStateService implementations:
> - AliasBasedtokenStateService
> - ZookeeperTokenStateService
> - JournalBasedTokenStateService
>
> The reason behind this idea for the last two implementations in the above
> list is quite simple:
>
> 1. ZookeeperTokenStateService was our first approach to provide HA support
> for Knox Token Integration. However, our internal tests have shown that ZK
> is just simply not the right tool for that feature. Eventual consistency is
> only one part of this issue (we could make this work with re-tried ZK
> queries). Performance-wise ZK proved to be a wrong decision. In our test
> environment, where hundreds of tokens were generated in every minute, ZK
> was not enough to scale.
>
> 2. JournalBasedTokensSateService is
>   2.1 insecure (it stores plain data on the FS),
>   2.2 missing features (no impersonation or SSO Cookie support)
>
> In the case of the AliasBasedtokenStateService, the reason is not that
> simple. It's true, that keystore-related operations are expensive, but the
> background thread that actually persists the token state improved a lot in
> this respect. However, it's still slow compared to the supported databases
> we added for the JDBC implementation when it comes to token verification.
> In addition to that, the current implementation creates at least 3 aliases
> per token, which makes the __gateway really big in case of lots of tokens.
> Even worse, we try to read all tokens into memory from __gateway credential
> store in a background thread that also consumes memory, CPU which we could
> avoid.
> To be honest, I don't see any reason why could not we achieve the same
> functionality with a pre-configured Derby database that stores its data in
> a dedicated sub-folder within the KNOX_DATA_DIR. This would be the default
> choice, so users will still not need to configure everything for the
> KnoxToken service even if token state management is enabled.
>
> We could also write a small KNOX CLI command to migrate existing tokens
> from keystores to Derby upon upgrade.
>
> Advantages of the above:
> - only one implementation will be kept (JDBCTokenStateService) which is
> proven to be robust enough and can scale well
> - easier to maintain the product
> - easier to troubleshoot in PROD environments (Derby has very powerful
> tools to connect and run SQL queries)
> - eliminate background threads which make debugging hard,
> resource-consuming, and adds complexity
> - the non-desired side effects of reading lots of tokens into memory from
> __gateway credential store that may make the
>
> I'm curious about what you think of the above and I'd like to hear back
> from you with your suggestions and ideas.
>
> Cheers,
> Sandor
>

Reply via email to