gaborgsomogyi commented on code in PR #19372: URL: https://github.com/apache/flink/pull/19372#discussion_r849691193
########## flink-runtime/src/main/java/org/apache/flink/runtime/security/token/KerberosDelegationTokenManager.java: ########## @@ -110,13 +139,62 @@ public void obtainDelegationTokens(Credentials credentials) { * task managers. */ @Override - public void start() { - LOG.info("Starting renewal task"); + public void start() throws Exception { + checkNotNull(scheduledExecutor, "Scheduled executor must not be null"); + checkNotNull(executorService, "Executor service must not be null"); + checkState(tgtRenewalFuture == null, "Manager is already started"); + + if (!kerberosRenewalPossibleProvider.isRenewalPossible()) { + LOG.info("Renewal is NOT possible, skipping to start renewal task"); + return; + } + + startTGTRenewal(); + } + + private void startTGTRenewal() throws IOException { + LOG.debug("Starting credential renewal task"); + + UserGroupInformation currentUser = UserGroupInformation.getCurrentUser(); + if (currentUser.isFromKeytab()) { + // In Hadoop 2.x, renewal of the keytab-based login seems to be automatic, but in Hadoop + // 3.x, it is configurable (see hadoop.kerberos.keytab.login.autorenewal.enabled, added + // in HADOOP-9567). This task will make sure that the user stays logged in regardless of + // that configuration's value. Note that checkTGTAndReloginFromKeytab() is a no-op if + // the TGT does not need to be renewed yet. + long tgtRenewalPeriod = configuration.get(KERBEROS_RELOGIN_PERIOD).toMillis(); + tgtRenewalFuture = + scheduledExecutor.scheduleAtFixedRate( + () -> + executorService.execute( + () -> { + try { + LOG.debug("Renewing TGT"); + currentUser.checkTGTAndReloginFromKeytab(); Review Comment: I've had a deeper look and let's summarize my findings: * No mocking framework allowed * `UserGroupInformation` class has no public constructor so instance creation is possible with reflection only which I'm pretty sure won't initialize the instance properly. As a result all places where `UserGroupInformation` is used need to be hacked around. * If we go to the `KerberosClient` direction we would see functions like `KerberosClient.hasCurrentUserKerberosCredentials()` instead of `UserGroupInformation.getCurrentUser().hasKerberosCredentials()` which is hacky but doable * But as soon as the condition gets complicated like this for example: `Option(currentUser.getRealUser()).getOrElse(currentUser).hasKerberosCredentials()` how should be the `KerberosClient` be named? In Spark and other components I've tried to mock/reimplement/modify/make `UserGroupInformation` testable w/o any success. I think we have the same situation here unless you have a clear doable suggestion. I think realistically we have the following possibilities for this case: * Mock `UserGroupInformation.getCurrentUser()` static function and we give back a mocked `UserGroupInformation` instance -> Here powermock runner with junit5 is simply not working and mockito is too old to mock static functions. All in all here only the mockito version upgrade could be a potential solution. * Use reflection to call `UserGroupInformation` hidden constructor -> here I have no idea what will happen, I mean how well initialized the instance will be + how to modify the instance behavior to give back something hardcoded * We don't write automated tests for places where `UserGroupInformation` is embedded * We introduce `KerberosClient` and we create functions like `currentUserRealUserOrElseCurrentUserHasKerberosCredentials()` from expressions like `Option(currentUser.getRealUser()).getOrElse(currentUser).hasKerberosCredentials()` Well, none of the proposals looks good but here in Flink I've not found the holy grail just like in other places. In Spark bullet point 3 has been implemented which is definitely debatable. The fact is that there with powermock bullet point 1 would be possible but nobody ever done that. There is a reason why [such](https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/ugi.html) writings are created: `If there is one class guaranteed to strike fear into anyone with experience in Hadoop+Kerberos code it is UserGroupInformation, abbreviated to "UGI"` Let's hear your opinion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org