[ https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398208#comment-17398208 ]
Daniel Osvath edited comment on HDFS-16165 at 8/12/21, 5:57 PM: ---------------------------------------------------------------- This request is on behalf [Confluent, Inc|http://confluent.io]. was (Author: dosvath): This request is on behalf [Confluent, Inc|confluent.io]. > Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x > ------------------------------------------------------------------ > > Key: HDFS-16165 > URL: https://issues.apache.org/jira/browse/HDFS-16165 > Project: Hadoop HDFS > Issue Type: Wish > Environment: Can be reproduced in docker HDFS environment with > Kerberos > https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh > Reporter: Daniel Osvath > Priority: Major > > *Problem Description* > For more than a year Apache Kafka Connect users have been running into a > Kerberos renewal issue that causes our HDFS2 connectors to fail. > We have been able to consistently reproduce the issue under high load with 40 > connectors (threads) that use the library. When we try an alternate > workaround that uses the kerberos keytab on the system the connector operates > without issues. > We identified the root cause to be a race condition bug in the Hadoop 2.x > library that causes the ticker renewal to fail with the error below: > {code:java} > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We > reached the conclusion of the root cause once we tried the same environment > (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated > without renewal issues. Additionally, identifying that the synchronization > issue has been fixed for the newer Hadoop 3.x releases we confirmed our > hypothesis about the root cause. Request > {code} > There are many changes in HDFS 3 > [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java] > related to UGI synchronization which were done as part of > https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest > some race conditions were happening with older version, i.e HDFS 2.x Which > would explain why we can reproduce the problem with HDFS2. > For example(among others): > {code:java} > private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime) > throws IOException { > // ensure the relogin is atomic to avoid leaving credentials in an > // inconsistent state. prevents other ugi instances, SASL, and SPNEGO > // from accessing or altering credentials during the relogin. > synchronized(login.getSubjectLock()) { > // another racing thread may have beat us to the relogin. > if (login == getLogin()) { > unprotectedRelogin(login, ignoreLastLoginTime); > } > } > } > {code} > All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses > 2.10.1), on which several CDH distributions are based. > *Request* > We would like to ask for the synchronization fix to be backported to Hadoop > 2.x so that our users can operate without issues. > *Impact* > The older 2.x Hadoop version is used by our HDFS connector, which is used in > production by our community. Currently, the issue causes our HDFS connector > to fail, as it is unable to recover and renew the ticket at a later point. > Having the backported fix would allow our users to operate without issues > that require manual intervention every week (or few days in some cases). The > only workaround available to community for the issue is to run a command or > restart their workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org