Hello Lars, I can't say I've personally run HDFS on Kubernetes with Kerberos enabled. However, some of the issues you raise sound like they have some overlap with the HDFS multi-homing features:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html Have you seen this? Does anything look helpful there? Chris Nauroth On Fri, Jun 24, 2022 at 4:55 AM Lars Francke <lars.fran...@gmail.com> wrote: > Hi everyone, > > we're trying to get HDFS running in Kubernetes using Kerberos. > This has some challenges as you might expect. > We have created an issue for that including a spike: > https://issues.apache.org/jira/browse/HDFS-16577 > > Currently (as of 3.2.2, but reading through the release notes this doesn't > seem to have changed since then) DataNodes use the same properties for > deciding which port to bind each service to, as for deciding which ports > are included in the `DatanodeRegistration` sent to the NameNode. Further, > NameNodes overwrite the DataNode's IP address with the incoming address > during registration. > > Both of these prevent external users from connecting to DataNodes that are > hosted behind some sort of NAT (such as Kubernetes). > > We'd go ahead with a proper implementation/PR but we thought about asking > for comments/feedback first. Maybe someone else has already done some work > here that we might have missed etc. > > Thank you! > > Cheers, > Lars >