Hi everyone, we're trying to get HDFS running in Kubernetes using Kerberos. This has some challenges as you might expect. We have created an issue for that including a spike: https://issues.apache.org/jira/browse/HDFS-16577
Currently (as of 3.2.2, but reading through the release notes this doesn't seem to have changed since then) DataNodes use the same properties for deciding which port to bind each service to, as for deciding which ports are included in the `DatanodeRegistration` sent to the NameNode. Further, NameNodes overwrite the DataNode's IP address with the incoming address during registration. Both of these prevent external users from connecting to DataNodes that are hosted behind some sort of NAT (such as Kubernetes). We'd go ahead with a proper implementation/PR but we thought about asking for comments/feedback first. Maybe someone else has already done some work here that we might have missed etc. Thank you! Cheers, Lars