I'd like to propose dropping support for Hadoop 2 in Druid 28. Not the very next release (which I assume will be Druid 27) but the one after that, likely late 2023 timeframe.
In 2021, we had a discussion about moving away from Hadoop 2: https://lists.apache.org/thread/zmc389trnkh6x444so8mdb2h0x0noqq4. For various reasons, it didn't seem like the right time. However, I believe now is the right time: 1) We didn't support Hadoop 3 in 2021, but we support it now. There is now a Hadoop 3 build profile, as well as convenience binaries on https://druid.apache.org/downloads.html. 2) We have SQL-based ingest with MSQ tasks, which provides a built-in / scalable / robust alternative to using Hadoop at all. 3) It has been an additional two years. Hadoop 2 is that much older, that much more time has passed since it was superseded by Hadoop 3, and people have had that much more time to migrate. 4) The original main reason for wanting to move away from Hadoop 2 is still relevant. It keeps us on various old dependencies, including an ancient version of Guava, which in turn has been keeping us on an ancient version of Calcite. The Calcite community has graciously decided to support this old version of Guava for at least one release, but plans to drop support by Calcite 1.36, leaving us back in the same position. Managing this situation is time-consuming for both Druid and Calcite maintainers. 5) Other solutions beyond dropping Hadoop 2 support were proposed in 2021, such as reworking Hadoop support to be purely extension based, and reworking extensions to be more isolated from each other. However, these are both substantially more complex than dropping support, and in the two years since the original thread, these more complex solutions have not been implemented. So, I think we need to move on with the simpler solution of dropping support. Gian