Hi all, I've written up a status report for the current state of Hadoop 3 on the wiki. I've also pasted it below for your convenience.
Cheers, Andrew https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates 2017-08-25 Another month flew by without an update. This is a big one. Red flags: - 11 blockers still on the dashboard, with some filed recently. Need to burn these down. - There are many branch merges proposals flying around for features that were not originally being tracked for beta1 and GA. Introducing new code always comes with risk, so I'm working with the different contributors involved to discuss target versions, confirm readiness, and define quality bars for merge. Miscellaneous blockers: - HADOOP-14284 <https://issues.apache.org/jira/browse/HADOOP-14284> (Shade Guava everywhere): We have agreement to shade the yarn client JAR. Shading hadoop-hdfs is still being discussed. - HADOOP-13363 <https://issues.apache.org/jira/browse/HADOOP-13363> (Upgrade to protobuf 3): Waiting on the Guava shading first. - YARN-7076 <https://issues.apache.org/jira/browse/YARN-7076>: New blocker, we need an assignee. - YARN-7094 <https://issues.apache.org/jira/browse/YARN-7094> (Document that server-side graceful decom is currently not recommended): Robert has a patch up, needs review. This is a stopgap for the old blocker YARN-5464. - YARN-5536 <https://issues.apache.org/jira/browse/YARN-5536> (Multiple format support (JSON, etc.) for exclude node file in NM graceful decommission with timeout): Robert has a proposal that needs to be pushed on. beta1 features: - Erasure coding - There are three must-dos. Two have patches, one might not be a must-do. - I pinged the pluggable policy JIRA to see if metadata and API compatibility is complete. - Addressing incompatible changes (YARN-6142 and HDFS-11096) - Sean has HDFS rolling upgrade scripts up, waiting on Ray to add some YARN/MR coverage too. - Need to do a final runthrough of the JACC reports for YARN and HDFS. - Classpath isolation (HADOOP-11656) - We're down to the wire on this, I pinged Sean for an update. - Compat guide (HADOOP-13714 <https://issues.apache.org/jira/browse/HADOOP-13714>) - I pinged the JIRA on this too, no updated patch since May Features under discussion: I discussed with a number of lead contributors on these features that were previously not on my radar. 3.0.0-beta1: - YARN native services (Jian He) - I was convinced that this is very separate from the core. I'll get someone from Cloudera to run it through our integration tests to verify it doesn't break anything downstream, then happy to merge. - TSv2 alpha 2 (Vrushali C) - Despite being called "alpha 2", this is more like "beta" in terms of readiness. Twitter is planning to roll it out to production. Seems quite done. - I double checked with Haibo, and he successfully ran it through our internal integration testing. 3.0.0 GA: - Resource profiles (Wangda Tan) - Alpha feature, APIs are not stable yet. Has some compatible PB changes, will verify rolling upgrade from branch-2. Touches some core parts of YARN. - Decided that it's too close to beta1 for this, we're going to test it a lot and make sure it's ready for 3.0.0 GA. - HDFS router-based federation (Chris Douglas) - This is like YARN federation, very separate and doesn't add new APIs, run in production at MSFT. - If it passes Cloudera internal integration testing, I'm fine putting this in for GA. 3.1.0: - Storage Policy Satisfier (Uma Gangumalla) - We're resolving some design discussions on JIRA. Plan is to do some MVP work on the API to get this into 3.1, and if we're happy with the second phase, consider for 3.0 GA. - HDFS tiered storage (Chris Douglas): - This touches some core stuff, and the write path is still being worked on. Still somewhat useful with just the read path. Targeting at 3.1.0 gives enough time to wrap this up.