Thanks Sean for the insight & detailed analysis! I think it makes sense to revert HBASE-18118. It's not as trivial to maintain backward compatibility. Kepping "NONE" as the default hsm doesn't harm. Having additional documentation is helpful to avoid confusion (since "NONE" is not a supported HDFS HSM option)
On Mon, May 14, 2018 at 9:38 PM Yu Li <car...@gmail.com> wrote: > Thanks for pointing this out Sean. IMHO, after re-checking the codes, > HBASE-18118 needs an addendum (at least). The proposal was to set the > storage policy of WAL directory to HOT by default, but the current > implementation could not achieve this: it follows the old "NONE" logic to > escape calling the API if policy matches default, but for "HOT" we need an > explicit call to HDFS. > > Further more, I think the old logic to leave default to "NONE" is even > better: if admin set hbase.root.dir to some policy like ALL_SSD the WAL > will simply follow, and if not the policy is HOT by default > So maybe reverting HBASE-18118 is a better choice although I could see my > own +1 on HBASE-18118 there?... @Andrew what's your opinion here? > > And btw, I have opened HBASE-20479 for documenting the whole HSM solution > in hbase including HFile/WAL/Bulkload etc. (but still haven't got enough > time to complete it) JFYI. > > > Best Regards, > Yu > > On 15 May 2018 at 05:14, Sean Busbey <bus...@apache.org> wrote: > >> Hi folks! >> >> I'm trying to reason through our "set a storage policy for WALs" >> feature and having some difficulty. I want to get some feedback before >> I fix our docs or submit a patch to change behavior. >> >> Here's the history of the feature as I understand it: >> >> 1) Starting in HBase 1.1 you can change the setting >> "hbase.wal.storage.policy" and if the underlying Hadoop installation >> supports storage policies[1] then we'll call the needed APIs to set >> policies as we create WALs. >> >> The main use case is to tell HDFS that you want the HBase WAL on SSDs >> in a mixed hardware deployment. >> >> 2) In HBase 1.1 - 1.4, the above setting defaulted to the value >> "NONE". Our utility code for setting storage policies expressly checks >> any config value against the default and when it matches opts to log a >> message rather than call the actual Hadoop API[2]. This is important >> since "NONE" isn't actually a valid storage policy, so if we pass it >> to the Hadoop API we'll get a bunch of log noise. >> >> 3) In HBase 2 and 1.5+, the setting defaults to "HOT" as of >> HBASE-18118. Now if we were to pass the value to the Hadoop API we >> won't get log noise. The utility code does the same check against our >> default. The Hadoop default storage policy is "HOT" so presumably we >> save an RPC call by not setting it again. >> >> ---- >> >> If the above is correct, how do I specify that I want WALs to have a >> storage policy of HOT in the event that HDFS already has some other >> policy in place for a parent directory? >> >> e.g. In HBase 1.1 - 1.4, I can set the storage policy (via Hadoop >> admin tools) for "/hbase" to be COLD and I can change >> "hbase.wal.storage.policy" to HOT. In HBase 2 and 1.5+, AFAICT my WALs >> will still have the COLD policy. >> >> Related, but different problem: I can use Hadoop admin tools to set >> the storage policy for "/hbase" to be "ALL_SSD" and if I leave HBase >> configs on defaults then I end up with WALs having "ALL_SSD" as their >> policy in all versions. But in HBase 2 and 1.5+ the HBase configs >> claim the policy is HOT. >> >> Should we always set the policy if the api is available? To avoid >> having to double-configure in something like the second case, do we >> still need a way to say "please do not expressly set a storage >> policy"? (as an alternative we could just call out "be sure to update >> your WAL config" in docs) >> >> >> >> [1]: "Storage Policy" gets called several things in Hadoop, like >> Archival Storage, Heterogenous Storage, HSM, and "Hierarchical >> Storage". In all cases I'm talking about the feature documented here: >> >> >> http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html >> >> http://hadoop.apache.org/docs/r3.0.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html >> >> I think it's available in Hadoop 2.6.0+, 3.0.0+. >> >> [2]: >> >> In rel/1.2.0 you can see the default check by tracing starting at FSHLog: >> >> https://s.apache.org/BqAk >> >> The constants referred to in that code are in HConstants: >> >> https://s.apache.org/OJyR >> >> And in FSUtils we exit the function early when the default matches >> what we pull out of configs: >> >> https://s.apache.org/A4GA >> >> In rel/2.0.0 the code works essentially the same but has moved around. >> The starting point is now AbstractFSWAL: >> >> https://s.apache.org/pp6T >> >> The constants now use HOT instead of NONE as a default: >> >> https://s.apache.org/7K2J >> >> and in CommonFSUtils we do the same early return: >> >> https://s.apache.org/fYKr >> > > -- A very happy Clouderan