Thanks for pointing this out Sean. IMHO, after re-checking the codes, HBASE-18118 needs an addendum (at least). The proposal was to set the storage policy of WAL directory to HOT by default, but the current implementation could not achieve this: it follows the old "NONE" logic to escape calling the API if policy matches default, but for "HOT" we need an explicit call to HDFS.
Further more, I think the old logic to leave default to "NONE" is even better: if admin set hbase.root.dir to some policy like ALL_SSD the WAL will simply follow, and if not the policy is HOT by default So maybe reverting HBASE-18118 is a better choice although I could see my own +1 on HBASE-18118 there?... @Andrew what's your opinion here? And btw, I have opened HBASE-20479 for documenting the whole HSM solution in hbase including HFile/WAL/Bulkload etc. (but still haven't got enough time to complete it) JFYI. Best Regards, Yu On 15 May 2018 at 05:14, Sean Busbey <bus...@apache.org> wrote: > Hi folks! > > I'm trying to reason through our "set a storage policy for WALs" > feature and having some difficulty. I want to get some feedback before > I fix our docs or submit a patch to change behavior. > > Here's the history of the feature as I understand it: > > 1) Starting in HBase 1.1 you can change the setting > "hbase.wal.storage.policy" and if the underlying Hadoop installation > supports storage policies[1] then we'll call the needed APIs to set > policies as we create WALs. > > The main use case is to tell HDFS that you want the HBase WAL on SSDs > in a mixed hardware deployment. > > 2) In HBase 1.1 - 1.4, the above setting defaulted to the value > "NONE". Our utility code for setting storage policies expressly checks > any config value against the default and when it matches opts to log a > message rather than call the actual Hadoop API[2]. This is important > since "NONE" isn't actually a valid storage policy, so if we pass it > to the Hadoop API we'll get a bunch of log noise. > > 3) In HBase 2 and 1.5+, the setting defaults to "HOT" as of > HBASE-18118. Now if we were to pass the value to the Hadoop API we > won't get log noise. The utility code does the same check against our > default. The Hadoop default storage policy is "HOT" so presumably we > save an RPC call by not setting it again. > > ---- > > If the above is correct, how do I specify that I want WALs to have a > storage policy of HOT in the event that HDFS already has some other > policy in place for a parent directory? > > e.g. In HBase 1.1 - 1.4, I can set the storage policy (via Hadoop > admin tools) for "/hbase" to be COLD and I can change > "hbase.wal.storage.policy" to HOT. In HBase 2 and 1.5+, AFAICT my WALs > will still have the COLD policy. > > Related, but different problem: I can use Hadoop admin tools to set > the storage policy for "/hbase" to be "ALL_SSD" and if I leave HBase > configs on defaults then I end up with WALs having "ALL_SSD" as their > policy in all versions. But in HBase 2 and 1.5+ the HBase configs > claim the policy is HOT. > > Should we always set the policy if the api is available? To avoid > having to double-configure in something like the second case, do we > still need a way to say "please do not expressly set a storage > policy"? (as an alternative we could just call out "be sure to update > your WAL config" in docs) > > > > [1]: "Storage Policy" gets called several things in Hadoop, like > Archival Storage, Heterogenous Storage, HSM, and "Hierarchical > Storage". In all cases I'm talking about the feature documented here: > > http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/ > hadoop-hdfs/ArchivalStorage.html > http://hadoop.apache.org/docs/r3.0.2/hadoop-project-dist/ > hadoop-hdfs/ArchivalStorage.html > > I think it's available in Hadoop 2.6.0+, 3.0.0+. > > [2]: > > In rel/1.2.0 you can see the default check by tracing starting at FSHLog: > > https://s.apache.org/BqAk > > The constants referred to in that code are in HConstants: > > https://s.apache.org/OJyR > > And in FSUtils we exit the function early when the default matches > what we pull out of configs: > > https://s.apache.org/A4GA > > In rel/2.0.0 the code works essentially the same but has moved around. > The starting point is now AbstractFSWAL: > > https://s.apache.org/pp6T > > The constants now use HOT instead of NONE as a default: > > https://s.apache.org/7K2J > > and in CommonFSUtils we do the same early return: > > https://s.apache.org/fYKr >