Hi folks!

I'm trying to reason through our "set a storage policy for WALs"
feature and having some difficulty. I want to get some feedback before
I fix our docs or submit a patch to change behavior.

Here's the history of the feature as I understand it:

1) Starting in HBase 1.1 you can change the setting
"hbase.wal.storage.policy" and if the underlying Hadoop installation
supports storage policies[1] then we'll call the needed APIs to set
policies as we create WALs.

The main use case is to tell HDFS that you want the HBase WAL on SSDs
in a mixed hardware deployment.

2) In HBase 1.1 - 1.4, the above setting defaulted to the value
"NONE". Our utility code for setting storage policies expressly checks
any config value against the default and when it matches opts to log a
message rather than call the actual Hadoop API[2]. This is important
since "NONE" isn't actually a valid storage policy, so if we pass it
to the Hadoop API we'll get a bunch of log noise.

3) In HBase 2 and 1.5+, the setting defaults to "HOT" as of
HBASE-18118. Now if we were to pass the value to the Hadoop API we
won't get log noise. The utility code does the same check against our
default. The Hadoop default storage policy is "HOT" so presumably we
save an RPC call by not setting it again.

----

If the above is correct, how do I specify that I want WALs to have a
storage policy of HOT in the event that HDFS already has some other
policy in place for a parent directory?

e.g. In HBase 1.1 - 1.4, I can set the storage policy (via Hadoop
admin tools) for "/hbase" to be COLD and I can change
"hbase.wal.storage.policy" to HOT. In HBase 2 and 1.5+, AFAICT my WALs
will still have the COLD policy.

Related, but different problem: I can use Hadoop admin tools to set
the storage policy for "/hbase" to be "ALL_SSD" and if I leave HBase
configs on defaults then I end up with WALs having "ALL_SSD" as their
policy in all versions. But in HBase 2 and 1.5+ the HBase configs
claim the policy is HOT.

Should we always set the policy if the api is available? To avoid
having to double-configure in something like the second case, do we
still need a way to say "please do not expressly set a storage
policy"? (as an alternative we could just call out "be sure to update
your WAL config" in docs)



[1]: "Storage Policy" gets called several things in Hadoop, like
Archival Storage, Heterogenous Storage, HSM, and "Hierarchical
Storage". In all cases I'm talking about the feature documented here:

http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
http://hadoop.apache.org/docs/r3.0.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html

I think it's available in Hadoop 2.6.0+, 3.0.0+.

[2]:

In rel/1.2.0 you can see the default check by tracing starting at FSHLog:

https://s.apache.org/BqAk

The constants referred to in that code are in HConstants:

https://s.apache.org/OJyR

And in FSUtils we exit the function early when the default matches
what we pull out of configs:

 https://s.apache.org/A4GA

In rel/2.0.0 the code works essentially the same but has moved around.
The starting point is now AbstractFSWAL:

https://s.apache.org/pp6T

The constants now use HOT instead of NONE as a default:

https://s.apache.org/7K2J

and in CommonFSUtils we do the same early return:

https://s.apache.org/fYKr

Reply via email to