Thanks Sean for the insight & detailed analysis!

I think it makes sense to revert HBASE-18118.
It's not as trivial to maintain backward compatibility. Kepping "NONE" as
the default hsm doesn't harm. Having additional documentation is helpful to
avoid confusion (since "NONE" is not a supported HDFS HSM option)

On Mon, May 14, 2018 at 9:38 PM Yu Li <car...@gmail.com> wrote:

> Thanks for pointing this out Sean. IMHO, after re-checking the codes,
> HBASE-18118 needs an addendum (at least). The proposal was to set the
> storage policy of WAL directory to HOT by default, but the current
> implementation could not achieve this: it follows the old "NONE" logic to
> escape calling the API if policy matches default, but for "HOT" we need an
> explicit call to HDFS.
>
> Further more, I think the old logic to leave default to "NONE" is even
> better: if admin set hbase.root.dir to some policy like ALL_SSD the WAL
> will simply follow, and if not the policy is HOT by default
> So maybe reverting HBASE-18118 is a better choice although I could see my
> own +1 on HBASE-18118 there?... @Andrew what's your opinion here?
>
> And btw, I have opened HBASE-20479 for documenting the whole HSM solution
> in hbase including HFile/WAL/Bulkload etc. (but still haven't got enough
> time to complete it) JFYI.
>
>
> Best Regards,
> Yu
>
> On 15 May 2018 at 05:14, Sean Busbey <bus...@apache.org> wrote:
>
>> Hi folks!
>>
>> I'm trying to reason through our "set a storage policy for WALs"
>> feature and having some difficulty. I want to get some feedback before
>> I fix our docs or submit a patch to change behavior.
>>
>> Here's the history of the feature as I understand it:
>>
>> 1) Starting in HBase 1.1 you can change the setting
>> "hbase.wal.storage.policy" and if the underlying Hadoop installation
>> supports storage policies[1] then we'll call the needed APIs to set
>> policies as we create WALs.
>>
>> The main use case is to tell HDFS that you want the HBase WAL on SSDs
>> in a mixed hardware deployment.
>>
>> 2) In HBase 1.1 - 1.4, the above setting defaulted to the value
>> "NONE". Our utility code for setting storage policies expressly checks
>> any config value against the default and when it matches opts to log a
>> message rather than call the actual Hadoop API[2]. This is important
>> since "NONE" isn't actually a valid storage policy, so if we pass it
>> to the Hadoop API we'll get a bunch of log noise.
>>
>> 3) In HBase 2 and 1.5+, the setting defaults to "HOT" as of
>> HBASE-18118. Now if we were to pass the value to the Hadoop API we
>> won't get log noise. The utility code does the same check against our
>> default. The Hadoop default storage policy is "HOT" so presumably we
>> save an RPC call by not setting it again.
>>
>> ----
>>
>> If the above is correct, how do I specify that I want WALs to have a
>> storage policy of HOT in the event that HDFS already has some other
>> policy in place for a parent directory?
>>
>> e.g. In HBase 1.1 - 1.4, I can set the storage policy (via Hadoop
>> admin tools) for "/hbase" to be COLD and I can change
>> "hbase.wal.storage.policy" to HOT. In HBase 2 and 1.5+, AFAICT my WALs
>> will still have the COLD policy.
>>
>> Related, but different problem: I can use Hadoop admin tools to set
>> the storage policy for "/hbase" to be "ALL_SSD" and if I leave HBase
>> configs on defaults then I end up with WALs having "ALL_SSD" as their
>> policy in all versions. But in HBase 2 and 1.5+ the HBase configs
>> claim the policy is HOT.
>>
>> Should we always set the policy if the api is available? To avoid
>> having to double-configure in something like the second case, do we
>> still need a way to say "please do not expressly set a storage
>> policy"? (as an alternative we could just call out "be sure to update
>> your WAL config" in docs)
>>
>>
>>
>> [1]: "Storage Policy" gets called several things in Hadoop, like
>> Archival Storage, Heterogenous Storage, HSM, and "Hierarchical
>> Storage". In all cases I'm talking about the feature documented here:
>>
>>
>> http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
>>
>> http://hadoop.apache.org/docs/r3.0.2/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
>>
>> I think it's available in Hadoop 2.6.0+, 3.0.0+.
>>
>> [2]:
>>
>> In rel/1.2.0 you can see the default check by tracing starting at FSHLog:
>>
>> https://s.apache.org/BqAk
>>
>> The constants referred to in that code are in HConstants:
>>
>> https://s.apache.org/OJyR
>>
>> And in FSUtils we exit the function early when the default matches
>> what we pull out of configs:
>>
>>  https://s.apache.org/A4GA
>>
>> In rel/2.0.0 the code works essentially the same but has moved around.
>> The starting point is now AbstractFSWAL:
>>
>> https://s.apache.org/pp6T
>>
>> The constants now use HOT instead of NONE as a default:
>>
>> https://s.apache.org/7K2J
>>
>> and in CommonFSUtils we do the same early return:
>>
>> https://s.apache.org/fYKr
>>
>
>

-- 
A very happy Clouderan

Reply via email to