[
https://issues.apache.org/jira/browse/HBASE-30222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HBASE-30222:
-----------------------------------
Labels: pull-request-available (was: )
> DefaultCipherProvider re-parses HBase configuration
> (hbase-default.xml/hbase-site.xml) on every instantiation
> -------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-30222
> URL: https://issues.apache.org/jira/browse/HBASE-30222
> Project: HBase
> Issue Type: Bug
> Components: encryption
> Affects Versions: 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.15, 2.6.6
> Reporter: Hari Krishna Dara
> Assignee: Hari Krishna Dara
> Priority: Major
> Labels: pull-request-available
>
> h3. Summary
> {{DefaultCipherProvider}} initializes its {{conf}} field inline:
> // DefaultCipherProvider.java
> private Configuration conf = HBaseConfiguration.create();
> Every time the provider is constructed, this parses hbase-default.xml and
> hbase-site.xml off the classpath to build a fresh Configuration. That
> Configuration is then immediately discarded: providers are constructed via
> {{Encryption.getCipherProvider(conf)}} ->
> {{ReflectionUtils.newInstance(class, conf)}},
> which invokes the no-arg constructor (running the field initializer) and
> *then*
> calls {{setConf(conf)}} -- CipherProvider extends Configurable --
> overwriting the
> field with the caller's conf. So the default-config parse performed in the
> constructor is pure waste on every call.
> {{Encryption.getCipherProvider(conf)}} constructs a new provider instance
> on every
> invocation (it does not reuse the {{getInstance()}} singleton), so the
> wasteful
> parse happens on every cipher resolution.
> h3. Impact
> Each call to {{Encryption.getCipher(conf, name)}} /
> {{Encryption.getCipherProvider(conf)}} parses the full HBase configuration.
> A JMH
> microbenchmark of a single provider construction measured ~2.3 ms latency
> and
> ~1.1 MB of allocation, essentially all of it attributable to
> {{HBaseConfiguration.create()}} -- confirmed by control benchmarks
> isolating the
> config parse from cipher construction and the AES operations.
> (Microbenchmark on
> Apple M4 Max / JDK 17; the absolute numbers are environment-dependent, but
> the
> attribution -- ~100% of the cost is the config parse -- is the point.)
> For encryption at rest as it exists today the practical impact is small,
> because
> cipher resolution happens only at infrequent, amortized points:
> - per WAL file: AbstractProtobufLogWriter, SecureProtobufLogReader
> - per HFile / encryption-context creation:
> SecurityUtil.createEncryptionContext
> and related paths
> These are once-per-file events amortized over many cells, so the cost is
> real but
> negligible in normal operation. However, the inefficiency is latent: any
> code path
> that resolves ciphers at higher frequency (e.g. per-key or per-row) would
> pay a
> full config parse each time, turning a harmless cost into a significant
> latency/allocation problem. Fixing it removes a foot-gun for current and
> future
> callers and eliminates needless GC churn.
> h3. Root cause
> {{DefaultCipherProvider}} has no Configuration-accepting constructor, so
> construction *always* calls {{HBaseConfiguration.create()}}, even though
> every
> production caller overwrites that value via {{setConf()}} before use. There
> is no
> way to suppress the parse.
> The {{getInstance()}} singleton is used only by tests (TestAES,
> TestCommonsAES),
> both of which already call {{setConf(conf)}} immediately afterward; it is
> effectively a test-only accessor but is not annotated as such.
> h3. Proposed fix
> Remove the inline initializer so construction does not parse config:
> private Configuration conf; // set via setConf() before use
> This is safe for production: {{ReflectionUtils.newInstance()}} always calls
> {{setConf()}} before the provider is used, and {{getConf()}} is first
> dereferenced
> later (in the AES/AES256GCM constructor), by which point conf is set.
> Additionally:
> - Annotate {{getInstance()}} as
> @InterfaceAudience.LimitedPrivate(UNITTEST) (or
> otherwise document it as test-only), since tests are its only callers
> and they
> already set conf explicitly.
> - Optionally add a fail-fast guard in {{getConf()}}
> (Preconditions.checkState(conf != null, ...)) so a caller that forgets
> setConf() fails loudly instead of NPE-ing inside cipher construction --
> preserving the "always non-null" property the field initializer
> accidentally
> provided.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)