[ 
https://issues.apache.org/jira/browse/HBASE-30222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-30222:
-----------------------------------
    Labels: pull-request-available  (was: )

> DefaultCipherProvider re-parses HBase configuration 
> (hbase-default.xml/hbase-site.xml) on every instantiation
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-30222
>                 URL: https://issues.apache.org/jira/browse/HBASE-30222
>             Project: HBase
>          Issue Type: Bug
>          Components: encryption
>    Affects Versions: 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.15, 2.6.6
>            Reporter: Hari Krishna Dara
>            Assignee: Hari Krishna Dara
>            Priority: Major
>              Labels: pull-request-available
>
>   h3. Summary
>   {{DefaultCipherProvider}} initializes its {{conf}} field inline:
>     // DefaultCipherProvider.java
>     private Configuration conf = HBaseConfiguration.create();
>   Every time the provider is constructed, this parses hbase-default.xml and
>   hbase-site.xml off the classpath to build a fresh Configuration. That
>   Configuration is then immediately discarded: providers are constructed via
>   {{Encryption.getCipherProvider(conf)}} -> 
> {{ReflectionUtils.newInstance(class, conf)}},
>   which invokes the no-arg constructor (running the field initializer) and 
> *then*
>   calls {{setConf(conf)}} -- CipherProvider extends Configurable -- 
> overwriting the
>   field with the caller's conf. So the default-config parse performed in the
>   constructor is pure waste on every call.
>   {{Encryption.getCipherProvider(conf)}} constructs a new provider instance 
> on every
>   invocation (it does not reuse the {{getInstance()}} singleton), so the 
> wasteful
>   parse happens on every cipher resolution.
>   h3. Impact
>   Each call to {{Encryption.getCipher(conf, name)}} /
>   {{Encryption.getCipherProvider(conf)}} parses the full HBase configuration. 
> A JMH
>   microbenchmark of a single provider construction measured ~2.3 ms latency 
> and
>   ~1.1 MB of allocation, essentially all of it attributable to
>   {{HBaseConfiguration.create()}} -- confirmed by control benchmarks 
> isolating the
>   config parse from cipher construction and the AES operations. 
> (Microbenchmark on
>   Apple M4 Max / JDK 17; the absolute numbers are environment-dependent, but 
> the
>   attribution -- ~100% of the cost is the config parse -- is the point.)
>   For encryption at rest as it exists today the practical impact is small, 
> because
>   cipher resolution happens only at infrequent, amortized points:
>     - per WAL file: AbstractProtobufLogWriter, SecureProtobufLogReader
>     - per HFile / encryption-context creation: 
> SecurityUtil.createEncryptionContext
>       and related paths
>   These are once-per-file events amortized over many cells, so the cost is 
> real but
>   negligible in normal operation. However, the inefficiency is latent: any 
> code path
>   that resolves ciphers at higher frequency (e.g. per-key or per-row) would 
> pay a
>   full config parse each time, turning a harmless cost into a significant
>   latency/allocation problem. Fixing it removes a foot-gun for current and 
> future
>   callers and eliminates needless GC churn.
>   h3. Root cause
>   {{DefaultCipherProvider}} has no Configuration-accepting constructor, so
>   construction *always* calls {{HBaseConfiguration.create()}}, even though 
> every
>   production caller overwrites that value via {{setConf()}} before use. There 
> is no
>   way to suppress the parse.
>   The {{getInstance()}} singleton is used only by tests (TestAES, 
> TestCommonsAES),
>   both of which already call {{setConf(conf)}} immediately afterward; it is
>   effectively a test-only accessor but is not annotated as such.
>   h3. Proposed fix
>   Remove the inline initializer so construction does not parse config:
>     private Configuration conf;   // set via setConf() before use
>   This is safe for production: {{ReflectionUtils.newInstance()}} always calls
>   {{setConf()}} before the provider is used, and {{getConf()}} is first 
> dereferenced
>   later (in the AES/AES256GCM constructor), by which point conf is set.
>   Additionally:
>     - Annotate {{getInstance()}} as 
> @InterfaceAudience.LimitedPrivate(UNITTEST) (or
>       otherwise document it as test-only), since tests are its only callers 
> and they
>       already set conf explicitly.
>     - Optionally add a fail-fast guard in {{getConf()}}
>       (Preconditions.checkState(conf != null, ...)) so a caller that forgets
>       setConf() fails loudly instead of NPE-ing inside cipher construction --
>       preserving the "always non-null" property the field initializer 
> accidentally
>       provided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to