Agreed the definition of LP(CONFIG) would need to be tweaked.

We do not quite have enough support for analyzing configuration key set changes 
in the current API audit tool. Removal of a public constant field from an LP 
annotated class would be flagged but a modification of the constant would not. 
However it is a OSS project consisting of Perl scripts that might accept a 
contribution or at least could be patched or extended. Or we can build a new 
audit tool for the purpose, which is what I would recommend. The Perl based 
tool shells out to javah and is quite expensive, and on some platforms, 
depending on comparison, can fail due to command line length limits. A Java 
tool would likely be more efficient at processing Java class annotations and 
introspecting string constants. 

> 
> On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <[email protected]> wrote:
> 
> On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <[email protected]>
> wrote:
> 
>> Although collecting all configuration keys into a single file is
>> definitely an anti-pattern I’m not sure the same is true of package or
>> Maven module level aggregation classes marked LP(CONFIG).Somewhat like
>> DFSConfigKeys but geared toward our API/release auditing.
>> 
>> This would seem virtuous for a couple of reasons. Relevant configuration
>> key constants for the package or module would be grouped in a well known
>> place for users and developers alike. The LP(CONFIG) designation would
>> require developers to think about deprecation cycle if contemplating a
>> change, thus providing some back pressure against snap decisions. Or, if
>> not then, then at release candidate evaluation time, user configuration
>> breaking changes could be caught be a release automation tool that diffs
>> LP(CONFIG) annotated classes. Something like this would improve the state
>> of configuration key management quite dramatically, because currently it’s
>> ad hoc.
>> 
> 
> I am supportive of trying such an effort. We'll need to tweak the meaning
> of LP(CONFIG) as we define it currently, but that can be done. I don't know
> what our current tools do or assume regarding this annotation. I think
> there is something custom happening in the compatibility reports that we
> generate as part of each RC.
> 
>>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault 
>>> <[email protected]>
>>> wrote:
>>> 
>>> Thanks for your detailed response, Nick!
>>> 
>>>> I think that none of my comments address your intended topic: how do we
>>> publish our configuration points as an API that can be consumed by user
>>> applications? (Do I have that correct?)
>>> 
>>> This is a good summary, and I appreciate the other
>> thoughts/clarifications
>>> as well. I also realize this is probably hard to get perfect and any
>> choice
>>> must be weighed against the effort necessary to change/maintain.
>>> 
>>> One example I know is Hadoop/HDFS, and I bet some on this list have much
>>> more knowledge of that project's history than I do. For HDFS they have
>>> DFSConfigKeys which in my experience does seem to include most configs. I
>>> believe they even have unit tests which verify that all configs in the
>>> various site.xml files are represented in code. In more recent versions
>>> they have split that class up into smaller groupings, for example
>>> DfsClientConf and the various inner classes there.
>>> 
>>> In a vacuum, from a code design perspective, I'm not commenting on
>> whether
>>> that's a good or bad pattern. I also don't know of the politics of the
>>> project or what sorts of pain points they've discovered in that pattern
>>> over the years. But from *user's perspective*, this is a handy way to
>>> handle things in my opinion.
>>> 
>>> At my company, in general we try to avoid "magic strings" [1] and instead
>>> always try to use constants. We can and do define our own constants to
>> try
>>> to mirror some of the "private" magic strings in the hbase client. This
>> is
>>> better than nothing but even better would be to use hbase-provided
>>> constants so that we can build more defensive applications, using the
>>> compiler to verify that the configs we reference still do anything.
>>> 
>>> I unfortunately can't speak to the original issues with HConstants that
>>> turned it into an anti-pattern. What I do notice is there are definitely
>>> examples in the hbase codebase of duplicated config strings, one of which
>>> is called out in one of the jiras I linked in my original email. These
>> are
>>> just bugs waiting to happen in my opinion, either for hbase itself or for
>>> users which may reference them.
>>> 
>>> [1] https://deviq.com/antipatterns/magic-strings
>>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <[email protected]>
>> wrote:
>>>> 
>>>> Hi Bryan,
>>>> 
>>>> Thanks for bringing this up.
>>>> 
>>>> I agree with Duo (and I think we have this settled as project-wide
>>>> consensus) that HConstants is/was an anti-pattern, that we are actively
>>>> against adding new fields there, and opportunistically removing fields
>> when
>>>> we can. Further, the documented meaning of the
>>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear
>> in
>>>> user facing configuration files", so this isn't really appropriate for
>>>> marking a field that exposes a configuration key to user applications. I
>>>> will also note that there appears to be two categories of tunable
>>>> parameters -- configuration points that we expect users to tweak are
>>>> catalogued and documented in the book [0] and everything else is left to
>>>> the obscurity of code-grep.
>>>> 
>>>> While we are actively squashing use of fields in HConstants, I don't
>> know
>>>> that we have proposed some alternative to the user community. For my
>> part,
>>>> when I write and review code that involves configuration keys, I
>> generally
>>>> implement the key constant string as a private field in an appropriate
>>>> class, and the unit test coverage for that configuration key replicates
>> the
>>>> string in the test. My reasoning being that the string is a part of our
>>>> public API and making a change to the public API should be detected from
>>>> the unit test. I also have (on occasion) gone out of my way to write
>> about
>>>> the configuration keys in the package or class-level javadoc.
>>>> 
>>>> I think that none of my comments address your intended topic: how do we
>>>> publish our configuration points as an API that can be consumed by user
>>>> applications? (Do I have that correct?)
>>>> 
>>>> I am of the mind that we don't need/want an API of configurations ; we
>> want
>>>> a catalogue, i.e., what has been started in our book. Perhaps
>> accompanied
>>>> by/generated from an authoritative hbase-defaults.xml file. In fact, we
>>>> already do generate from hbase-default.xml, the result is [1] ; I don't
>>>> believe it is authoritative.
>>>> 
>>>> If we did have an AP thoughI, what would be better than the HConstants
>>>> approach of key-strings as public fields ? What if we had a
>>>> ConfigurationBuilder type of class, which had methods tied to
>> configuration
>>>> keys? I would think that such a globally applicable class would have the
>>>> same maintenance issues as HConstants. But what if we had some kind of
>>>> ConfigurationSetter class, perhaps per package, that performed this
>>>> function? That might be maintainable for us and useful for users.
>>>> 
>>>> I'm keen to hear what other ideas are out there, or better, examples and
>>>> counter-examples from other projects.
>>>> 
>>>> Thanks,
>>>> Nick
>>>> 
>>>> [0]: https://hbase.apache.org/book.html#important_configurations
>>>> <https://hbase.apache.org/book.html#important_configurations>
>>>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
>>>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>>>> 
>>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
>>>> <[email protected]> wrote:
>>>> 
>>>>> Hi devs,
>>>>> 
>>>>> As a major user of hbase, my company has thousands of clients deployed
>>>>> which use the hbase client to connect to a variety of hbase clusters.
>> We
>>>>> have a common library which handles configuring all clients by setting
>> up
>>>>> the Configuration object prior to creating a Connection. Our library
>> sets
>>>>> configurations using the various configs in HConstants, but there are
>>>> also
>>>>> a bunch of configs which don't exist in HConstants. For these we have
>>>>> hardcoded config strings in our client.
>>>>> 
>>>>> We're now working on an hbase upgrade and need to go through our client
>>>>> library and check how the configs may have changed in the new version.
>>>> This
>>>>> is relatively easy to do for those HConstants cases -- configs may be
>>>>> marked @Deprecated which will show up in one's editor, they may be
>>>> removed
>>>>> entirely which would show up is a compile error, and otherwise one can
>>>>> easily click through or bring up the javadoc. For the others that don't
>>>>> exist in HConstants, we need to go manually search the hbase codebase
>> for
>>>>> those strings.
>>>>> 
>>>>> Without doing this painstaking manual process, we would potentially
>>>> deploy
>>>>> the upgraded client with configs which are no longer used or deprecated
>>>> by
>>>>> the hbase client. For those using HConstants, this is immediately
>> obvious
>>>>> because the HConstant field may have been removed. This is a clear
>>>>> indication of needing to investigate the config. In this case it's
>>>>> preferred to face the compile failure because it's clearer than having
>>>>> something silently disappear or change.
>>>>> 
>>>>> I opened 3 jiras to move some fields to HConstants, but got some
>>>> reasonable
>>>>> pushback from Duo:
>>>>> 
>>>>> https://issues.apache.org/jira/browse/HBASE-26845
>>>> <https://issues.apache.org/jira/browse/HBASE-26845>
>>>>> https://issues.apache.org/jira/browse/HBASE-26846
>>>> <https://issues.apache.org/jira/browse/HBASE-26846>
>>>>> https://issues.apache.org/jira/browse/HBASE-26847
>>>> <https://issues.apache.org/jira/browse/HBASE-26847>
>>>>> 
>>>>> Duo's pushback is that HConstants is an anti-pattern and these configs
>>>> are
>>>>> not part of our public API. I can agree that a catch-all constants
>> class
>>>>> might be an anti-pattern, but would argue that consolidating configs
>>>> there
>>>>> is very useful for end-users. I can also potentially agree that
>> exposing
>>>>> these as part of our public API might limit the flexibility of
>>>> development
>>>>> due to compatibility constraints about IA.Public.
>>>>> 
>>>>> To me it seems odd to add a configuration, whose whole point is to make
>>>>> something tuneable, but then bury it in a private class despite having
>>>> real
>>>>> implications for how the application runs. If a configuration is not
>>>> meant
>>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it
>> should
>>>> be
>>>>> exposed for reference.
>>>>> 
>>>>> I'm wondering if there is some compromise we can achieve which makes it
>>>>> easier for end-users to integrate with tunable configs.
>>>>> 
>>>>> One can imagine a large project to clean up all of our configs under
>> some
>>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
>> perfect
>>>>> (needing to migrate all configs) the enemy of good.
>>>>> 
>>>>> A better option might be to make those classes which expose configs
>>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
>>>>> ConnectionImplementation. That might be the most incremental change we
>>>>> could make. We could handle this on a case-by-case basis.
>>>>> 
>>>>> Does anyone have any thoughts?
>>>>> 
>>>> 
>> 

Reply via email to