On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <[email protected]>
wrote:

> Although collecting all configuration keys into a single file is
> definitely an anti-pattern I’m not sure the same is true of package or
> Maven module level aggregation classes marked LP(CONFIG).Somewhat like
> DFSConfigKeys but geared toward our API/release auditing.
>
> This would seem virtuous for a couple of reasons. Relevant configuration
> key constants for the package or module would be grouped in a well known
> place for users and developers alike. The LP(CONFIG) designation would
> require developers to think about deprecation cycle if contemplating a
> change, thus providing some back pressure against snap decisions. Or, if
> not then, then at release candidate evaluation time, user configuration
> breaking changes could be caught be a release automation tool that diffs
> LP(CONFIG) annotated classes. Something like this would improve the state
> of configuration key management quite dramatically, because currently it’s
> ad hoc.
>

I am supportive of trying such an effort. We'll need to tweak the meaning
of LP(CONFIG) as we define it currently, but that can be done. I don't know
what our current tools do or assume regarding this annotation. I think
there is something custom happening in the compatibility reports that we
generate as part of each RC.

> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault 
> <[email protected]>
> wrote:
> >
> > Thanks for your detailed response, Nick!
> >
> >> I think that none of my comments address your intended topic: how do we
> > publish our configuration points as an API that can be consumed by user
> > applications? (Do I have that correct?)
> >
> > This is a good summary, and I appreciate the other
> thoughts/clarifications
> > as well. I also realize this is probably hard to get perfect and any
> choice
> > must be weighed against the effort necessary to change/maintain.
> >
> > One example I know is Hadoop/HDFS, and I bet some on this list have much
> > more knowledge of that project's history than I do. For HDFS they have
> > DFSConfigKeys which in my experience does seem to include most configs. I
> > believe they even have unit tests which verify that all configs in the
> > various site.xml files are represented in code. In more recent versions
> > they have split that class up into smaller groupings, for example
> > DfsClientConf and the various inner classes there.
> >
> > In a vacuum, from a code design perspective, I'm not commenting on
> whether
> > that's a good or bad pattern. I also don't know of the politics of the
> > project or what sorts of pain points they've discovered in that pattern
> > over the years. But from *user's perspective*, this is a handy way to
> > handle things in my opinion.
> >
> > At my company, in general we try to avoid "magic strings" [1] and instead
> > always try to use constants. We can and do define our own constants to
> try
> > to mirror some of the "private" magic strings in the hbase client. This
> is
> > better than nothing but even better would be to use hbase-provided
> > constants so that we can build more defensive applications, using the
> > compiler to verify that the configs we reference still do anything.
> >
> > I unfortunately can't speak to the original issues with HConstants that
> > turned it into an anti-pattern. What I do notice is there are definitely
> > examples in the hbase codebase of duplicated config strings, one of which
> > is called out in one of the jiras I linked in my original email. These
> are
> > just bugs waiting to happen in my opinion, either for hbase itself or for
> > users which may reference them.
> >
> > [1] https://deviq.com/antipatterns/magic-strings
> >> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <[email protected]>
> wrote:
> >>
> >> Hi Bryan,
> >>
> >> Thanks for bringing this up.
> >>
> >> I agree with Duo (and I think we have this settled as project-wide
> >> consensus) that HConstants is/was an anti-pattern, that we are actively
> >> against adding new fields there, and opportunistically removing fields
> when
> >> we can. Further, the documented meaning of the
> >> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear
> in
> >> user facing configuration files", so this isn't really appropriate for
> >> marking a field that exposes a configuration key to user applications. I
> >> will also note that there appears to be two categories of tunable
> >> parameters -- configuration points that we expect users to tweak are
> >> catalogued and documented in the book [0] and everything else is left to
> >> the obscurity of code-grep.
> >>
> >> While we are actively squashing use of fields in HConstants, I don't
> know
> >> that we have proposed some alternative to the user community. For my
> part,
> >> when I write and review code that involves configuration keys, I
> generally
> >> implement the key constant string as a private field in an appropriate
> >> class, and the unit test coverage for that configuration key replicates
> the
> >> string in the test. My reasoning being that the string is a part of our
> >> public API and making a change to the public API should be detected from
> >> the unit test. I also have (on occasion) gone out of my way to write
> about
> >> the configuration keys in the package or class-level javadoc.
> >>
> >> I think that none of my comments address your intended topic: how do we
> >> publish our configuration points as an API that can be consumed by user
> >> applications? (Do I have that correct?)
> >>
> >> I am of the mind that we don't need/want an API of configurations ; we
> want
> >> a catalogue, i.e., what has been started in our book. Perhaps
> accompanied
> >> by/generated from an authoritative hbase-defaults.xml file. In fact, we
> >> already do generate from hbase-default.xml, the result is [1] ; I don't
> >> believe it is authoritative.
> >>
> >> If we did have an AP thoughI, what would be better than the HConstants
> >> approach of key-strings as public fields ? What if we had a
> >> ConfigurationBuilder type of class, which had methods tied to
> configuration
> >> keys? I would think that such a globally applicable class would have the
> >> same maintenance issues as HConstants. But what if we had some kind of
> >> ConfigurationSetter class, perhaps per package, that performed this
> >> function? That might be maintainable for us and useful for users.
> >>
> >> I'm keen to hear what other ideas are out there, or better, examples and
> >> counter-examples from other projects.
> >>
> >> Thanks,
> >> Nick
> >>
> >> [0]: https://hbase.apache.org/book.html#important_configurations
> >> <https://hbase.apache.org/book.html#important_configurations>
> >> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
> >> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >>
> >> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> >> <[email protected]> wrote:
> >>
> >>> Hi devs,
> >>>
> >>> As a major user of hbase, my company has thousands of clients deployed
> >>> which use the hbase client to connect to a variety of hbase clusters.
> We
> >>> have a common library which handles configuring all clients by setting
> up
> >>> the Configuration object prior to creating a Connection. Our library
> sets
> >>> configurations using the various configs in HConstants, but there are
> >> also
> >>> a bunch of configs which don't exist in HConstants. For these we have
> >>> hardcoded config strings in our client.
> >>>
> >>> We're now working on an hbase upgrade and need to go through our client
> >>> library and check how the configs may have changed in the new version.
> >> This
> >>> is relatively easy to do for those HConstants cases -- configs may be
> >>> marked @Deprecated which will show up in one's editor, they may be
> >> removed
> >>> entirely which would show up is a compile error, and otherwise one can
> >>> easily click through or bring up the javadoc. For the others that don't
> >>> exist in HConstants, we need to go manually search the hbase codebase
> for
> >>> those strings.
> >>>
> >>> Without doing this painstaking manual process, we would potentially
> >> deploy
> >>> the upgraded client with configs which are no longer used or deprecated
> >> by
> >>> the hbase client. For those using HConstants, this is immediately
> obvious
> >>> because the HConstant field may have been removed. This is a clear
> >>> indication of needing to investigate the config. In this case it's
> >>> preferred to face the compile failure because it's clearer than having
> >>> something silently disappear or change.
> >>>
> >>> I opened 3 jiras to move some fields to HConstants, but got some
> >> reasonable
> >>> pushback from Duo:
> >>>
> >>> https://issues.apache.org/jira/browse/HBASE-26845
> >> <https://issues.apache.org/jira/browse/HBASE-26845>
> >>> https://issues.apache.org/jira/browse/HBASE-26846
> >> <https://issues.apache.org/jira/browse/HBASE-26846>
> >>> https://issues.apache.org/jira/browse/HBASE-26847
> >> <https://issues.apache.org/jira/browse/HBASE-26847>
> >>>
> >>> Duo's pushback is that HConstants is an anti-pattern and these configs
> >> are
> >>> not part of our public API. I can agree that a catch-all constants
> class
> >>> might be an anti-pattern, but would argue that consolidating configs
> >> there
> >>> is very useful for end-users. I can also potentially agree that
> exposing
> >>> these as part of our public API might limit the flexibility of
> >> development
> >>> due to compatibility constraints about IA.Public.
> >>>
> >>> To me it seems odd to add a configuration, whose whole point is to make
> >>> something tuneable, but then bury it in a private class despite having
> >> real
> >>> implications for how the application runs. If a configuration is not
> >> meant
> >>> to be tuned, it shouldn't be a configuration at all. Otherwise it
> should
> >> be
> >>> exposed for reference.
> >>>
> >>> I'm wondering if there is some compromise we can achieve which makes it
> >>> easier for end-users to integrate with tunable configs.
> >>>
> >>> One can imagine a large project to clean up all of our configs under
> some
> >>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
> perfect
> >>> (needing to migrate all configs) the enemy of good.
> >>>
> >>> A better option might be to make those classes which expose configs
> >>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
> >>> ConnectionImplementation. That might be the most incremental change we
> >>> could make. We could handle this on a case-by-case basis.
> >>>
> >>> Does anyone have any thoughts?
> >>>
> >>
>

Reply via email to