One sticking point: ConnectionConfiguration and
AsyncConnectionConfiguration are both IA.Private.

On Thu, Mar 24, 2022 at 17:21 Bryan Beaudreault
<[email protected]> wrote:

> I realized there already exists a good candidate for this in hbase-client
> -- ConnectionConfiguration. My latest commit in
> https://github.com/apache/hbase/pull/4180 adds a new config constant there
> and marks it as LP(Config), but I'd also be happy to revert that part of
> the PR and instead handle that in a dedicated jira for this topic if
> desired.
>
> On Thu, Mar 24, 2022 at 11:08 AM Bryan Beaudreault <
> [email protected]>
> wrote:
>
> > Thank you both for the input here!
> >
> > It seems like we've come to the conclusion that it would be ok to:
> >
> > - Update docs around LP(CONFIG) to say that it also encompasses some
> > module-aggregated classes which hold config constants
> > - Maybe document this convention elsewhere for contributors as well,
> > though need to look into where (guidance welcome, but will look around)
> > - I can create the first constant class in hbase-client for the use-cases
> > in my jira. I also have another config constant I've added in
> > https://github.com/apache/hbase/pull/4180 prior to this discussion, so
> > may want to use that class there.
> > - Look into updates to tooling to audit the LP(CONFIG) classes
> >
> > If this sounds good, I'll create a JIRA to track the work. The first 3
> > todos will be relatively easy, and I'll look into options for the audit
> tool
> >
> >
> > On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <
> [email protected]>
> > wrote:
> >
> >> Agreed the definition of LP(CONFIG) would need to be tweaked.
> >>
> >> We do not quite have enough support for analyzing configuration key set
> >> changes in the current API audit tool. Removal of a public constant
> field
> >> from an LP annotated class would be flagged but a modification of the
> >> constant would not. However it is a OSS project consisting of Perl
> scripts
> >> that might accept a contribution or at least could be patched or
> extended.
> >> Or we can build a new audit tool for the purpose, which is what I would
> >> recommend. The Perl based tool shells out to javah and is quite
> expensive,
> >> and on some platforms, depending on comparison, can fail due to command
> >> line length limits. A Java tool would likely be more efficient at
> >> processing Java class annotations and introspecting string constants.
> >>
> >> >
> >> > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <[email protected]>
> wrote:
> >> >
> >> > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <
> >> [email protected]>
> >> > wrote:
> >> >
> >> >> Although collecting all configuration keys into a single file is
> >> >> definitely an anti-pattern I’m not sure the same is true of package
> or
> >> >> Maven module level aggregation classes marked LP(CONFIG).Somewhat
> like
> >> >> DFSConfigKeys but geared toward our API/release auditing.
> >> >>
> >> >> This would seem virtuous for a couple of reasons. Relevant
> >> configuration
> >> >> key constants for the package or module would be grouped in a well
> >> known
> >> >> place for users and developers alike. The LP(CONFIG) designation
> would
> >> >> require developers to think about deprecation cycle if contemplating
> a
> >> >> change, thus providing some back pressure against snap decisions. Or,
> >> if
> >> >> not then, then at release candidate evaluation time, user
> configuration
> >> >> breaking changes could be caught be a release automation tool that
> >> diffs
> >> >> LP(CONFIG) annotated classes. Something like this would improve the
> >> state
> >> >> of configuration key management quite dramatically, because currently
> >> it’s
> >> >> ad hoc.
> >> >>
> >> >
> >> > I am supportive of trying such an effort. We'll need to tweak the
> >> meaning
> >> > of LP(CONFIG) as we define it currently, but that can be done. I don't
> >> know
> >> > what our current tools do or assume regarding this annotation. I think
> >> > there is something custom happening in the compatibility reports that
> we
> >> > generate as part of each RC.
> >> >
> >> >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault
> >> <[email protected]>
> >> >>> wrote:
> >> >>>
> >> >>> Thanks for your detailed response, Nick!
> >> >>>
> >> >>>> I think that none of my comments address your intended topic: how
> do
> >> we
> >> >>> publish our configuration points as an API that can be consumed by
> >> user
> >> >>> applications? (Do I have that correct?)
> >> >>>
> >> >>> This is a good summary, and I appreciate the other
> >> >> thoughts/clarifications
> >> >>> as well. I also realize this is probably hard to get perfect and any
> >> >> choice
> >> >>> must be weighed against the effort necessary to change/maintain.
> >> >>>
> >> >>> One example I know is Hadoop/HDFS, and I bet some on this list have
> >> much
> >> >>> more knowledge of that project's history than I do. For HDFS they
> have
> >> >>> DFSConfigKeys which in my experience does seem to include most
> >> configs. I
> >> >>> believe they even have unit tests which verify that all configs in
> the
> >> >>> various site.xml files are represented in code. In more recent
> >> versions
> >> >>> they have split that class up into smaller groupings, for example
> >> >>> DfsClientConf and the various inner classes there.
> >> >>>
> >> >>> In a vacuum, from a code design perspective, I'm not commenting on
> >> >> whether
> >> >>> that's a good or bad pattern. I also don't know of the politics of
> the
> >> >>> project or what sorts of pain points they've discovered in that
> >> pattern
> >> >>> over the years. But from *user's perspective*, this is a handy way
> to
> >> >>> handle things in my opinion.
> >> >>>
> >> >>> At my company, in general we try to avoid "magic strings" [1] and
> >> instead
> >> >>> always try to use constants. We can and do define our own constants
> to
> >> >> try
> >> >>> to mirror some of the "private" magic strings in the hbase client.
> >> This
> >> >> is
> >> >>> better than nothing but even better would be to use hbase-provided
> >> >>> constants so that we can build more defensive applications, using
> the
> >> >>> compiler to verify that the configs we reference still do anything.
> >> >>>
> >> >>> I unfortunately can't speak to the original issues with HConstants
> >> that
> >> >>> turned it into an anti-pattern. What I do notice is there are
> >> definitely
> >> >>> examples in the hbase codebase of duplicated config strings, one of
> >> which
> >> >>> is called out in one of the jiras I linked in my original email.
> These
> >> >> are
> >> >>> just bugs waiting to happen in my opinion, either for hbase itself
> or
> >> for
> >> >>> users which may reference them.
> >> >>>
> >> >>> [1] https://deviq.com/antipatterns/magic-strings
> >> <https://deviq.com/antipatterns/magic-strings>
> >> >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <[email protected]
> >
> >> >> wrote:
> >> >>>>
> >> >>>> Hi Bryan,
> >> >>>>
> >> >>>> Thanks for bringing this up.
> >> >>>>
> >> >>>> I agree with Duo (and I think we have this settled as project-wide
> >> >>>> consensus) that HConstants is/was an anti-pattern, that we are
> >> actively
> >> >>>> against adding new fields there, and opportunistically removing
> >> fields
> >> >> when
> >> >>>> we can. Further, the documented meaning of the
> >> >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that
> >> appear
> >> >> in
> >> >>>> user facing configuration files", so this isn't really appropriate
> >> for
> >> >>>> marking a field that exposes a configuration key to user
> >> applications. I
> >> >>>> will also note that there appears to be two categories of tunable
> >> >>>> parameters -- configuration points that we expect users to tweak
> are
> >> >>>> catalogued and documented in the book [0] and everything else is
> >> left to
> >> >>>> the obscurity of code-grep.
> >> >>>>
> >> >>>> While we are actively squashing use of fields in HConstants, I
> don't
> >> >> know
> >> >>>> that we have proposed some alternative to the user community. For
> my
> >> >> part,
> >> >>>> when I write and review code that involves configuration keys, I
> >> >> generally
> >> >>>> implement the key constant string as a private field in an
> >> appropriate
> >> >>>> class, and the unit test coverage for that configuration key
> >> replicates
> >> >> the
> >> >>>> string in the test. My reasoning being that the string is a part of
> >> our
> >> >>>> public API and making a change to the public API should be detected
> >> from
> >> >>>> the unit test. I also have (on occasion) gone out of my way to
> write
> >> >> about
> >> >>>> the configuration keys in the package or class-level javadoc.
> >> >>>>
> >> >>>> I think that none of my comments address your intended topic: how
> do
> >> we
> >> >>>> publish our configuration points as an API that can be consumed by
> >> user
> >> >>>> applications? (Do I have that correct?)
> >> >>>>
> >> >>>> I am of the mind that we don't need/want an API of configurations ;
> >> we
> >> >> want
> >> >>>> a catalogue, i.e., what has been started in our book. Perhaps
> >> >> accompanied
> >> >>>> by/generated from an authoritative hbase-defaults.xml file. In
> fact,
> >> we
> >> >>>> already do generate from hbase-default.xml, the result is [1] ; I
> >> don't
> >> >>>> believe it is authoritative.
> >> >>>>
> >> >>>> If we did have an AP thoughI, what would be better than the
> >> HConstants
> >> >>>> approach of key-strings as public fields ? What if we had a
> >> >>>> ConfigurationBuilder type of class, which had methods tied to
> >> >> configuration
> >> >>>> keys? I would think that such a globally applicable class would
> have
> >> the
> >> >>>> same maintenance issues as HConstants. But what if we had some kind
> >> of
> >> >>>> ConfigurationSetter class, perhaps per package, that performed this
> >> >>>> function? That might be maintainable for us and useful for users.
> >> >>>>
> >> >>>> I'm keen to hear what other ideas are out there, or better,
> examples
> >> and
> >> >>>> counter-examples from other projects.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Nick
> >> >>>>
> >> >>>> [0]: https://hbase.apache.org/book.html#important_configurations
> >> <https://hbase.apache.org/book.html#important_configurations>
> >> >>>> <https://hbase.apache.org/book.html#important_configurations
> >> <https://hbase.apache.org/book.html#important_configurations>
> >> >
> >> >>>> [1]:
> https://hbase.apache.org/book.html#hbase_default_configurations
> >> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >> >>>> <https://hbase.apache.org/book.html#hbase_default_configurations
> >> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >> >
> >> >>>>
> >> >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> >> >>>> <[email protected]> wrote:
> >> >>>>
> >> >>>>> Hi devs,
> >> >>>>>
> >> >>>>> As a major user of hbase, my company has thousands of clients
> >> deployed
> >> >>>>> which use the hbase client to connect to a variety of hbase
> >> clusters.
> >> >> We
> >> >>>>> have a common library which handles configuring all clients by
> >> setting
> >> >> up
> >> >>>>> the Configuration object prior to creating a Connection. Our
> library
> >> >> sets
> >> >>>>> configurations using the various configs in HConstants, but there
> >> are
> >> >>>> also
> >> >>>>> a bunch of configs which don't exist in HConstants. For these we
> >> have
> >> >>>>> hardcoded config strings in our client.
> >> >>>>>
> >> >>>>> We're now working on an hbase upgrade and need to go through our
> >> client
> >> >>>>> library and check how the configs may have changed in the new
> >> version.
> >> >>>> This
> >> >>>>> is relatively easy to do for those HConstants cases -- configs may
> >> be
> >> >>>>> marked @Deprecated which will show up in one's editor, they may be
> >> >>>> removed
> >> >>>>> entirely which would show up is a compile error, and otherwise one
> >> can
> >> >>>>> easily click through or bring up the javadoc. For the others that
> >> don't
> >> >>>>> exist in HConstants, we need to go manually search the hbase
> >> codebase
> >> >> for
> >> >>>>> those strings.
> >> >>>>>
> >> >>>>> Without doing this painstaking manual process, we would
> potentially
> >> >>>> deploy
> >> >>>>> the upgraded client with configs which are no longer used or
> >> deprecated
> >> >>>> by
> >> >>>>> the hbase client. For those using HConstants, this is immediately
> >> >> obvious
> >> >>>>> because the HConstant field may have been removed. This is a clear
> >> >>>>> indication of needing to investigate the config. In this case it's
> >> >>>>> preferred to face the compile failure because it's clearer than
> >> having
> >> >>>>> something silently disappear or change.
> >> >>>>>
> >> >>>>> I opened 3 jiras to move some fields to HConstants, but got some
> >> >>>> reasonable
> >> >>>>> pushback from Duo:
> >> >>>>>
> >> >>>>> https://issues.apache.org/jira/browse/HBASE-26845
> >> <https://issues.apache.org/jira/browse/HBASE-26845>
> >> >>>> <https://issues.apache.org/jira/browse/HBASE-26845
> >> <https://issues.apache.org/jira/browse/HBASE-26845>
> >> >
> >> >>>>> https://issues.apache.org/jira/browse/HBASE-26846
> >> <https://issues.apache.org/jira/browse/HBASE-26846>
> >> >>>> <https://issues.apache.org/jira/browse/HBASE-26846
> >> <https://issues.apache.org/jira/browse/HBASE-26846>
> >> >
> >> >>>>> https://issues.apache.org/jira/browse/HBASE-26847
> >> <https://issues.apache.org/jira/browse/HBASE-26847>
> >> >>>> <https://issues.apache.org/jira/browse/HBASE-26847
> >> <https://issues.apache.org/jira/browse/HBASE-26847>
> >> >
> >> >>>>>
> >> >>>>> Duo's pushback is that HConstants is an anti-pattern and these
> >> configs
> >> >>>> are
> >> >>>>> not part of our public API. I can agree that a catch-all constants
> >> >> class
> >> >>>>> might be an anti-pattern, but would argue that consolidating
> configs
> >> >>>> there
> >> >>>>> is very useful for end-users. I can also potentially agree that
> >> >> exposing
> >> >>>>> these as part of our public API might limit the flexibility of
> >> >>>> development
> >> >>>>> due to compatibility constraints about IA.Public.
> >> >>>>>
> >> >>>>> To me it seems odd to add a configuration, whose whole point is to
> >> make
> >> >>>>> something tuneable, but then bury it in a private class despite
> >> having
> >> >>>> real
> >> >>>>> implications for how the application runs. If a configuration is
> not
> >> >>>> meant
> >> >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it
> >> >> should
> >> >>>> be
> >> >>>>> exposed for reference.
> >> >>>>>
> >> >>>>> I'm wondering if there is some compromise we can achieve which
> >> makes it
> >> >>>>> easier for end-users to integrate with tunable configs.
> >> >>>>>
> >> >>>>> One can imagine a large project to clean up all of our configs
> under
> >> >> some
> >> >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
> >> >> perfect
> >> >>>>> (needing to migrate all configs) the enemy of good.
> >> >>>>>
> >> >>>>> A better option might be to make those classes which expose
> configs
> >> >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
> >> >>>>> ConnectionImplementation. That might be the most incremental
> change
> >> we
> >> >>>>> could make. We could handle this on a case-by-case basis.
> >> >>>>>
> >> >>>>> Does anyone have any thoughts?
> >> >>>>>
> >> >>>>
> >> >>
> >>
> >
>

Reply via email to