On Thu, Oct 5, 2017 at 9:30 AM, Josh Elser <els...@apache.org> wrote:

> (I think I understand the problems enough to comment, but, admittedly, my
> 5minute read is probably lacking)
>
>
Thanks for chiming-in Josh.



> I think the only argument against what you all have outlined here is if,
> in the future, we have some intent to create new implementations of HRegion
> or HStore. If that's the case, Region could still be the CP's "view" of
> what a region is, we could introduce another interface which defines what
> the internal/private "view" of a region is, and then we plug in the
> implementation of choice (e.g. HRegion as we know it now would become
> something like HRegionImpl).
>
>

Currently our codebase entertains the 'fantasy' that it is possible to plug
in alternate Region and Store implementations; there is a config that
allows you stipulate another Region class. As part of the Duo/Anoop
pare-back of CPs on RegionServer-side, it was noted that we should disabuse
ourselves of all such notions. It just doesn't work. We've not expended the
effort to keep clean Region/Store Interfaces (Witness this note on
confusion around intent of Region Interface for example). There are no
tests.



> However, I'm struggling to come up with a concrete use-case as to why we
> would need that extra level of indirection. As such, I think the below
> proposal makes sense.
>
>
Yeah. We could do the work to backfill an 'HRegion Interface', but years
later, there have been no takers. It'd be a fun project for sure but would
have to have good justification (Justification would have to include why
use HBase at all and not something like Apache Helix instead).

Good stuff,
S





> The distillation of a tricky issue is quite appreciated!
>
>
> On 10/4/17 6:51 PM, Andrew Purtell wrote:
>
>> I think it is fine to rebrand these interfaces as for coprocessors and tag
>> them LP(COPROC):
>>
>>      Region (use HRegion in internals)
>>
>>      Store (use HStore in internals)
>>
>>      MasterServices (use HMaster in internals)
>>
>>      RegionServerServices (use HRegionServer in internals)
>>
>> and pare them down. This is not a complete list, just a handful of
>> examples.
>>
>> Some specific points of feedback I have had:
>>
>> In Region, it would be good for CPs to be able to schedule async flushes
>> and compactions, poll or wait for completion of a specific request, or
>> wait
>> for all pending flushes and compactions to complete. There is a Phoenix
>> use
>> case for this in indexing.
>>
>> Security coprocessors use MasterServices to create their system tables.
>> Maybe this can be replaced by using the normal admin API for same.
>>
>> When removing access to internal services, consider if there are client
>> API
>> equivalents that the CP can use, and if embedded calls to such client APIs
>> from the coprocessor context would be a good idea. CP invocation of an
>> internal service is simply an in-process method call. That's good and bad,
>> right? The bad part, direct access, is the thing we want to restrict, the
>> motivation for this work (in part). But the good thing is it avoids a lot
>> of the fat client logic unnecessary for all-in-process service invocation,
>> which might even not work correctly. Removing everything is as drastic as
>> allowing CPs access to everything. It could be fine to drastically pare
>> down, but please consider it.
>>
>> Some changes have been proposed that removes access to metrics (e.g.
>> RegionMetrics, MasterMetrics). Right now coprocessors can bypass core
>> function and replace it. Until and unless we remove the bypass semantic
>> (under discussion) we should continue to allow CPs access to metrics
>> objects so they can update metrics as expected by admins and users when
>> replacing functionality (via bypass). Metrics are a public facing API. I
>> agree this is kind of dodgy. I believe we should remove the bypass
>> semantic. Once that is done, coprocessors can only mix in additional
>> functionality. No more cause to touch core metrics. They can export their
>> own metrics if so desired.
>> ​​
>>
>> On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote:
>> ​​
>>
>> A bunch of us are making good progress on the next alpha release,
>>
>>> hbase-2.0.0-alpha-4. The theme for this release is "Fixing the
>>> Coprocessor
>>> API", mostly undoing access accidentally granted Coprocessors. I'm
>>> talking
>>> out loud about a particularly awkward item here rather than in a comment
>>> up
>>> in JIRA so the airing sees a broader audience. Interested in any opinions
>>> or input you might have.
>>>
>>> TL;DR MasterService/RegionServerService and Region, etc., Interfaces
>>> were
>>> overloaded serving two, disparate roles; a load of refactoring has to be
>>> done to undo the damage. Suggestions for how to avoid making same mistake
>>> in future?
>>>
>>> I'm working on "HBASE-12260 MasterServices - remove from coprocessor API
>>> (Discuss)". MasterServices started out as a subset of the Master
>>> functionality. The idea back then was that certain Services and Managers
>>> could make do w/ less-than-full-access to the HMaster process. If so, we
>>> could test the Service and Manager without having to standup a full
>>> HMaster
>>> instance (This usually required our putting up a cluster too). If
>>> MasterServices had but a few methods, a Mock would be easy, making
>>> testing
>>> easier still. We did the same thing on the RegionServer side where we had
>>> RegionServerServices.
>>>
>>> MasterServices (and RegionServerServices) were also exposed to
>>> Coprocessors. The idea was that CPs could ask-for particular Master
>>> functions via MasterService. In this role MasterServices was meant to
>>> constrain what CPs had access to.
>>>
>>> MasterServices therefore had two consumers; one internal, the other not.
>>>
>>> With time, MasterServices got fat as internal Services and Managers
>>> needed
>>> more of HMaster. Everytime we added to MasterServices, CPs got access.
>>>
>>> On survey as part of the recent HBASE-12260 work, it turns out that the
>>> bulk of the methods in MasterServices are actually annotated
>>> InterfaceAudience.Private; i.e. for internal use only, not for
>>> Coprocessors. A brutal purge of Private audience objects, makes for a
>>> MasterServices we can pass Coprocessors but Coprocessors now have much
>>> less
>>> facility available (for parts, there are alternatives; Andy review
>>> suggests
>>> that CPs are severely crimped if this patch goes in). But MasterServices
>>> can no longer be used for its original purpose, passing Services and
>>> Managers a subset of HMaster. The latter brings on a substantial
>>> refactor.
>>>
>>> Another example of the double-role problem outlined above was found by
>>> Duo
>>> and Anoop in the RegionServer Coprocessor refactor salt mine. They hit a
>>> similar tangle. There was the RegionServerServices <=> MasterServices
>>> case
>>> but also the exposure of HRegion internals. In this latter, Region was
>>> introduced by Andy EXPLICITLY as a subset of HRegion facility FOR
>>> Coprocessors. Subsequently, we all confused his original intent and went
>>> ahead and thought of Region (as opposed to HRegion) as an Interface for
>>> HRegion and plumbed it throughout the code base in place of explicit
>>> HRegion references. As Region picked up functionality, Coprocessors
>>> gained
>>> more access.
>>>
>>> The refactoring pattern that has emerged out of the RegionServer-side
>>> refactoring (which is ahead of the Master-work), is that we move to use
>>> the
>>> HRegion implementation everywhere internally, undoing use of Region
>>> Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp. I'm
>>> following suit on the Master side moving to use HMaster in place of
>>> MasterServices in all launched Services and Managers.
>>>
>>> How do we avoid this mistake in future? Should we rename Region as
>>> CoprocessorRegion so it more plain that its consumer is Coprocessors?
>>> Ditto
>>> on MasterServices?
>>>
>>> Thanks,
>>> S
>>>
>>>
>>
>>
>>

Reply via email to