On Thu, Oct 5, 2017 at 9:30 AM, Josh Elser <els...@apache.org> wrote:
> (I think I understand the problems enough to comment, but, admittedly, my > 5minute read is probably lacking) > > Thanks for chiming-in Josh. > I think the only argument against what you all have outlined here is if, > in the future, we have some intent to create new implementations of HRegion > or HStore. If that's the case, Region could still be the CP's "view" of > what a region is, we could introduce another interface which defines what > the internal/private "view" of a region is, and then we plug in the > implementation of choice (e.g. HRegion as we know it now would become > something like HRegionImpl). > > Currently our codebase entertains the 'fantasy' that it is possible to plug in alternate Region and Store implementations; there is a config that allows you stipulate another Region class. As part of the Duo/Anoop pare-back of CPs on RegionServer-side, it was noted that we should disabuse ourselves of all such notions. It just doesn't work. We've not expended the effort to keep clean Region/Store Interfaces (Witness this note on confusion around intent of Region Interface for example). There are no tests. > However, I'm struggling to come up with a concrete use-case as to why we > would need that extra level of indirection. As such, I think the below > proposal makes sense. > > Yeah. We could do the work to backfill an 'HRegion Interface', but years later, there have been no takers. It'd be a fun project for sure but would have to have good justification (Justification would have to include why use HBase at all and not something like Apache Helix instead). Good stuff, S > The distillation of a tricky issue is quite appreciated! > > > On 10/4/17 6:51 PM, Andrew Purtell wrote: > >> I think it is fine to rebrand these interfaces as for coprocessors and tag >> them LP(COPROC): >> >> Region (use HRegion in internals) >> >> Store (use HStore in internals) >> >> MasterServices (use HMaster in internals) >> >> RegionServerServices (use HRegionServer in internals) >> >> and pare them down. This is not a complete list, just a handful of >> examples. >> >> Some specific points of feedback I have had: >> >> In Region, it would be good for CPs to be able to schedule async flushes >> and compactions, poll or wait for completion of a specific request, or >> wait >> for all pending flushes and compactions to complete. There is a Phoenix >> use >> case for this in indexing. >> >> Security coprocessors use MasterServices to create their system tables. >> Maybe this can be replaced by using the normal admin API for same. >> >> When removing access to internal services, consider if there are client >> API >> equivalents that the CP can use, and if embedded calls to such client APIs >> from the coprocessor context would be a good idea. CP invocation of an >> internal service is simply an in-process method call. That's good and bad, >> right? The bad part, direct access, is the thing we want to restrict, the >> motivation for this work (in part). But the good thing is it avoids a lot >> of the fat client logic unnecessary for all-in-process service invocation, >> which might even not work correctly. Removing everything is as drastic as >> allowing CPs access to everything. It could be fine to drastically pare >> down, but please consider it. >> >> Some changes have been proposed that removes access to metrics (e.g. >> RegionMetrics, MasterMetrics). Right now coprocessors can bypass core >> function and replace it. Until and unless we remove the bypass semantic >> (under discussion) we should continue to allow CPs access to metrics >> objects so they can update metrics as expected by admins and users when >> replacing functionality (via bypass). Metrics are a public facing API. I >> agree this is kind of dodgy. I believe we should remove the bypass >> semantic. Once that is done, coprocessors can only mix in additional >> functionality. No more cause to touch core metrics. They can export their >> own metrics if so desired. >> >> >> On Wed, Oct 4, 2017 at 3:15 PM, Stack <st...@duboce.net> wrote: >> >> >> A bunch of us are making good progress on the next alpha release, >> >>> hbase-2.0.0-alpha-4. The theme for this release is "Fixing the >>> Coprocessor >>> API", mostly undoing access accidentally granted Coprocessors. I'm >>> talking >>> out loud about a particularly awkward item here rather than in a comment >>> up >>> in JIRA so the airing sees a broader audience. Interested in any opinions >>> or input you might have. >>> >>> TL;DR MasterService/RegionServerService and Region, etc., Interfaces >>> were >>> overloaded serving two, disparate roles; a load of refactoring has to be >>> done to undo the damage. Suggestions for how to avoid making same mistake >>> in future? >>> >>> I'm working on "HBASE-12260 MasterServices - remove from coprocessor API >>> (Discuss)". MasterServices started out as a subset of the Master >>> functionality. The idea back then was that certain Services and Managers >>> could make do w/ less-than-full-access to the HMaster process. If so, we >>> could test the Service and Manager without having to standup a full >>> HMaster >>> instance (This usually required our putting up a cluster too). If >>> MasterServices had but a few methods, a Mock would be easy, making >>> testing >>> easier still. We did the same thing on the RegionServer side where we had >>> RegionServerServices. >>> >>> MasterServices (and RegionServerServices) were also exposed to >>> Coprocessors. The idea was that CPs could ask-for particular Master >>> functions via MasterService. In this role MasterServices was meant to >>> constrain what CPs had access to. >>> >>> MasterServices therefore had two consumers; one internal, the other not. >>> >>> With time, MasterServices got fat as internal Services and Managers >>> needed >>> more of HMaster. Everytime we added to MasterServices, CPs got access. >>> >>> On survey as part of the recent HBASE-12260 work, it turns out that the >>> bulk of the methods in MasterServices are actually annotated >>> InterfaceAudience.Private; i.e. for internal use only, not for >>> Coprocessors. A brutal purge of Private audience objects, makes for a >>> MasterServices we can pass Coprocessors but Coprocessors now have much >>> less >>> facility available (for parts, there are alternatives; Andy review >>> suggests >>> that CPs are severely crimped if this patch goes in). But MasterServices >>> can no longer be used for its original purpose, passing Services and >>> Managers a subset of HMaster. The latter brings on a substantial >>> refactor. >>> >>> Another example of the double-role problem outlined above was found by >>> Duo >>> and Anoop in the RegionServer Coprocessor refactor salt mine. They hit a >>> similar tangle. There was the RegionServerServices <=> MasterServices >>> case >>> but also the exposure of HRegion internals. In this latter, Region was >>> introduced by Andy EXPLICITLY as a subset of HRegion facility FOR >>> Coprocessors. Subsequently, we all confused his original intent and went >>> ahead and thought of Region (as opposed to HRegion) as an Interface for >>> HRegion and plumbed it throughout the code base in place of explicit >>> HRegion references. As Region picked up functionality, Coprocessors >>> gained >>> more access. >>> >>> The refactoring pattern that has emerged out of the RegionServer-side >>> refactoring (which is ahead of the Master-work), is that we move to use >>> the >>> HRegion implementation everywhere internally, undoing use of Region >>> Interface; Region Interface picks up a "FOR COPROCESSORS ONLY" stamp. I'm >>> following suit on the Master side moving to use HMaster in place of >>> MasterServices in all launched Services and Managers. >>> >>> How do we avoid this mistake in future? Should we rename Region as >>> CoprocessorRegion so it more plain that its consumer is Coprocessors? >>> Ditto >>> on MasterServices? >>> >>> Thanks, >>> S >>> >>> >> >> >>