Good 'ol software engineering practices is certainly the core of what I intend, though I am also raising the question of whether or not we want to consign some defined set of things that plug-in to the far side of those API's, and whether or not that entails a more explicit notion of what constitutes a "plugin".
On Mon, Jul 27, 2020 at 9:18 AM David Smiley <dsmi...@apache.org> wrote: > To everyone and especially Gus: I think the "plugin" word in this thread > is basically a stop-word to the intent/scope of the thread. A plugin to > Solr both has been and will be nothing more than a class that's loaded > *dynamically* by a configurable name -- as opposed to a class within Solr > that isn't pluggable (*statically* referenced). Whether a class is > statically loaded or dynamically loaded, it has some sort of API to itself > where it receives and provides other abstractions provided by Solr. I > *think* what's being proposed in this thread are some better higher level > abstractions within Solr that could be used to hide implementation details > that are found in some APIs currently in Solr. Good 'ol software > engineering practices. Am I missing something? > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Sun, Jul 26, 2020 at 6:11 PM Varun Thacker <va...@vthacker.in> wrote: > >> >> >> On Sun, Jul 26, 2020 at 1:05 AM Ilan Ginzburg <ilans...@gmail.com> wrote: >> >>> Varun, you're correct. >>> This PR was built based on what's needed for creation (easiest starting >>> point for me and likely most urgent need). It's still totally WIP and >>> following steps include building the API required for move and other >>> placement based needs, then also everything related to triggers (see the >>> Jira). >>> >>> Collection API commands (Solr provided implementation, not a plug-in) >>> will build the requests they need, then call the plug-in (custom one or a >>> defaut one), and use the returned "work items" (more types of work items >>> will be introduced of course) to do the job (know where to place or where >>> to move or what to remove or add etc.) >>> >> >> This sounds perfect! >> >> I'd be interested to see how can we use SamplePluginMinimizeCores for say >> create collection but use FooPluginMinimizeLoad for add-replica >> >>> >>> Ilan >>> >>> Le dim. 26 juil. 2020 à 04:13, Varun Thacker <va...@vthacker.in> a >>> écrit : >>> >>>> Hi Ilan, >>>> >>>> I like where we're going with >>>> https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am >>>> wrong, but my understanding of this PR is we're defining the interfaces for >>>> creating policies >>>> >>>> What's not clear to me is how will existing collection APIs like >>>> create-collections/add-replica etc make use of it? Is that something that >>>> has been discussed somewhere that I could read up on? >>>> >>>> >>>> >>>> On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg <ilans...@gmail.com> >>>> wrote: >>>> >>>>> Thanks Gus! >>>>> This makes a lot of sense but significantly increases IMO the scope >>>>> and effort to define an "Autoscaling" framework interface. >>>>> >>>>> I'd be happy to try to see what concepts could be shared and how a >>>>> generic plugin facade could be defined. >>>>> >>>>> What are the other types of plugins that would share such a unified >>>>> approach? Do they already exist under another form or are just projects at >>>>> this stage, like Autoscaling plugins? >>>>> >>>>> But... Assuming this is the first "facade" layer to be defined between >>>>> Solr and external code, it might be hard to make it generic and get it >>>>> right. There's value in starting simple, understanding the tradeoffs and >>>>> generalizing later. >>>>> >>>>> Also I'd like to make sure we're not paying a performance "genericity >>>>> tax" in Autoscaling for unneeded features. >>>>> >>>>> Ilan >>>>> >>>>> Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.h...@gmail.com> a écrit : >>>>> >>>>>> Scanned through the PR and read some of this thread. I likely have >>>>>> missed much other discussion, so forgive me if I'm dredging up somethings >>>>>> that are already discussed elsewhere. >>>>>> >>>>>> The idea of designing the interfaces defining what information is >>>>>> available seems good here, but I worry that it's too auto-scaling >>>>>> focused. >>>>>> In my imagination, I would see solr having a standard informational >>>>>> interface that is useful to any plugin of any sort. Autoscaling should be >>>>>> leveraging that and we should be enhancing that to enable autoscaling. >>>>>> The >>>>>> current state of the system is one key type of information, but another >>>>>> type of information that should exist within solr and be exposed to >>>>>> plugins >>>>>> (including autoscaling) is events. When a new node joins there should be >>>>>> an >>>>>> event for example so that plugins can listen for that rather than >>>>>> incessantly polling and comparing the list of 100 nodes to a cached list >>>>>> of >>>>>> 100 nodes. >>>>>> >>>>>> In the PR I see a bunch of classes all off in a separate package, >>>>>> which looks like an autoscaling fiefdom which will be tempted if not >>>>>> forced >>>>>> to duplicate lots of stuff relative to other plugins and/or core. >>>>>> >>>>>> As a side note I would think the metrics system could be a plugin >>>>>> that leverages the same set of informational interfaces.... >>>>>> >>>>>> So there should be 3 parts to this as I imagine it. >>>>>> >>>>>> 1) Enhancements to the **plugin system** that make information about >>>>>> the cluster available solr to ALL plugins >>>>>> 2) Enhancements to the **plugin system** API's provided to ALL >>>>>> plugins that allow them to mutate solr safely. >>>>>> 3) A plugin that we intend to support for our users currently using >>>>>> auto scaling utilizes the enhanced information to provide a similar level >>>>>> of functionality as is *promised* by our current documentation of >>>>>> autoscaling, there might be some gaps or differences but we should be >>>>>> discussing what they are and providing recommended workarounds for users >>>>>> that relied on those promises to the users. Even if there were cases >>>>>> where >>>>>> we failed to deliver, if there were at least some conditions under which >>>>>> we >>>>>> could deliver the promised functionality those should be supported. Only >>>>>> if >>>>>> we never were able to deliver and it never worked under any circumstance >>>>>> should we rip stuff out entirely. >>>>>> >>>>>> Implicit in the above is the concept that there should be a facade >>>>>> between plugins and the core of solr. >>>>>> >>>>>> WRT #1 which will necessarily involve information collected from >>>>>> remote nodes, we need to be designing that thinking about what >>>>>> informational guarantees it provides. Latency, consistency, delivery, >>>>>> etc. >>>>>> We also need to think about what is exposed in a read-only fashion vs >>>>>> what >>>>>> plugins might write back to solr. Certainly there will be a lot of >>>>>> information that most plugins ignore, and we might consider having >>>>>> groupings of information and interfaces or annotations that indicate what >>>>>> info is provided, but the simplest default state is to just give plugins >>>>>> a >>>>>> reference to a class that they can use to drill into information about >>>>>> the >>>>>> cluster as needed. (SolrInformationBooth? ... or less tongue in cheek... >>>>>> enhance SolrInfoBean? ) >>>>>> >>>>>> Finally a fourth thing that occurs to me as I write is we need to >>>>>> consider what information one plugin might make available to the rest of >>>>>> the solr plugins. This might come later, and is hard because it's very >>>>>> hard >>>>>> to anticipate what info might be generated by unknown plugins in the >>>>>> future. >>>>>> >>>>>> So some humorous, not seriously suggested but hopefully memorable >>>>>> class names encapsulating the concepts: >>>>>> >>>>>> SolrInformationBooth (place to query) >>>>>> SolrLoudspeaker (event announcements) >>>>>> SolrControlLevers (mutate solr cluster) >>>>>> SolrPluginFacebookPage (info published by the plugin that others can >>>>>> watch) >>>>>> >>>>>> The "facade" provided to plugins by the plugin system should grow and >>>>>> expand such that more and more plugins can rely on it. This effort should >>>>>> grow it enough to move autoscaling onto it without dropping (much) >>>>>> functionality that we've previously published. >>>>>> >>>>>> -Gus >>>>>> >>>>>> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan....@cominvent.com> >>>>>> wrote: >>>>>> >>>>>>> Not clear to me what type of "alternative proposal" you're thinking >>>>>>> of Jan >>>>>>> >>>>>>> >>>>>>> That would be the responsibility of Noble and others who have >>>>>>> concerns to detail - and try convince other peers. >>>>>>> It’s hard for me as a spectator to know whether to agree with Noble >>>>>>> without a clear picture of what the alternative API or approach would >>>>>>> look >>>>>>> like. >>>>>>> I’m often a fan of loosely typed APIs since they tend to cause less >>>>>>> boilerplate code, but strong typing may indeed be a sound choice in this >>>>>>> API. >>>>>>> >>>>>>> Jan Høydahl >>>>>>> >>>>>>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilans...@gmail.com>: >>>>>>> >>>>>>> >>>>>>> In my opinion we have to (and therefore will) ship at least a basic >>>>>>> prod ready implementation on top of the API that does simple things (not >>>>>>> sure about rack, but for example balance cores and disk size without co >>>>>>> locating replicas of same shard on same node). >>>>>>> Without such an implementation, I suspect adoption will be low. >>>>>>> Moreover, it's always a lot more friendly to start coding from a working >>>>>>> example than from scratch. >>>>>>> >>>>>>> Not clear to me what type of "alternative proposal" you're thinking >>>>>>> of Jan. Alternative API proposal? Alternative approach to replace >>>>>>> Autoscaling? >>>>>>> >>>>>>> Ilan >>>>>>> >>>>>>> Ilan >>>>>>> >>>>>>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan....@cominvent.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Important discussion indeed. >>>>>>>> >>>>>>>> I don’t have time to dive deep into the PR or make up my mind >>>>>>>> whether there is a simpler and more future proof way of designing these >>>>>>>> APIs. But I understand that autoscaling is a complex beast and it is >>>>>>>> important we get it right. >>>>>>>> >>>>>>>> One question regarding having to write code vs config. Is the plan >>>>>>>> to ship some very simple light weight default placement rules ootb that >>>>>>>> gives 80% of users what they need with simple config, or would every >>>>>>>> user >>>>>>>> need to write code to e.g. spread replicas across hosts/racks? I’d be >>>>>>>> interested in seeing an alternative proposal laid out, perhaps not in >>>>>>>> code >>>>>>>> but with a design that can be compared and discussed. >>>>>>>> >>>>>>>> Jan Høydahl >>>>>>>> >>>>>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman < >>>>>>>> houstonput...@gmail.com>: >>>>>>>> >>>>>>>> >>>>>>>> I think this is a valid thing to discuss on the dev list, since >>>>>>>> this isn't just about code comments. >>>>>>>> It seems to me that Ilan wants to discuss the philosophy around how >>>>>>>> to design plugins and the interfaces in Solr which the plugins will >>>>>>>> talk to. >>>>>>>> This is broad and affects much more than just the Autoscaling >>>>>>>> framework. >>>>>>>> >>>>>>>> As a community & product, we have so far agreed that Solr should be >>>>>>>> lighter weight and additional features should live in plugins that are >>>>>>>> managed separately from Solr itself. >>>>>>>> At that point we need to think about the lifetime and support of >>>>>>>> these plugins. People love to refactor stuff in the solr core, which >>>>>>>> before >>>>>>>> plugins wasn't a large issue. >>>>>>>> However if we are now intending for many customers to rely on >>>>>>>> plugins, then we need to come up with standards and guarantees so that >>>>>>>> these plugins don't: >>>>>>>> >>>>>>>> - Stall people from upgrading Solr (minor or major versions) >>>>>>>> - Hinder the development of Solr Core >>>>>>>> - Cause us more headaches trying to keep multiple repos of >>>>>>>> plugins up to date with recent versions of Solr >>>>>>>> >>>>>>>> >>>>>>>> I am not completely sure where I stand right now, but this is >>>>>>>> definitely something that we should be thinking about when migrating >>>>>>>> all of >>>>>>>> this functionality to plugins. >>>>>>>> >>>>>>>> - Houston >>>>>>>> >>>>>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya < >>>>>>>> is...@apache.org> wrote: >>>>>>>> >>>>>>>>> I think we should move the discussion back to the PR because it >>>>>>>>> has more context and inline comments are possible. Having this >>>>>>>>> discussion >>>>>>>>> in 4 places (jira, pr, slack and dev list is very hard to keep track >>>>>>>>> of). >>>>>>>>> >>>>>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilans...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> [I’m moving a discussion from the PR >>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613 >>>>>>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev >>>>>>>>>> list for a wider audience. This is about replacing the now (in >>>>>>>>>> master) gone >>>>>>>>>> Autoscaling framework with a way for clients to write their >>>>>>>>>> customized >>>>>>>>>> placement code] >>>>>>>>>> >>>>>>>>>> It took me a long time to write this mail and it's quite long, >>>>>>>>>> sorry. >>>>>>>>>> Please anybody interested in the future of Autoscaling (not only >>>>>>>>>> those I cc'ed) do read it and provide feedback. Very impacting >>>>>>>>>> decisions >>>>>>>>>> have to be made now. >>>>>>>>>> >>>>>>>>>> Thanks Noble for your feedback. >>>>>>>>>> I believe it is important that we are aligned on what we build >>>>>>>>>> here, esp. at the early defining stages (now). >>>>>>>>>> >>>>>>>>>> Let me try to elaborate on your concerns and provide in general >>>>>>>>>> the rationale behind the approach. >>>>>>>>>> >>>>>>>>>> *> Anyone who wishes to implement this should not require to >>>>>>>>>> learn a lot before even getting started* >>>>>>>>>> For somebody who knows Solr (what is a Node, Collection, Shard, >>>>>>>>>> Replica) and basic notions related to Autoscaling (getting variables >>>>>>>>>> representing current state to make decisions), there’s not much to >>>>>>>>>> learn. >>>>>>>>>> The framework uses the same concepts, often with the same names. >>>>>>>>>> >>>>>>>>>> *> I don't believe we should have a set of interfaces that >>>>>>>>>> duplicate existing classes just for this functionality.* >>>>>>>>>> Where appropriate we can have existing classes be the >>>>>>>>>> implementations for these interfaces and be passed to the plugins, >>>>>>>>>> that >>>>>>>>>> would be perfectly ok. The proposal doesn’t include implementations >>>>>>>>>> at this >>>>>>>>>> stage, therefore there’s no duplication, or not yet... (we must get >>>>>>>>>> the >>>>>>>>>> interfaces right and agreed upon before implementation). If some >>>>>>>>>> interface >>>>>>>>>> methods in the proposal have a different name from equivalent >>>>>>>>>> methods in >>>>>>>>>> internal classes we plan to use, of course let's rename one or the >>>>>>>>>> other. >>>>>>>>>> >>>>>>>>>> Existing internal abstractions are most of the time concrete >>>>>>>>>> classes and not interfaces (Replica, Slice, DocCollection, >>>>>>>>>> ClusterState). Making these visible to contrib code living >>>>>>>>>> elsewhere is making future refactoring hard and contrib code will >>>>>>>>>> most >>>>>>>>>> likely end up reaching to methods it shouldn’t be using. If we >>>>>>>>>> define a >>>>>>>>>> clean set of interfaces for plugins, I wouldn’t hesitate to break >>>>>>>>>> external >>>>>>>>>> plugins that reach out to other internal Solr classes, but will make >>>>>>>>>> everything possible to keep the API backward compatible so existing >>>>>>>>>> plugins >>>>>>>>>> can be recompiled without change. >>>>>>>>>> >>>>>>>>>> *> 24 interfaces to do this is definitely over engineering* >>>>>>>>>> I don’t consider the number of classes or interfaces a metric of >>>>>>>>>> complexity or of engineering quality. There are sample >>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c> >>>>>>>>>> plugin implementations to serve as a base for plugin writers (and >>>>>>>>>> for us >>>>>>>>>> defining this framework) and I believe the process is relatively >>>>>>>>>> simple. >>>>>>>>>> Trying to do the same things with existing Solr classes might prove >>>>>>>>>> a lot >>>>>>>>>> harder (but might be worth the effort for comparison purposes to >>>>>>>>>> make sure >>>>>>>>>> we agree on the approach? For example, getting sister replicas of a >>>>>>>>>> given >>>>>>>>>> replica in the proposed API is: replica.getShard() >>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27> >>>>>>>>>> .getReplicas() >>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>. >>>>>>>>>> Doing so with the internal classes likely involves getting the >>>>>>>>>> DocCollection and Slice name from the Replica, then get the >>>>>>>>>> DocCollection from the cluster state, there get the Slice based >>>>>>>>>> on its name and finally getReplicas() from the Slice). I >>>>>>>>>> consider the role of this new framework is to make life as easy as >>>>>>>>>> possible >>>>>>>>>> for writing placement code and the like, make life easy for us to >>>>>>>>>> maintain >>>>>>>>>> it, make it easy to write a simulation engine (should be at least an >>>>>>>>>> order >>>>>>>>>> of magnitude simpler than the previous one), etc. >>>>>>>>>> >>>>>>>>>> An example regarding readability and number of interfaces: rather >>>>>>>>>> than defining an enum with runtime annotation for building its >>>>>>>>>> instances ( >>>>>>>>>> Variable.Type >>>>>>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>) >>>>>>>>>> and then very generic access methods, the proposal defines a specific >>>>>>>>>> interface for each “variable type” (called properties >>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>). >>>>>>>>>> Rather than concatenating strings to specify the data to return from >>>>>>>>>> a >>>>>>>>>> remote node (based on snitches >>>>>>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>, >>>>>>>>>> see doc >>>>>>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>), >>>>>>>>>> the proposal is explicit and strongly typed (here >>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> >>>>>>>>>> example >>>>>>>>>> to get a specific system property from a node). This definitely does >>>>>>>>>> increase the number of interfaces, but reduces IMO the effort to >>>>>>>>>> code to >>>>>>>>>> these abstractions and provides a lot more compile time and IDE >>>>>>>>>> assistance. >>>>>>>>>> >>>>>>>>>> Goal is to hide all the boilerplate code and machinery (and to a >>>>>>>>>> point - complexity) in the implementations of these interfaces >>>>>>>>>> rather than >>>>>>>>>> have each plugin writer deal with the same problems. >>>>>>>>>> >>>>>>>>>> We’re moving from something that was complex and hard to read and >>>>>>>>>> debug yet functionally extremely rich, to something simpler for us, >>>>>>>>>> more >>>>>>>>>> demanding for users (write code rather than policy config if there's >>>>>>>>>> a need >>>>>>>>>> for new behavior) but that should not be less "expressive" in any >>>>>>>>>> significant way. One could even imagine reimplementing the former >>>>>>>>>> Autoscaling config Domain Specific Language on top of these API >>>>>>>>>> (maybe as a >>>>>>>>>> summer internship project :) >>>>>>>>>> >>>>>>>>>> *> This is a common mistake that we all do. When we design a >>>>>>>>>> feature we think that is the most important thing.* >>>>>>>>>> If by *"most important thing"* you mean investing the best >>>>>>>>>> reasonable effort to do things right then yes. >>>>>>>>>> If you mean trying to make a minor feature look more important >>>>>>>>>> and inflated than it is, I disagree. >>>>>>>>>> As a personal note, replica placement is not the aspect of >>>>>>>>>> SolrCloud I'm most interested in, but the first bottleneck we hit >>>>>>>>>> when >>>>>>>>>> pushing the scale of SolrCloud. I approach this with a state of mind >>>>>>>>>> "let's >>>>>>>>>> do it right and get it out of the way" to move to topics I really >>>>>>>>>> want to >>>>>>>>>> work on (around distribution in SolrCloud and the role of Overseer). >>>>>>>>>> Implementing Autoscaling in a way that simplifies future refactoring >>>>>>>>>> (or >>>>>>>>>> that does not make them harder than they already are) is therefore >>>>>>>>>> *very >>>>>>>>>> high* on my priority list, to support modest changes (Slice to >>>>>>>>>> Shard renaming) and more ambitious ones (replacing Zookeeper, >>>>>>>>>> removing Overseer, you name it). >>>>>>>>>> >>>>>>>>>> Thanks for reading, again sorry for the long email, but I hope >>>>>>>>>> this helps (at least helps the discussion), >>>>>>>>>> Ilan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notificati...@github.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I don't believe we should have a set of interfaces that >>>>>>>>>>> duplicate existing classes just for this functionality. This is a >>>>>>>>>>> common >>>>>>>>>>> mistake that we all do. When we design a feature we think that is >>>>>>>>>>> the most >>>>>>>>>>> important thing. We endup over designing and over engineering >>>>>>>>>>> things. This >>>>>>>>>>> feature will remain a tiny part of Solr. Anyone who wishes to >>>>>>>>>>> implement >>>>>>>>>>> this should not require to learn a lot before even getting started. >>>>>>>>>>> Let's >>>>>>>>>>> try to have a minimal set of interfaces so that people who try to >>>>>>>>>>> implement >>>>>>>>>>> them do not have a huge learning cure. >>>>>>>>>>> >>>>>>>>>>> Let's try to understand the requirement >>>>>>>>>>> >>>>>>>>>>> - Solr wants a set of positions to place a few replicas >>>>>>>>>>> - The implementation wants to know what is the current state >>>>>>>>>>> of the cluster so that it can make those decisions >>>>>>>>>>> >>>>>>>>>>> 24 interfaces to do this is definitely over engineering >>>>>>>>>>> >>>>>>>>>>> — >>>>>>>>>>> You are receiving this because you authored the thread. >>>>>>>>>>> Reply to this email directly, view it on GitHub >>>>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>, >>>>>>>>>>> or unsubscribe >>>>>>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>> -- >>>>>> http://www.needhamsoftware.com (work) >>>>>> http://www.the111shift.com (play) >>>>>> >>>>> -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)