On Sun, Jul 26, 2020 at 1:05 AM Ilan Ginzburg <ilans...@gmail.com> wrote:
> Varun, you're correct. > This PR was built based on what's needed for creation (easiest starting > point for me and likely most urgent need). It's still totally WIP and > following steps include building the API required for move and other > placement based needs, then also everything related to triggers (see the > Jira). > > Collection API commands (Solr provided implementation, not a plug-in) will > build the requests they need, then call the plug-in (custom one or a defaut > one), and use the returned "work items" (more types of work items will be > introduced of course) to do the job (know where to place or where to move > or what to remove or add etc.) > This sounds perfect! I'd be interested to see how can we use SamplePluginMinimizeCores for say create collection but use FooPluginMinimizeLoad for add-replica > > Ilan > > Le dim. 26 juil. 2020 à 04:13, Varun Thacker <va...@vthacker.in> a écrit : > >> Hi Ilan, >> >> I like where we're going with >> https://github.com/apache/lucene-solr/pull/1684 . Correct me if I am >> wrong, but my understanding of this PR is we're defining the interfaces for >> creating policies >> >> What's not clear to me is how will existing collection APIs like >> create-collections/add-replica etc make use of it? Is that something that >> has been discussed somewhere that I could read up on? >> >> >> >> On Sat, Jul 25, 2020 at 2:03 PM Ilan Ginzburg <ilans...@gmail.com> wrote: >> >>> Thanks Gus! >>> This makes a lot of sense but significantly increases IMO the scope and >>> effort to define an "Autoscaling" framework interface. >>> >>> I'd be happy to try to see what concepts could be shared and how a >>> generic plugin facade could be defined. >>> >>> What are the other types of plugins that would share such a unified >>> approach? Do they already exist under another form or are just projects at >>> this stage, like Autoscaling plugins? >>> >>> But... Assuming this is the first "facade" layer to be defined between >>> Solr and external code, it might be hard to make it generic and get it >>> right. There's value in starting simple, understanding the tradeoffs and >>> generalizing later. >>> >>> Also I'd like to make sure we're not paying a performance "genericity >>> tax" in Autoscaling for unneeded features. >>> >>> Ilan >>> >>> Le sam. 25 juil. 2020 à 16:02, Gus Heck <gus.h...@gmail.com> a écrit : >>> >>>> Scanned through the PR and read some of this thread. I likely have >>>> missed much other discussion, so forgive me if I'm dredging up somethings >>>> that are already discussed elsewhere. >>>> >>>> The idea of designing the interfaces defining what information is >>>> available seems good here, but I worry that it's too auto-scaling focused. >>>> In my imagination, I would see solr having a standard informational >>>> interface that is useful to any plugin of any sort. Autoscaling should be >>>> leveraging that and we should be enhancing that to enable autoscaling. The >>>> current state of the system is one key type of information, but another >>>> type of information that should exist within solr and be exposed to plugins >>>> (including autoscaling) is events. When a new node joins there should be an >>>> event for example so that plugins can listen for that rather than >>>> incessantly polling and comparing the list of 100 nodes to a cached list of >>>> 100 nodes. >>>> >>>> In the PR I see a bunch of classes all off in a separate package, which >>>> looks like an autoscaling fiefdom which will be tempted if not forced to >>>> duplicate lots of stuff relative to other plugins and/or core. >>>> >>>> As a side note I would think the metrics system could be a plugin that >>>> leverages the same set of informational interfaces.... >>>> >>>> So there should be 3 parts to this as I imagine it. >>>> >>>> 1) Enhancements to the **plugin system** that make information about >>>> the cluster available solr to ALL plugins >>>> 2) Enhancements to the **plugin system** API's provided to ALL plugins >>>> that allow them to mutate solr safely. >>>> 3) A plugin that we intend to support for our users currently using >>>> auto scaling utilizes the enhanced information to provide a similar level >>>> of functionality as is *promised* by our current documentation of >>>> autoscaling, there might be some gaps or differences but we should be >>>> discussing what they are and providing recommended workarounds for users >>>> that relied on those promises to the users. Even if there were cases where >>>> we failed to deliver, if there were at least some conditions under which we >>>> could deliver the promised functionality those should be supported. Only if >>>> we never were able to deliver and it never worked under any circumstance >>>> should we rip stuff out entirely. >>>> >>>> Implicit in the above is the concept that there should be a facade >>>> between plugins and the core of solr. >>>> >>>> WRT #1 which will necessarily involve information collected from remote >>>> nodes, we need to be designing that thinking about what informational >>>> guarantees it provides. Latency, consistency, delivery, etc. We also need >>>> to think about what is exposed in a read-only fashion vs what plugins might >>>> write back to solr. Certainly there will be a lot of information that most >>>> plugins ignore, and we might consider having groupings of information and >>>> interfaces or annotations that indicate what info is provided, but the >>>> simplest default state is to just give plugins a reference to a class that >>>> they can use to drill into information about the cluster as needed. >>>> (SolrInformationBooth? ... or less tongue in cheek... enhance >>>> SolrInfoBean? ) >>>> >>>> Finally a fourth thing that occurs to me as I write is we need to >>>> consider what information one plugin might make available to the rest of >>>> the solr plugins. This might come later, and is hard because it's very hard >>>> to anticipate what info might be generated by unknown plugins in the >>>> future. >>>> >>>> So some humorous, not seriously suggested but hopefully memorable class >>>> names encapsulating the concepts: >>>> >>>> SolrInformationBooth (place to query) >>>> SolrLoudspeaker (event announcements) >>>> SolrControlLevers (mutate solr cluster) >>>> SolrPluginFacebookPage (info published by the plugin that others can >>>> watch) >>>> >>>> The "facade" provided to plugins by the plugin system should grow and >>>> expand such that more and more plugins can rely on it. This effort should >>>> grow it enough to move autoscaling onto it without dropping (much) >>>> functionality that we've previously published. >>>> >>>> -Gus >>>> >>>> On Fri, Jul 24, 2020 at 4:40 PM Jan Høydahl <jan....@cominvent.com> >>>> wrote: >>>> >>>>> Not clear to me what type of "alternative proposal" you're thinking of >>>>> Jan >>>>> >>>>> >>>>> That would be the responsibility of Noble and others who have concerns >>>>> to detail - and try convince other peers. >>>>> It’s hard for me as a spectator to know whether to agree with Noble >>>>> without a clear picture of what the alternative API or approach would look >>>>> like. >>>>> I’m often a fan of loosely typed APIs since they tend to cause less >>>>> boilerplate code, but strong typing may indeed be a sound choice in this >>>>> API. >>>>> >>>>> Jan Høydahl >>>>> >>>>> 24. jul. 2020 kl. 01:44 skrev Ilan Ginzburg <ilans...@gmail.com>: >>>>> >>>>> >>>>> In my opinion we have to (and therefore will) ship at least a basic >>>>> prod ready implementation on top of the API that does simple things (not >>>>> sure about rack, but for example balance cores and disk size without co >>>>> locating replicas of same shard on same node). >>>>> Without such an implementation, I suspect adoption will be low. >>>>> Moreover, it's always a lot more friendly to start coding from a working >>>>> example than from scratch. >>>>> >>>>> Not clear to me what type of "alternative proposal" you're thinking of >>>>> Jan. Alternative API proposal? Alternative approach to replace >>>>> Autoscaling? >>>>> >>>>> Ilan >>>>> >>>>> Ilan >>>>> >>>>> On Fri, Jul 24, 2020 at 12:11 AM Jan Høydahl <jan....@cominvent.com> >>>>> wrote: >>>>> >>>>>> Important discussion indeed. >>>>>> >>>>>> I don’t have time to dive deep into the PR or make up my mind whether >>>>>> there is a simpler and more future proof way of designing these APIs. >>>>>> But I >>>>>> understand that autoscaling is a complex beast and it is important we get >>>>>> it right. >>>>>> >>>>>> One question regarding having to write code vs config. Is the plan to >>>>>> ship some very simple light weight default placement rules ootb that >>>>>> gives >>>>>> 80% of users what they need with simple config, or would every user need >>>>>> to >>>>>> write code to e.g. spread replicas across hosts/racks? I’d be interested >>>>>> in >>>>>> seeing an alternative proposal laid out, perhaps not in code but with a >>>>>> design that can be compared and discussed. >>>>>> >>>>>> Jan Høydahl >>>>>> >>>>>> 23. jul. 2020 kl. 17:53 skrev Houston Putman <houstonput...@gmail.com >>>>>> >: >>>>>> >>>>>> >>>>>> I think this is a valid thing to discuss on the dev list, since this >>>>>> isn't just about code comments. >>>>>> It seems to me that Ilan wants to discuss the philosophy around how >>>>>> to design plugins and the interfaces in Solr which the plugins will talk >>>>>> to. >>>>>> This is broad and affects much more than just the Autoscaling >>>>>> framework. >>>>>> >>>>>> As a community & product, we have so far agreed that Solr should be >>>>>> lighter weight and additional features should live in plugins that are >>>>>> managed separately from Solr itself. >>>>>> At that point we need to think about the lifetime and support of >>>>>> these plugins. People love to refactor stuff in the solr core, which >>>>>> before >>>>>> plugins wasn't a large issue. >>>>>> However if we are now intending for many customers to rely on >>>>>> plugins, then we need to come up with standards and guarantees so that >>>>>> these plugins don't: >>>>>> >>>>>> - Stall people from upgrading Solr (minor or major versions) >>>>>> - Hinder the development of Solr Core >>>>>> - Cause us more headaches trying to keep multiple repos of >>>>>> plugins up to date with recent versions of Solr >>>>>> >>>>>> >>>>>> I am not completely sure where I stand right now, but this is >>>>>> definitely something that we should be thinking about when migrating all >>>>>> of >>>>>> this functionality to plugins. >>>>>> >>>>>> - Houston >>>>>> >>>>>> On Thu, Jul 23, 2020 at 9:27 AM Ishan Chattopadhyaya < >>>>>> is...@apache.org> wrote: >>>>>> >>>>>>> I think we should move the discussion back to the PR because it has >>>>>>> more context and inline comments are possible. Having this discussion >>>>>>> in 4 >>>>>>> places (jira, pr, slack and dev list is very hard to keep track of). >>>>>>> >>>>>>> On Thu, 23 Jul, 2020, 5:57 pm Ilan Ginzburg, <ilans...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> [I’m moving a discussion from the PR >>>>>>>> <https://github.com/apache/lucene-solr/pull/1684> for SOLR-14613 >>>>>>>> <https://issues.apache.org/jira/browse/SOLR-14613> to the dev list >>>>>>>> for a wider audience. This is about replacing the now (in master) gone >>>>>>>> Autoscaling framework with a way for clients to write their customized >>>>>>>> placement code] >>>>>>>> >>>>>>>> It took me a long time to write this mail and it's quite long, >>>>>>>> sorry. >>>>>>>> Please anybody interested in the future of Autoscaling (not only >>>>>>>> those I cc'ed) do read it and provide feedback. Very impacting >>>>>>>> decisions >>>>>>>> have to be made now. >>>>>>>> >>>>>>>> Thanks Noble for your feedback. >>>>>>>> I believe it is important that we are aligned on what we build >>>>>>>> here, esp. at the early defining stages (now). >>>>>>>> >>>>>>>> Let me try to elaborate on your concerns and provide in general the >>>>>>>> rationale behind the approach. >>>>>>>> >>>>>>>> *> Anyone who wishes to implement this should not require to learn >>>>>>>> a lot before even getting started* >>>>>>>> For somebody who knows Solr (what is a Node, Collection, Shard, >>>>>>>> Replica) and basic notions related to Autoscaling (getting variables >>>>>>>> representing current state to make decisions), there’s not much to >>>>>>>> learn. >>>>>>>> The framework uses the same concepts, often with the same names. >>>>>>>> >>>>>>>> *> I don't believe we should have a set of interfaces that >>>>>>>> duplicate existing classes just for this functionality.* >>>>>>>> Where appropriate we can have existing classes be the >>>>>>>> implementations for these interfaces and be passed to the plugins, that >>>>>>>> would be perfectly ok. The proposal doesn’t include implementations at >>>>>>>> this >>>>>>>> stage, therefore there’s no duplication, or not yet... (we must get the >>>>>>>> interfaces right and agreed upon before implementation). If some >>>>>>>> interface >>>>>>>> methods in the proposal have a different name from equivalent methods >>>>>>>> in >>>>>>>> internal classes we plan to use, of course let's rename one or the >>>>>>>> other. >>>>>>>> >>>>>>>> Existing internal abstractions are most of the time concrete >>>>>>>> classes and not interfaces (Replica, Slice, DocCollection, >>>>>>>> ClusterState). Making these visible to contrib code living >>>>>>>> elsewhere is making future refactoring hard and contrib code will most >>>>>>>> likely end up reaching to methods it shouldn’t be using. If we define a >>>>>>>> clean set of interfaces for plugins, I wouldn’t hesitate to break >>>>>>>> external >>>>>>>> plugins that reach out to other internal Solr classes, but will make >>>>>>>> everything possible to keep the API backward compatible so existing >>>>>>>> plugins >>>>>>>> can be recompiled without change. >>>>>>>> >>>>>>>> *> 24 interfaces to do this is definitely over engineering* >>>>>>>> I don’t consider the number of classes or interfaces a metric of >>>>>>>> complexity or of engineering quality. There are sample >>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-ddbe185b5e7922b91b90dfabfc50df4c> >>>>>>>> plugin implementations to serve as a base for plugin writers (and for >>>>>>>> us >>>>>>>> defining this framework) and I believe the process is relatively >>>>>>>> simple. >>>>>>>> Trying to do the same things with existing Solr classes might prove a >>>>>>>> lot >>>>>>>> harder (but might be worth the effort for comparison purposes to make >>>>>>>> sure >>>>>>>> we agree on the approach? For example, getting sister replicas of a >>>>>>>> given >>>>>>>> replica in the proposed API is: replica.getShard() >>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-a2d49bd52fddde54bb7fd2e96238507eR27> >>>>>>>> .getReplicas() >>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-9633f5e169fa3095062451599daac213R31>. >>>>>>>> Doing so with the internal classes likely involves getting the >>>>>>>> DocCollection and Slice name from the Replica, then get the >>>>>>>> DocCollection from the cluster state, there get the Slice based on >>>>>>>> its name and finally getReplicas() from the Slice). I consider the >>>>>>>> role of this new framework is to make life as easy as possible for >>>>>>>> writing >>>>>>>> placement code and the like, make life easy for us to maintain it, >>>>>>>> make it >>>>>>>> easy to write a simulation engine (should be at least an order of >>>>>>>> magnitude >>>>>>>> simpler than the previous one), etc. >>>>>>>> >>>>>>>> An example regarding readability and number of interfaces: rather >>>>>>>> than defining an enum with runtime annotation for building its >>>>>>>> instances ( >>>>>>>> Variable.Type >>>>>>>> <https://github.com/apache/lucene-solr/blob/branch_8_6/solr/solrj/src/java/org/apache/solr/client/solrj/cloud/autoscaling/Variable.java#L98>) >>>>>>>> and then very generic access methods, the proposal defines a specific >>>>>>>> interface for each “variable type” (called properties >>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4c0fa84354f93cb00e6643aefd00fd3c>). >>>>>>>> Rather than concatenating strings to specify the data to return from a >>>>>>>> remote node (based on snitches >>>>>>>> <https://github.com/apache/lucene-solr/blame/branch_8_6/solr/core/src/java/org/apache/solr/cloud/rule/ImplicitSnitch.java#L60>, >>>>>>>> see doc >>>>>>>> <https://lucene.apache.org/solr/guide/8_1/solrcloud-autoscaling-policy-preferences.html#node-selector>), >>>>>>>> the proposal is explicit and strongly typed (here >>>>>>>> <https://github.com/apache/lucene-solr/pull/1684/files#diff-4ec32958f54ec8e1f7e2d5ce8de331bb> >>>>>>>> example >>>>>>>> to get a specific system property from a node). This definitely does >>>>>>>> increase the number of interfaces, but reduces IMO the effort to code >>>>>>>> to >>>>>>>> these abstractions and provides a lot more compile time and IDE >>>>>>>> assistance. >>>>>>>> >>>>>>>> Goal is to hide all the boilerplate code and machinery (and to a >>>>>>>> point - complexity) in the implementations of these interfaces rather >>>>>>>> than >>>>>>>> have each plugin writer deal with the same problems. >>>>>>>> >>>>>>>> We’re moving from something that was complex and hard to read and >>>>>>>> debug yet functionally extremely rich, to something simpler for us, >>>>>>>> more >>>>>>>> demanding for users (write code rather than policy config if there's a >>>>>>>> need >>>>>>>> for new behavior) but that should not be less "expressive" in any >>>>>>>> significant way. One could even imagine reimplementing the former >>>>>>>> Autoscaling config Domain Specific Language on top of these API (maybe >>>>>>>> as a >>>>>>>> summer internship project :) >>>>>>>> >>>>>>>> *> This is a common mistake that we all do. When we design a >>>>>>>> feature we think that is the most important thing.* >>>>>>>> If by *"most important thing"* you mean investing the best >>>>>>>> reasonable effort to do things right then yes. >>>>>>>> If you mean trying to make a minor feature look more important and >>>>>>>> inflated than it is, I disagree. >>>>>>>> As a personal note, replica placement is not the aspect of >>>>>>>> SolrCloud I'm most interested in, but the first bottleneck we hit when >>>>>>>> pushing the scale of SolrCloud. I approach this with a state of mind >>>>>>>> "let's >>>>>>>> do it right and get it out of the way" to move to topics I really want >>>>>>>> to >>>>>>>> work on (around distribution in SolrCloud and the role of Overseer). >>>>>>>> Implementing Autoscaling in a way that simplifies future refactoring >>>>>>>> (or >>>>>>>> that does not make them harder than they already are) is therefore >>>>>>>> *very >>>>>>>> high* on my priority list, to support modest changes (Slice to >>>>>>>> Shard renaming) and more ambitious ones (replacing Zookeeper, >>>>>>>> removing Overseer, you name it). >>>>>>>> >>>>>>>> Thanks for reading, again sorry for the long email, but I hope this >>>>>>>> helps (at least helps the discussion), >>>>>>>> Ilan >>>>>>>> >>>>>>>> >>>>>>>> On Thu 23 Jul 2020 at 08:16, Noble Paul <notificati...@github.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I don't believe we should have a set of interfaces that duplicate >>>>>>>>> existing classes just for this functionality. This is a common >>>>>>>>> mistake that >>>>>>>>> we all do. When we design a feature we think that is the most >>>>>>>>> important >>>>>>>>> thing. We endup over designing and over engineering things. This >>>>>>>>> feature >>>>>>>>> will remain a tiny part of Solr. Anyone who wishes to implement this >>>>>>>>> should >>>>>>>>> not require to learn a lot before even getting started. Let's try to >>>>>>>>> have a >>>>>>>>> minimal set of interfaces so that people who try to implement them do >>>>>>>>> not >>>>>>>>> have a huge learning cure. >>>>>>>>> >>>>>>>>> Let's try to understand the requirement >>>>>>>>> >>>>>>>>> - Solr wants a set of positions to place a few replicas >>>>>>>>> - The implementation wants to know what is the current state >>>>>>>>> of the cluster so that it can make those decisions >>>>>>>>> >>>>>>>>> 24 interfaces to do this is definitely over engineering >>>>>>>>> >>>>>>>>> — >>>>>>>>> You are receiving this because you authored the thread. >>>>>>>>> Reply to this email directly, view it on GitHub >>>>>>>>> <https://github.com/apache/lucene-solr/pull/1684#issuecomment-662837142>, >>>>>>>>> or unsubscribe >>>>>>>>> <https://github.com/notifications/unsubscribe-auth/AKIOMCFT5GU2II347GZ4HTTR47IVTANCNFSM4PC3HDKQ> >>>>>>>>> . >>>>>>>>> >>>>>>>> >>>>>>>> >>>> >>>> -- >>>> http://www.needhamsoftware.com (work) >>>> http://www.the111shift.com (play) >>>> >>>