WRT the JettySolrRunner tangent, I wonder if our lack of a web.xml and thus lack of metadata-complete=true in web.xml means that we wind up scanning for annotations?
On Tue, Aug 17, 2021 at 10:06 AM Gus Heck <[email protected]> wrote: > https://issues.apache.org/jira/browse/SOLR-15590 > > On Tue, Aug 17, 2021 at 9:44 AM Gus Heck <[email protected]> wrote: > >> Ok well it would be interesting to compare quickstarting vs >> JettySolrRunner, and reading up on quickstart >> <https://webtide.com/jetty-9-quick-start/> gives ideas about >> pre-building the quickstart xml, but web.xml for JettySolrRunner is a >> tangent. Either way we are still in a container and I think I hear some >> agreement that something should be done about the dispatch here, and both >> of you seem to agree that an actual servlet would make more sense than a >> filter, so I'll make a ticket/pr to make it easier to track & easier to >> read the code. >> >> On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <[email protected]> >> wrote: >> >>> The downside to respecting web.xml and making JettySolrServer serve a >>> webapp is that loading a webapp is very expensive and slow in comparison. >>> JettySolrServer actually starts up extremely quickly. It’s almost more >>> appealing to change the Server to use the JettySolrServer strategy. It’s so >>> slow to load a webapp because of all the things it needs to support and >>> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty >>> QuickStart does improve the situation if used at least. But there is really >>> no need to eat the whole webapp standard for a non webapp app to have Jetty >>> manage more of the dispatching. I always wondering about the motivation / >>> upside of hacking in a straight servlet myself. But for a non webapp stuck >>> in a servlet container, it’s actually a beautiful move that side steps a >>> bunch of slow crap even better than the QuickStart pushes ever will, and >>> still allows for matching any support we would want. >>> >>> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I >>> made a slim SolrCall base class that each extends. All that logic is hairy >>> enough to follow without them interleaving and sharing in a kind of wild >>> collaboration. It’s pretty unfortunate they both still exist. >>> >>> You are right, leaving that path in that state is a poor idea. If there >>> was anything that could be considered the “hot” path, it’s right there - >>> feverishly checking and ripping up streams for anyone and anything. Is a >>> core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path >>> down? Do a dance? Treat the string like a date and see if that works. >>> >>> So yeah, agreed, nobody does or would dispatch this way, you have to >>> frog boil into it. It’s slow, almost incomprehensible, and incredibly good >>> at holding onto life, even building on it. But being a webapp and parsing >>> web.xml has little to do with dispatching and having sensible http api that >>> doesn’t work like it’s a perl compiler. >>> >>> Anyway, I can’t imagine trying to trace through that code with any solid >>> feeling about having a handle on it via inspection. But just like Perl, if >>> you debug / trace log the hell out of it, it is all actually pretty simple >>> to adjust and reign in. >>> >>> >>> MRM >>> >>> >>> >>> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <[email protected]> wrote: >>> >>>> So I'm not interested in *adhering* to anything here, just using stuff >>>> where it helps... I see tools sitting on the shelf, already bought and paid >>>> for and carried with us to every job but then they sit in the back of the >>>> truck mostly unused... and (I think) they look useful. >>>> >>>> This should not be seen as in any way advocating a return to war file >>>> deployment or "choose your container". All of what I would suggest should >>>> likely be compatible with a jetty startup controlled by our code (I assume >>>> we can convince it to read web.xml properly when doing so). Certainly point >>>> out anything that would interfere with that. >>>> >>>> Here we sit in a container that is a tool box containing: >>>> >>>> - A nice mechanism for initializing stuff at the application level, >>>> (ServletContextListener) and a well defined guarantee that it runs >>>> before >>>> any filter or servlet, >>>> - An automatic shutdown mechanism that it will call for us on >>>> graceful shutdown (ServletContextListner again). >>>> - A nice layered filter chain mechanism which could have been used >>>> to layer in things like tracing and authentication, and close shields >>>> etc >>>> as small succinct filters rather than weaving them into an ever more >>>> complex filter & call class. >>>> - In more recent versions of the spec, for listeners defined in >>>> web.xml the order is also guaranteed. >>>> - Servlet classes that are *already* set up to distinguish http >>>> verbs automatically when desired >>>> >>>> So why is this better? Because monster classes that do a hundred things >>>> are really hard to understand and maintain. Small methods, small classes >>>> whenever possible. I also suspect that there may be some gains in >>>> performance to be had if we rely more on the container (which will already >>>> be dispatching based on path) to choose our code paths (at least at a >>>> coarse level) and then have less custom dispatch logic executed on *every* >>>> request >>>> >>>> Obviously I'm wrong and if the net result is less performant to any >>>> significant degree forget it that wouldn't be worth it. (wanted: >>>> standardized solr benchmarks) >>>> >>>> There WILL be complications with v2 because it is a subclass of >>>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it >>>> should be a separate servlet from v1, but we don't want to duplicate code >>>> either... so work to do there... >>>> >>>> I think an incremental approach is necessary since very few of us have >>>> the bandwidth to more than that and certainly it becomes difficult to find >>>> anyone with the time to review large changes. What I have thus far is >>>> stable with respect to the tests, and simplifies some stuff already which >>>> is why I chose this point to start the discussion >>>> >>>> But yeah, look at what I did and say what you like and what you don't. >>>> :). >>>> >>>> -Gus >>>> >>>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <[email protected]> >>>> wrote: >>>> >>>>> Sounds like an interesting adventure last weekend. >>>>> >>>>> I'm unclear what the point of going this direction is; my instinct is >>>>> to go the opposite direction. You seem to suggest there are some >>>>> simplification/organization benefits, which I love, so I'll need to look >>>>> at >>>>> what you've done to judge that for myself. Yes Jetty supports the Servlet >>>>> spec but we need not embrace it. Adhering to that is useful if you have a >>>>> generic web app that can be deployed to a container of the user's >>>>> convenience/choosing. No doubt this is why Solr started this way, and why >>>>> the apps I built in my early days adhered to that spec. But since 6.0, we >>>>> view Solr as self-contained and more supportable if the project makes >>>>> these >>>>> decisions and thus not needlessly constrain itself as well. >>>>> >>>>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* >>>>> and not a *Servlet* itself. >>>>> >>>>> Also, I suspect there may be complications in changes here relating to >>>>> Solr's v1 vs v2 API. And most definitely also what you discovered -- >>>>> JettySolrRunner. >>>>> >>>>> ~ David Smiley >>>>> Apache Lucene/Solr Search Developer >>>>> http://www.linkedin.com/in/davidwsmiley >>>>> >>>>> >>>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <[email protected]> wrote: >>>>> >>>>>> *TLDR:* I've got a working branch >>>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where >>>>>> CoreContainer & our startup process is extracted from SolrDispatch >>>>>> Filter. >>>>>> Do other folks think that is interesting enough that I should make a JIRA >>>>>> and/or PR with intention to make this change? >>>>>> >>>>>> *Details:* >>>>>> >>>>>> Jetty is a servlet container, yet we more or less ignore it by >>>>>> stuffing everything into a single filter (almost, admin UI is served >>>>>> separately). I'm sure there are lots of historical reasons for this, >>>>>> probably including servlet containers and their specs were much less >>>>>> mature >>>>>> when solr was first started. Maybe also the early authors were more >>>>>> focused >>>>>> on making search work than leveraging what the container could do for >>>>>> them >>>>>> (but I wasn't there for that so that's just a guess). >>>>>> >>>>>> The result is that we have a couple of very large classes that are >>>>>> touched by almost every request, and they have a lot of conditional logic >>>>>> trying to decide what the user is asking. Normally this sort of dispatch >>>>>> is >>>>>> done by the container based on the request URL to direct it to the >>>>>> appropriate servlet. Solr does a LOT of different things so this code is >>>>>> extremely tricky and complex to understand. Specifically, I'm speaking of >>>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being >>>>>> two classes probably makes them harder to understand. >>>>>> >>>>>> It seems to me (and this mail is asking if you agree) that these >>>>>> classes are long overdue for some subdivision. The most obvious thing to >>>>>> pull out is all the admin calls. Admin code paths really have little or >>>>>> nothing to do with query or update code paths since there are no >>>>>> documents >>>>>> to route or sub-requests to some subset of nodes. >>>>>> >>>>>> The primary obstacle to any such separation and simplification is >>>>>> that most requests have some interaction with CoreContainer, and the >>>>>> things >>>>>> it holds, and this is initialized and held by a field in >>>>>> SolrDispatchFilter. After spending a significant chunk of time reading >>>>>> this >>>>>> code in the prior weeks and a timely and motivating conversation with >>>>>> Eric >>>>>> Pugh, I dumped a chunk of my weekend into an experiment to see if I could >>>>>> pull CoreContainer out of the dispatch filter, and leverage the >>>>>> facilities >>>>>> of our servlet container. >>>>>> >>>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was >>>>>> very confusing, and worrisome since I've realized it's not respecting our >>>>>> web.xml at all, and any configuration in web.xml needs to be duplicated >>>>>> for >>>>>> our tests in JettySolrRunner (tangent alert) >>>>>> >>>>>> The result is that CoreContainer is now held by a class called >>>>>> CoreService (please help me name things if you don't like my names :) ). >>>>>> CoreService is a ServletContextListener, appropriately configured in >>>>>> web.xml, and has a static method that can be used to get a reference to >>>>>> the >>>>>> CoreContainer corresponding to the ServletContext in which code wanting a >>>>>> core container is running (this supports having multiple JettySolrRunners >>>>>> in tests, though probably never has more than one CoreContainer in the >>>>>> running application) >>>>>> >>>>>> I achieved this in 4 stages shown here: >>>>>> https://github.com/gus-asf/solr/tree/servlet-solr >>>>>> >>>>>> Ignore the AdminServlet class, it's a placeholder, and can be >>>>>> subtracted without harm. >>>>>> >>>>>> Since the current state of the code in that branch is apparently >>>>>> test-stable (4 runs of check in a row passing, none slower than any run >>>>>> of >>>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on >>>>>> the >>>>>> machine)... >>>>>> >>>>>> Do we want to push this refactor in now to avoid making a huge ball >>>>>> of changes that gets harder and harder to merge? The next push point >>>>>> would >>>>>> probably be when AdminServlet was functional (if/when that happens) (and >>>>>> we >>>>>> could not push that class for now). >>>>>> >>>>>> If you read this far, thanks :) I wasn't sure how feasible this would >>>>>> be so I felt the need to prove it to my self in code before wasting your >>>>>> time, but please don't hesitate to point to costs I might be missing or >>>>>> anything that looks like a mistake, or say why this was a total waste of >>>>>> time :) >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> -Gus >>>>>> -- >>>>>> http://www.needhamsoftware.com (work) >>>>>> http://www.the111shift.com (play) >>>>>> >>>>> >>>> >>>> -- >>>> http://www.needhamsoftware.com (work) >>>> http://www.the111shift.com (play) >>>> >>> -- >>> - Mark >>> >>> http://about.me/markrmiller >>> >> >> >> -- >> http://www.needhamsoftware.com (work) >> http://www.the111shift.com (play) >> > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
