Re: Solr and Jetty and Servlets

Gus Heck Tue, 17 Aug 2021 06:45:16 -0700

Ok well it would be interesting to compare quickstarting vs
JettySolrRunner, and reading up on quickstart
<https://webtide.com/jetty-9-quick-start/> gives ideas about pre-building
the quickstart xml, but web.xml for JettySolrRunner is a tangent. Either
way we are still in a container and I think I hear some agreement that
something should be done about the dispatch here, and both of you seem to
agree that an actual servlet would make more sense than a filter, so I'll
make a ticket/pr to make it easier to track & easier to read the code.


On Tue, Aug 17, 2021 at 1:41 AM Mark Miller <[email protected]> wrote:

> The downside to respecting web.xml and making JettySolrServer serve a
> webapp is that loading a webapp is very expensive and slow in comparison.
> JettySolrServer actually starts up extremely quickly. It’s almost more
> appealing to change the Server to use the JettySolrServer strategy. It’s so
> slow to load a webapp because of all the things it needs to support and
> scan jars for - in a kind of JavaEE situation, though not as bad. Jetty
> QuickStart does improve the situation if used at least. But there is really
> no need to eat the whole webapp standard for a non webapp app to have Jetty
> manage more of the dispatching. I always wondering about the motivation /
> upside of hacking in a straight servlet myself. But for a non webapp stuck
> in a servlet container, it’s actually a beautiful move that side steps a
> bunch of slow crap even better than the QuickStart pushes ever will, and
> still allows for matching any support we would want.
>
> The HttpSolrCallV2 extending HttpSolrCall is horrendous. Personally, I
> made a slim SolrCall base class that each extends. All that logic is hairy
> enough to follow without them interleaving and sharing in a kind of wild
> collaboration. It’s pretty unfortunate they both still exist.
>
> You are right, leaving that path in that state is a poor idea. If there
> was anything that could be considered the “hot” path, it’s right there -
> feverishly checking and ripping up streams for anyone and anything. Is a
> core? A collection? A handler? Maybe a V2 handler? What I’d we cut the path
> down? Do a dance? Treat the string like a date and see if that works.
>
> So yeah, agreed, nobody does or would dispatch this way, you have to frog
> boil into it. It’s slow, almost incomprehensible, and incredibly good at
> holding onto life, even building on it. But being a webapp and parsing
> web.xml has little to do with dispatching and having sensible http api that
> doesn’t work like it’s a perl compiler.
>
> Anyway, I can’t imagine trying to trace through that code with any solid
> feeling about having a handle on it via inspection. But just like Perl, if
> you debug / trace log the hell out of it, it is all actually pretty simple
> to adjust and reign in.
>
>
> MRM
>
>
>
> On Mon, Aug 16, 2021 at 10:39 PM Gus Heck <[email protected]> wrote:
>
>> So I'm not interested in *adhering* to anything here, just using stuff
>> where it helps... I see tools sitting on the shelf, already bought and paid
>> for and carried with us to every job but then they sit in the back of the
>> truck mostly unused... and (I think) they look useful.
>>
>> This should not be seen as in any way advocating a return to war file
>> deployment or "choose your container". All of what I would suggest should
>> likely be compatible with a jetty startup controlled by our code (I assume
>> we can convince it to read web.xml properly when doing so). Certainly point
>> out anything that would interfere with that.
>>
>> Here we sit in a container that is a tool box containing:
>>
>>    - A nice mechanism for initializing stuff at the application level,
>>    (ServletContextListener) and a well defined guarantee that it runs before
>>    any filter or servlet,
>>    - An automatic shutdown mechanism that it will call for us on
>>    graceful shutdown (ServletContextListner again).
>>    - A nice layered filter chain mechanism which could have been used to
>>    layer in things like tracing and authentication, and close shields etc as
>>    small succinct filters rather than weaving them into an ever more complex
>>    filter & call class.
>>    - In more recent versions of the spec, for listeners defined in
>>    web.xml the order is also guaranteed.
>>    - Servlet classes that are *already* set up to distinguish http verbs
>>    automatically when desired
>>
>> So why is this better? Because monster classes that do a hundred things
>> are really hard to understand and maintain. Small methods, small classes
>> whenever possible. I also suspect that there may be some gains in
>> performance to be had if we rely more on the container (which will already
>> be dispatching based on path) to choose our code paths (at least at a
>> coarse level) and then have less custom dispatch logic executed on *every*
>> request
>>
>> Obviously I'm wrong and if the net result is less performant to any
>> significant degree forget it that wouldn't be worth it. (wanted:
>> standardized solr benchmarks)
>>
>> There WILL be complications with v2 because it is a subclass of
>> HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
>> should be a separate servlet from v1, but we don't want to duplicate code
>> either... so work to do there...
>>
>> I think an incremental approach is necessary since very few of us have
>> the bandwidth to more than that and certainly it becomes difficult to find
>> anyone with the time to review large changes. What I have thus far is
>> stable with respect to the tests, and simplifies some stuff already which
>> is why I chose this point to start the discussion
>>
>> But yeah, look at what I did and say what you like and what you don't.
>> :).
>>
>> -Gus
>>
>> On Mon, Aug 16, 2021 at 9:42 PM David Smiley <[email protected]> wrote:
>>
>>> Sounds like an interesting adventure last weekend.
>>>
>>> I'm unclear what the point of going this direction is; my instinct is to
>>> go the opposite direction.  You seem to suggest there are some
>>> simplification/organization benefits, which I love, so I'll need to look at
>>> what you've done to judge that for myself.  Yes Jetty supports the Servlet
>>> spec but we need not embrace it.  Adhering to that is useful if you have a
>>> generic web app that can be deployed to a container of the user's
>>> convenience/choosing.  No doubt this is why Solr started this way, and why
>>> the apps I built in my early days adhered to that spec.  But since 6.0, we
>>> view Solr as self-contained and more supportable if the project makes these
>>> decisions and thus not needlessly constrain itself as well.
>>>
>>> It is super weird to me that SolrDispatchFilter is a Servlet *Filter*
>>> and not a *Servlet* itself.
>>>
>>> Also, I suspect there may be complications in changes here relating to
>>> Solr's v1 vs v2 API.  And most definitely also what you discovered --
>>> JettySolrRunner.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <[email protected]> wrote:
>>>
>>>> *TLDR:* I've got a working branch
>>>> <https://github.com/gus-asf/solr/tree/servlet-solr> where
>>>> CoreContainer & our startup process is extracted from SolrDispatch Filter.
>>>> Do other folks think that is interesting enough that I should make a JIRA
>>>> and/or PR with intention to make this change?
>>>>
>>>> *Details:*
>>>>
>>>> Jetty is a servlet container, yet we more or less ignore it by stuffing
>>>> everything into a single filter (almost, admin UI is served separately).
>>>> I'm sure there are lots of historical reasons for this, probably including
>>>> servlet containers and their specs were much less mature when solr was
>>>> first started. Maybe also the early authors were more focused on making
>>>> search work than leveraging what the container could do for them (but I
>>>> wasn't there for that so that's just a guess).
>>>>
>>>> The result is that we have a couple of very large classes that are
>>>> touched by almost every request, and they have a lot of conditional logic
>>>> trying to decide what the user is asking. Normally this sort of dispatch is
>>>> done by the container based on the request URL to direct it to the
>>>> appropriate servlet. Solr does a LOT of different things so this code is
>>>> extremely tricky and complex to understand. Specifically, I'm speaking of
>>>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>>>> two classes probably makes them harder to understand.
>>>>
>>>> It seems to me (and this mail is asking if you agree) that these
>>>> classes are long overdue for some subdivision. The most obvious thing to
>>>> pull out is all the admin calls. Admin code paths really have little or
>>>> nothing to do with query or update code paths since there are no documents
>>>> to route or sub-requests to some subset of nodes.
>>>>
>>>> The primary obstacle to any such separation and simplification is that
>>>> most requests have some interaction with CoreContainer, and the things it
>>>> holds, and this is initialized and held by a field in SolrDispatchFilter.
>>>> After spending a significant chunk of time reading this code in the prior
>>>> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
>>>> chunk of my weekend into an experiment to see if I could pull CoreContainer
>>>> out of the dispatch filter, and leverage the facilities of our servlet
>>>> container.
>>>>
>>>> That wasn't too terribly hard, but keeping JettySolrRunner happy was
>>>> very confusing, and worrisome since I've realized it's not respecting our
>>>> web.xml at all, and any configuration in web.xml needs to be duplicated for
>>>> our tests in JettySolrRunner (tangent alert)
>>>>
>>>> The result is that CoreContainer is now held by a class called
>>>> CoreService (please help me name things if you don't like my names :) ).
>>>> CoreService is a ServletContextListener, appropriately configured in
>>>> web.xml, and has a static method that can be used to get a reference to the
>>>> CoreContainer corresponding to the ServletContext in which code wanting a
>>>> core container is running (this supports having multiple JettySolrRunners
>>>> in tests, though probably never has more than one CoreContainer in the
>>>> running application)
>>>>
>>>> I achieved this in 4 stages shown here:
>>>> https://github.com/gus-asf/solr/tree/servlet-solr
>>>>
>>>> Ignore the AdminServlet class, it's a placeholder, and can be
>>>> subtracted without harm.
>>>>
>>>> Since the current state of the code in that branch is apparently
>>>> test-stable (4 runs of check in a row passing, none slower than any run of
>>>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>>>> machine)...
>>>>
>>>> Do we want to push this refactor in now to avoid making a huge ball of
>>>> changes that gets harder and harder to merge? The next push point would
>>>> probably be when AdminServlet was functional (if/when that happens) (and we
>>>> could not push that class for now).
>>>>
>>>> If you read this far, thanks :) I wasn't sure how feasible this would
>>>> be so I felt the need to prove it to my self in code before wasting your
>>>> time, but please don't hesitate to point to costs I might be missing or
>>>> anything that looks like a mistake, or say why this was a total waste of
>>>> time :)
>>>>
>>>> Thoughts?
>>>>
>>>> -Gus
>>>> --
>>>> http://www.needhamsoftware.com (work)
>>>> http://www.the111shift.com (play)
>>>>
>>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
> --
> - Mark
>
> http://about.me/markrmiller
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Reply via email to