Re: Solr and Jetty and Servlets

Gus Heck Mon, 16 Aug 2021 20:39:23 -0700

So I'm not interested in *adhering* to anything here, just using stuff
where it helps... I see tools sitting on the shelf, already bought and paid
for and carried with us to every job but then they sit in the back of the
truck mostly unused... and (I think) they look useful.


This should not be seen as in any way advocating a return to war file
deployment or "choose your container". All of what I would suggest should
likely be compatible with a jetty startup controlled by our code (I assume
we can convince it to read web.xml properly when doing so). Certainly point
out anything that would interfere with that.

Here we sit in a container that is a tool box containing:

   - A nice mechanism for initializing stuff at the application level,
   (ServletContextListener) and a well defined guarantee that it runs before
   any filter or servlet,
   - An automatic shutdown mechanism that it will call for us on graceful
   shutdown (ServletContextListner again).
   - A nice layered filter chain mechanism which could have been used to
   layer in things like tracing and authentication, and close shields etc as
   small succinct filters rather than weaving them into an ever more complex
   filter & call class.
   - In more recent versions of the spec, for listeners defined in web.xml
   the order is also guaranteed.
   - Servlet classes that are *already* set up to distinguish http verbs
   automatically when desired

So why is this better? Because monster classes that do a hundred things are
really hard to understand and maintain. Small methods, small classes
whenever possible. I also suspect that there may be some gains in
performance to be had if we rely more on the container (which will already
be dispatching based on path) to choose our code paths (at least at a
coarse level) and then have less custom dispatch logic executed on *every*
request

Obviously I'm wrong and if the net result is less performant to any
significant degree forget it that wouldn't be worth it. (wanted:
standardized solr benchmarks)

There WILL be complications with v2 because it is a subclass of
HttpSolrCall, which will take a bit of teasing apart for sure. Ideally it
should be a separate servlet from v1, but we don't want to duplicate code
either... so work to do there...

I think an incremental approach is necessary since very few of us have the
bandwidth to more than that and certainly it becomes difficult to find
anyone with the time to review large changes. What I have thus far is
stable with respect to the tests, and simplifies some stuff already which
is why I chose this point to start the discussion

But yeah, look at what I did and say what you like and what you don't. :).

-Gus

On Mon, Aug 16, 2021 at 9:42 PM David Smiley <[email protected]> wrote:

> Sounds like an interesting adventure last weekend.
>
> I'm unclear what the point of going this direction is; my instinct is to
> go the opposite direction.  You seem to suggest there are some
> simplification/organization benefits, which I love, so I'll need to look at
> what you've done to judge that for myself.  Yes Jetty supports the Servlet
> spec but we need not embrace it.  Adhering to that is useful if you have a
> generic web app that can be deployed to a container of the user's
> convenience/choosing.  No doubt this is why Solr started this way, and why
> the apps I built in my early days adhered to that spec.  But since 6.0, we
> view Solr as self-contained and more supportable if the project makes these
> decisions and thus not needlessly constrain itself as well.
>
> It is super weird to me that SolrDispatchFilter is a Servlet *Filter* and
> not a *Servlet* itself.
>
> Also, I suspect there may be complications in changes here relating to
> Solr's v1 vs v2 API.  And most definitely also what you discovered --
> JettySolrRunner.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Aug 16, 2021 at 11:37 AM Gus Heck <[email protected]> wrote:
>
>> *TLDR:* I've got a working branch
>> <https://github.com/gus-asf/solr/tree/servlet-solr> where CoreContainer
>> & our startup process is extracted from SolrDispatch Filter. Do other folks
>> think that is interesting enough that I should make a JIRA and/or PR with
>> intention to make this change?
>>
>> *Details:*
>>
>> Jetty is a servlet container, yet we more or less ignore it by stuffing
>> everything into a single filter (almost, admin UI is served separately).
>> I'm sure there are lots of historical reasons for this, probably including
>> servlet containers and their specs were much less mature when solr was
>> first started. Maybe also the early authors were more focused on making
>> search work than leveraging what the container could do for them (but I
>> wasn't there for that so that's just a guess).
>>
>> The result is that we have a couple of very large classes that are
>> touched by almost every request, and they have a lot of conditional logic
>> trying to decide what the user is asking. Normally this sort of dispatch is
>> done by the container based on the request URL to direct it to the
>> appropriate servlet. Solr does a LOT of different things so this code is
>> extremely tricky and complex to understand. Specifically, I'm speaking of
>> SolrDispatchFilter and HttpSolrCall, which are so inseparable that being
>> two classes probably makes them harder to understand.
>>
>> It seems to me (and this mail is asking if you agree) that these classes
>> are long overdue for some subdivision. The most obvious thing to pull out
>> is all the admin calls. Admin code paths really have little or nothing to
>> do with query or update code paths since there are no documents to route or
>> sub-requests to some subset of nodes.
>>
>> The primary obstacle to any such separation and simplification is that
>> most requests have some interaction with CoreContainer, and the things it
>> holds, and this is initialized and held by a field in SolrDispatchFilter.
>> After spending a significant chunk of time reading this code in the prior
>> weeks and a timely and motivating conversation with Eric Pugh, I dumped a
>> chunk of my weekend into an experiment to see if I could pull CoreContainer
>> out of the dispatch filter, and leverage the facilities of our servlet
>> container.
>>
>> That wasn't too terribly hard, but keeping JettySolrRunner happy was very
>> confusing, and worrisome since I've realized it's not respecting our
>> web.xml at all, and any configuration in web.xml needs to be duplicated for
>> our tests in JettySolrRunner (tangent alert)
>>
>> The result is that CoreContainer is now held by a class called
>> CoreService (please help me name things if you don't like my names :) ).
>> CoreService is a ServletContextListener, appropriately configured in
>> web.xml, and has a static method that can be used to get a reference to the
>> CoreContainer corresponding to the ServletContext in which code wanting a
>> core container is running (this supports having multiple JettySolrRunners
>> in tests, though probably never has more than one CoreContainer in the
>> running application)
>>
>> I achieved this in 4 stages shown here:
>> https://github.com/gus-asf/solr/tree/servlet-solr
>>
>> Ignore the AdminServlet class, it's a placeholder, and can be subtracted
>> without harm.
>>
>> Since the current state of the code in that branch is apparently
>> test-stable (4 runs of check in a row passing, none slower than any run of
>> 3 runs of main, both as low as 10.5 min if I don't continue working on the
>> machine)...
>>
>> Do we want to push this refactor in now to avoid making a huge ball of
>> changes that gets harder and harder to merge? The next push point would
>> probably be when AdminServlet was functional (if/when that happens) (and we
>> could not push that class for now).
>>
>> If you read this far, thanks :) I wasn't sure how feasible this would be
>> so I felt the need to prove it to my self in code before wasting your time,
>> but please don't hesitate to point to costs I might be missing or anything
>> that looks like a mistake, or say why this was a total waste of time :)
>>
>> Thoughts?
>>
>> -Gus
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Solr and Jetty and Servlets

Reply via email to