Re: BadApple

Mark Miller Wed, 06 Nov 2019 10:50:59 -0800

Some of these are hard to trigger without doing a lot of other things. Like
you have to make the overseer much faster. Much as I dislike that thing,
you can make much much much faster as it is and that will help. Many many
bugs hide because we crawl.


On Wed, Nov 6, 2019 at 12:19 PM Mark Miller <[email protected]> wrote:

> Lets see as long as it's time of mind...
>
> SolrDispatchFilter should wait for a core if its loading.
> proxy remote request can be crazy - maybe less crazy if you fix other
> things, but see my starburst branch (thats missing so much good stuff :(*)
> for a better impl that uses http2.
> Get all our solrcloud tests off that solrtest4j or whatever its called
> base class. It wasn't designed for that, it causes all sorts of little
> issues.
> The way we track objects in maps in solrresourceloader - there is a nasty
> bug where we use a wrong collection field name - but also that concurrency
> is slow.
> ZkSolrResourceLoader SHOULD NOT fall back to SolrResourceLoader - a lot of
> this type of crap also hides bugs.
>
> On Wed, Nov 6, 2019 at 11:48 AM Mark Miller <[email protected]> wrote:
>
>> This BadApple stuff would have more value after more valuable work though.
>>
>> I can't stress it enough - you have to make this fast to fix it.
>>
>> I'll give you some more items to consider:
>>
>> * Our xml parsing is deathly slow and blocking. All blocking stuff when
>> cores start is death to multicore. You can use a non blocking, modern fast
>> parser to parse our docs and config.
>> * You can also find various statics that are expensive to init and block
>> - moving some of those to init right away can help multicore alot as well.
>> Getting multicore more than deathly slow is a big big help to find stuff.
>> * Making the enscryption key stuff is slow and blocks - don't make it for
>> every tests and every core when its not needed.
>> * The metrics stuff is sllooow startup and shutdown. Do that stuff in
>> parallel.
>> * SolrCoreState has issues where it doesn't always clean up - I think can
>> hurt reload the most.
>> * reload has lots of holes especially on failure cases. I don't know -
>> make more tests.
>> * Coreaware stuff and listens can be multi threaded - all that being
>> single threaded is no good - like modern hardware man.
>> * Most of the stuff people get wrong can be pulled in easy to use útil
>> classes
>> * We need to allow jetty time to stop for good startup and shutdown -
>> you have to fix other stuff first - things like the overseer make shutdown
>> a nightmare in tests.
>> * With the current Overseer it's best to reorg tests to try and shut it
>> down last. I know this sucks, fix that too.
>> * One small help, the syste doesn’t properly wait like it tries on
>> shutdown for overseer to run its queue.
>> * A lot of close and shutdown is slower and wrong order of stuff and
>> gnarly.
>> * We need to have a cluster shutdown if it will ever actually be clean -
>> how about writing to a control znode to trigger it?
>> * How about creating our znodes for a cluster up front in like an install
>> process? Right now there are many races around this. Often the config you
>> specify in tests (or more than often?) is not the one you think.
>> * We throw a lot of already close exceptions and stuff where we should
>> not - this is to get around our broken shutdown - they are bad, so fix
>> shutdown, remove them - they should only usually exist where something is
>> trying to start a resource, not use it.
>> * There also concurrency issues in SolrCores. Plus I'd speed a lot of
>> that locking up. There are searcher leaks in SolrCore as well.
>>
>> hmmm... lots more, but even that is a nice dent. Mostly make things fast,
>> the tests will start to whisper the secrets.
>>
>
>
> --
> - Mark
>
> http://about.me/markrmiller
>
-- 
- Mark

http://about.me/markrmiller

Re: BadApple

Reply via email to