This BadApple stuff would have more value after more valuable work though.

I can't stress it enough - you have to make this fast to fix it.

I'll give you some more items to consider:

* Our xml parsing is deathly slow and blocking. All blocking stuff when
cores start is death to multicore. You can use a non blocking, modern fast
parser to parse our docs and config.
* You can also find various statics that are expensive to init and block -
moving some of those to init right away can help multicore alot as well.
Getting multicore more than deathly slow is a big big help to find stuff.
* Making the enscryption key stuff is slow and blocks - don't make it for
every tests and every core when its not needed.
* The metrics stuff is sllooow startup and shutdown. Do that stuff in
parallel.
* SolrCoreState has issues where it doesn't always clean up - I think can
hurt reload the most.
* reload has lots of holes especially on failure cases. I don't know - make
more tests.
* Coreaware stuff and listens can be multi threaded - all that being single
threaded is no good - like modern hardware man.
* Most of the stuff people get wrong can be pulled in easy to use útil
classes
* We need to allow jetty time to stop for good startup and shutdown - you
have to fix other stuff first - things like the overseer make shutdown a
nightmare in tests.
* With the current Overseer it's best to reorg tests to try and shut it
down last. I know this sucks, fix that too.
* One small help, the syste doesn’t properly wait like it tries on shutdown
for overseer to run its queue.
* A lot of close and shutdown is slower and wrong order of stuff and gnarly.
* We need to have a cluster shutdown if it will ever actually be clean -
how about writing to a control znode to trigger it?
* How about creating our znodes for a cluster up front in like an install
process? Right now there are many races around this. Often the config you
specify in tests (or more than often?) is not the one you think.
* We throw a lot of already close exceptions and stuff where we should not
- this is to get around our broken shutdown - they are bad, so fix
shutdown, remove them - they should only usually exist where something is
trying to start a resource, not use it.
* There also concurrency issues in SolrCores. Plus I'd speed a lot of that
locking up. There are searcher leaks in SolrCore as well.

hmmm... lots more, but even that is a nice dent. Mostly make things fast,
the tests will start to whisper the secrets.

Reply via email to