Re: Distributed Search and the Stale Check
On Mon, Feb 25, 2013 at 8:26 PM, Mark Miller markrmil...@gmail.com wrote: Please file a JIRA issue and attach your patch. Great write up! (Saw it pop up on twitter, so I read it a little earlier). Done. https://issues.apache.org/jira/browse/SOLR-4509
RE: Distributed Search and the Stale Check
I don't have anything to add besides saying this is awesome. Great analysis. -Michael
Re: Distributed Search and the Stale Check
On Feb 25, 2013, at 8:14 PM, Ryan Zezeski rzeze...@gmail.com wrote: I would like to see a similar fix made upstream and that is why I am posting here. Please file a JIRA issue and attach your patch. Great write up! (Saw it pop up on twitter, so I read it a little earlier). - Mark
Re: Distributed Search and the Stale Check
On my particular benchmark rig, each stale check call accounted for an additional ~10ms. That's insane! It's still not even clear to me how the stale check works (reliably). Couldn't the server still close the connection between the stale check and the send of data by the client? -Yonik http://lucidworks.com On Mon, Feb 25, 2013 at 8:14 PM, Ryan Zezeski rzeze...@gmail.com wrote: Hello Solr Users, I just wrote up a piece about some work I did recently to improve the throughput of distributed search. http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html The short of it is that the stale check in Apache's HTTP Client used by SolrJ can add a lot of latency to a distributed search request. Especially given that distributed search is actually made up of 2 stages, each of which must perform its own stale check. For my particular benchmark setup I saw a 2-4x increase in throughput and 100ms+ drop in latency. All my work has been done in context of a larger project, Yokozuna [1], and thus the patch is currently local to that project. I would like to see a similar fix made upstream and that is why I am posting here. I was hoping the Solr sages could offer their input. My fix is very basic, simply disabling the check and adding a sweeper thread to prevent socket reset errors [2]. But if I had more time I think a rewrite using the latest Apache HTTP Components might be in order. I'm not sure. I'm happy to answer any questions and give more details on my test setup. -Z [1] https://github.com/rzezeski/yokozuna [2] https://github.com/rzezeski/yokozuna/blob/a731748f07ee2156b5b3eb558e6b8a3efda4bfe4/solr-patches/no-stale-check.patch
Re: Distributed Search and the Stale Check
On Mon, Feb 25, 2013 at 8:42 PM, Yonik Seeley yo...@lucidworks.com wrote: That's insane! It is insane. Keep in mind this was a 5-node cluster on the same physical machine sharing the same resources. It consist of 5 smartos zones on the same global zone. On my MacBook Pro I saw ~1.5ms per stale check but that was not under load (I'm honestly not sure if on/off load makes a difference as it didn't seem to on my smartos cluster). I could probably get to the root of this with DTrace/BTrace, but alas I haven't bothered. It's still not even clear to me how the stale check works (reliably). Couldn't the server still close the connection between the stale check and the send of data by the client? The stale check isn't 100%, but it works most of the time. As you say, the server could close the socket between the stale check completing and the request data being sent. I'm pretty sure Oleg, one of the maintainers, has said as much but I can't find the original context. -Z