Re: Distributed Search and the Stale Check

2013-02-27 Thread Ryan Zezeski
On Mon, Feb 25, 2013 at 8:26 PM, Mark Miller markrmil...@gmail.com wrote:

 Please file a JIRA issue and attach your patch. Great write up! (Saw it
 pop up on twitter, so I read it a little earlier).


Done.

https://issues.apache.org/jira/browse/SOLR-4509


RE: Distributed Search and the Stale Check

2013-02-25 Thread Michael Ryan
I don't have anything to add besides saying this is awesome. Great analysis.

-Michael


Re: Distributed Search and the Stale Check

2013-02-25 Thread Mark Miller

On Feb 25, 2013, at 8:14 PM, Ryan Zezeski rzeze...@gmail.com wrote:

 I would like to see a
 similar fix made upstream and that is why I am posting here.

Please file a JIRA issue and attach your patch. Great write up! (Saw it pop up 
on twitter, so I read it a little earlier).

- Mark

Re: Distributed Search and the Stale Check

2013-02-25 Thread Yonik Seeley
 On my particular benchmark rig, each stale check call accounted for an 
 additional ~10ms.

That's insane!

It's still not even clear to me how the stale check works (reliably).
Couldn't the server still close the connection between the stale check
and the send of data by the client?

-Yonik
http://lucidworks.com


On Mon, Feb 25, 2013 at 8:14 PM, Ryan Zezeski rzeze...@gmail.com wrote:
 Hello Solr Users,

 I just wrote up a piece about some work I did recently to improve the
 throughput of distributed search.

 http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html

 The short of it is that the stale check in Apache's HTTP Client used by
 SolrJ can add a lot of latency to a distributed search request.  Especially
 given that distributed search is actually made up of 2 stages, each of
 which must perform its own stale check.  For my particular benchmark setup
 I saw a 2-4x increase in throughput and 100ms+ drop in latency.  All my
 work has been done in context of a larger project, Yokozuna [1], and thus
 the patch is currently local to that project.  I would like to see a
 similar fix made upstream and that is why I am posting here.  I was hoping
 the Solr sages could offer their input.  My fix is very basic, simply
 disabling the check and adding a sweeper thread to prevent socket reset
 errors [2].  But if I had more time I think a rewrite using the latest
 Apache HTTP Components might be in order.  I'm not sure.  I'm happy to
 answer any questions and give more details on my test setup.

 -Z

 [1] https://github.com/rzezeski/yokozuna

 [2]
 https://github.com/rzezeski/yokozuna/blob/a731748f07ee2156b5b3eb558e6b8a3efda4bfe4/solr-patches/no-stale-check.patch


Re: Distributed Search and the Stale Check

2013-02-25 Thread Ryan Zezeski
On Mon, Feb 25, 2013 at 8:42 PM, Yonik Seeley yo...@lucidworks.com wrote:


 That's insane!


It is insane.  Keep in mind this was a 5-node cluster on the
same physical machine sharing the same resources.  It consist of 5 smartos
zones on the same global zone.  On my MacBook Pro I saw ~1.5ms per stale
check but that was not under load (I'm honestly not sure if on/off load
makes a difference as it didn't seem to on my smartos cluster).  I could
probably get to the root of this with DTrace/BTrace, but alas I haven't
bothered.



 It's still not even clear to me how the stale check works (reliably).
 Couldn't the server still close the connection between the stale check
 and the send of data by the client?


The stale check isn't 100%, but it works most of the time.  As you say, the
server could close the socket between the stale check completing and the
request data being sent.  I'm pretty sure Oleg, one of the maintainers, has
said as much but I can't find the original context.

-Z