Re: Cloud Behavior when using numShards=1

2016-12-27 Thread Dave Seltzer
Thanks Erick,

That's pretty much where I'd landed on the issue. To me Solr Cloud is
clearly the preferable option here - especially when it comes to indexing
and cluster management. I'll give "preferLocalShards" a try and see what
happens.

Many thanks for your in-depth analysis!

-Dave

Dave Seltzer 
Chief Systems Architect
TVEyes
(203) 254-3600 x222

On Tue, Dec 27, 2016 at 12:22 PM, Erick Erickson 
wrote:

> The form of the query doesn't enter into whether query is passed on to
> a different replica IIUC. preferLocalShards was created to keep this
> from happening though. There's discussion at
> https://issues.apache.org/jira/browse/SOLR-6832.
>
> BTW, "it's just a parameter". At root, the sugar methods (SolrJ, but I
> assume SolrNet too) for setting specific options (rows say) are just
> adding to an underlying map. A SolrJ example SolrQuery.setRows()
> eventually resolves itself to a Map.put("rows", ###). I'm pretty sure
> SolrNet has a generic "setParam" or similar that also just adds a
> value to the list of parameters.
>
> As for whether traditional master/slave would be a better choice...
> Since you only have one shard it's more ambiguous than if you had a
> bunch of shards.
>
> The biggest advantage you get with SolrCloud in your setup is that all
> the pesky issues about failover are handled for you.
>
> The other advantage of SolrCloud is that the client (CloudSolrClient)
> is aware of Zookeeper and can "do the right thing" when nodes come and
> go. In that setup, you don't necessarily even need a load balancer.
> AFAIK, SolrNet hasn't implemented that capability so that's
> irrelevant, and you're using HAProxy anyway so I doubt you care much.
>
> Say you're using M/S rather than SolrCloud. Now say you're indexing
> and the master fails. How difficult is it to recover? How mission
> critical is uninterrupted up-to-date service? How long can recovery
> take and not impact business unduly?
>
> A few scenarios.
>
> 1> worst case. You can't re-index from some arbitrary point in the
> past because the system-of-record isn't available. Thus if your master
> dies you may have lost documents. You really don't want M/S in this
> case.
>
> 2> next worst case, The master dies. Can you have an unchanging index
> on the replicas that you're querying while you spin up a new master
> and then re-index all your data and point your slaves at the new
> master? then M/S is fine.
>
> 3> less bad case. You're indexing and the master dies. Can you stand
> an unchanging index while you promote one of the slaves to be master
> and can pick up indexing from some time X where you're guaranteed that
> the newly-promoted master replicated from the old master? then M/S is
> fine.
>
> 4> Best case. You index once per day (or week or month or...).
> Rebuilding your entire index from the system of record takes X hours,
> and and business can wait X hours (possibly using the old index and
> not serving as many queries). M/S is simpler in this case than
> SolrCloud.
>
> So really, IMO, it's a question of whether the failover goodness you
> get with SolrCloud outweighs the complexity of maintaining Zookeeper
> and questions like you're asking now.
>
> IOW "It Depends" (tm).
>
> Best,
> Erick
>
> On Tue, Dec 27, 2016 at 7:59 AM, Dave Seltzer  wrote:
> > Hehe Good Tip :-)
> >
> > preferLocalShards may indeed be a good solution. I'll have to figure out
> > how to pass that parameter using SolrNet.
> >
> > The queries are quite complex. We're sampling audio, calculating hashes
> and
> > comparing them to known hashes.
> >
> > I'll paste an example below.
> >
> > Are nested queries more likely to be distributed in this fashion?
> >
> > -Dave
> >
> > q=_query_:"{!edismax mm=5}hashTable_0:359079936 hashTable_1:440999735
> > hashTable_2:1376147226 hashTable_3:35668745 hashTable_4:671810129
> > hashTable_5:536885545 hashTable_6:453337089 hashTable_7:1279281410
> > hashTable_8:772478009 hashTable_9:806096663 hashTable_10:1779768130
> > hashTable_11:1699416602 hashTable_12:135229216 hashTable_13:68107537
> > hashTable_14:134963224 hashTable_15:772210781 hashTable_16:51315463
> > hashTable_17:306522185 hashTable_18:575080513 hashTable_19:623118387
> > hashTable_20:1159227396 hashTable_21:907954972 hashTable_22:219782400
> > hashTable_23:268848920 hashTable_24:185729340" _query_:"{!edismax
> > mm=5}hashTable_0:830515738 hashTable_1:135401527 hashTable_2:2098135824
> > hashTable_3:2065698563 hashTable_4:672596488 hashTable_5:470813767
> > hashTable_6:453977870 hashTable_7:906104066 hashTable_8:21772611
> > hashTable_9:813630732 hashTable_10:-1973675256 hashTable_11:1577323034
> > hashTable_12:135152649 hashTable_13:236264215 hashTable_14:68300817
> > hashTable_15:85790523 hashTable_16:186191879 hashTable_17:306083351
> > hashTable_18:2011629862 hashTable_19:1364872503 hashTable_20:4128772
> > hashTable_21:689650435 hashTable_22:222499855 hashTable_23:17187346
> > 

Re: Cloud Behavior when using numShards=1

2016-12-27 Thread Erick Erickson
The form of the query doesn't enter into whether query is passed on to
a different replica IIUC. preferLocalShards was created to keep this
from happening though. There's discussion at
https://issues.apache.org/jira/browse/SOLR-6832.

BTW, "it's just a parameter". At root, the sugar methods (SolrJ, but I
assume SolrNet too) for setting specific options (rows say) are just
adding to an underlying map. A SolrJ example SolrQuery.setRows()
eventually resolves itself to a Map.put("rows", ###). I'm pretty sure
SolrNet has a generic "setParam" or similar that also just adds a
value to the list of parameters.

As for whether traditional master/slave would be a better choice...
Since you only have one shard it's more ambiguous than if you had a
bunch of shards.

The biggest advantage you get with SolrCloud in your setup is that all
the pesky issues about failover are handled for you.

The other advantage of SolrCloud is that the client (CloudSolrClient)
is aware of Zookeeper and can "do the right thing" when nodes come and
go. In that setup, you don't necessarily even need a load balancer.
AFAIK, SolrNet hasn't implemented that capability so that's
irrelevant, and you're using HAProxy anyway so I doubt you care much.

Say you're using M/S rather than SolrCloud. Now say you're indexing
and the master fails. How difficult is it to recover? How mission
critical is uninterrupted up-to-date service? How long can recovery
take and not impact business unduly?

A few scenarios.

1> worst case. You can't re-index from some arbitrary point in the
past because the system-of-record isn't available. Thus if your master
dies you may have lost documents. You really don't want M/S in this
case.

2> next worst case, The master dies. Can you have an unchanging index
on the replicas that you're querying while you spin up a new master
and then re-index all your data and point your slaves at the new
master? then M/S is fine.

3> less bad case. You're indexing and the master dies. Can you stand
an unchanging index while you promote one of the slaves to be master
and can pick up indexing from some time X where you're guaranteed that
the newly-promoted master replicated from the old master? then M/S is
fine.

4> Best case. You index once per day (or week or month or...).
Rebuilding your entire index from the system of record takes X hours,
and and business can wait X hours (possibly using the old index and
not serving as many queries). M/S is simpler in this case than
SolrCloud.

So really, IMO, it's a question of whether the failover goodness you
get with SolrCloud outweighs the complexity of maintaining Zookeeper
and questions like you're asking now.

IOW "It Depends" (tm).

Best,
Erick

On Tue, Dec 27, 2016 at 7:59 AM, Dave Seltzer  wrote:
> Hehe Good Tip :-)
>
> preferLocalShards may indeed be a good solution. I'll have to figure out
> how to pass that parameter using SolrNet.
>
> The queries are quite complex. We're sampling audio, calculating hashes and
> comparing them to known hashes.
>
> I'll paste an example below.
>
> Are nested queries more likely to be distributed in this fashion?
>
> -Dave
>
> q=_query_:"{!edismax mm=5}hashTable_0:359079936 hashTable_1:440999735
> hashTable_2:1376147226 hashTable_3:35668745 hashTable_4:671810129
> hashTable_5:536885545 hashTable_6:453337089 hashTable_7:1279281410
> hashTable_8:772478009 hashTable_9:806096663 hashTable_10:1779768130
> hashTable_11:1699416602 hashTable_12:135229216 hashTable_13:68107537
> hashTable_14:134963224 hashTable_15:772210781 hashTable_16:51315463
> hashTable_17:306522185 hashTable_18:575080513 hashTable_19:623118387
> hashTable_20:1159227396 hashTable_21:907954972 hashTable_22:219782400
> hashTable_23:268848920 hashTable_24:185729340" _query_:"{!edismax
> mm=5}hashTable_0:830515738 hashTable_1:135401527 hashTable_2:2098135824
> hashTable_3:2065698563 hashTable_4:672596488 hashTable_5:470813767
> hashTable_6:453977870 hashTable_7:906104066 hashTable_8:21772611
> hashTable_9:813630732 hashTable_10:-1973675256 hashTable_11:1577323034
> hashTable_12:135152649 hashTable_13:236264215 hashTable_14:68300817
> hashTable_15:85790523 hashTable_16:186191879 hashTable_17:306083351
> hashTable_18:2011629862 hashTable_19:1364872503 hashTable_20:4128772
> hashTable_21:689650435 hashTable_22:222499855 hashTable_23:17187346
> hashTable_24:1913783558" _query_:"{!edismax mm=5}hashTable_0:622538010
> hashTable_1:337383479 hashTable_2:-1272249576 hashTable_3:271847194
> hashTable_4:522322513 hashTable_5:1110312368 hashTable_6:-1757546994
> hashTable_7:-1939467262 hashTable_8:20196637 hashTable_9:572261655
> hashTable_10:-702476280 hashTable_11:453716754 hashTable_12:134877193
> hashTable_13:169152357 hashTable_14:136117838 hashTable_15:875044907
> hashTable_16:1797459972 hashTable_17:303711774 hashTable_18:1847132476
> hashTable_19:978126878 hashTable_20:120193028 hashTable_21:487858837
> hashTable_22:223803151 hashTable_23:-2079961818 hashTable_24:387645702"
> 

Re: Cloud Behavior when using numShards=1

2016-12-27 Thread Dave Seltzer
Hehe Good Tip :-)

preferLocalShards may indeed be a good solution. I'll have to figure out
how to pass that parameter using SolrNet.

The queries are quite complex. We're sampling audio, calculating hashes and
comparing them to known hashes.

I'll paste an example below.

Are nested queries more likely to be distributed in this fashion?

-Dave

q=_query_:"{!edismax mm=5}hashTable_0:359079936 hashTable_1:440999735
hashTable_2:1376147226 hashTable_3:35668745 hashTable_4:671810129
hashTable_5:536885545 hashTable_6:453337089 hashTable_7:1279281410
hashTable_8:772478009 hashTable_9:806096663 hashTable_10:1779768130
hashTable_11:1699416602 hashTable_12:135229216 hashTable_13:68107537
hashTable_14:134963224 hashTable_15:772210781 hashTable_16:51315463
hashTable_17:306522185 hashTable_18:575080513 hashTable_19:623118387
hashTable_20:1159227396 hashTable_21:907954972 hashTable_22:219782400
hashTable_23:268848920 hashTable_24:185729340" _query_:"{!edismax
mm=5}hashTable_0:830515738 hashTable_1:135401527 hashTable_2:2098135824
hashTable_3:2065698563 hashTable_4:672596488 hashTable_5:470813767
hashTable_6:453977870 hashTable_7:906104066 hashTable_8:21772611
hashTable_9:813630732 hashTable_10:-1973675256 hashTable_11:1577323034
hashTable_12:135152649 hashTable_13:236264215 hashTable_14:68300817
hashTable_15:85790523 hashTable_16:186191879 hashTable_17:306083351
hashTable_18:2011629862 hashTable_19:1364872503 hashTable_20:4128772
hashTable_21:689650435 hashTable_22:222499855 hashTable_23:17187346
hashTable_24:1913783558" _query_:"{!edismax mm=5}hashTable_0:622538010
hashTable_1:337383479 hashTable_2:-1272249576 hashTable_3:271847194
hashTable_4:522322513 hashTable_5:1110312368 hashTable_6:-1757546994
hashTable_7:-1939467262 hashTable_8:20196637 hashTable_9:572261655
hashTable_10:-702476280 hashTable_11:453716754 hashTable_12:134877193
hashTable_13:169152357 hashTable_14:136117838 hashTable_15:875044907
hashTable_16:1797459972 hashTable_17:303711774 hashTable_18:1847132476
hashTable_19:978126878 hashTable_20:120193028 hashTable_21:487858837
hashTable_22:223803151 hashTable_23:-2079961818 hashTable_24:387645702"
_query_:"{!edismax mm=5}hashTable_0:269046593 hashTable_1:202510337
hashTable_2:-1908118760 hashTable_3:557125123 hashTable_4:622985745
hashTable_5:1112540520 hashTable_6:-1760619239 hashTable_7:302584834
hashTable_8:774853149 hashTable_9:407637521 hashTable_10:503842575
hashTable_11:973810450 hashTable_12:386551297 hashTable_13:520687392
hashTable_14:2031254298 hashTable_15:253050461 hashTable_16:1697657095
hashTable_17:307316254 hashTable_18:321716292 hashTable_19:887500833
hashTable_20:120193028 hashTable_21:353632786 hashTable_22:221726992
hashTable_23:1359367954 hashTable_24:218981212" _query_:"{!edismax
mm=5}hashTable_0:354102618 hashTable_1:440534785 hashTable_2:1780351770
hashTable_3:35596035 hashTable_4:371327546 hashTable_5:620958505
hashTable_6:823926785 hashTable_7:106959874 hashTable_8:775171357
hashTable_9:570891537 hashTable_10:470295321 hashTable_11:823007555
hashTable_12:459162889 hashTable_13:163586959 hashTable_14:-1065149104
hashTable_15:422450690 hashTable_16:487142404 hashTable_17:222040067
hashTable_18:323450677 hashTable_19:36375841 hashTable_20:244600580
hashTable_21:1510146588 hashTable_22:571998720 hashTable_23:235287562
hashTable_24:1981482410" _query_:"{!edismax mm=5}hashTable_0:443429471
hashTable_1:437060151 hashTable_2:1145334291 hashTable_3:269043481
hashTable_4:371327531 hashTable_5:288896278 hashTable_6:19277121
hashTable_7:419565314 hashTable_8:1375944989 hashTable_9:571285015
hashTable_10:1728606735 hashTable_11:1560438339 hashTable_12:1263078657
hashTable_13:639901719 hashTable_14:980304657 hashTable_15:889786370
hashTable_16:288954532 hashTable_17:69543944 hashTable_18:52866077
hashTable_19:1174882581 hashTable_20:159002116 hashTable_21:218507036
hashTable_22:286916626 hashTable_23:17128202 hashTable_24:-1235483301"
_query_:"{!edismax mm=5}hashTable_0:1578862134 hashTable_1:439820032
hashTable_2:1715571972 hashTable_3:51184175 hashTable_4:371655241
hashTable_5:473500713 hashTable_6:20579091 hashTable_7:67600402
hashTable_8:336281885 hashTable_9:218958103 hashTable_10:170691901
hashTable_11:153224477 hashTable_12:941347926 hashTable_13:335611671
hashTable_14:352541245 hashTable_15:87010585 hashTable_16:36323236
hashTable_17:304437256 hashTable_18:1850568961 hashTable_19:34031890
hashTable_20:544884996 hashTable_21:588907548 hashTable_22:204955669
hashTable_23:1510304271 hashTable_24:555417973" _query_:"{!edismax
mm=5}hashTable_0:-1844085066 hashTable_1:441982775 hashTable_2:1176983556
hashTable_3:118293016 hashTable_4:374481425 hashTable_5:439943942
hashTable_6:19079169 hashTable_7:321782530 hashTable_8:538016737
hashTable_9:813316631 hashTable_10:169561147 hashTable_11:973210906
hashTable_12:1547197978 hashTable_13:957701387 hashTable_14:1679907747
hashTable_15:356169241 hashTable_16:1378732772 hashTable_17:313198851
hashTable_18:624714050 hashTable_19:67582263 

Re: Cloud Behavior when using numShards=1

2016-12-27 Thread Dorian Hoxha
I think solr tries itself to load balance. Read this page
https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
(preferLocalShards!)

Also please write the query.

tip: fill "send" address after completing email

On Tue, Dec 27, 2016 at 4:31 PM, Dave Seltzer  wrote:

> [Forgive the repeat here, I accidentally clicked send too early]
>
> Hi Everyone,
>
> I have a Solr index which is quite small (400,000 documents totaling 157
> MB) with a query load which is quite large. I therefore want to spread the
> load across multiple Solr servers.
>
> To accomplish this I've created a Solr Cloud cluster with two collections.
> The collections are configured with only 1 shard, but with 3 replicas in
> order to make sure that each of the three Solr servers has all of the data
> and can therefore answer any query without having to request data from
> another server. I use the following command:
>
> solr create -c sf_fingerprints -shards 1 -n fingerprints -replicationFactor
> 3
>
> I use HAProxy to spread the load across the three servers by directing the
> query to the server with the fewest current connections.
>
> However, when I turn up the load during testing I'm seeing some stuff in
> the logs of SERVER1 which makes me question my understanding of Solr Cloud:
>
> SERVER1: HttpSolrCall null:org.apache.solr.common.SolrException: Error
> trying to proxy request for url: http://SERVER3:8983/solr/sf_
> fingerprints/select 
>
> I'm curious why SERVER1 would be proxying requests to SERVER3 in a
> situation where the sf_fingerprints index is completely present on the
> local system.
>
> Is this a situation where I should be using generic replication rather than
> Cloud?
>
> Many thanks!
>
> -Dave
>


Cloud Behavior when using numShards=1

2016-12-27 Thread Dave Seltzer
[Forgive the repeat here, I accidentally clicked send too early]

Hi Everyone,

I have a Solr index which is quite small (400,000 documents totaling 157
MB) with a query load which is quite large. I therefore want to spread the
load across multiple Solr servers.

To accomplish this I've created a Solr Cloud cluster with two collections.
The collections are configured with only 1 shard, but with 3 replicas in
order to make sure that each of the three Solr servers has all of the data
and can therefore answer any query without having to request data from
another server. I use the following command:

solr create -c sf_fingerprints -shards 1 -n fingerprints -replicationFactor
3

I use HAProxy to spread the load across the three servers by directing the
query to the server with the fewest current connections.

However, when I turn up the load during testing I'm seeing some stuff in
the logs of SERVER1 which makes me question my understanding of Solr Cloud:

SERVER1: HttpSolrCall null:org.apache.solr.common.SolrException: Error
trying to proxy request for url: http://SERVER3:8983/solr/sf_
fingerprints/select 

I'm curious why SERVER1 would be proxying requests to SERVER3 in a
situation where the sf_fingerprints index is completely present on the
local system.

Is this a situation where I should be using generic replication rather than
Cloud?

Many thanks!

-Dave