Re: Scaling strategies without shard splitting

Ian Rose Fri, 17 Oct 2014 13:21:07 -0700

Hey Nik -

Thanks for the response.


- Ian


On Mon, Oct 13, 2014 at 4:28 PM, Nikolas Everett <nik9...@gmail.com> wrote:

>
>
> On Mon, Oct 13, 2014 at 11:12 AM, Ian Rose <ianr...@fullstory.com> wrote:
>
> Hi -
>>
>> My team has used Solr in it's single-node configuration (without
>> SolrCloud) for a few years now.  In our current product we are now looking
>> at transitioning to SolrCloud, but before we made that leap I wanted to
>> also take a good look at whether ElasticSearch would be a better fit for
>> our needs.  Although ES has some nice advantages (such as automatic shard
>> rebalancing) I'm trying to figure out how to live in a world without shard
>> splitting.  In brief, our situation is as follows:
>>
>>  - We use one index ("collection" in Solr) per customer.
>>  - The indexes are going to vary quite a bit in size, following something
>> like a power-law distribution with many small indexes (let's guess < 250k
>> documents), some medium sized indexes (up to a few million documents) and a
>> few large indexes (hundreds of millions of documents).
>>  - So the number of shards required per index will vary greatly, and will
>> be hard to predict accurately at creation time.
>>
>> How do people generally approach this kind of problem?  Do you just make
>> a best guess at the appropriate number of shards for each new index and
>> then do a full re-index (with more shards) if the number of documents grows
>> bigger than expected?
>>
>>
> I'm in a pretty similar boat and have done just fine without shard
> splitting.  I maintain the search index for about 900 wikis
> <http://noc.wikimedia.org/conf/all.dblist>.  Each wiki gets two
> Elasticsearch indexes and those indexes vary in size, update rate, and
> query rate a ton.  Most wikis get a single shard for all of there indexes
> but many of them use more
> <https://git.wikimedia.org/blob/operations%2Fmediawiki-config.git/747fc7436226774d1735775c2ef41c911d59b5d2/wmf-config%2FInitialiseSettings.php#L13828>.
> I basically just guestimated and reindexed the ones that were too big into
> more shards.
>
> We have a script that creates a new index with new configuration and then
> copies all the document from the old index to the new one and then swap the
> aliases (that we use for updates and queries) to the new index.  Then it
> re-does any updates or deletes that occurred since copy script started.
> Having something like that is pretty common.  I rarely use it to change
> sharding configuration - its much more common that I'll use it to change
> how a field in the document is analyzed.
>
> Elasticsearch also has another way to handle this problem (we don't use it
> for other reasons) where you create a single index for all customers and
> then filter them at query time.  You also add routing values to your
> documents and queries so all documents from the same customer get routed to
> the same shard.  That way you can serve queries for a single customer out
> of one shard which is pretty cool.  For larger customers that don't fit on
> a single shard you still create indexes just for them.
>
> One thing to watch out for, though, is that Elasticsearch doesn't use the
> shard's size when determining where to place the shard.  It'll check to
> make sure the shard won't fill the disk beyond some percentage but it won't
> try to spread out the large shards so you can get somewhat unbalanced disk
> usage.  I have an open pull request for something to do that so probably
> won't be true forever but it is true for now.
>
> How big are your documents and how frequently do you think you'll need
> shard splitting?  If your documents are pretty small you may be able to get
> away with just reindexing all of them for the customer when you need more
> shards like I do.  It sure isn't optimal but it gets the job done.
>
> Another way to do things is once your customers get too big you create a
> new index and route all of their new data there.  You have to query both
> indexes.  This is _kindof_ how people handle log messages and it might
> work, depending on your use case.
>
> Nik
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/5JTYFC93jS8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd051yRH2AiG7ZsSPR_zD2a%3DMfaRcWFywyPfsfSPsyBf4Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAPmjWd051yRH2AiG7ZsSPR_zD2a%3DMfaRcWFywyPfsfSPsyBf4Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALswpfCnkpu3RqAOowcNhZCAaxyT_Q9CKbv28Y_uEKA0UDjyiw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling strategies without shard splitting

Reply via email to