Re: Please explain the flow of data?

Josh Harrison Fri, 21 Mar 2014 15:37:13 -0700

Awesome, ok, thank you.
Is the logic behind not allowing storage on master nodes to both:
Take advantage of a system with limited storage resources
and
Have a dedicated results aggregator/search handler?


I can imagine if I had a particularly badly written gnarly search, trying 
to deal with the results on a master and a querying the results at the same 
time could be bad.

So in a 16 node cluster you'd want to have 9 nodes allowed to be masters, 
(n/2)+1?

Thanks again!
Josh


On Friday, March 21, 2014 3:20:24 PM UTC-7, Mark Walkom wrote:
>
> A couple of things;
>
>    1. You should have n/2+1 masters in your cluster, where n = number of 
>    nodes. This helps prevent split brain situations and is best practise.
>    2. Your master nodes can store data, this way you don't need to add 
>    more nodes to fulfil the above. 
>
> Your indexing scenario is correct. 
> For searching, replica's and primaries can be queried.
> For both - Adding more masters adds redundancy as per the first two 
> points. Adding more search nodes won't do much though other than reduce the 
> load on your masters (unless someone else can add anything I don't know :p).
>
> And for your final question, yes that is correct.
>
> To give you an idea of practical application, we don't use search nodes 
> but have 3 non-data masters that handle all queries, and a bunch of data 
> only nodes for storing everything.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com <javascript:>
> web: www.campaignmonitor.com
>
>
> On 22 March 2014 08:25, Josh Harrison <hij...@gmail.com <javascript:>>wrote:
>
>> I'm trying to build a basic understanding of how indexing and searching 
>> works, hopefully someone can either point me to good resources or explain!
>> I'm trying to figure out what having multiple "coordinator" nodes as 
>> defined in the elasticsearch.yml would do, and what having multiple "search 
>> load balancer" nodes would do. Both in the context of indexing and 
>> searching.
>> Is there a functional difference between a "coordinator" node and a 
>> "search load balancer" node, beyond the fact that a "search load balancer" 
>> node can't be elected master?
>>
>>
>> Say I have a 4 node cluster. There's a master only "coordinator" node, 
>> that doesn't store data, named "master". 
>> node.master: true
>> node.data: false
>>
>> There are three data only nodes, "A", "B" and "C" 
>> node.master: false
>> node.date: true
>>
>> I have an index "test" with two shards and one replica. Primary shard 0 
>> lives on A, primary shard 1 lives on C, replica shard 0 lives on B, replica 
>> shard 1 lives on A.
>>
>> I send the command
>> curl -XPOST http://master:9200/test/test -d '{"foo":"bar"}'
>>
>> A connection is made to master, and the data is sent to master to be 
>> indexed. Master randomly decides to place this document in shard 1, so it 
>> gets sent to the primary shard 1 on C and replica shard 1 on B, right? This 
>> is where routing can come in, I can say that that document really should go 
>> to shard 0 because I said so.
>>
>> So this is a fairly simple scenario, assuming I'm correct.
>>
>> What benefit do I get to indexing when I add more "coordinator" nodes?
>> node.master: true
>> node.data: false
>>
>> What about if I add "search load balancer" nodes?
>> node.master: false
>> node.data: false
>>
>>
>>
>> How about on the searching side of things?
>> I send a search to master,
>> curl -XPOST http://master:9200/test/test/_search -d 
>> '{"query":{"match_all":{}}}'
>>
>> Master sends these queries off to A, B and C, who each generate their own 
>> results and return them to master. Each data node queries all the relevant 
>> shards that are present locally and then combines those results for 
>> delivery to master. Do only primary shards get queried, or are replica 
>> shards queried too? 
>> Master takes these combined results from all the relevant nodes and 
>> combines them into the final query response.
>>
>> Same questions:
>> What benefit do I get to searching when I add more nodes that are like 
>> master?
>> node.master: true
>> node.data: false
>>
>> What about if I add "search load balancer" nodes?
>> node.master: false
>> node.data: false
>>  
>>
>> Is the only difference between a 
>> node.master: true
>> node.data: false
>> and a
>> node.master: false
>>  node.data: false
>> that the node is a candidate to be a master, should it be elected?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/eaff1d85-1e85-422d-bfba-9a0825ed5da9%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/eaff1d85-1e85-422d-bfba-9a0825ed5da9%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5b45303b-b012-4c3c-9bd7-86cf02d7f937%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Please explain the flow of data?

Reply via email to