Please explain the flow of data?

Josh Harrison Fri, 21 Mar 2014 14:26:55 -0700

I'm trying to build a basic understanding of how indexing and searching 
works, hopefully someone can either point me to good resources or explain!
I'm trying to figure out what having multiple "coordinator" nodes as 
defined in the elasticsearch.yml would do, and what having multiple "search 
load balancer" nodes would do. Both in the context of indexing and 
searching.
Is there a functional difference between a "coordinator" node and a "search 
load balancer" node, beyond the fact that a "search load balancer" node 
can't be elected master?

Say I have a 4 node cluster. There's a master only "coordinator" node, that
doesn't store data, named "master".
node.master: true
node.data: false

There are three data only nodes, "A", "B" and "C"
node.master: false
node.date: true

I have an index "test" with two shards and one replica. Primary shard 0
lives on A, primary shard 1 lives on C, replica shard 0 lives on B, replica
shard 1 lives on A.

I send the command
curl -XPOST http://master:9200/test/test -d '{"foo":"bar"}'

A connection is made to master, and the data is sent to master to be
indexed. Master randomly decides to place this document in shard 1, so it
gets sent to the primary shard 1 on C and replica shard 1 on B, right? This
is where routing can come in, I can say that that document really should go
to shard 0 because I said so.

So this is a fairly simple scenario, assuming I'm correct.

What benefit do I get to indexing when I add more "coordinator" nodes?
node.master: true
node.data: false

What about if I add "search load balancer" nodes?
node.master: false
node.data: false

How about on the searching side of things?
I send a search to master,
curl -XPOST http://master:9200/test/test/_search -d
'{"query":{"match_all":{}}}'

Master sends these queries off to A, B and C, who each generate their own
results and return them to master. Each data node queries all the relevant
shards that are present locally and then combines those results for
delivery to master. Do only primary shards get queried, or are replica
shards queried too?
Master takes these combined results from all the relevant nodes and
combines them into the final query response.

Same questions:
What benefit do I get to searching when I add more nodes that are like
master?
node.master: true
node.data: false

What about if I add "search load balancer" nodes?
node.master: false
node.data: false

Is the only difference between a
node.master: true
node.data: false
and a
node.master: false
node.data: false
that the node is a candidate to be a master, should it be elected?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/eaff1d85-1e85-422d-bfba-9a0825ed5da9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Please explain the flow of data?

Reply via email to