Hi

A very basic question about implementation. Best understood through the 
example of implementation.

Architecture: A 3 node cluster with single index and 32 shards. A type 
"data" contains months of data with somewhere around 40K-50K count of 
documents per month. A routing value defined using the month and year value 
is used to route this data per shard. So, in short 1 month of data goes to 
1 shard.

Requirement: Simple requirement: pass a query, get data, update each 
document and insert back to the same shard. Since the number of shards = 32 
creates 32 tasks, each task fetches 1 month of data, update it and send it 
back to ES for writing with same routing value so that it overwrites the 
previous document.

Flow:  Well the retrieval seems easy, 32 tasks created, one task per shard 
and brings the data into a single RDD. Next step update each document. Next 
is the step for writing which brings the question as follows:

How does write operation divides itself into tasks?
Doing by documentation, it depends upon the es.batch.size.bytes and 
es.batch.size.entries. The value of these two properties defines the number 
of tasks. What I presumed was RDD is again partitioned into n number of 
tasks depending upon the value specified in these parameters and then that 
many number of tasks run to index/update data. However, when I ran write 
operation with just a count of 5 documents and with es.batch.size.entries 
as 10,000 I still saw as many of 32 tasks doing a write operation on my 
es.resource. Still confused on how the task allocation works here. Can you 
please explain?

Now comes the another question: In a standalone write to ES operation, how 
does code identify which shards contains which routing value? My assumption 
was all the tasks sends the data to the ES node which then distributes the 
data itself to the shards based on the routing value just like a normal 
bulk index operation. 

Can you please explain the process of task creations for the two operations 
- read-update-write and only write.

Thanks in advance
Piyush

-- 
Please update your bookmarks! We moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d9dac53-da38-4309-8dc1-7440cb9479ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to