Hi A very basic question about implementation. Best understood through the example of implementation.
Architecture: A 3 node cluster with single index and 32 shards. A type "data" contains months of data with somewhere around 40K-50K count of documents per month. A routing value defined using the month and year value is used to route this data per shard. So, in short 1 month of data goes to 1 shard. Requirement: Simple requirement: pass a query, get data, update each document and insert back to the same shard. Since the number of shards = 32 creates 32 tasks, each task fetches 1 month of data, update it and send it back to ES for writing with same routing value so that it overwrites the previous document. Flow: Well the retrieval seems easy, 32 tasks created, one task per shard and brings the data into a single RDD. Next step update each document. Next is the step for writing which brings the question as follows: How does write operation divides itself into tasks? Doing by documentation, it depends upon the es.batch.size.bytes and es.batch.size.entries. The value of these two properties defines the number of tasks. What I presumed was RDD is again partitioned into n number of tasks depending upon the value specified in these parameters and then that many number of tasks run to index/update data. However, when I ran write operation with just a count of 5 documents and with es.batch.size.entries as 10,000 I still saw as many of 32 tasks doing a write operation on my es.resource. Still confused on how the task allocation works here. Can you please explain? Now comes the another question: In a standalone write to ES operation, how does code identify which shards contains which routing value? My assumption was all the tasks sends the data to the ES node which then distributes the data itself to the shards based on the routing value just like a normal bulk index operation. Can you please explain the process of task creations for the two operations - read-update-write and only write. Thanks in advance Piyush -- Please update your bookmarks! We moved to https://discuss.elastic.co/ --- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d9dac53-da38-4309-8dc1-7440cb9479ae%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.