Re: Why ES node starts recovering all the data from other nodes after reboot?
On 2014-11-22 09:35, Otis Gospodnetic wrote: Hi Konstantin, Check out http://gibrown.com/2014/11/19/elasticsearch-the-broken-bits/ Good writing! Thanks. I wonder if there's any drawback from cutting indices in smaller (tiny?) shards? My thinking is this: We don't really change data in our bigger indices, we just keep adding to them, so ultimately as we re-build node, they should all have the same version of the old shards, which should make re-start, and even re-build from backups much faster. -- http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5470DE32.5070902%40zioup.com. For more options, visit https://groups.google.com/d/optout.
Re: Why ES node starts recovering all the data from other nodes after reboot?
Thanks Nicolas. Is this true on versions 0.9, or only on 1? I've had nodes die and restart, and they did copy everything! On 2014-11-20 22:02, Nikolas Everett wrote: The thing is that this is a disk level operation. It pretty much rsyncs the files from the current master shard to the node when it comes back online. This would be OK if the replica shards matched the master but that is only normally the case if the shard was moved to the node after it was mostly complete and then you've had only a few writes. Normally shards don't match each other because the way the index is maintained is nondeterministic. The translog replay is only used as a catch up after the rsync-like step. This is something that is being worked on. Its certainly my biggest complaint about elasticsearch but I'm confident that it'll get better. Nik On Nov 20, 2014 11:11 PM, Mark Walkom markwal...@gmail.com mailto:markwal...@gmail.com wrote: It will enter recovery where it syncs at the segment level from the current primary, then the translog gets shipped over and (re)played, which brings it all up to date. On 21 November 2014 14:51, Yves Dorfsman y...@zioup.com mailto:y...@zioup.com wrote: If you do disable allocation before you reboot a node and a client writes to a shard that had a replica on that node, does the entire replica gets copied when the node come up? Or does it get just updated? On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote: You should disable allocation before you reboot, that will save a lot of shard shuffling - http://www.elasticsearch.org/__guide/en/elasticsearch/__reference/current/setup-__upgrade.html#rolling-upgrades http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com wrote: I work on an experimental cluster of ES nodes running on Windows Server machines. Once in a while we have a need to reboot machines. The initial state - cluster is green and well balanced. One machine is gracefully taken offline and then after necessary service is performed it comes back online. All the hardware and file system content is intact. As soon as ES service starts on that machine, it assumes that there is no usable data locally and recovers as much data as it deems necessary for balancing from other nodes. This behavior puzzles me, because most of the data shards stored on that machine file system can be reused as they are. Cluster stores logs, so all indices except those for the current day never ever change until they get deleted. Can't ES node detect that it has perfect copies of some (actually most) of the shards and instead of copying them over just mark them as up to date? I suspect I don't know about some step to enable this behavior and I'm looking to enable it. Any advice? Thank you! Konstantin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic
adding a new node: how to prime the data
We upgrade our clusters by adding new nodes, increase the number or replicas on the indices, let the new node catch up, then exclude the old node, and reduce the number of replicas on the indices. One cluster has a large index for which this operation takes hours. We tried to copy data from an existing node, but it copies everything regardless (I suspect it has no way to know what's new or not?). We're do plan to split that index into smaller shards, but in the meantime we are wondering if there is a better way of doing this? Thanks. --- http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/60428bf4-675b-47bd-8b8b-e90e7e967b0b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Why ES node starts recovering all the data from other nodes after reboot?
If you do disable allocation before you reboot a node and a client writes to a shard that had a replica on that node, does the entire replica gets copied when the node come up? Or does it get just updated? On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote: You should disable allocation before you reboot, that will save a lot of shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com javascript: wrote: I work on an experimental cluster of ES nodes running on Windows Server machines. Once in a while we have a need to reboot machines. The initial state - cluster is green and well balanced. One machine is gracefully taken offline and then after necessary service is performed it comes back online. All the hardware and file system content is intact. As soon as ES service starts on that machine, it assumes that there is no usable data locally and recovers as much data as it deems necessary for balancing from other nodes. This behavior puzzles me, because most of the data shards stored on that machine file system can be reused as they are. Cluster stores logs, so all indices except those for the current day never ever change until they get deleted. Can't ES node detect that it has perfect copies of some (actually most) of the shards and instead of copying them over just mark them as up to date? I suspect I don't know about some step to enable this behavior and I'm looking to enable it. Any advice? Thank you! Konstantin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: priming data for a new node
So if a shard has been updated since the data copy, will it copy the entire shard, or just update it? On Wednesday, 19 November 2014 23:34:01 UTC-7, Mark Walkom wrote: It doesn't copy everything, only what it needs to balance the shards. On 20 November 2014 17:20, Yves Dorfsman yv...@zioup.com javascript: wrote: When adding a new node to a cluster, is there a way to prevent it from having to copy all the data from the other nodes? We tried to copy the data on disk from an existing node (one that had all the data for the given indices), but it still copied everything. Is there a way to make it update what is new only? Thanks. -- http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7b30007-972b-40cb-a5b0-5eb1c1b738c5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
upgrading from 0.90.7 to 1.4. Gotchas?
Are there any precautions to take before upgrading from 0.9 to 1.4? Different data types? Different API calls? etc... And, what is the best way to upgrade? Can we just add a node at the newer version and let it pull the data? Thanks. http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
priming data for a new node
When adding a new node to a cluster, is there a way to prevent it from having to copy all the data from the other nodes? We tried to copy the data on disk from an existing node (one that had all the data for the given indices), but it still copied everything. Is there a way to make it update what is new only? Thanks. -- http://yves.zioup.com gpg: 4096R/32B0F416 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/546D8838.5070405%40zioup.com. For more options, visit https://groups.google.com/d/optout.
Is it possible to isolate search quesries to a single node
I have a job that makes heavy use to ES, to the point that it affects the cluster. Is it possible to: - add a replica - force the extra replica to a specific node - isolate some of the queries to that particular node? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2815f221-4828-4382-a246-97973cd98709%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Can a replica be updated with the deltas only?
When I shutdown a node that holds a replica and updates are happening to the rest of the cluster, then re-start this node, it seems that the entire replica is being copied again to that node. Is there a way to make ES just update that node with the updates that happened while it was down? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2d6138a0-5b4d-4ab4-9ef8-2f94beaef241%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.