Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-22 Thread Yves Dorfsman
On 2014-11-22 09:35, Otis Gospodnetic wrote:
 Hi Konstantin,
 
 Check out http://gibrown.com/2014/11/19/elasticsearch-the-broken-bits/
 

Good writing! Thanks.

I wonder if there's any drawback from cutting indices in smaller (tiny?) shards?

My thinking is this: We don't really change data in our bigger indices, we
just keep adding to them, so ultimately as we re-build node, they should all
have the same version of the old shards, which should make re-start, and even
re-build from backups much faster.

-- 
http://yves.zioup.com
gpg: 4096R/32B0F416

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5470DE32.5070902%40zioup.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-21 Thread Yves Dorfsman
Thanks Nicolas.

Is this true on versions 0.9, or only on  1?
I've had nodes die and restart, and they did copy everything!

On 2014-11-20 22:02, Nikolas Everett wrote:
 The thing is that this is a disk level operation. It pretty much rsyncs the
 files from the current master shard to the node when it comes back online.
 This would be OK if the replica shards matched the master but that is only
 normally the case if the shard was moved to the node after it was mostly
 complete and then you've had only a few writes. Normally shards don't match
 each other because the way the index is maintained is nondeterministic.
 
 The translog replay is only used as a catch up after the rsync-like step.
 
 This is something that is being worked on. Its certainly my biggest complaint
 about elasticsearch but I'm confident that it'll get better.
 
 Nik
 
 On Nov 20, 2014 11:11 PM, Mark Walkom markwal...@gmail.com
 mailto:markwal...@gmail.com wrote:
 
 It will enter recovery where it syncs at the segment level from the
 current primary, then the translog gets shipped over and (re)played, which
 brings it all up to date.
 
 On 21 November 2014 14:51, Yves Dorfsman y...@zioup.com
 mailto:y...@zioup.com wrote:
 
 
 If you do disable allocation before you reboot a node and a client
 writes to a shard that had a replica on that node, does the entire
 replica gets copied when the node come up? Or does it get just 
 updated?
 
 On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:
 
 You should disable allocation before you reboot, that will save a
 lot of shard shuffling -
 
 http://www.elasticsearch.org/__guide/en/elasticsearch/__reference/current/setup-__upgrade.html#rolling-upgrades
 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades
 
 On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com 
 wrote:
 
 I work on an experimental cluster of ES nodes running on
 Windows Server machines. Once in a while we have a need to
 reboot machines. The initial state - cluster is green and well
 balanced. One machine is gracefully taken offline and then
 after necessary service is performed it comes back online. All
 the hardware and file system content is intact. As soon as ES
 service starts on that machine, it assumes that there is no
 usable data locally and recovers as much data as it deems
 necessary for balancing from other nodes. 
 
 This behavior puzzles me, because most of the data shards
 stored on that machine file system can be reused as they are.
 Cluster stores logs, so all indices except those for
 the current day never ever change until they get deleted.
 Can't ES node detect that it has perfect copies of some
 (actually most) of the shards and instead of copying them over
 just mark them as up to date? 
 
 I suspect I don't know about some step to enable this behavior
 and I'm looking to enable it. Any advice? 
 
 Thank you!
 Konstantin
 
 -- 
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearch+unsubscr...@googlegroups.com
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 
 https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com
 
 https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 
 https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com
 
 https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com?utm_medium=emailutm_source=footer.
 For more options, visit https://groups.google.com/d/optout.
 
 -- 
 You received this message because you are subscribed to a topic in the Google
 Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic

adding a new node: how to prime the data

2014-11-20 Thread Yves Dorfsman
We upgrade our clusters by adding new nodes, increase the number or 
replicas on the indices, let the new node catch up, then exclude the old 
node, and reduce the number of replicas on the indices.

One cluster has a large index for which this operation takes hours. We 
tried to copy data from an existing node, but it copies everything 
regardless (I suspect it has no way to know what's new or not?). We're do 
plan to split that index into smaller shards, but in the meantime we are 
wondering if there is a better way of doing this?

Thanks.

---
http://yves.zioup.com
gpg: 4096R/32B0F416 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/60428bf4-675b-47bd-8b8b-e90e7e967b0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Yves Dorfsman

If you do disable allocation before you reboot a node and a client writes 
to a shard that had a replica on that node, does the entire replica gets 
copied when the node come up? Or does it get just updated?

On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:

 You should disable allocation before you reboot, that will save a lot of 
 shard shuffling - 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades

 On 21 November 2014 13:48, Konstantin Erman kon...@gmail.com 
 javascript: wrote:

 I work on an experimental cluster of ES nodes running on Windows Server 
 machines. Once in a while we have a need to reboot machines. The initial 
 state - cluster is green and well balanced. One machine is gracefully taken 
 offline and then after necessary service is performed it comes back online. 
 All the hardware and file system content is intact. As soon as ES service 
 starts on that machine, it assumes that there is no usable data locally and 
 recovers as much data as it deems necessary for balancing from other nodes. 

 This behavior puzzles me, because most of the data shards stored on that 
 machine file system can be reused as they are. Cluster stores logs, so all 
 indices except those for the current day never ever change until they get 
 deleted. Can't ES node detect that it has perfect copies of some (actually 
 most) of the shards and instead of copying them over just mark them as up 
 to date? 

 I suspect I don't know about some step to enable this behavior and I'm 
 looking to enable it. Any advice? 

 Thank you!
 Konstantin



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: priming data for a new node

2014-11-20 Thread Yves Dorfsman

So if a shard has been updated since the data copy, will it copy the entire 
shard, or just update it?

On Wednesday, 19 November 2014 23:34:01 UTC-7, Mark Walkom wrote:

 It doesn't copy everything, only what it needs to balance the shards.

 On 20 November 2014 17:20, Yves Dorfsman yv...@zioup.com javascript: 
 wrote:

 When adding a new node to a cluster, is there a way to prevent it from 
 having
 to copy all the data from the other nodes?

 We tried to copy the data on disk from an existing node (one that had all 
 the
 data for the given indices), but it still copied everything. Is there a 
 way to
 make it update what is new only?

 Thanks.

 --
 http://yves.zioup.com
 gpg: 4096R/32B0F416




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7b30007-972b-40cb-a5b0-5eb1c1b738c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


upgrading from 0.90.7 to 1.4. Gotchas?

2014-11-19 Thread Yves Dorfsman
Are there any precautions to take before upgrading from 0.9 to 1.4?

Different data types?
Different API calls?
etc...

And, what is the best way to upgrade? Can we just add a node at the newer 
version and let it pull the data?

Thanks.

http://yves.zioup.com
gpg: 4096R/32B0F416 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3c6b6789-de98-40d4-9532-ae78b5465c4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


priming data for a new node

2014-11-19 Thread Yves Dorfsman
When adding a new node to a cluster, is there a way to prevent it from having
to copy all the data from the other nodes?

We tried to copy the data on disk from an existing node (one that had all the
data for the given indices), but it still copied everything. Is there a way to
make it update what is new only?

Thanks.

-- 
http://yves.zioup.com
gpg: 4096R/32B0F416

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/546D8838.5070405%40zioup.com.
For more options, visit https://groups.google.com/d/optout.


Is it possible to isolate search quesries to a single node

2014-03-10 Thread Yves Dorfsman
I have a job that makes heavy use to ES, to the point that it affects the 
cluster. Is it possible to:

  - add a replica
  - force the extra replica to a specific node
  - isolate some of the queries to that particular node?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2815f221-4828-4382-a246-97973cd98709%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Can a replica be updated with the deltas only?

2014-03-10 Thread Yves Dorfsman
When I shutdown a node that holds a replica and updates are happening to 
the rest of the cluster, then re-start this node, it seems that the entire 
replica is being copied again to that node.

Is there a way to make ES just update that node with the updates that 
happened while it was down?


Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2d6138a0-5b4d-4ab4-9ef8-2f94beaef241%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.