Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-24 Thread Konstantin Erman
I don't believe we can disable indexing. ES cluster collects the logs from 
a huge server farm and those logs come in a steady rate. If we disable indexing 
we start dropping log events! Do I get it right?

On Monday, November 24, 2014 8:58:21 AM UTC-8, Ivan Brusic wrote:
>
> It used to be 2 concurrent streams. Has the default been upped in recent 
> versions? I agree, that number is awfully low. If you can disable indexing 
> during rolling restarts, those numbers can be much higher.
>
> -- 
> Ivan
>
> On Sun, Nov 23, 2014 at 5:48 PM, joerg...@gmail.com  <
> joerg...@gmail.com > wrote:
>
>> The default indices recovery performance is limited by 3 concurrent 
>> streams and 20MB/sec. This is very slow on my machines. YMMV.
>>
>> Jörg
>>
>> On Sun, Nov 23, 2014 at 9:01 PM, Konstantin Erman > > wrote:
>>
>>> Advice to increase indices.recovery.concurrent_streams sounds 
>>> suspiciously specific to me :-) What made you so confident that it is the 
>>> bottleneck for recovery in most cases? And how cluster.routing.
>>> allocation.node_concurrent_recoveries should be set?  
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGH9BmW_nfZ3wGxOX0gNaJTDR2r0HSPeKc9dWNryUGP-Q%40mail.gmail.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fcbe921b-b381-4d92-93ed-81c6c32f0204%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-24 Thread Ivan Brusic
It used to be 2 concurrent streams. Has the default been upped in recent
versions? I agree, that number is awfully low. If you can disable indexing
during rolling restarts, those numbers can be much higher.

-- 
Ivan

On Sun, Nov 23, 2014 at 5:48 PM, joergpra...@gmail.com <
joergpra...@gmail.com> wrote:

> The default indices recovery performance is limited by 3 concurrent
> streams and 20MB/sec. This is very slow on my machines. YMMV.
>
> Jörg
>
> On Sun, Nov 23, 2014 at 9:01 PM, Konstantin Erman 
> wrote:
>
>> Advice to increase indices.recovery.concurrent_streams sounds
>> suspiciously specific to me :-) What made you so confident that it is the
>> bottleneck for recovery in most cases? And how cluster.routing.allocation
>> .node_concurrent_recoveries should be set?
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGH9BmW_nfZ3wGxOX0gNaJTDR2r0HSPeKc9dWNryUGP-Q%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAY-ucXNA6DeKpaVF9eMW%3Ddv95PsEZv8oPs0taojuLpxw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-23 Thread joergpra...@gmail.com
The default indices recovery performance is limited by 3 concurrent streams
and 20MB/sec. This is very slow on my machines. YMMV.

Jörg

On Sun, Nov 23, 2014 at 9:01 PM, Konstantin Erman  wrote:

> Advice to increase indices.recovery.concurrent_streams sounds
> suspiciously specific to me :-) What made you so confident that it is the
> bottleneck for recovery in most cases? And how 
> cluster.routing.allocation.node_concurrent_recoveries
> should be set?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGH9BmW_nfZ3wGxOX0gNaJTDR2r0HSPeKc9dWNryUGP-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-23 Thread Konstantin Erman
Advice to increase indices.recovery.concurrent_streams sounds suspiciously 
specific to me :-) What made you so confident that it is the bottleneck for 
recovery in most cases? And how 
cluster.routing.allocation.node_concurrent_recoveries 
should be set?  

On Sunday, November 23, 2014 6:27:40 AM UTC-8, Jörg Prante wrote:
>
> FWIW with Lucene 5, segment and index checksums will arrive, and 
> Elasticsearch will stop retrieving shards if local ones match remote ones.
>
> In the meantime you can increase indices.recovery.concurrent_streams to 
> something like 16 or even higher, so the recovery takes less than a minute.
>
> Jörg
>
>
>
> On Sat, Nov 22, 2014 at 7:43 PM, Konstantin Erman  > wrote:
>
>> Yes, I have noticed that article right away, simply because I keep 
>> googling ES related questions every day :-)
>>
>> Unfortunately the only practical advice I could learn from that article 
>> is to use doc_values instead of field data and it does not really help with 
>> "full node rebuild after short down time" issue. 
>>
>> One thing I noticed while watching Windows Resource Monitor during those 
>> lengthy node rebuilds though is that rebuilding node actively reads 
>> its shards, so there is a chance that it tries to bring its local data back 
>> to life and not just copies everything over from replicas, but 
>> unfortunately performance wise reviving local shards takes as long if not 
>> longer than dumb copying everything over from replicas. 
>>
>>
>> On Saturday, November 22, 2014 8:35:59 AM UTC-8, Otis Gospodnetic wrote:
>>>
>>> Hi Konstantin,
>>>
>>> Check out http://gibrown.com/2014/11/19/elasticsearch-the-broken-bits/
>>>
>>> Otis
>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>>
>>> On Thursday, November 20, 2014 9:48:56 PM UTC-5, Konstantin Erman wrote:

 I work on an experimental cluster of ES nodes running on Windows Server 
 machines. Once in a while we have a need to reboot machines. The initial 
 state - cluster is green and well balanced. One machine is 
 gracefully taken offline and then after necessary service is performed it 
 comes back online. All the hardware and file system content is intact. As 
 soon as ES service starts on that machine, it assumes that there is no 
 usable data locally and recovers as much data as it deems necessary for 
 balancing from other nodes. 

 This behavior puzzles me, because most of the data shards stored on 
 that machine file system can be reused as they are. Cluster stores logs, 
 so 
 all indices except those for the current day never ever change until they 
 get deleted. Can't ES node detect that it has perfect copies of some 
 (actually most) of the shards and instead of copying them over just mark 
 them as up to date? 

 I suspect I don't know about some step to enable this behavior and I'm 
 looking to enable it. Any advice? 

 Thank you!
 Konstantin

>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/914332e3-2b14-49a4-be0e-5ddeadf9f357%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/462e3dc3-7e39-4cde-b00c-ad22377d40ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-23 Thread joergpra...@gmail.com
FWIW with Lucene 5, segment and index checksums will arrive, and
Elasticsearch will stop retrieving shards if local ones match remote ones.

In the meantime you can increase indices.recovery.concurrent_streams to
something like 16 or even higher, so the recovery takes less than a minute.

Jörg



On Sat, Nov 22, 2014 at 7:43 PM, Konstantin Erman  wrote:

> Yes, I have noticed that article right away, simply because I keep
> googling ES related questions every day :-)
>
> Unfortunately the only practical advice I could learn from that article is
> to use doc_values instead of field data and it does not really help with
> "full node rebuild after short down time" issue.
>
> One thing I noticed while watching Windows Resource Monitor during those
> lengthy node rebuilds though is that rebuilding node actively reads
> its shards, so there is a chance that it tries to bring its local data back
> to life and not just copies everything over from replicas, but
> unfortunately performance wise reviving local shards takes as long if not
> longer than dumb copying everything over from replicas.
>
>
> On Saturday, November 22, 2014 8:35:59 AM UTC-8, Otis Gospodnetic wrote:
>>
>> Hi Konstantin,
>>
>> Check out http://gibrown.com/2014/11/19/elasticsearch-the-broken-bits/
>>
>> Otis
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>>
>> On Thursday, November 20, 2014 9:48:56 PM UTC-5, Konstantin Erman wrote:
>>>
>>> I work on an experimental cluster of ES nodes running on Windows Server
>>> machines. Once in a while we have a need to reboot machines. The initial
>>> state - cluster is green and well balanced. One machine is
>>> gracefully taken offline and then after necessary service is performed it
>>> comes back online. All the hardware and file system content is intact. As
>>> soon as ES service starts on that machine, it assumes that there is no
>>> usable data locally and recovers as much data as it deems necessary for
>>> balancing from other nodes.
>>>
>>> This behavior puzzles me, because most of the data shards stored on that
>>> machine file system can be reused as they are. Cluster stores logs, so all
>>> indices except those for the current day never ever change until they get
>>> deleted. Can't ES node detect that it has perfect copies of some (actually
>>> most) of the shards and instead of copying them over just mark them as up
>>> to date?
>>>
>>> I suspect I don't know about some step to enable this behavior and I'm
>>> looking to enable it. Any advice?
>>>
>>> Thank you!
>>> Konstantin
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/914332e3-2b14-49a4-be0e-5ddeadf9f357%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGvQ3ZJz%3D7v_QHTfgxsmznMTMLnsxGhrgiHuPBfTKoVWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-22 Thread Nikolas Everett
Tiny shards have more ever head and aren't going to score results as
accurately.
On Nov 22, 2014 2:04 PM, "Yves Dorfsman"  wrote:

> On 2014-11-22 09:35, Otis Gospodnetic wrote:
> > Hi Konstantin,
> >
> > Check out http://gibrown.com/2014/11/19/elasticsearch-the-broken-bits/
> >
>
> Good writing! Thanks.
>
> I wonder if there's any drawback from cutting indices in smaller (tiny?)
> shards?
>
> My thinking is this: We don't really change data in our bigger indices, we
> just keep adding to them, so ultimately as we re-build node, they should
> all
> have the same version of the old shards, which should make re-start, and
> even
> re-build from backups much faster.
>
> --
> http://yves.zioup.com
> gpg: 4096R/32B0F416
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5470DE32.5070902%40zioup.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2b6Omqo5K-fp%3DM4%3D-MJvV_jfGjkCFHmu9u28g%3D5asJ6Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-22 Thread Yves Dorfsman
On 2014-11-22 09:35, Otis Gospodnetic wrote:
> Hi Konstantin,
> 
> Check out http://gibrown.com/2014/11/19/elasticsearch-the-broken-bits/
> 

Good writing! Thanks.

I wonder if there's any drawback from cutting indices in smaller (tiny?) shards?

My thinking is this: We don't really change data in our bigger indices, we
just keep adding to them, so ultimately as we re-build node, they should all
have the same version of the old shards, which should make re-start, and even
re-build from backups much faster.

-- 
http://yves.zioup.com
gpg: 4096R/32B0F416

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5470DE32.5070902%40zioup.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-22 Thread Konstantin Erman
Yes, I have noticed that article right away, simply because I keep googling 
ES related questions every day :-)

Unfortunately the only practical advice I could learn from that article is 
to use doc_values instead of field data and it does not really help with 
"full node rebuild after short down time" issue. 

One thing I noticed while watching Windows Resource Monitor during those 
lengthy node rebuilds though is that rebuilding node actively reads 
its shards, so there is a chance that it tries to bring its local data back 
to life and not just copies everything over from replicas, but 
unfortunately performance wise reviving local shards takes as long if not 
longer than dumb copying everything over from replicas. 

On Saturday, November 22, 2014 8:35:59 AM UTC-8, Otis Gospodnetic wrote:
>
> Hi Konstantin,
>
> Check out http://gibrown.com/2014/11/19/elasticsearch-the-broken-bits/
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On Thursday, November 20, 2014 9:48:56 PM UTC-5, Konstantin Erman wrote:
>>
>> I work on an experimental cluster of ES nodes running on Windows Server 
>> machines. Once in a while we have a need to reboot machines. The initial 
>> state - cluster is green and well balanced. One machine is 
>> gracefully taken offline and then after necessary service is performed it 
>> comes back online. All the hardware and file system content is intact. As 
>> soon as ES service starts on that machine, it assumes that there is no 
>> usable data locally and recovers as much data as it deems necessary for 
>> balancing from other nodes. 
>>
>> This behavior puzzles me, because most of the data shards stored on that 
>> machine file system can be reused as they are. Cluster stores logs, so all 
>> indices except those for the current day never ever change until they get 
>> deleted. Can't ES node detect that it has perfect copies of some (actually 
>> most) of the shards and instead of copying them over just mark them as up 
>> to date? 
>>
>> I suspect I don't know about some step to enable this behavior and I'm 
>> looking to enable it. Any advice? 
>>
>> Thank you!
>> Konstantin
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/914332e3-2b14-49a4-be0e-5ddeadf9f357%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-22 Thread Otis Gospodnetic
Hi Konstantin,

Check out http://gibrown.com/2014/11/19/elasticsearch-the-broken-bits/

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On Thursday, November 20, 2014 9:48:56 PM UTC-5, Konstantin Erman wrote:
>
> I work on an experimental cluster of ES nodes running on Windows Server 
> machines. Once in a while we have a need to reboot machines. The initial 
> state - cluster is green and well balanced. One machine is gracefully taken 
> offline and then after necessary service is performed it comes back online. 
> All the hardware and file system content is intact. As soon as ES service 
> starts on that machine, it assumes that there is no usable data locally and 
> recovers as much data as it deems necessary for balancing from other nodes. 
>
> This behavior puzzles me, because most of the data shards stored on that 
> machine file system can be reused as they are. Cluster stores logs, so all 
> indices except those for the current day never ever change until they get 
> deleted. Can't ES node detect that it has perfect copies of some (actually 
> most) of the shards and instead of copying them over just mark them as up 
> to date? 
>
> I suspect I don't know about some step to enable this behavior and I'm 
> looking to enable it. Any advice? 
>
> Thank you!
> Konstantin
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6aeacdee-f8e9-405d-aca1-7166f1722fdd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-21 Thread Nikolas Everett
Its been true for as long as I've used Elasticsearch which as .9X (can't
remember which).  Copying everything is pretty common because the index
operations are non-deterministic so the files might not be the same at
all.  I pretty much assume I'm going to have to copy _almost_ everything
every time I restart.  It turns my rolling restarts into two day long
affairs.  I know its being worked on.

Nik

On Fri, Nov 21, 2014 at 10:51 AM, Yves Dorfsman  wrote:

> Thanks Nicolas.
>
> Is this true on versions 0.9, or only on > 1?
> I've had nodes die and restart, and they did copy everything!
>
> On 2014-11-20 22:02, Nikolas Everett wrote:
> > The thing is that this is a disk level operation. It pretty much rsyncs
> the
> > files from the current master shard to the node when it comes back
> online.
> > This would be OK if the replica shards matched the master but that is
> only
> > normally the case if the shard was moved to the node after it was mostly
> > complete and then you've had only a few writes. Normally shards don't
> match
> > each other because the way the index is maintained is nondeterministic.
> >
> > The translog replay is only used as a catch up after the rsync-like step.
> >
> > This is something that is being worked on. Its certainly my biggest
> complaint
> > about elasticsearch but I'm confident that it'll get better.
> >
> > Nik
> >
> > On Nov 20, 2014 11:11 PM, "Mark Walkom"  > > wrote:
> >
> > It will enter recovery where it syncs at the segment level from the
> > current primary, then the translog gets shipped over and (re)played,
> which
> > brings it all up to date.
> >
> > On 21 November 2014 14:51, Yves Dorfsman  > > wrote:
> >
> >
> > If you do disable allocation before you reboot a node and a
> client
> > writes to a shard that had a replica on that node, does the
> entire
> > replica gets copied when the node come up? Or does it get just
> updated?
> >
> > On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:
> >
> > You should disable allocation before you reboot, that will
> save a
> > lot of shard shuffling -
> >
> http://www.elasticsearch.org/__guide/en/elasticsearch/__reference/current/setup-__upgrade.html#rolling-upgrades
> > <
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades
> >
> >
> > On 21 November 2014 13:48, Konstantin Erman <
> kon...@gmail.com> wrote:
> >
> > I work on an experimental cluster of ES nodes running on
> > Windows Server machines. Once in a while we have a need
> to
> > reboot machines. The initial state - cluster is green
> and well
> > balanced. One machine is gracefully taken offline and
> then
> > after necessary service is performed it comes back
> online. All
> > the hardware and file system content is intact. As soon
> as ES
> > service starts on that machine, it assumes that there is
> no
> > usable data locally and recovers as much data as it deems
> > necessary for balancing from other nodes.
> >
> > This behavior puzzles me, because most of the data shards
> > stored on that machine file system can be reused as they
> are.
> > Cluster stores logs, so all indices except those for
> > the current day never ever change until they get deleted.
> > Can't ES node detect that it has perfect copies of some
> > (actually most) of the shards and instead of copying
> them over
> > just mark them as up to date?
> >
> > I suspect I don't know about some step to enable this
> behavior
> > and I'm looking to enable it. Any advice?
> >
> > Thank you!
> > Konstantin
> >
> > --
> > You received this message because you are subscribed to the
> Google
> > Groups "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails from
> it, send
> > an email to elasticsearch+unsubscr...@googlegroups.com
> > .
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com
> > <
> https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com?utm_medium=email&utm_source=footer
> >.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send an
> > em

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-21 Thread Yves Dorfsman
Thanks Nicolas.

Is this true on versions 0.9, or only on > 1?
I've had nodes die and restart, and they did copy everything!

On 2014-11-20 22:02, Nikolas Everett wrote:
> The thing is that this is a disk level operation. It pretty much rsyncs the
> files from the current master shard to the node when it comes back online.
> This would be OK if the replica shards matched the master but that is only
> normally the case if the shard was moved to the node after it was mostly
> complete and then you've had only a few writes. Normally shards don't match
> each other because the way the index is maintained is nondeterministic.
> 
> The translog replay is only used as a catch up after the rsync-like step.
> 
> This is something that is being worked on. Its certainly my biggest complaint
> about elasticsearch but I'm confident that it'll get better.
> 
> Nik
> 
> On Nov 20, 2014 11:11 PM, "Mark Walkom"  > wrote:
> 
> It will enter recovery where it syncs at the segment level from the
> current primary, then the translog gets shipped over and (re)played, which
> brings it all up to date.
> 
> On 21 November 2014 14:51, Yves Dorfsman  > wrote:
> 
> 
> If you do disable allocation before you reboot a node and a client
> writes to a shard that had a replica on that node, does the entire
> replica gets copied when the node come up? Or does it get just 
> updated?
> 
> On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:
> 
> You should disable allocation before you reboot, that will save a
> lot of shard shuffling -
> 
> http://www.elasticsearch.org/__guide/en/elasticsearch/__reference/current/setup-__upgrade.html#rolling-upgrades
> 
> 
> 
> On 21 November 2014 13:48, Konstantin Erman  
> wrote:
> 
> I work on an experimental cluster of ES nodes running on
> Windows Server machines. Once in a while we have a need to
> reboot machines. The initial state - cluster is green and well
> balanced. One machine is gracefully taken offline and then
> after necessary service is performed it comes back online. All
> the hardware and file system content is intact. As soon as ES
> service starts on that machine, it assumes that there is no
> usable data locally and recovers as much data as it deems
> necessary for balancing from other nodes. 
> 
> This behavior puzzles me, because most of the data shards
> stored on that machine file system can be reused as they are.
> Cluster stores logs, so all indices except those for
> the current day never ever change until they get deleted.
> Can't ES node detect that it has perfect copies of some
> (actually most) of the shards and instead of copying them over
> just mark them as up to date? 
> 
> I suspect I don't know about some step to enable this behavior
> and I'm looking to enable it. Any advice? 
> 
> Thank you!
> Konstantin
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to elasticsearch+unsubscr...@googlegroups.com
> .
> To view this discussion on the web visit
> 
> https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com
> .
> To view this discussion on the web visit
> 
> https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to a topic in the Google
> Groups "elasticsearch" group.
> To unsubscribe 

Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-21 Thread Ivan Brusic
Disabling allocation helps, but it does not solve the problem completely.
Just like Nik, one of my complaints (although not my primary one). :)

I found that recovery gets easier when doing a rolling restart. First few
servers always rebalance, the last ones do not.

-- 
Ivan

On Thu, Nov 20, 2014 at 9:51 PM, Mark Walkom  wrote:

> You should disable allocation before you reboot, that will save a lot of
> shard shuffling -
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades
>
> On 21 November 2014 13:48, Konstantin Erman  wrote:
>
>> I work on an experimental cluster of ES nodes running on Windows Server
>> machines. Once in a while we have a need to reboot machines. The initial
>> state - cluster is green and well balanced. One machine is gracefully taken
>> offline and then after necessary service is performed it comes back online.
>> All the hardware and file system content is intact. As soon as ES service
>> starts on that machine, it assumes that there is no usable data locally and
>> recovers as much data as it deems necessary for balancing from other nodes.
>>
>> This behavior puzzles me, because most of the data shards stored on that
>> machine file system can be reused as they are. Cluster stores logs, so all
>> indices except those for the current day never ever change until they get
>> deleted. Can't ES node detect that it has perfect copies of some (actually
>> most) of the shards and instead of copying them over just mark them as up
>> to date?
>>
>> I suspect I don't know about some step to enable this behavior and I'm
>> looking to enable it. Any advice?
>>
>> Thank you!
>> Konstantin
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/4fb2d8bc-7787-43e3-8c66-e241945d496b%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmc_rMFzRUUrJSMJ9bY16tz-dZ8eSeUZobC7XaxWZTRPg%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDjm%3DvBT7U3%3DQXwZzz83Bf52t21KYZwCuqdYgGJhXKuhQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Nikolas Everett
The thing is that this is a disk level operation. It pretty much rsyncs the
files from the current master shard to the node when it comes back online.
This would be OK if the replica shards matched the master but that is only
normally the case if the shard was moved to the node after it was mostly
complete and then you've had only a few writes. Normally shards don't match
each other because the way the index is maintained is nondeterministic.

The translog replay is only used as a catch up after the rsync-like step.

This is something that is being worked on. Its certainly my biggest
complaint about elasticsearch but I'm confident that it'll get better.

Nik
On Nov 20, 2014 11:11 PM, "Mark Walkom"  wrote:

> It will enter recovery where it syncs at the segment level from the
> current primary, then the translog gets shipped over and (re)played, which
> brings it all up to date.
>
> On 21 November 2014 14:51, Yves Dorfsman  wrote:
>
>>
>> If you do disable allocation before you reboot a node and a client writes
>> to a shard that had a replica on that node, does the entire replica gets
>> copied when the node come up? Or does it get just updated?
>>
>> On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:
>>>
>>> You should disable allocation before you reboot, that will save a lot of
>>> shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/
>>> reference/current/setup-upgrade.html#rolling-upgrades
>>>
>>> On 21 November 2014 13:48, Konstantin Erman  wrote:
>>>
 I work on an experimental cluster of ES nodes running on Windows Server
 machines. Once in a while we have a need to reboot machines. The initial
 state - cluster is green and well balanced. One machine is gracefully taken
 offline and then after necessary service is performed it comes back online.
 All the hardware and file system content is intact. As soon as ES service
 starts on that machine, it assumes that there is no usable data locally and
 recovers as much data as it deems necessary for balancing from other nodes.

 This behavior puzzles me, because most of the data shards stored on
 that machine file system can be reused as they are. Cluster stores logs, so
 all indices except those for the current day never ever change until they
 get deleted. Can't ES node detect that it has perfect copies of some
 (actually most) of the shards and instead of copying them over just mark
 them as up to date?

 I suspect I don't know about some step to enable this behavior and I'm
 looking to enable it. Any advice?

 Thank you!
 Konstantin

  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd09LRJk89wdHybYy48FMpCaYa1wTJ9HX9uX%2BjvNjvYq2g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Mark Walkom
It will enter recovery where it syncs at the segment level from the current
primary, then the translog gets shipped over and (re)played, which brings
it all up to date.

On 21 November 2014 14:51, Yves Dorfsman  wrote:

>
> If you do disable allocation before you reboot a node and a client writes
> to a shard that had a replica on that node, does the entire replica gets
> copied when the node come up? Or does it get just updated?
>
> On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:
>>
>> You should disable allocation before you reboot, that will save a lot of
>> shard shuffling - http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/setup-upgrade.html#rolling-upgrades
>>
>> On 21 November 2014 13:48, Konstantin Erman  wrote:
>>
>>> I work on an experimental cluster of ES nodes running on Windows Server
>>> machines. Once in a while we have a need to reboot machines. The initial
>>> state - cluster is green and well balanced. One machine is gracefully taken
>>> offline and then after necessary service is performed it comes back online.
>>> All the hardware and file system content is intact. As soon as ES service
>>> starts on that machine, it assumes that there is no usable data locally and
>>> recovers as much data as it deems necessary for balancing from other nodes.
>>>
>>> This behavior puzzles me, because most of the data shards stored on that
>>> machine file system can be reused as they are. Cluster stores logs, so all
>>> indices except those for the current day never ever change until they get
>>> deleted. Can't ES node detect that it has perfect copies of some (actually
>>> most) of the shards and instead of copying them over just mark them as up
>>> to date?
>>>
>>> I suspect I don't know about some step to enable this behavior and I'm
>>> looking to enable it. Any advice?
>>>
>>> Thank you!
>>> Konstantin
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmxZuSjJAJPj_yKT6d8_L-Mx6ceZfDNmJCLkSOXsfeydQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Yves Dorfsman

If you do disable allocation before you reboot a node and a client writes 
to a shard that had a replica on that node, does the entire replica gets 
copied when the node come up? Or does it get just updated?

On Thursday, 20 November 2014 19:52:26 UTC-7, Mark Walkom wrote:
>
> You should disable allocation before you reboot, that will save a lot of 
> shard shuffling - 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades
>
> On 21 November 2014 13:48, Konstantin Erman  > wrote:
>
>> I work on an experimental cluster of ES nodes running on Windows Server 
>> machines. Once in a while we have a need to reboot machines. The initial 
>> state - cluster is green and well balanced. One machine is gracefully taken 
>> offline and then after necessary service is performed it comes back online. 
>> All the hardware and file system content is intact. As soon as ES service 
>> starts on that machine, it assumes that there is no usable data locally and 
>> recovers as much data as it deems necessary for balancing from other nodes. 
>>
>> This behavior puzzles me, because most of the data shards stored on that 
>> machine file system can be reused as they are. Cluster stores logs, so all 
>> indices except those for the current day never ever change until they get 
>> deleted. Can't ES node detect that it has perfect copies of some (actually 
>> most) of the shards and instead of copying them over just mark them as up 
>> to date? 
>>
>> I suspect I don't know about some step to enable this behavior and I'm 
>> looking to enable it. Any advice? 
>>
>> Thank you!
>> Konstantin
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/51b3ff69-a126-4f2f-9838-0098bc26694d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Why ES node starts recovering all the data from other nodes after reboot?

2014-11-20 Thread Mark Walkom
You should disable allocation before you reboot, that will save a lot of
shard shuffling -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-upgrade.html#rolling-upgrades

On 21 November 2014 13:48, Konstantin Erman  wrote:

> I work on an experimental cluster of ES nodes running on Windows Server
> machines. Once in a while we have a need to reboot machines. The initial
> state - cluster is green and well balanced. One machine is gracefully taken
> offline and then after necessary service is performed it comes back online.
> All the hardware and file system content is intact. As soon as ES service
> starts on that machine, it assumes that there is no usable data locally and
> recovers as much data as it deems necessary for balancing from other nodes.
>
> This behavior puzzles me, because most of the data shards stored on that
> machine file system can be reused as they are. Cluster stores logs, so all
> indices except those for the current day never ever change until they get
> deleted. Can't ES node detect that it has perfect copies of some (actually
> most) of the shards and instead of copying them over just mark them as up
> to date?
>
> I suspect I don't know about some step to enable this behavior and I'm
> looking to enable it. Any advice?
>
> Thank you!
> Konstantin
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4fb2d8bc-7787-43e3-8c66-e241945d496b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmc_rMFzRUUrJSMJ9bY16tz-dZ8eSeUZobC7XaxWZTRPg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.