Re: does snapshot restore lead to a memory leak?

2014-07-03 Thread Igor Motov
So, you are running out of threads not memory. Are you re-registering 
repository every time you restore from it? If you do, you might be running 
into this issue https://github.com/elasticsearch/elasticsearch/issues/6181

On Thursday, July 3, 2014 2:06:38 PM UTC-4, JoeZ99 wrote:

 Igor.
 I'm posting a pdf document with some graphs I think are quite enlightening 
 . The jvm threads is particularly interesting. 
 the times are utc-4. and during the jvm growing period is when most of the 
 restore process have been taking place.
 Igor, I will send you the reports you asked me in an email, since they 
 contain filesystem data. Hope you don't mind. 

 The graphs contain data from two elasticsearch clusters. ES1 is the one 
 we've  been talking about in this thread. ES4 is on cluster devoted to two 
 indices, not very big but with a highly search demand.


 txs!!!

 On Monday, June 30, 2014 8:53:10 AM UTC-4, JoeZ99 wrote:

 We have one one-machine cluster with about 1k indices. It used to work 
 flawlessly , (being with a high load, of course)

 but since we started to use heavily the snapshot-restore feature, it's 
 getting its memory exhausted within 7 days of use. The cluster make about 
 700 restore proceedings during the day. Maybe there are some memory 
 considerations when using the restore feature???

 -- 
 uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o.

 http://www.defectivebydesign.org/no-drm-in-html5
  


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e6705c17-ea08-48fa-873c-b44dc797a9d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: does snapshot restore lead to a memory leak?

2014-07-03 Thread José de Zárate
precisely!!! I re-issue the repository PUT command every time I do the
restore . I know it's not the smartest thing in the world, but I wanted to
make sure the repos will always be available without worrying if the
elasticsearch cluster was newly created or not.

I'll look into that.


On Thu, Jul 3, 2014 at 2:17 PM, Igor Motov imo...@gmail.com wrote:

 So, you are running out of threads not memory. Are you re-registering
 repository every time you restore from it? If you do, you might be running
 into this issue https://github.com/elasticsearch/elasticsearch/issues/6181


 On Thursday, July 3, 2014 2:06:38 PM UTC-4, JoeZ99 wrote:

 Igor.
 I'm posting a pdf document with some graphs I think are quite
 enlightening . The jvm threads is particularly interesting.
 the times are utc-4. and during the jvm growing period is when most of
 the restore process have been taking place.
 Igor, I will send you the reports you asked me in an email, since they
 contain filesystem data. Hope you don't mind.

 The graphs contain data from two elasticsearch clusters. ES1 is the one
 we've  been talking about in this thread. ES4 is on cluster devoted to two
 indices, not very big but with a highly search demand.


 txs!!!

 On Monday, June 30, 2014 8:53:10 AM UTC-4, JoeZ99 wrote:

 We have one one-machine cluster with about 1k indices. It used to work
 flawlessly , (being with a high load, of course)

 but since we started to use heavily the snapshot-restore feature, it's
 getting its memory exhausted within 7 days of use. The cluster make about
 700 restore proceedings during the day. Maybe there are some memory
 considerations when using the restore feature???

 --
 uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o.

 http://www.defectivebydesign.org/no-drm-in-html5

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/jYB9n-mXsbU/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/e6705c17-ea08-48fa-873c-b44dc797a9d4%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/e6705c17-ea08-48fa-873c-b44dc797a9d4%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o.

http://www.defectivebydesign.org/no-drm-in-html5

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKNaH0VcGoNoXOZW01_YaJJetE8GbXzw56HOjn97J2i4eC%3DB1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: does snapshot restore lead to a memory leak?

2014-07-02 Thread Igor Motov
So, your search-only machines are running out of memory, while your 
index-only machines are doing fine. Did I understand you correctly? Could 
you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from 
the machine that runs out of memory, please run stats a few times with 1 
hour interval. I would like to see how memory consumption is increasing 
over time. Please, also run nodes info ones (curl localhost:9200/_nodes) 
and post here (or send me by email) the results. Thanks!

On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote:

 Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch the 
 update.

 I explain:

- we have one cluster of one machine which is only meant for serving 
 search requests. the goal is  not to index anything to it. It contains 1.7k 
 indices, give it or take it. 
- every day, those 1.7k indices are reindexed, and snapshoted in pairs 
 to a S3 repository (producint 850 snapshots)repository. 
- every day, the one reading only cluster of the first point restores 
 those 850 snapshots to update its 1.7k indices from that same S3 
 repository. 

 It works like a real charm. Load has dropped dramatically, and we can set 
 a farm of temporary machines to do the indexing duties. 

 But memory consumption never stops growing.

 we don't get any out of memory error or anything. In fact, there is 
 nothing in the logs that shows any error, but after a week or a few days, 
 the host has its memory almost exhausted and elasticsearch is not 
 responding. The memory consumption is of course way ahead of the HEAP_SIZE
 We have to restart it and, when we do it we get the following error:

 java.util.concurrent.RejectedExecutionException: Worker has already been 
 shutdown
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 execute(DefaultChannelPipeline.java:636)
 at org.elasticsearch.common.netty.channel.Channels.
 fireExceptionCaughtLater(Channels.java:496)
 at org.elasticsearch.common.netty.channel.AbstractChannelSink.
 exceptionCaught(AbstractChannelSink.java:46)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 notifyHandlerException(DefaultChannelPipeline.java:658)
 at org.elasticsearch.common.netty.channel.
 DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(
 DefaultChannelPipeline.java:781)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels.
 java:725)
 at org.elasticsearch.common.netty.handler.codec.oneone.
 OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
 at org.elasticsearch.common.netty.handler.codec.oneone.
 OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendDownstream(DefaultChannelPipeline.java:591)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendDownstream(DefaultChannelPipeline.java:582)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels.
 java:704)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels.
 java:671)
 at org.elasticsearch.common.netty.channel.AbstractChannel.write(
 AbstractChannel.java:248)
 at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(
 NettyHttpChannel.java:158)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.
 onResponse(RestSearchAction.java:106)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.
 onResponse(RestSearchAction.java:98)
 at org.elasticsearch.action.search.type.
 TransportSearchQueryAndFetchAction$AsyncAction.innerFinishHim(
 TransportSearchQueryAndFetchAction.java:94)
 at org.elasticsearch.action.search.type.
 TransportSearchQueryAndFetchAction$AsyncAction.moveToSecondPhase(
 TransportSearchQueryAndFetchAction.java:77)
 at org.elasticsearch.action.search.type.
 TransportSearchTypeAction$BaseAsyncAction.innerMoveToSecondPhase(
 TransportSearchTypeAction.java:425)
 at org.elasticsearch.action.search.type.
 TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(
 TransportSearchTypeAction.java:243)
 at org.elasticsearch.action.search.span style=color: #
 ...

-- 
You received this 

Re: does snapshot restore lead to a memory leak?

2014-07-02 Thread JoeZ99
Igor.
Yes, that's right. My index only machines are just machines that are 
booted just for the indexing-snapshotting task. once there is no more tasks 
in queue, those machines are terminated. they only handle a few indices 
each time (their only purpose is to snapshot).

I will do as you tell me. I guess I'll better wait to the timeframe in 
which most of the restores occurs, because that's when the memory 
consumption grows more, so expect those postings in 5 or 6 hours. 

On Wednesday, July 2, 2014 10:29:53 AM UTC-4, Igor Motov wrote:

 So, your search-only machines are running out of memory, while your 
 index-only machines are doing fine. Did I understand you correctly? Could 
 you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from 
 the machine that runs out of memory, please run stats a few times with 1 
 hour interval. I would like to see how memory consumption is increasing 
 over time. Please, also run nodes info ones (curl localhost:9200/_nodes) 
 and post here (or send me by email) the results. Thanks!

 On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote:

 Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch 
 the update.

 I explain:

- we have one cluster of one machine which is only meant for serving 
 search requests. the goal is  not to index anything to it. It contains 1.7k 
 indices, give it or take it. 
- every day, those 1.7k indices are reindexed, and snapshoted in pairs 
 to a S3 repository (producint 850 snapshots)repository. 
- every day, the one reading only cluster of the first point 
 restores those 850 snapshots to update its 1.7k indices from that same S3 
 repository. 

 It works like a real charm. Load has dropped dramatically, and we can set 
 a farm of temporary machines to do the indexing duties. 

 But memory consumption never stops growing.

 we don't get any out of memory error or anything. In fact, there is 
 nothing in the logs that shows any error, but after a week or a few days, 
 the host has its memory almost exhausted and elasticsearch is not 
 responding. The memory consumption is of course way ahead of the HEAP_SIZE
 We have to restart it and, when we do it we get the following error:

 java.util.concurrent.RejectedExecutionException: Worker has already been 
 shutdown
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.
 AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 execute(DefaultChannelPipeline.java:636)
 at org.elasticsearch.common.netty.channel.Channels.
 fireExceptionCaughtLater(Channels.java:496)
 at org.elasticsearch.common.netty.channel.AbstractChannelSink.
 exceptionCaught(AbstractChannelSink.java:46)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 notifyHandlerException(DefaultChannelPipeline.java:658)
 at org.elasticsearch.common.netty.channel.
 DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(
 DefaultChannelPipeline.java:781)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels
 .java:725)
 at org.elasticsearch.common.netty.handler.codec.oneone.
 OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
 at org.elasticsearch.common.netty.handler.codec.oneone.
 OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendDownstream(DefaultChannelPipeline.java:591)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
 sendDownstream(DefaultChannelPipeline.java:582)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels
 .java:704)
 at org.elasticsearch.common.netty.channel.Channels.write(Channels
 .java:671)
 at org.elasticsearch.common.netty.channel.AbstractChannel.write(
 AbstractChannel.java:248)
 at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(
 NettyHttpChannel.java:158)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.
 onResponse(RestSearchAction.java:106)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.
 onResponse(RestSearchAction.java:98)
 at org.elasticsearch.action.search.type.
 TransportSearchQueryAndFetchAction$AsyncAction.innerFinishHim(
 TransportSearchQueryAndFetchAction.java:94)
  

Re: does snapshot restore lead to a memory leak?

2014-07-02 Thread joergpra...@gmail.com
This memory issue report might be related

https://groups.google.com/forum/#!topic/elasticsearch/EH76o1CIeQQ

Jörg


On Wed, Jul 2, 2014 at 5:34 PM, JoeZ99 jzar...@gmail.com wrote:

 Igor.
 Yes, that's right. My index only machines are just machines that are
 booted just for the indexing-snapshotting task. once there is no more tasks
 in queue, those machines are terminated. they only handle a few indices
 each time (their only purpose is to snapshot).

 I will do as you tell me. I guess I'll better wait to the timeframe in
 which most of the restores occurs, because that's when the memory
 consumption grows more, so expect those postings in 5 or 6 hours.


 On Wednesday, July 2, 2014 10:29:53 AM UTC-4, Igor Motov wrote:

 So, your search-only machines are running out of memory, while your
 index-only machines are doing fine. Did I understand you correctly? Could
 you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from
 the machine that runs out of memory, please run stats a few times with 1
 hour interval. I would like to see how memory consumption is increasing
 over time. Please, also run nodes info ones (curl localhost:9200/_nodes)
 and post here (or send me by email) the results. Thanks!

 On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote:

 Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch
 the update.

 I explain:

- we have one cluster of one machine which is only meant for serving
 search requests. the goal is  not to index anything to it. It contains 1.7k
 indices, give it or take it.
- every day, those 1.7k indices are reindexed, and snapshoted in
 pairs to a S3 repository (producint 850 snapshots)repository.
- every day, the one reading only cluster of the first point
 restores those 850 snapshots to update its 1.7k indices from that same S3
 repository.

 It works like a real charm. Load has dropped dramatically, and we can
 set a farm of temporary machines to do the indexing duties.

 But memory consumption never stops growing.

 we don't get any out of memory error or anything. In fact, there is
 nothing in the logs that shows any error, but after a week or a few days,
 the host has its memory almost exhausted and elasticsearch is not
 responding. The memory consumption is of course way ahead of the HEAP_SIZE
 We have to restart it and, when we do it we get the following error:

 java.util.concurrent.RejectedExecutionException: Worker has already
 been shutdown
 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
 oSelector.registerTask(AbstractNioSelector.java:120)
 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
 oWorker.executeInIoThread(AbstractNioWorker.java:72)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
 oWorker.executeInIoThread(AbstractNioWorker.java:56)
 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
 executeInIoThread(NioWorker.java:36)
 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
 oChannelSink.execute(AbstractNioChannelSink.java:34)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
 .execute(DefaultChannelPipeline.java:636)
 at org.elasticsearch.common.netty.channel.Channels.fireExceptio
 nCaughtLater(Channels.java:496)
 at org.elasticsearch.common.netty.channel.AbstractChannelSink.e
 xceptionCaught(AbstractChannelSink.java:46)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
 .notifyHandlerException(DefaultChannelPipeline.java:658)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipelin
 e$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.
 java:781)
 at org.elasticsearch.common.netty.channel.Channels.write(Channe
 ls.java:725)
 at org.elasticsearch.common.netty.handler.codec.oneone.OneToOne
 Encoder.doEncode(OneToOneEncoder.java:71)
 at org.elasticsearch.common.netty.handler.codec.oneone.OneToOne
 Encoder.handleDownstream(OneToOneEncoder.java:59)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
 .sendDownstream(DefaultChannelPipeline.java:591)
 at org.elasticsearch.common.netty.channel.DefaultChannelPipeline
 .sendDownstream(DefaultChannelPipeline.java:582)
 at org.elasticsearch.common.netty.channel.Channels.write(Channe
 ls.java:704)
 at org.elasticsearch.common.netty.channel.Channels.write(Channe
 ls.java:671)
 at org.elasticsearch.common.netty.channel.AbstractChannel.write(
 AbstractChannel.java:248)
 at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(N
 ettyHttpChannel.java:158)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.onRe
 sponse(RestSearchAction.java:106)
 at org.elasticsearch.rest.action.search.RestSearchAction$1.onRe
 sponse(RestSearchAction.java:98)

does snapshot restore lead to a memory leak?

2014-06-30 Thread José de Zárate
We have one one-machine cluster with about 1k indices. It used to work
flawlessly , (being with a high load, of course)

but since we started to use heavily the snapshot-restore feature, it's
getting its memory exhausted within 7 days of use. The cluster make about
700 restore proceedings during the day. Maybe there are some memory
considerations when using the restore feature???

-- 
uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o.

http://www.defectivebydesign.org/no-drm-in-html5

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKNaH0XTQtcSsXPBAb%2BbOh2Hcg-9QCBRd4hNjxN-N1UFLvENBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: does snapshot restore lead to a memory leak?

2014-06-30 Thread Igor Motov
Just to make sure I got it right, you really meant 700 restores (not just 
700 snapshots), correct? What type of repository are you using? Could you 
add a bit more details about your use case?

On Monday, June 30, 2014 8:53:10 AM UTC-4, JoeZ99 wrote:

 We have one one-machine cluster with about 1k indices. It used to work 
 flawlessly , (being with a high load, of course)

 but since we started to use heavily the snapshot-restore feature, it's 
 getting its memory exhausted within 7 days of use. The cluster make about 
 700 restore proceedings during the day. Maybe there are some memory 
 considerations when using the restore feature???

 -- 
 uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o.

 http://www.defectivebydesign.org/no-drm-in-html5
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6158eb50-bdbd-40c3-80fb-b18102cacb6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.