Re: does snapshot restore lead to a memory leak?
So, you are running out of threads not memory. Are you re-registering repository every time you restore from it? If you do, you might be running into this issue https://github.com/elasticsearch/elasticsearch/issues/6181 On Thursday, July 3, 2014 2:06:38 PM UTC-4, JoeZ99 wrote: Igor. I'm posting a pdf document with some graphs I think are quite enlightening . The jvm threads is particularly interesting. the times are utc-4. and during the jvm growing period is when most of the restore process have been taking place. Igor, I will send you the reports you asked me in an email, since they contain filesystem data. Hope you don't mind. The graphs contain data from two elasticsearch clusters. ES1 is the one we've been talking about in this thread. ES4 is on cluster devoted to two indices, not very big but with a highly search demand. txs!!! On Monday, June 30, 2014 8:53:10 AM UTC-4, JoeZ99 wrote: We have one one-machine cluster with about 1k indices. It used to work flawlessly , (being with a high load, of course) but since we started to use heavily the snapshot-restore feature, it's getting its memory exhausted within 7 days of use. The cluster make about 700 restore proceedings during the day. Maybe there are some memory considerations when using the restore feature??? -- uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o. http://www.defectivebydesign.org/no-drm-in-html5 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e6705c17-ea08-48fa-873c-b44dc797a9d4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: does snapshot restore lead to a memory leak?
precisely!!! I re-issue the repository PUT command every time I do the restore . I know it's not the smartest thing in the world, but I wanted to make sure the repos will always be available without worrying if the elasticsearch cluster was newly created or not. I'll look into that. On Thu, Jul 3, 2014 at 2:17 PM, Igor Motov imo...@gmail.com wrote: So, you are running out of threads not memory. Are you re-registering repository every time you restore from it? If you do, you might be running into this issue https://github.com/elasticsearch/elasticsearch/issues/6181 On Thursday, July 3, 2014 2:06:38 PM UTC-4, JoeZ99 wrote: Igor. I'm posting a pdf document with some graphs I think are quite enlightening . The jvm threads is particularly interesting. the times are utc-4. and during the jvm growing period is when most of the restore process have been taking place. Igor, I will send you the reports you asked me in an email, since they contain filesystem data. Hope you don't mind. The graphs contain data from two elasticsearch clusters. ES1 is the one we've been talking about in this thread. ES4 is on cluster devoted to two indices, not very big but with a highly search demand. txs!!! On Monday, June 30, 2014 8:53:10 AM UTC-4, JoeZ99 wrote: We have one one-machine cluster with about 1k indices. It used to work flawlessly , (being with a high load, of course) but since we started to use heavily the snapshot-restore feature, it's getting its memory exhausted within 7 days of use. The cluster make about 700 restore proceedings during the day. Maybe there are some memory considerations when using the restore feature??? -- uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o. http://www.defectivebydesign.org/no-drm-in-html5 -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/jYB9n-mXsbU/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e6705c17-ea08-48fa-873c-b44dc797a9d4%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/e6705c17-ea08-48fa-873c-b44dc797a9d4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o. http://www.defectivebydesign.org/no-drm-in-html5 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKNaH0VcGoNoXOZW01_YaJJetE8GbXzw56HOjn97J2i4eC%3DB1A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: does snapshot restore lead to a memory leak?
So, your search-only machines are running out of memory, while your index-only machines are doing fine. Did I understand you correctly? Could you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from the machine that runs out of memory, please run stats a few times with 1 hour interval. I would like to see how memory consumption is increasing over time. Please, also run nodes info ones (curl localhost:9200/_nodes) and post here (or send me by email) the results. Thanks! On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote: Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch the update. I explain: - we have one cluster of one machine which is only meant for serving search requests. the goal is not to index anything to it. It contains 1.7k indices, give it or take it. - every day, those 1.7k indices are reindexed, and snapshoted in pairs to a S3 repository (producint 850 snapshots)repository. - every day, the one reading only cluster of the first point restores those 850 snapshots to update its 1.7k indices from that same S3 repository. It works like a real charm. Load has dropped dramatically, and we can set a farm of temporary machines to do the indexing duties. But memory consumption never stops growing. we don't get any out of memory error or anything. In fact, there is nothing in the logs that shows any error, but after a week or a few days, the host has its memory almost exhausted and elasticsearch is not responding. The memory consumption is of course way ahead of the HEAP_SIZE We have to restart it and, when we do it we get the following error: java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioSelector.registerTask(AbstractNioSelector.java:120) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. execute(DefaultChannelPipeline.java:636) at org.elasticsearch.common.netty.channel.Channels. fireExceptionCaughtLater(Channels.java:496) at org.elasticsearch.common.netty.channel.AbstractChannelSink. exceptionCaught(AbstractChannelSink.java:46) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. notifyHandlerException(DefaultChannelPipeline.java:658) at org.elasticsearch.common.netty.channel. DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream( DefaultChannelPipeline.java:781) at org.elasticsearch.common.netty.channel.Channels.write(Channels. java:725) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:591) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:582) at org.elasticsearch.common.netty.channel.Channels.write(Channels. java:704) at org.elasticsearch.common.netty.channel.Channels.write(Channels. java:671) at org.elasticsearch.common.netty.channel.AbstractChannel.write( AbstractChannel.java:248) at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse( NettyHttpChannel.java:158) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:106) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:98) at org.elasticsearch.action.search.type. TransportSearchQueryAndFetchAction$AsyncAction.innerFinishHim( TransportSearchQueryAndFetchAction.java:94) at org.elasticsearch.action.search.type. TransportSearchQueryAndFetchAction$AsyncAction.moveToSecondPhase( TransportSearchQueryAndFetchAction.java:77) at org.elasticsearch.action.search.type. TransportSearchTypeAction$BaseAsyncAction.innerMoveToSecondPhase( TransportSearchTypeAction.java:425) at org.elasticsearch.action.search.type. TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult( TransportSearchTypeAction.java:243) at org.elasticsearch.action.search.span style=color: # ... -- You received this
Re: does snapshot restore lead to a memory leak?
Igor. Yes, that's right. My index only machines are just machines that are booted just for the indexing-snapshotting task. once there is no more tasks in queue, those machines are terminated. they only handle a few indices each time (their only purpose is to snapshot). I will do as you tell me. I guess I'll better wait to the timeframe in which most of the restores occurs, because that's when the memory consumption grows more, so expect those postings in 5 or 6 hours. On Wednesday, July 2, 2014 10:29:53 AM UTC-4, Igor Motov wrote: So, your search-only machines are running out of memory, while your index-only machines are doing fine. Did I understand you correctly? Could you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from the machine that runs out of memory, please run stats a few times with 1 hour interval. I would like to see how memory consumption is increasing over time. Please, also run nodes info ones (curl localhost:9200/_nodes) and post here (or send me by email) the results. Thanks! On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote: Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch the update. I explain: - we have one cluster of one machine which is only meant for serving search requests. the goal is not to index anything to it. It contains 1.7k indices, give it or take it. - every day, those 1.7k indices are reindexed, and snapshoted in pairs to a S3 repository (producint 850 snapshots)repository. - every day, the one reading only cluster of the first point restores those 850 snapshots to update its 1.7k indices from that same S3 repository. It works like a real charm. Load has dropped dramatically, and we can set a farm of temporary machines to do the indexing duties. But memory consumption never stops growing. we don't get any out of memory error or anything. In fact, there is nothing in the logs that shows any error, but after a week or a few days, the host has its memory almost exhausted and elasticsearch is not responding. The memory consumption is of course way ahead of the HEAP_SIZE We have to restart it and, when we do it we get the following error: java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioSelector.registerTask(AbstractNioSelector.java:120) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio. AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. execute(DefaultChannelPipeline.java:636) at org.elasticsearch.common.netty.channel.Channels. fireExceptionCaughtLater(Channels.java:496) at org.elasticsearch.common.netty.channel.AbstractChannelSink. exceptionCaught(AbstractChannelSink.java:46) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. notifyHandlerException(DefaultChannelPipeline.java:658) at org.elasticsearch.common.netty.channel. DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream( DefaultChannelPipeline.java:781) at org.elasticsearch.common.netty.channel.Channels.write(Channels .java:725) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.doEncode(OneToOneEncoder.java:71) at org.elasticsearch.common.netty.handler.codec.oneone. OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:591) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline. sendDownstream(DefaultChannelPipeline.java:582) at org.elasticsearch.common.netty.channel.Channels.write(Channels .java:704) at org.elasticsearch.common.netty.channel.Channels.write(Channels .java:671) at org.elasticsearch.common.netty.channel.AbstractChannel.write( AbstractChannel.java:248) at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse( NettyHttpChannel.java:158) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:106) at org.elasticsearch.rest.action.search.RestSearchAction$1. onResponse(RestSearchAction.java:98) at org.elasticsearch.action.search.type. TransportSearchQueryAndFetchAction$AsyncAction.innerFinishHim( TransportSearchQueryAndFetchAction.java:94)
Re: does snapshot restore lead to a memory leak?
This memory issue report might be related https://groups.google.com/forum/#!topic/elasticsearch/EH76o1CIeQQ Jörg On Wed, Jul 2, 2014 at 5:34 PM, JoeZ99 jzar...@gmail.com wrote: Igor. Yes, that's right. My index only machines are just machines that are booted just for the indexing-snapshotting task. once there is no more tasks in queue, those machines are terminated. they only handle a few indices each time (their only purpose is to snapshot). I will do as you tell me. I guess I'll better wait to the timeframe in which most of the restores occurs, because that's when the memory consumption grows more, so expect those postings in 5 or 6 hours. On Wednesday, July 2, 2014 10:29:53 AM UTC-4, Igor Motov wrote: So, your search-only machines are running out of memory, while your index-only machines are doing fine. Did I understand you correctly? Could you send me nodes stats (curl localhost:9200/_nodes/stats?pretty) from the machine that runs out of memory, please run stats a few times with 1 hour interval. I would like to see how memory consumption is increasing over time. Please, also run nodes info ones (curl localhost:9200/_nodes) and post here (or send me by email) the results. Thanks! On Wednesday, July 2, 2014 10:15:46 AM UTC-4, JoeZ99 wrote: Hey, Igor, thanks for answering! and sorry for the delay. Didn't catch the update. I explain: - we have one cluster of one machine which is only meant for serving search requests. the goal is not to index anything to it. It contains 1.7k indices, give it or take it. - every day, those 1.7k indices are reindexed, and snapshoted in pairs to a S3 repository (producint 850 snapshots)repository. - every day, the one reading only cluster of the first point restores those 850 snapshots to update its 1.7k indices from that same S3 repository. It works like a real charm. Load has dropped dramatically, and we can set a farm of temporary machines to do the indexing duties. But memory consumption never stops growing. we don't get any out of memory error or anything. In fact, there is nothing in the logs that shows any error, but after a week or a few days, the host has its memory almost exhausted and elasticsearch is not responding. The memory consumption is of course way ahead of the HEAP_SIZE We have to restart it and, when we do it we get the following error: java.util.concurrent.RejectedExecutionException: Worker has already been shutdown at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi oSelector.registerTask(AbstractNioSelector.java:120) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi oWorker.executeInIoThread(AbstractNioWorker.java:72) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi oWorker.executeInIoThread(AbstractNioWorker.java:56) at org.elasticsearch.common.netty.channel.socket.nio.NioWorker. executeInIoThread(NioWorker.java:36) at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi oChannelSink.execute(AbstractNioChannelSink.java:34) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline .execute(DefaultChannelPipeline.java:636) at org.elasticsearch.common.netty.channel.Channels.fireExceptio nCaughtLater(Channels.java:496) at org.elasticsearch.common.netty.channel.AbstractChannelSink.e xceptionCaught(AbstractChannelSink.java:46) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline .notifyHandlerException(DefaultChannelPipeline.java:658) at org.elasticsearch.common.netty.channel.DefaultChannelPipelin e$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline. java:781) at org.elasticsearch.common.netty.channel.Channels.write(Channe ls.java:725) at org.elasticsearch.common.netty.handler.codec.oneone.OneToOne Encoder.doEncode(OneToOneEncoder.java:71) at org.elasticsearch.common.netty.handler.codec.oneone.OneToOne Encoder.handleDownstream(OneToOneEncoder.java:59) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline .sendDownstream(DefaultChannelPipeline.java:591) at org.elasticsearch.common.netty.channel.DefaultChannelPipeline .sendDownstream(DefaultChannelPipeline.java:582) at org.elasticsearch.common.netty.channel.Channels.write(Channe ls.java:704) at org.elasticsearch.common.netty.channel.Channels.write(Channe ls.java:671) at org.elasticsearch.common.netty.channel.AbstractChannel.write( AbstractChannel.java:248) at org.elasticsearch.http.netty.NettyHttpChannel.sendResponse(N ettyHttpChannel.java:158) at org.elasticsearch.rest.action.search.RestSearchAction$1.onRe sponse(RestSearchAction.java:106) at org.elasticsearch.rest.action.search.RestSearchAction$1.onRe sponse(RestSearchAction.java:98)
does snapshot restore lead to a memory leak?
We have one one-machine cluster with about 1k indices. It used to work flawlessly , (being with a high load, of course) but since we started to use heavily the snapshot-restore feature, it's getting its memory exhausted within 7 days of use. The cluster make about 700 restore proceedings during the day. Maybe there are some memory considerations when using the restore feature??? -- uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o. http://www.defectivebydesign.org/no-drm-in-html5 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKNaH0XTQtcSsXPBAb%2BbOh2Hcg-9QCBRd4hNjxN-N1UFLvENBw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: does snapshot restore lead to a memory leak?
Just to make sure I got it right, you really meant 700 restores (not just 700 snapshots), correct? What type of repository are you using? Could you add a bit more details about your use case? On Monday, June 30, 2014 8:53:10 AM UTC-4, JoeZ99 wrote: We have one one-machine cluster with about 1k indices. It used to work flawlessly , (being with a high load, of course) but since we started to use heavily the snapshot-restore feature, it's getting its memory exhausted within 7 days of use. The cluster make about 700 restore proceedings during the day. Maybe there are some memory considerations when using the restore feature??? -- uh, oh http://www.youtube.com/watch?v=GMD_T7ICL0o. http://www.defectivebydesign.org/no-drm-in-html5 -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6158eb50-bdbd-40c3-80fb-b18102cacb6d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.