I managed to reproduce the issue locally, I'm looking into it.
On Fri, Apr 11, 2014 at 9:45 AM, Robin Wallin <walro...@gmail.com> wrote: > Hello, > > We are experiencing a related problem with 1.1.0. Segments do not seem to > merge as they should during indexing. The optimize API does practically > nothing in terms of lowering the segments count either. The problem > persists through a cluster restart. The vast amount of segments seem to be > greatly impact the performance of the cluster, in a very negative way. > > We currently have 414 million documents across 3 nodes, each shard has in > average 1200 segments(!). > > With 1.0.1 we had even more documents, ~650 million, without any segment > problems. Looking in Marvel we were hovering at around 30-40 segments per > shard back then. > > Best Regards, > Robin > > On Friday, April 11, 2014 1:35:42 AM UTC+2, Adrien Grand wrote: > >> Thanks for reporting this, the behavior is definitely unexpected. I'll >> test _optimize on very large numbers of shards to see if I can reproduce >> the issue. >> >> >> On Thu, Apr 10, 2014 at 2:10 PM, Elliott Bradshaw <ebrad...@gmail.com>wrote: >> >>> Adrien, >>> >>> Just an FYI, after resetting the cluster, things seem to have improved. >>> Optimize calls now lead to CPU/IO activity over their duration. >>> Max_num_segments=1 does not seem to be working for me on any given call, as >>> each call would only reduce the segment count by about 600-700. I ran 10 >>> calls in sequence overnight, and actually got down to 4 segments (1/shard)! >>> >>> I'm glad I got the index optimized, searches are literally 10-20 times >>> faster without 1500/segments per shard to deal with. It's awesome. >>> >>> That said, any thoughts on why the index wasn't merging on its own, or >>> why optimize was returning prematurely? >>> >>> >>> On Wednesday, April 9, 2014 11:10:56 AM UTC-4, Elliott Bradshaw wrote: >>>> >>>> Hi Adrien, >>>> >>>> I kept the logs up over the last optimize call, and I did see an >>>> exception. I Ctrl-C'd a curl optimize call before making another one, but >>>> I don't think that that caused this exception. The error is essentially as >>>> follows: >>>> >>>> netty - Caught exception while handling client http traffic, closing >>>> connection [id: 0x4d8f1a90, /127.0.0.1:33480 :> /127.0.0.1:9200] >>>> >>>> java.nio.channels.ClosedChannelException at AbstractNioWorker. >>>> cleanUpWriteBuffer(AbstractNioWorker.java:433) >>>> at AbstractNioWorker.writeFromUserCode >>>> at NioServerSocketPipelineSink.handleAcceptedSocket >>>> at NioServerSocketPipelineSink.eventSunk >>>> at DefaultChannelPipeline$DefaultChannelhandlerContext.sendDownstream >>>> at Channels.write >>>> at OneToOneEncoder.doEncode >>>> at OneToOneEncoder.handleDownstream >>>> at DefaultChannelPipeline.sendDownstream >>>> at DefaultChannelPipeline.sendDownstream >>>> at Channels.write >>>> at AbstractChannel.write >>>> at NettyHttpChannel.sendResponse >>>> at RestOptimizeAction$1.onResponse(95) >>>> at RestOptimizeAction$1.onResponse(85) >>>> at TransportBroadcastOperationAction$AsyncBroadcastAction.finishHim >>>> at TransportBroadcastOperationAction$AsyncBroadcastAction.onOperation >>>> at TransportBroadcastOperationAction$AsyncBroadcastAction$2.run >>>> >>>> Sorry about the crappy stack trace. Still, looks like this might point >>>> to a problem! The exception fired about an hour after I kicked off the >>>> optimize. Any thoughts? >>>> >>>> On Wednesday, April 9, 2014 10:06:57 AM UTC-4, Elliott Bradshaw wrote: >>>>> >>>>> Hi Adrien, >>>>> >>>>> I did customize my merge policy, although I did so only because I was >>>>> so surprised by the number of segments left over after the load. I'm >>>>> pretty sure the optimize problem was happening before I made this change, >>>>> but either way here are my settings: >>>>> >>>>> "index" : { >>>>> "merge" : { >>>>> "policy" : { >>>>> "max_merged_segment" : "20gb", >>>>> "segments_per_tier" : 5, >>>>> "floor_segment" : "10mb" >>>>> }, >>>>> "scheduler" : "concurrentmergescheduler" >>>>> } >>>>> } >>>>> >>>>> Not sure whether this set up could be a contributing factor or not. >>>>> Nothing really jumps out at me in the logs. In fact, when i kick off the >>>>> optimize, I don't see any logging at all. Should I? >>>>> >>>>> I'm running the following command: curl -XPOST >>>>> http://localhost:9200/index/_optimize >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> On Wednesday, April 9, 2014 8:56:35 AM UTC-4, Adrien Grand wrote: >>>>>> >>>>>> Hi Elliott, >>>>>> >>>>>> 1500 segments per shard is certainly way too much, and it is not >>>>>> normal that optimize doesn't manage to reduce the number of segments. >>>>>> - Is there anything suspicious in the logs? >>>>>> - Have you customized the merge policy or scheduler?[1] >>>>>> - Does the issue still reproduce if you restart your cluster? >>>>>> >>>>>> [1] http://www.elasticsearch.org/guide/en/elasticsearch/referenc >>>>>> e/current/index-modules-merge.html >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Apr 9, 2014 at 2:38 PM, Elliott Bradshaw >>>>>> <ebrad...@gmail.com>wrote: >>>>>> >>>>>>> Any other thoughts on this? Would 1500 segments per shard be >>>>>>> significantly impacting performance? Have you guys noticed this >>>>>>> behavior >>>>>>> elsewhere? >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> On Monday, April 7, 2014 8:56:38 AM UTC-4, Elliott Bradshaw wrote: >>>>>>>> >>>>>>>> Adrian, >>>>>>>> >>>>>>>> I ran the following command: >>>>>>>> >>>>>>>> curl -XPUT http://localhost:9200/_settings -d >>>>>>>> '{"indices.store.throttle.max_bytes_per_sec" : "10gb"}' >>>>>>>> >>>>>>>> and received a { "acknowledged" : "true" } response. The logs >>>>>>>> showed "cluster state updated". >>>>>>>> >>>>>>>> I did have to close my index prior to changing the setting and >>>>>>>> reopen afterward. >>>>>>>> >>>>>>>> >>>>>>>> I've since began another optimize, but again it doesn't look like >>>>>>>> much is happening. The optimize isn't returning and the total CPU >>>>>>>> usage on >>>>>>>> every node is holding at about 2% of a single core. I would copy a >>>>>>>> hot_threads stack trace, but I'm unfortunately on a closed network and >>>>>>>> this >>>>>>>> isn't possible. I can tell you that refreshes of hot_threads show vary >>>>>>>> little happening. The occasional [merge] thread (always in a >>>>>>>> LinkedTransferQueue.awaitMatch() state) or [optimize] (doing >>>>>>>> nothing on a waitForMerge() call) thread shows up, but it's always >>>>>>>> consuming 0-1% CPU. It sure feels like something isn't right. Any >>>>>>>> thoughts? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Apr 4, 2014 at 3:24 PM, Adrien Grand < >>>>>>>> adrien...@elasticsearch.com> wrote: >>>>>>>> >>>>>>>>> Did you see a message in the logs confirming that the setting has >>>>>>>>> been updated? It would be interesting to see the output of hot >>>>>>>>> threads[1] >>>>>>>>> to see what your node is doing. >>>>>>>>> >>>>>>>>> [1] http://www.elasticsearch.org/guide/en/elasticsearch/referenc >>>>>>>>> e/current/cluster-nodes-hot-threads.html >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Apr 4, 2014 at 7:18 PM, Elliott Bradshaw < >>>>>>>>> ebrad...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Yes. I have run max_num_segments=1 every time. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Apr 4, 2014 at 12:26 PM, Michael Sick < >>>>>>>>>> michae...@serenesoftware.com> wrote: >>>>>>>>>> >>>>>>>>>>> Have you tried max_num_segments=1 on your optimize? >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 4, 2014 at 11:27 AM, Elliott Bradshaw < >>>>>>>>>>> ebrad...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Any thoughts on this? I've run optimize several more times, >>>>>>>>>>>> and the number of segments falls each time, but I'm still over 1000 >>>>>>>>>>>> segments per shard. Has anyone else run into something similar? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thursday, April 3, 2014 11:21:29 AM UTC-4, Elliott Bradshaw >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> OK. Optimize finally returned, so I suppose something was >>>>>>>>>>>>> happening in the background, but I'm still seeing over 6500 >>>>>>>>>>>>> segments. Even >>>>>>>>>>>>> after setting max_num_segments=5. Does this seem right? Queries >>>>>>>>>>>>> are a >>>>>>>>>>>>> little faster (350-400ms) but still not great. Bigdesk is still >>>>>>>>>>>>> showing a >>>>>>>>>>>>> fair amount of file IO. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thursday, April 3, 2014 8:47:32 AM UTC-4, Elliott Bradshaw >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've recently upgraded to Elasticsearch 1.1.0. I've got a 4 >>>>>>>>>>>>>> node cluster, each with 64G of ram, with 24G allocated to >>>>>>>>>>>>>> Elasticsearch on >>>>>>>>>>>>>> each. I've batch loaded approximately 86 million documents into >>>>>>>>>>>>>> a single >>>>>>>>>>>>>> index (4 shards) and have started benchmarking >>>>>>>>>>>>>> cross_field/multi_match >>>>>>>>>>>>>> queries on them. The index has one replica and takes up a total >>>>>>>>>>>>>> of 111G. >>>>>>>>>>>>>> I've run several batches of warming queries, but queries are not >>>>>>>>>>>>>> as fast as >>>>>>>>>>>>>> I had hoped, approximately 400-500ms each. Given that *top *(on >>>>>>>>>>>>>> Centos) shows 5-8 GB of free memory on each server, I would >>>>>>>>>>>>>> assume that the >>>>>>>>>>>>>> entire index has been paged into memory (I had worried about disk >>>>>>>>>>>>>> performance previously, as we are working in a virtualized >>>>>>>>>>>>>> environment). >>>>>>>>>>>>>> >>>>>>>>>>>>>> A stats query on the index in questions shows that the index >>>>>>>>>>>>>> is composed of > 7000 segments. This seemed high to me, but >>>>>>>>>>>>>> maybe it's >>>>>>>>>>>>>> appropriate. Regardless, I dispatched an optimize command, but >>>>>>>>>>>>>> I am not >>>>>>>>>>>>>> seeing any progress and the command has not returned. Current >>>>>>>>>>>>>> merges >>>>>>>>>>>>>> remains at zero, and the segment count is not changing. >>>>>>>>>>>>>> Checking out hot >>>>>>>>>>>>>> threads in ElasticHQ, I initially saw an optimize call in the >>>>>>>>>>>>>> stack that >>>>>>>>>>>>>> was blocked on a waitForMerge call. This however has >>>>>>>>>>>>>> disappeared, and I'm >>>>>>>>>>>>>> seeing no evidence that the optimize is occuring. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does any of this seem out of the norm or unusual? Has anyone >>>>>>>>>>>>>> else had similar issues. This is the second time I have tried >>>>>>>>>>>>>> to optimize >>>>>>>>>>>>>> an index since upgrading. I've gotten the same result both time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks in advance for any help/tips! >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Elliott >>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "elasticsearch" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to elasticsearc...@googlegroups.com. >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/5391291f-5c5 >>>>>>>>>>>> e-4088-a1f2-93272beef0bb%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/5391291f-5c5e-4088-a1f2-93272beef0bb%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to a topic >>>>>>>>>>> in the Google Groups "elasticsearch" group. >>>>>>>>>>> To unsubscribe from this topic, visit >>>>>>>>>>> https://groups.google.com/d/topic/elasticsearch/kqTRRADQBwc/ >>>>>>>>>>> unsubscribe. >>>>>>>>>>> To unsubscribe from this group and all its topics, send an email >>>>>>>>>>> to elasticsearc...@googlegroups.com. >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAP8axnD7BUz >>>>>>>>>>> iGct2%3Db%3DfupaKYFnA5fR2TBsxHoURJumHSyODFA%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAP8axnD7BUziGct2%3Db%3DfupaKYFnA5fR2TBsxHoURJumHSyODFA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "elasticsearch" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAGCt%2BFvoS >>>>>>>>>> QTvv%2B6G%3D3GOX27AuYdEwLiW%3Demc0JTouT9%2BBeUk_A%40mail.gma >>>>>>>>>> il.com<https://groups.google.com/d/msgid/elasticsearch/CAGCt%2BFvoSQTvv%2B6G%3D3GOX27AuYdEwLiW%3Demc0JTouT9%2BBeUk_A%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Adrien Grand >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to a topic in >>>>>>>>> the Google Groups "elasticsearch" group. >>>>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>>>>>>>> topic/elasticsearch/kqTRRADQBwc/unsubscribe. >>>>>>>>> To unsubscribe from this group and all its topics, send an email >>>>>>>>> to elasticsearc...@googlegroups.com. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6sQrP >>>>>>>>> jijV86nYGoGTAQ%3D3cO_pgyYE6%2B3sGjJPr8%2BKDsg%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6sQrPjijV86nYGoGTAQ%3D3cO_pgyYE6%2B3sGjJPr8%2BKDsg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/8742280e-922 >>>>>>> f-4e91-bcb2-6096ca0165e6%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/8742280e-922f-4e91-bcb2-6096ca0165e6%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Adrien Grand >>>>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/344b09db-a2d8-4c2d-a917-dbf53eda03ce% >>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/344b09db-a2d8-4c2d-a917-dbf53eda03ce%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Adrien Grand >> > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/eda52a43-94ec-4574-b989-32727cf3cfe4%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/eda52a43-94ec-4574-b989-32727cf3cfe4%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7mYNrg1vVauWN8CyD-csXPqtdPad%3DC0QFiTyYOzsU2Bg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.