Hi Robert, Or maybe it's worth rethinking the architecture to avoid having to do tricks like no-replicas for 1h. Kafka in front of ES comes to mind. We use this setup for Logsene <http://sematext.com/logsene/> and don't have the problem with log loss, so it may work well for you, too.
I think you could also replace Redis + 3 Logstash servers with 1 rsyslog server with omelasticsearch, which has built-in buffering in memory and on disk (see links below for config examples). Some pointers that may be helpful: * http://blog.sematext.com/?s=omelasticsearch * http://blog.sematext.com/2013/07/01/recipe-rsyslog-elasticsearch-kibana/ Otis -- Elasticsearch Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Wednesday, August 13, 2014 7:24:09 PM UTC+2, Robert Gardam wrote: > > I appreciate your answers. I think IO could be a contributing factor. I'm > thinking of splitting the index into an hourly index with no replicas for > bulk importing and then switch it on afterwards. > > I think the risk of loosing data would be too high if it was any longer > than that. Also Does the async replication from the logstash side of things > cause unknown issues? > > > > On Wednesday, August 13, 2014 7:08:05 PM UTC+2, Jörg Prante wrote: >> >> If Elasticsearch rejects bulk actions, this is serious and you should >> examine the cluster to find out why this is so. From slow disks, cluster >> health, or capacity problems, everything comes to mind. But if you ignore >> problem solution and merely disable bulk resource control instead, you open >> the gate wide to unpredictable node crashes, and you won't be able to >> control the cluster at a certain point. >> >> To reduce the number of active bulk requests per timeframe, for example, >> you could increase the bulk request actions per request. Or simply increase >> the number of nodes. Or think about the shard/replica organization while >> indexing - it can be an advantage to bulk index to replica level 0 index >> only and increase the replica level later. >> >> Jörg >> >> >> On Wed, Aug 13, 2014 at 6:50 PM, Robert Gardam <robert...@fyber.com> >> wrote: >> >>> Hi, >>> The reason this is set is because without it we reject messages and >>> there fore don't have all the log entries. >>> >>> I'm happy to be told this isn't required, but i'm pretty sure it is. We >>> are constantly bulk indexing large numbers of events. >>> >>> >>> >>> On Wednesday, August 13, 2014 6:09:46 PM UTC+2, Jörg Prante wrote: >>> >>>> Because you set queue_size: -1 in the bulk thread pool, you explicitly >>>> allowed the node to crash. >>>> >>>> You should use reasonable resource limits. Default settings, which are >>>> reasonable, are sufficient in most cases. >>>> >>>> Jörg >>>> >>>> >>>> On Wed, Aug 13, 2014 at 5:18 PM, Robert Gardam <robert...@fyber.com> >>>> wrote: >>>> >>>>> Hello >>>>> >>>>> >>>>> We have a 10 node elasticsearch cluster which is receieving roughly >>>>> 10k/s worth of logs lines from our application. >>>>> >>>>> Each elasticsearch node has 132gb of memory - 48gb heap size, the disk >>>>> subsystem is not great, but it seems to be keeping up. (This could be an >>>>> issue, but i'm not sure that it is) >>>>> >>>>> The logs path is: >>>>> >>>>> app server -> redis (via logstash) -> logstash filters (3 dedicated >>>>> boxes) -> elasticsearch_http >>>>> >>>>> >>>>> We currently bulk import from logstash at 5k documents per flush to >>>>> keep up with the volume of data that comes in. >>>>> >>>>> Here are the es non standard configs. >>>>> >>>>> indices.memory.index_buffer_size: 50% >>>>> index.translog.flush_threshold_ops: 50000 >>>>> # Refresh tuning. >>>>> index.refresh_interval: 15s >>>>> # Field Data cache tuning >>>>> indices.fielddata.cache.size: 24g >>>>> indices.fielddata.cache.expire: 10m >>>>> #Segment Merging Tuning >>>>> index.merge.policy.max_merged_segment: 15g >>>>> # Thread Tuning >>>>> threadpool: >>>>> bulk: >>>>> type: fixed >>>>> queue_size: -1 >>>>> >>>>> We have not had this cluster stay up for more than a week, but it also >>>>> seems to crash for no real reason. >>>>> >>>>> It seems like one node starts having issues and then it takes the >>>>> entire cluster down. >>>>> >>>>> Does anyone from the community have any experience with this kind of >>>>> setup? >>>>> >>>>> Thanks in Advance, >>>>> Rob >>>>> >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to elasticsearc...@googlegroups.com. >>>>> >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/d04a643e-990b-40b0-b230-2ba560f08eea% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/d04a643e-990b-40b0-b230-2ba560f08eea%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/be02ed79-31de-4002-9144-d124370d0c31%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elasticsearch/be02ed79-31de-4002-9144-d124370d0c31%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ab612b8-31c3-47bd-b8a5-8687c7fc9108%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.