Re: Large Scale elastic Search Logstash collection system

2014-08-17 Thread Otis Gospodnetic
Hi Robert,

Or maybe it's worth rethinking the architecture to avoid having to do 
tricks like no-replicas for 1h.  Kafka in front of ES comes to mind.  We 
use this setup for Logsene  and don't have 
the problem with log loss, so it may work well for you, too.

I think you could also replace Redis + 3 Logstash servers with 1 rsyslog 
server with omelasticsearch, which has built-in buffering in memory and on 
disk (see links below for config examples).

Some pointers that may be helpful:
* http://blog.sematext.com/?s=omelasticsearch
* http://blog.sematext.com/2013/07/01/recipe-rsyslog-elasticsearch-kibana/

Otis
--
Elasticsearch Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wednesday, August 13, 2014 7:24:09 PM UTC+2, Robert Gardam wrote:
>
> I appreciate your answers. I think IO could be a contributing factor. I'm 
> thinking of splitting the index into an hourly index with no replicas for 
> bulk importing and then switch it on afterwards. 
>
> I think the risk of loosing data would be too high if it was any longer 
> than that. Also Does the async replication from the logstash side of things 
> cause unknown issues?
>
>
>
> On Wednesday, August 13, 2014 7:08:05 PM UTC+2, Jörg Prante wrote:
>>
>> If Elasticsearch rejects bulk actions, this is serious and you should 
>> examine the cluster to find out why this is so. From slow disks, cluster 
>> health, or capacity problems, everything comes to mind. But if you ignore 
>> problem solution and merely disable bulk resource control instead, you open 
>> the gate wide to unpredictable node crashes, and you won't be able to 
>> control the cluster at a certain point.
>>
>> To reduce the number of active bulk requests per timeframe, for example, 
>> you could increase the bulk request actions per request. Or simply increase 
>> the number of nodes. Or think about the shard/replica organization while 
>> indexing - it can be an advantage to bulk index to replica level 0 index 
>> only and increase the replica level later.
>>
>> Jörg
>>
>>
>> On Wed, Aug 13, 2014 at 6:50 PM, Robert Gardam  
>> wrote:
>>
>>> Hi,
>>> The reason this is set is because without it we reject messages and 
>>> there fore don't have all the log entries.
>>>
>>> I'm happy to be told this isn't required, but i'm pretty sure it is. We 
>>> are constantly bulk indexing large numbers of events.
>>>
>>>
>>>
>>> On Wednesday, August 13, 2014 6:09:46 PM UTC+2, Jörg Prante wrote:
>>>
 Because you set queue_size: -1 in the bulk thread pool, you explicitly 
 allowed the node to crash.

 You should use reasonable resource limits. Default settings, which are 
 reasonable, are sufficient in most cases.

 Jörg


 On Wed, Aug 13, 2014 at 5:18 PM, Robert Gardam  
 wrote:

> Hello 
>
>
> We have a 10 node elasticsearch cluster which is receieving roughly 
> 10k/s worth of logs lines from our application.
>
> Each elasticsearch node has 132gb of memory - 48gb heap size, the disk 
> subsystem is not great, but it seems to be keeping up. (This could be an 
> issue, but i'm not sure that it is)
>
> The logs path is: 
>
> app server -> redis (via logstash) -> logstash filters (3 dedicated 
> boxes) -> elasticsearch_http 
>
>  
> We currently bulk import from logstash at 5k documents per flush to 
> keep up with the volume of data that comes in. 
>
> Here are the es non standard configs.
>
> indices.memory.index_buffer_size: 50%
> index.translog.flush_threshold_ops: 5
> # Refresh tuning.
> index.refresh_interval: 15s
> # Field Data cache tuning
> indices.fielddata.cache.size: 24g
> indices.fielddata.cache.expire: 10m
> #Segment Merging Tuning
> index.merge.policy.max_merged_segment: 15g
> # Thread Tuning
> threadpool:
> bulk:
> type: fixed
> queue_size: -1
>
> We have not had this cluster stay up for more than a week, but it also 
> seems to crash for no real reason. 
>
> It seems like one node starts having issues and then it takes the 
> entire cluster down. 
>
> Does anyone from the community have any experience with this kind of 
> setup?
>
> Thanks in Advance,
> Rob
>
>
>
>  -- 
> You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to elasticsearc...@googlegroups.com.
>
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elasticsearch/d04a643e-990b-40b0-b230-2ba560f08eea%
> 40googlegroups.com 
> 
> .
>>>

Re: Large Scale elastic Search Logstash collection system

2014-08-13 Thread Robert Gardam
I appreciate your answers. I think IO could be a contributing factor. I'm 
thinking of splitting the index into an hourly index with no replicas for 
bulk importing and then switch it on afterwards. 

I think the risk of loosing data would be too high if it was any longer 
than that. Also Does the async replication from the logstash side of things 
cause unknown issues?



On Wednesday, August 13, 2014 7:08:05 PM UTC+2, Jörg Prante wrote:
>
> If Elasticsearch rejects bulk actions, this is serious and you should 
> examine the cluster to find out why this is so. From slow disks, cluster 
> health, or capacity problems, everything comes to mind. But if you ignore 
> problem solution and merely disable bulk resource control instead, you open 
> the gate wide to unpredictable node crashes, and you won't be able to 
> control the cluster at a certain point.
>
> To reduce the number of active bulk requests per timeframe, for example, 
> you could increase the bulk request actions per request. Or simply increase 
> the number of nodes. Or think about the shard/replica organization while 
> indexing - it can be an advantage to bulk index to replica level 0 index 
> only and increase the replica level later.
>
> Jörg
>
>
> On Wed, Aug 13, 2014 at 6:50 PM, Robert Gardam  > wrote:
>
>> Hi,
>> The reason this is set is because without it we reject messages and there 
>> fore don't have all the log entries.
>>
>> I'm happy to be told this isn't required, but i'm pretty sure it is. We 
>> are constantly bulk indexing large numbers of events.
>>
>>
>>
>> On Wednesday, August 13, 2014 6:09:46 PM UTC+2, Jörg Prante wrote:
>>
>>> Because you set queue_size: -1 in the bulk thread pool, you explicitly 
>>> allowed the node to crash.
>>>
>>> You should use reasonable resource limits. Default settings, which are 
>>> reasonable, are sufficient in most cases.
>>>
>>> Jörg
>>>
>>>
>>> On Wed, Aug 13, 2014 at 5:18 PM, Robert Gardam  
>>> wrote:
>>>
 Hello 


 We have a 10 node elasticsearch cluster which is receieving roughly 
 10k/s worth of logs lines from our application.

 Each elasticsearch node has 132gb of memory - 48gb heap size, the disk 
 subsystem is not great, but it seems to be keeping up. (This could be an 
 issue, but i'm not sure that it is)

 The logs path is: 

 app server -> redis (via logstash) -> logstash filters (3 dedicated 
 boxes) -> elasticsearch_http 

  
 We currently bulk import from logstash at 5k documents per flush to 
 keep up with the volume of data that comes in. 

 Here are the es non standard configs.

 indices.memory.index_buffer_size: 50%
 index.translog.flush_threshold_ops: 5
 # Refresh tuning.
 index.refresh_interval: 15s
 # Field Data cache tuning
 indices.fielddata.cache.size: 24g
 indices.fielddata.cache.expire: 10m
 #Segment Merging Tuning
 index.merge.policy.max_merged_segment: 15g
 # Thread Tuning
 threadpool:
 bulk:
 type: fixed
 queue_size: -1

 We have not had this cluster stay up for more than a week, but it also 
 seems to crash for no real reason. 

 It seems like one node starts having issues and then it takes the 
 entire cluster down. 

 Does anyone from the community have any experience with this kind of 
 setup?

 Thanks in Advance,
 Rob



  -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/d04a643e-990b-40b0-b230-2ba560f08eea%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/be02ed79-31de-4002-9144-d124370d0c31%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://grou

Re: Large Scale elastic Search Logstash collection system

2014-08-13 Thread joergpra...@gmail.com
If Elasticsearch rejects bulk actions, this is serious and you should
examine the cluster to find out why this is so. From slow disks, cluster
health, or capacity problems, everything comes to mind. But if you ignore
problem solution and merely disable bulk resource control instead, you open
the gate wide to unpredictable node crashes, and you won't be able to
control the cluster at a certain point.

To reduce the number of active bulk requests per timeframe, for example,
you could increase the bulk request actions per request. Or simply increase
the number of nodes. Or think about the shard/replica organization while
indexing - it can be an advantage to bulk index to replica level 0 index
only and increase the replica level later.

Jörg


On Wed, Aug 13, 2014 at 6:50 PM, Robert Gardam 
wrote:

> Hi,
> The reason this is set is because without it we reject messages and there
> fore don't have all the log entries.
>
> I'm happy to be told this isn't required, but i'm pretty sure it is. We
> are constantly bulk indexing large numbers of events.
>
>
>
> On Wednesday, August 13, 2014 6:09:46 PM UTC+2, Jörg Prante wrote:
>
>> Because you set queue_size: -1 in the bulk thread pool, you explicitly
>> allowed the node to crash.
>>
>> You should use reasonable resource limits. Default settings, which are
>> reasonable, are sufficient in most cases.
>>
>> Jörg
>>
>>
>> On Wed, Aug 13, 2014 at 5:18 PM, Robert Gardam 
>> wrote:
>>
>>> Hello
>>>
>>>
>>> We have a 10 node elasticsearch cluster which is receieving roughly
>>> 10k/s worth of logs lines from our application.
>>>
>>> Each elasticsearch node has 132gb of memory - 48gb heap size, the disk
>>> subsystem is not great, but it seems to be keeping up. (This could be an
>>> issue, but i'm not sure that it is)
>>>
>>> The logs path is:
>>>
>>> app server -> redis (via logstash) -> logstash filters (3 dedicated
>>> boxes) -> elasticsearch_http
>>>
>>>
>>> We currently bulk import from logstash at 5k documents per flush to keep
>>> up with the volume of data that comes in.
>>>
>>> Here are the es non standard configs.
>>>
>>> indices.memory.index_buffer_size: 50%
>>> index.translog.flush_threshold_ops: 5
>>> # Refresh tuning.
>>> index.refresh_interval: 15s
>>> # Field Data cache tuning
>>> indices.fielddata.cache.size: 24g
>>> indices.fielddata.cache.expire: 10m
>>> #Segment Merging Tuning
>>> index.merge.policy.max_merged_segment: 15g
>>> # Thread Tuning
>>> threadpool:
>>> bulk:
>>> type: fixed
>>> queue_size: -1
>>>
>>> We have not had this cluster stay up for more than a week, but it also
>>> seems to crash for no real reason.
>>>
>>> It seems like one node starts having issues and then it takes the entire
>>> cluster down.
>>>
>>> Does anyone from the community have any experience with this kind of
>>> setup?
>>>
>>> Thanks in Advance,
>>> Rob
>>>
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/d04a643e-990b-40b0-b230-2ba560f08eea%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/be02ed79-31de-4002-9144-d124370d0c31%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFUPQBUvrsFYtQcWZ0W%3DnLZz_LasQ1T9p0XTx%3DqoesNMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Large Scale elastic Search Logstash collection system

2014-08-13 Thread Robert Gardam
Hi,
The reason this is set is because without it we reject messages and there 
fore don't have all the log entries.

I'm happy to be told this isn't required, but i'm pretty sure it is. We are 
constantly bulk indexing large numbers of events.



On Wednesday, August 13, 2014 6:09:46 PM UTC+2, Jörg Prante wrote:
>
> Because you set queue_size: -1 in the bulk thread pool, you explicitly 
> allowed the node to crash.
>
> You should use reasonable resource limits. Default settings, which are 
> reasonable, are sufficient in most cases.
>
> Jörg
>
>
> On Wed, Aug 13, 2014 at 5:18 PM, Robert Gardam  > wrote:
>
>> Hello 
>>
>>
>> We have a 10 node elasticsearch cluster which is receieving roughly 10k/s 
>> worth of logs lines from our application.
>>
>> Each elasticsearch node has 132gb of memory - 48gb heap size, the disk 
>> subsystem is not great, but it seems to be keeping up. (This could be an 
>> issue, but i'm not sure that it is)
>>
>> The logs path is: 
>>
>> app server -> redis (via logstash) -> logstash filters (3 dedicated 
>> boxes) -> elasticsearch_http 
>>
>>  
>> We currently bulk import from logstash at 5k documents per flush to keep 
>> up with the volume of data that comes in. 
>>
>> Here are the es non standard configs.
>>
>> indices.memory.index_buffer_size: 50%
>> index.translog.flush_threshold_ops: 5
>> # Refresh tuning.
>> index.refresh_interval: 15s
>> # Field Data cache tuning
>> indices.fielddata.cache.size: 24g
>> indices.fielddata.cache.expire: 10m
>> #Segment Merging Tuning
>> index.merge.policy.max_merged_segment: 15g
>> # Thread Tuning
>> threadpool:
>> bulk:
>> type: fixed
>> queue_size: -1
>>
>> We have not had this cluster stay up for more than a week, but it also 
>> seems to crash for no real reason. 
>>
>> It seems like one node starts having issues and then it takes the entire 
>> cluster down. 
>>
>> Does anyone from the community have any experience with this kind of 
>> setup?
>>
>> Thanks in Advance,
>> Rob
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/d04a643e-990b-40b0-b230-2ba560f08eea%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/be02ed79-31de-4002-9144-d124370d0c31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Large Scale elastic Search Logstash collection system

2014-08-13 Thread joergpra...@gmail.com
Because you set queue_size: -1 in the bulk thread pool, you explicitly
allowed the node to crash.

You should use reasonable resource limits. Default settings, which are
reasonable, are sufficient in most cases.

Jörg


On Wed, Aug 13, 2014 at 5:18 PM, Robert Gardam 
wrote:

> Hello
>
>
> We have a 10 node elasticsearch cluster which is receieving roughly 10k/s
> worth of logs lines from our application.
>
> Each elasticsearch node has 132gb of memory - 48gb heap size, the disk
> subsystem is not great, but it seems to be keeping up. (This could be an
> issue, but i'm not sure that it is)
>
> The logs path is:
>
> app server -> redis (via logstash) -> logstash filters (3 dedicated boxes)
> -> elasticsearch_http
>
>
> We currently bulk import from logstash at 5k documents per flush to keep
> up with the volume of data that comes in.
>
> Here are the es non standard configs.
>
> indices.memory.index_buffer_size: 50%
> index.translog.flush_threshold_ops: 5
> # Refresh tuning.
> index.refresh_interval: 15s
> # Field Data cache tuning
> indices.fielddata.cache.size: 24g
> indices.fielddata.cache.expire: 10m
> #Segment Merging Tuning
> index.merge.policy.max_merged_segment: 15g
> # Thread Tuning
> threadpool:
> bulk:
> type: fixed
> queue_size: -1
>
> We have not had this cluster stay up for more than a week, but it also
> seems to crash for no real reason.
>
> It seems like one node starts having issues and then it takes the entire
> cluster down.
>
> Does anyone from the community have any experience with this kind of setup?
>
> Thanks in Advance,
> Rob
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d04a643e-990b-40b0-b230-2ba560f08eea%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEU84GhzjUuuo69K2xeu1N2-nQPCJXXCj3wu20Mz0R4VA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.