Re: Elasticsearch performance tuning

2015-02-22 Thread Mark Walkom
Are you running a single cluster with all of those nodes included?
Have you changed the roles that these play, ie master, data, client, or are
they the default?

On 20 February 2015 at 16:30, Deva Raj  wrote:

>
> I listed below instance and his heap size  details.
>
> Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network
>
> Java heap size: 2gb
>
>
> R3 Large 15.25 RAM 2 cores Storage :32 GB SSD
>
> Java heap size: 7gb
>
>
> R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores
>
> Java heap size: 15gb
>
>
> Thanks
> Devaraj
>
> On Friday, February 20, 2015 at 4:15:12 AM UTC+5:30, Mark Walkom wrote:
>>
>> Don't change cache and buffer sizes unless you know what is happening,
>> the defaults are going to be fine.
>> How much heap did you give ES?
>>
>> I'm not sure you can do much about the date filter though, maybe someone
>> else has pointers.
>>
>> On 19 February 2015 at 21:12, Deva Raj  wrote:
>>
>>> Hi Mark Walkom,
>>>
>>> I have given below logstash conf file
>>>
>>>
>>>   Logstash conf
>>>
>>> input {
>>>file {
>>>
>>>   }
>>>
>>> }
>>>
>>> filter {
>>>   mutate
>>>   {
>>> gsub => ["message", "\n", " "]
>>>   }
>>>  mutate
>>>  {
>>> gsub => ["message", "\t", " "]
>>>  }
>>>  multiline
>>>{
>>> pattern => "^ "
>>> what => "previous"
>>>}
>>>
>>> grok { match => [ "message", 
>>> "%{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\|
>>>  %{GREEDYDATA:log_message}"]
>>>  match => [ "path" , 
>>> "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log"]
>>>
>>>  break_on_match => false
>>> }
>>>
>>>
>>> #To check location is S or L
>>>   if [loccode] == "S"  or [loccode] == "L" {
>>>  ruby {
>>> code => " temp = event['_machine'].split('_')
>>>   if  !temp.nil? || !temp.empty?
>>>   event['_machine'] = temp[0]
>>> end"
>>>}
>>>  }
>>>  mutate {
>>>
>>> add_field => ["event_timestamp", "%{@timestamp}" ]
>>> replace => [ "log_time", "%{logdate} %{log_time}" ]
>>> # Remove the 'logdate' field since we don't need it anymore.
>>>lowercase=>["loccode"]
>>>remove => "logdate"
>>>
>>>   }
>>> # to get all site details (site name, city and co-ordinates)
>>> sitelocator{sitename => "loccode"  
>>> datafile=>"vendor/sitelocator/SiteDetails.csv"}
>>> date {  locale=>"en"
>>> match => [ "log_time", "-MM-dd HH:mm:ss", "MM-dd- 
>>> HH:mm:ss.SSS","ISO8601" ] }
>>>
>>> }
>>>
>>> output {
>>> elasticsearch{
>>>  }
>>>
>>> }
>>>
>>>
>>>
>>> I have checked step by step to find bottleneck filter. Below filter
>>> which took much time. Can you guide me How can I tune it to get faster.
>>>
>>> date { locale=>"en" match => [ "log_time", "-MM-dd HH:mm:ss",
>>> "MM-dd- HH:mm:ss.SSS","ISO8601" ] } }
>>> <http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558>
>>>
>>>
>>> Thanks
>>> Devaraj
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-M05jDiiGk1m8TMBYOD3qcWLtVNyatKzxEZzAmf9Kthw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch performance tuning

2015-02-19 Thread Deva Raj

I listed below instance and his heap size  details.

Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network  

Java heap size: 2gb


R3 Large 15.25 RAM 2 cores Storage :32 GB SSD

Java heap size: 7gb


R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores

Java heap size: 15gb


Thanks
Devaraj

On Friday, February 20, 2015 at 4:15:12 AM UTC+5:30, Mark Walkom wrote:
>
> Don't change cache and buffer sizes unless you know what is happening, the 
> defaults are going to be fine.
> How much heap did you give ES?
>
> I'm not sure you can do much about the date filter though, maybe someone 
> else has pointers.
>
> On 19 February 2015 at 21:12, Deva Raj > 
> wrote:
>
>> Hi Mark Walkom,
>>
>> I have given below logstash conf file
>>
>>  
>>   Logstash conf
>>
>> input {
>>file {
>>
>>   }
>>
>> }
>>
>> filter {
>>   mutate
>>   {
>> gsub => ["message", "\n", " "]
>>   }
>>  mutate
>>  {
>> gsub => ["message", "\t", " "]
>>  }
>>  multiline
>>{
>> pattern => "^ "
>> what => "previous"
>>}
>>
>> grok { match => [ "message", 
>> "%{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\|
>>  %{GREEDYDATA:log_message}"] 
>>  match => [ "path" , 
>> "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log"]
>>
>>  break_on_match => false
>> }
>>
>>
>> #To check location is S or L
>>   if [loccode] == "S"  or [loccode] == "L" {
>>  ruby {   
>> code => " temp = event['_machine'].split('_')
>>   if  !temp.nil? || !temp.empty?
>>   event['_machine'] = temp[0]
>> end"
>>} 
>>  }
>>  mutate {
>>
>> add_field => ["event_timestamp", "%{@timestamp}" ]
>> replace => [ "log_time", "%{logdate} %{log_time}" ]
>> # Remove the 'logdate' field since we don't need it anymore.
>>lowercase=>["loccode"]
>>remove => "logdate"
>>
>>   }
>> # to get all site details (site name, city and co-ordinates)
>> sitelocator{sitename => "loccode"  
>> datafile=>"vendor/sitelocator/SiteDetails.csv"}
>> date {  locale=>"en"
>> match => [ "log_time", "-MM-dd HH:mm:ss", "MM-dd- 
>> HH:mm:ss.SSS","ISO8601" ] }
>>
>> }
>>
>> output {
>> elasticsearch{
>>  }
>>
>> }
>>
>>
>>
>> I have checked step by step to find bottleneck filter. Below filter which 
>> took much time. Can you guide me How can I tune it to get faster. 
>>
>> date { locale=>"en" match => [ "log_time", "-MM-dd HH:mm:ss", 
>> "MM-dd- HH:mm:ss.SSS","ISO8601" ] } } 
>> <http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558>
>>
>>
>> Thanks
>> Devaraj
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch performance tuning

2015-02-19 Thread Mark Walkom
Don't change cache and buffer sizes unless you know what is happening, the
defaults are going to be fine.
How much heap did you give ES?

I'm not sure you can do much about the date filter though, maybe someone
else has pointers.

On 19 February 2015 at 21:12, Deva Raj  wrote:

> Hi Mark Walkom,
>
> I have given below logstash conf file
>
>
>   Logstash conf
>
> input {
>file {
>
>   }
>
> }
>
> filter {
>   mutate
>   {
> gsub => ["message", "\n", " "]
>   }
>  mutate
>  {
> gsub => ["message", "\t", " "]
>  }
>  multiline
>{
> pattern => "^ "
> what => "previous"
>}
>
> grok { match => [ "message", 
> "%{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\|
>  %{GREEDYDATA:log_message}"]
>  match => [ "path" , 
> "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log"]
>
>  break_on_match => false
> }
>
>
> #To check location is S or L
>   if [loccode] == "S"  or [loccode] == "L" {
>  ruby {
> code => " temp = event['_machine'].split('_')
>   if  !temp.nil? || !temp.empty?
>   event['_machine'] = temp[0]
> end"
>}
>  }
>  mutate {
>
> add_field => ["event_timestamp", "%{@timestamp}" ]
> replace => [ "log_time", "%{logdate} %{log_time}" ]
> # Remove the 'logdate' field since we don't need it anymore.
>lowercase=>["loccode"]
>remove => "logdate"
>
>   }
> # to get all site details (site name, city and co-ordinates)
> sitelocator{sitename => "loccode"  
> datafile=>"vendor/sitelocator/SiteDetails.csv"}
> date {  locale=>"en"
> match => [ "log_time", "-MM-dd HH:mm:ss", "MM-dd- 
> HH:mm:ss.SSS","ISO8601" ] }
>
> }
>
> output {
> elasticsearch{
>  }
>
> }
>
>
>
> I have checked step by step to find bottleneck filter. Below filter which
> took much time. Can you guide me How can I tune it to get faster.
>
> date { locale=>"en" match => [ "log_time", "-MM-dd HH:mm:ss",
> "MM-dd- HH:mm:ss.SSS","ISO8601" ] } }
> <http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558>
>
>
> Thanks
> Devaraj
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-bB8%3DY0fd4HKcJ9Tw6OENwOTkMYo2muZs-Pd7-dt%2BA9w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch performance tuning

2015-02-19 Thread Deva Raj
Hi Mark Walkom,

I have given below logstash conf file

 
  Logstash conf

input {
   file {

  }

}

filter {
  mutate
  {
gsub => ["message", "\n", " "]
  }
 mutate
 {
gsub => ["message", "\t", " "]
 }
 multiline
   {
pattern => "^ "
what => "previous"
   }

grok { match => [ "message", 
"%{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\|
 %{GREEDYDATA:log_message}"] 
 match => [ "path" , 
"%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log"]

 break_on_match => false
}


#To check location is S or L
  if [loccode] == "S"  or [loccode] == "L" {
 ruby {   
code => " temp = event['_machine'].split('_')
  if  !temp.nil? || !temp.empty?
  event['_machine'] = temp[0]
end"
   } 
 }
 mutate {

add_field => ["event_timestamp", "%{@timestamp}" ]
replace => [ "log_time", "%{logdate} %{log_time}" ]
# Remove the 'logdate' field since we don't need it anymore.
   lowercase=>["loccode"]
   remove => "logdate"

  }
# to get all site details (site name, city and co-ordinates)
sitelocator{sitename => "loccode"  
datafile=>"vendor/sitelocator/SiteDetails.csv"}
date {  locale=>"en"
match => [ "log_time", "-MM-dd HH:mm:ss", "MM-dd- 
HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
 }

}



I have checked step by step to find bottleneck filter. Below filter which 
took much time. Can you guide me How can I tune it to get faster. 

date { locale=>"en" match => [ "log_time", "-MM-dd HH:mm:ss", 
"MM-dd- HH:mm:ss.SSS","ISO8601" ] } } 
<http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558>


Thanks
Devaraj

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch performance tuning

2015-02-18 Thread Deva Raj
Hi   Mark Walkom,

Thanks mark and i miss anything to tuning performance of elasticsearch.

  Added the following to elasticsearch settings:
Java heap size : Half of physical memory
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_
threshold_ops: 5
indices.memory.index_buffer_size: 50%


On Thursday, February 19, 2015 at 7:25:27 AM UTC+5:30, Mark Walkom wrote:
>
> 1. It depends
> 2. It depends
> 3. It depends
> 4. It also depends.
>
> The performance of ES is dependent on you; your data, your use, your 
> queries, your hardware, your configuration. If that is the results you got 
> then it is indicative to your setup and thus is your benchmark, and from 
> there you can tweak and try to improve performance.
>
> Monitoring LS is a little harder as there are no APIs for it (yet). Most 
> of the performance of it will result on your filters (especially grok).
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54a9031b-1e73-42b7-92b9-7ae3bda46ee7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch performance tuning

2015-02-18 Thread Mark Walkom
1. It depends
2. It depends
3. It depends
4. It also depends.

The performance of ES is dependent on you; your data, your use, your
queries, your hardware, your configuration. If that is the results you got
then it is indicative to your setup and thus is your benchmark, and from
there you can tweak and try to improve performance.

Monitoring LS is a little harder as there are no APIs for it (yet). Most of
the performance of it will result on your filters (especially grok).

On 18 February 2015 at 20:48, Deva Raj  wrote:

> Hi All,
>
>  In a Single Node Elastic Search along with logstash, We tested with 20mb
> and 200mb file parsing to Elastic Search on Different types of the AWS
> instance i.e Medium, Large and Xlarge.
>
> Environment Details : Medium instance 3.75 RAM  1 cores Storage :4 GB SSD
> 64-bit Network Performance: Moderate
> Instance running with : Logstash, Elastic search
>
> Scenario: 1
>
> **With default settings**
> Result :
> 20mb logfile 23 mins Events Per/second 175
> 200mb logfile 3 hrs 3 mins Events Per/second 175
>
>
> Added the following to settings:
> Java heap size : 2GB
> bootstrap.mlockall: true
> indices.fielddata.cache.size: "30%"
> indices.cache.filter.size: "30%"
> index.translog.flush_threshold_ops: 5
> indices.memory.index_buffer_size: 50%
>
> # Search thread pool
> threadpool.search.type: fixed
> threadpool.search.size: 20
> threadpool.search.queue_size: 100
>
> **With added settings**
> Result:
> 20mb logfile 22 mins Events Per/second 180
> 200mb logfile 3 hrs 07 mins Events Per/second 180
>
> Scenario 2
>
> Environment Details : R3 Large 15.25 RAM  2 cores Storage :32 GB SSD
> 64-bit Network Performance: Moderate
> Instance running with : Logstash, Elastic search
>
> **With default settings**
> Result :
>   20mb logfile 7 mins Events Per/second 750
>   200mb logfile 65 mins Events Per/second 800
>
> Added the following to settings:
> Java heap size: 7gb
> other parameters same as above
>
> **With added settings**
> Result:
> 20mb logfile 7 mins Events Per/second 800
> 200mb logfile 55 mins Events Per/second 800
>
> Scenario 3
>
> Environment Details :
> R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD
> 64-bit Network Performance: Moderate
> Instance running with : Logstash, Elastic search
>
> **With default settings**
>   Result:
>   20mb logfile 7 mins Events Per/second 1200
>   200mb logfile 34 mins Events Per/second 1200
>
>  Added the following to settings:
> Java heap size: 15gb
> other parameters same as above
>
> **With added settings**
> Result:
> 20mb logfile 7 mins Events Per/second 1200
> 200mb logfile 34 mins Events Per/second 1200
>
> I wanted to know
>
> 1. What is the benchmark for the performance?
> 2. Is the performance meets the benchmark or is it below the benchmark
> 3. Why even after i increased the elasticsearch JVM iam not able to find
> the difference?
> 4. how do i monitor Logstash and improve its performance?
>
> appreciate any help on this as iam new to logstash and elastic search.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_XRyHNnAt81NbdP0rK0r82%3D9LCNJsQTayEQiQNE8AA5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch performance tuning

2015-02-18 Thread Deva Raj
Hi All,

 In a Single Node Elastic Search along with logstash, We tested with 20mb 
and 200mb file parsing to Elastic Search on Different types of the AWS 
instance i.e Medium, Large and Xlarge.

Environment Details : Medium instance 3.75 RAM  1 cores Storage :4 GB SSD 
64-bit Network Performance: Moderate 
Instance running with : Logstash, Elastic search

Scenario: 1

**With default settings** 
Result :
20mb logfile 23 mins Events Per/second 175
200mb logfile 3 hrs 3 mins Events Per/second 175


Added the following to settings:
Java heap size : 2GB
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_threshold_ops: 5
indices.memory.index_buffer_size: 50%

# Search thread pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

**With added settings** 
Result:
20mb logfile 22 mins Events Per/second 180
200mb logfile 3 hrs 07 mins Events Per/second 180

Scenario 2

Environment Details : R3 Large 15.25 RAM  2 cores Storage :32 GB SSD 
64-bit Network Performance: Moderate 
Instance running with : Logstash, Elastic search

**With default settings** 
Result :
  20mb logfile 7 mins Events Per/second 750
  200mb logfile 65 mins Events Per/second 800

Added the following to settings:
Java heap size: 7gb
other parameters same as above

**With added settings** 
Result:
20mb logfile 7 mins Events Per/second 800
200mb logfile 55 mins Events Per/second 800

Scenario 3

Environment Details : 
R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD 
64-bit Network Performance: Moderate 
Instance running with : Logstash, Elastic search

**With default settings** 
  Result:
  20mb logfile 7 mins Events Per/second 1200
  200mb logfile 34 mins Events Per/second 1200

 Added the following to settings:
Java heap size: 15gb
other parameters same as above

**With added settings** 
Result:
20mb logfile 7 mins Events Per/second 1200
200mb logfile 34 mins Events Per/second 1200

I wanted to know

1. What is the benchmark for the performance?
2. Is the performance meets the benchmark or is it below the benchmark
3. Why even after i increased the elasticsearch JVM iam not able to find 
the difference?
4. how do i monitor Logstash and improve its performance?

appreciate any help on this as iam new to logstash and elastic search. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.