Re: Elasticsearch configuration for uninterrupted indexing

2014-03-25 Thread Rujuta Deshpande
Well it was for the entire machine. Now, I have changed it to a 4 GB 
machine. Even 4 GB is not enough right now and I do face the same problem. 
I am trying to benchmark the max/min Heap size I will have to allocate to 
an elasticsearch instance to be able to achieve uninterrupted indexing 
without running into memory errors. So, are you saying that the only 
solution to this problem is an increase in memory ? 

Thanks,
Rujuta

On Monday, March 24, 2014 9:56:25 PM UTC+5:30, Ivan Brusic wrote:

 I do not think splitting the application into 2 separate JVMs will solve 
 your issues. Is the 2GB per JVM or the total of the machine? For analytic 
 applications, with multiples facets, 2 GBs might not be sufficient.

 -- 
 Ivan


 On Sun, Mar 23, 2014 at 10:04 PM, Rujuta Deshpande 
 ruj...@gmail.comjavascript:
  wrote:

 Hi, 

 Thank you for the response. However, in our scenario, both the nodes are 
 on the same machine. Our setup doesn't allow us to have two separate 
 machines for each node. Also, we're indexing logs using logstash. 
 Sometimes, we have to query data from the logs over a period of two or 
 three months and then, we're thrown an out of memory error. This affects 
 the indexing that is simultaneously going on and we lose events. 

 I'm not sure what configuration of elasticsearch will help achieve this.

 Thanks,
 Rujuta

 On Friday, March 21, 2014 10:36:51 PM UTC+5:30, Ivan Brusic wrote:

 One of the main usage of having a data-less node is that it would act as 
 a coordinator between the other nodes. It will gather all the responses 
 from the other nodes/shards and reduce them into one.

 In your case, the data-less node is gathering all the data from just one 
 node. In other words, it is not doing much since the reduce phase is 
 basically a pass-thru operation. With a two node cluster, I would say you 
 are better off having both machines act as full nodes.

 Cheers,

 Ivan



 On Fri, Mar 21, 2014 at 5:04 AM, Rujuta Deshpande ruj...@gmail.comwrote:

 Hi, 

 I am setting up a system consisting of elasticsearch-logstash-kibana 
 for log analysis. I am using one machine (2 GB RAM, 2 CPUs) running 
 logstash, kibana and  two instances of elasticsearch. Two other machines, 
 each running  logstash-forwarder are pumping logs into the ELK system. 

 The reasoning behind using two ES instances was this - I needed one 
 uninterrupted instance to index the incoming logs and I also needed to 
 query the currently existing indices. However, I didn't want any complex 
 querying to result in loss of events owing to Out of Memory Errors because 
 of excessive querying. 

 So, one elasticsearch node was master = true  and data = true which did 
 the indexing (called the writer node) and the other node, was master = 
 false and data = false (this was the workhorse or reader node) .

 I assumed that, in cases of excessive querying, although the data is 
 stored on the writer node, the reader node will query the data and all the 
 processing will take place on the reader as a result of which issues like 
 out of memory error etc will be avoided and uninterrupted indexing will 
 take place. 

 However, while testing this, I realized that the reader hardly uses the 
 heap memory ( Checked this in Marvel )  and when I fire a complex search 
 query - which was a search request using the python API where the 'size' 
 parameter was set to 1, the writer node throws an out of memory error, 
 indicating that the processing also takes place on the writer node only. 
 My 
 min and max heap size was set to 256m  for this test. I also ensured that 
 I 
 was firing the search query to the port on which the reader node was 
 listening (Port 9200). The writer node was running on Port 9201.  

 Was my previous understanding of the problem incorrect - i.e. having 
 one reader and one writer node, doesn't help in uninterrupted indexing of 
 documents? If this is so, what is the use of having a separate workhorse 
 or 
 reader node? 

 My eventual aim is to be able to query elasticsearch and fetch large 
 amounts of data at a time without interrupting/slowing down the indexing 
 of 
 documents. 

 Thank you. 

 Rujuta 

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%
 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To 

Re: Elasticsearch configuration for uninterrupted indexing

2014-03-25 Thread joergpra...@gmail.com
While it is possible to create an ES cluster with dedicated reader/writer
nodes, this is not the default and in many cases, dedication of nodes is
not required at all. ES has some better heuristics built in to relief the
admin from tedious jobs like setting up dedicated nodes.

So I wonder how you understand what a reader or a writer node is. Note, if
you connect a client to a node, this node does not necessarily do the heavy
work, it just forwards automatically the requests to the nodes that hold
the shards.

You should use replica level  0 to distribute the query load. Replica
levels are duplicating shards just because of this - to allow better
distributed forwarding of search requests, and to allow some resilience in
case of node failures.

256m heap is very small for the massive Elasticsearch filter queries that
Kibana uses.

Jörg



On Fri, Mar 21, 2014 at 1:04 PM, Rujuta Deshpande rujd...@gmail.com wrote:

 Hi,

 I am setting up a system consisting of elasticsearch-logstash-kibana for
 log analysis. I am using one machine (2 GB RAM, 2 CPUs) running logstash,
 kibana and  two instances of elasticsearch. Two other machines, each
 running  logstash-forwarder are pumping logs into the ELK system.

 The reasoning behind using two ES instances was this - I needed one
 uninterrupted instance to index the incoming logs and I also needed to
 query the currently existing indices. However, I didn't want any complex
 querying to result in loss of events owing to Out of Memory Errors because
 of excessive querying.

 So, one elasticsearch node was master = true  and data = true which did
 the indexing (called the writer node) and the other node, was master =
 false and data = false (this was the workhorse or reader node) .

 I assumed that, in cases of excessive querying, although the data is
 stored on the writer node, the reader node will query the data and all the
 processing will take place on the reader as a result of which issues like
 out of memory error etc will be avoided and uninterrupted indexing will
 take place.

 However, while testing this, I realized that the reader hardly uses the
 heap memory ( Checked this in Marvel )  and when I fire a complex search
 query - which was a search request using the python API where the 'size'
 parameter was set to 1, the writer node throws an out of memory error,
 indicating that the processing also takes place on the writer node only. My
 min and max heap size was set to 256m  for this test. I also ensured that I
 was firing the search query to the port on which the reader node was
 listening (Port 9200). The writer node was running on Port 9201.

 Was my previous understanding of the problem incorrect - i.e. having one
 reader and one writer node, doesn't help in uninterrupted indexing of
 documents? If this is so, what is the use of having a separate workhorse or
 reader node?

 My eventual aim is to be able to query elasticsearch and fetch large
 amounts of data at a time without interrupting/slowing down the indexing of
 documents.

 Thank you.

 Rujuta

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHZZXV2xm6JrUuV8V-Sg1uLhehqQ68Bn_2SRpJ1ZAvuVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch configuration for uninterrupted indexing

2014-03-24 Thread Ivan Brusic
I do not think splitting the application into 2 separate JVMs will solve
your issues. Is the 2GB per JVM or the total of the machine? For analytic
applications, with multiples facets, 2 GBs might not be sufficient.

-- 
Ivan


On Sun, Mar 23, 2014 at 10:04 PM, Rujuta Deshpande rujd...@gmail.comwrote:

 Hi,

 Thank you for the response. However, in our scenario, both the nodes are
 on the same machine. Our setup doesn't allow us to have two separate
 machines for each node. Also, we're indexing logs using logstash.
 Sometimes, we have to query data from the logs over a period of two or
 three months and then, we're thrown an out of memory error. This affects
 the indexing that is simultaneously going on and we lose events.

 I'm not sure what configuration of elasticsearch will help achieve this.

 Thanks,
 Rujuta

 On Friday, March 21, 2014 10:36:51 PM UTC+5:30, Ivan Brusic wrote:

 One of the main usage of having a data-less node is that it would act as
 a coordinator between the other nodes. It will gather all the responses
 from the other nodes/shards and reduce them into one.

 In your case, the data-less node is gathering all the data from just one
 node. In other words, it is not doing much since the reduce phase is
 basically a pass-thru operation. With a two node cluster, I would say you
 are better off having both machines act as full nodes.

 Cheers,

 Ivan



 On Fri, Mar 21, 2014 at 5:04 AM, Rujuta Deshpande ruj...@gmail.comwrote:

 Hi,

 I am setting up a system consisting of elasticsearch-logstash-kibana for
 log analysis. I am using one machine (2 GB RAM, 2 CPUs) running logstash,
 kibana and  two instances of elasticsearch. Two other machines, each
 running  logstash-forwarder are pumping logs into the ELK system.

 The reasoning behind using two ES instances was this - I needed one
 uninterrupted instance to index the incoming logs and I also needed to
 query the currently existing indices. However, I didn't want any complex
 querying to result in loss of events owing to Out of Memory Errors because
 of excessive querying.

 So, one elasticsearch node was master = true  and data = true which did
 the indexing (called the writer node) and the other node, was master =
 false and data = false (this was the workhorse or reader node) .

 I assumed that, in cases of excessive querying, although the data is
 stored on the writer node, the reader node will query the data and all the
 processing will take place on the reader as a result of which issues like
 out of memory error etc will be avoided and uninterrupted indexing will
 take place.

 However, while testing this, I realized that the reader hardly uses the
 heap memory ( Checked this in Marvel )  and when I fire a complex search
 query - which was a search request using the python API where the 'size'
 parameter was set to 1, the writer node throws an out of memory error,
 indicating that the processing also takes place on the writer node only. My
 min and max heap size was set to 256m  for this test. I also ensured that I
 was firing the search query to the port on which the reader node was
 listening (Port 9200). The writer node was running on Port 9201.

 Was my previous understanding of the problem incorrect - i.e. having one
 reader and one writer node, doesn't help in uninterrupted indexing of
 documents? If this is so, what is the use of having a separate workhorse or
 reader node?

 My eventual aim is to be able to query elasticsearch and fetch large
 amounts of data at a time without interrupting/slowing down the indexing of
 documents.

 Thank you.

 Rujuta

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%
 40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/b552fc2c-1a22-49b5-b0a9-ddc54c134834%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b552fc2c-1a22-49b5-b0a9-ddc54c134834%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view 

Re: Elasticsearch configuration for uninterrupted indexing

2014-03-23 Thread Rujuta Deshpande
Hi, 

Thank you for the response. However, in our scenario, both the nodes are on 
the same machine. Our setup doesn't allow us to have two separate machines 
for each node. Also, we're indexing logs using logstash. Sometimes, we have 
to query data from the logs over a period of two or three months and then, 
we're thrown an out of memory error. This affects the indexing that is 
simultaneously going on and we lose events. 

I'm not sure what configuration of elasticsearch will help achieve this.

Thanks,
Rujuta

On Friday, March 21, 2014 10:36:51 PM UTC+5:30, Ivan Brusic wrote:

 One of the main usage of having a data-less node is that it would act as a 
 coordinator between the other nodes. It will gather all the responses from 
 the other nodes/shards and reduce them into one.

 In your case, the data-less node is gathering all the data from just one 
 node. In other words, it is not doing much since the reduce phase is 
 basically a pass-thru operation. With a two node cluster, I would say you 
 are better off having both machines act as full nodes.

 Cheers,

 Ivan



 On Fri, Mar 21, 2014 at 5:04 AM, Rujuta Deshpande 
 ruj...@gmail.comjavascript:
  wrote:

 Hi, 

 I am setting up a system consisting of elasticsearch-logstash-kibana for 
 log analysis. I am using one machine (2 GB RAM, 2 CPUs) running logstash, 
 kibana and  two instances of elasticsearch. Two other machines, each 
 running  logstash-forwarder are pumping logs into the ELK system. 

 The reasoning behind using two ES instances was this - I needed one 
 uninterrupted instance to index the incoming logs and I also needed to 
 query the currently existing indices. However, I didn't want any complex 
 querying to result in loss of events owing to Out of Memory Errors because 
 of excessive querying. 

 So, one elasticsearch node was master = true  and data = true which did 
 the indexing (called the writer node) and the other node, was master = 
 false and data = false (this was the workhorse or reader node) .

 I assumed that, in cases of excessive querying, although the data is 
 stored on the writer node, the reader node will query the data and all the 
 processing will take place on the reader as a result of which issues like 
 out of memory error etc will be avoided and uninterrupted indexing will 
 take place. 

 However, while testing this, I realized that the reader hardly uses the 
 heap memory ( Checked this in Marvel )  and when I fire a complex search 
 query - which was a search request using the python API where the 'size' 
 parameter was set to 1, the writer node throws an out of memory error, 
 indicating that the processing also takes place on the writer node only. My 
 min and max heap size was set to 256m  for this test. I also ensured that I 
 was firing the search query to the port on which the reader node was 
 listening (Port 9200). The writer node was running on Port 9201.  

 Was my previous understanding of the problem incorrect - i.e. having one 
 reader and one writer node, doesn't help in uninterrupted indexing of 
 documents? If this is so, what is the use of having a separate workhorse or 
 reader node? 

 My eventual aim is to be able to query elasticsearch and fetch large 
 amounts of data at a time without interrupting/slowing down the indexing of 
 documents. 

 Thank you. 

 Rujuta 

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b552fc2c-1a22-49b5-b0a9-ddc54c134834%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elasticsearch configuration for uninterrupted indexing

2014-03-21 Thread Rujuta Deshpande
Hi, 

I am setting up a system consisting of elasticsearch-logstash-kibana for 
log analysis. I am using one machine (2 GB RAM, 2 CPUs) running logstash, 
kibana and  two instances of elasticsearch. Two other machines, each 
running  logstash-forwarder are pumping logs into the ELK system. 

The reasoning behind using two ES instances was this - I needed one 
uninterrupted instance to index the incoming logs and I also needed to 
query the currently existing indices. However, I didn't want any complex 
querying to result in loss of events owing to Out of Memory Errors because 
of excessive querying. 

So, one elasticsearch node was master = true  and data = true which did the 
indexing (called the writer node) and the other node, was master = false 
and data = false (this was the workhorse or reader node) .

I assumed that, in cases of excessive querying, although the data is stored 
on the writer node, the reader node will query the data and all the 
processing will take place on the reader as a result of which issues like 
out of memory error etc will be avoided and uninterrupted indexing will 
take place. 

However, while testing this, I realized that the reader hardly uses the 
heap memory ( Checked this in Marvel )  and when I fire a complex search 
query - which was a search request using the python API where the 'size' 
parameter was set to 1, the writer node throws an out of memory error, 
indicating that the processing also takes place on the writer node only. My 
min and max heap size was set to 256m  for this test. I also ensured that I 
was firing the search query to the port on which the reader node was 
listening (Port 9200). The writer node was running on Port 9201.  

Was my previous understanding of the problem incorrect - i.e. having one 
reader and one writer node, doesn't help in uninterrupted indexing of 
documents? If this is so, what is the use of having a separate workhorse or 
reader node? 

My eventual aim is to be able to query elasticsearch and fetch large 
amounts of data at a time without interrupting/slowing down the indexing of 
documents. 

Thank you. 

Rujuta 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch configuration for uninterrupted indexing

2014-03-21 Thread Ivan Brusic
One of the main usage of having a data-less node is that it would act as a
coordinator between the other nodes. It will gather all the responses from
the other nodes/shards and reduce them into one.

In your case, the data-less node is gathering all the data from just one
node. In other words, it is not doing much since the reduce phase is
basically a pass-thru operation. With a two node cluster, I would say you
are better off having both machines act as full nodes.

Cheers,

Ivan



On Fri, Mar 21, 2014 at 5:04 AM, Rujuta Deshpande rujd...@gmail.com wrote:

 Hi,

 I am setting up a system consisting of elasticsearch-logstash-kibana for
 log analysis. I am using one machine (2 GB RAM, 2 CPUs) running logstash,
 kibana and  two instances of elasticsearch. Two other machines, each
 running  logstash-forwarder are pumping logs into the ELK system.

 The reasoning behind using two ES instances was this - I needed one
 uninterrupted instance to index the incoming logs and I also needed to
 query the currently existing indices. However, I didn't want any complex
 querying to result in loss of events owing to Out of Memory Errors because
 of excessive querying.

 So, one elasticsearch node was master = true  and data = true which did
 the indexing (called the writer node) and the other node, was master =
 false and data = false (this was the workhorse or reader node) .

 I assumed that, in cases of excessive querying, although the data is
 stored on the writer node, the reader node will query the data and all the
 processing will take place on the reader as a result of which issues like
 out of memory error etc will be avoided and uninterrupted indexing will
 take place.

 However, while testing this, I realized that the reader hardly uses the
 heap memory ( Checked this in Marvel )  and when I fire a complex search
 query - which was a search request using the python API where the 'size'
 parameter was set to 1, the writer node throws an out of memory error,
 indicating that the processing also takes place on the writer node only. My
 min and max heap size was set to 256m  for this test. I also ensured that I
 was firing the search query to the port on which the reader node was
 listening (Port 9200). The writer node was running on Port 9201.

 Was my previous understanding of the problem incorrect - i.e. having one
 reader and one writer node, doesn't help in uninterrupted indexing of
 documents? If this is so, what is the use of having a separate workhorse or
 reader node?

 My eventual aim is to be able to query elasticsearch and fetch large
 amounts of data at a time without interrupting/slowing down the indexing of
 documents.

 Thank you.

 Rujuta

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD25ipp5UFihLDqcqxqr1_4nMvngsNmedA73gLfjG_rcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.