Re: Elasticsearch configuration for uninterrupted indexing

Ivan Brusic Mon, 24 Mar 2014 09:35:25 -0700

I do not think splitting the application into 2 separate JVMs will solve
your issues. Is the 2GB per JVM or the total of the machine? For analytic
applications, with multiples facets, 2 GBs might not be sufficient.


-- 
Ivan


On Sun, Mar 23, 2014 at 10:04 PM, Rujuta Deshpande <rujd...@gmail.com>wrote:

> Hi,
>
> Thank you for the response. However, in our scenario, both the nodes are
> on the same machine. Our setup doesn't allow us to have two separate
> machines for each node. Also, we're indexing logs using logstash.
> Sometimes, we have to query data from the logs over a period of two or
> three months and then, we're thrown an out of memory error. This affects
> the indexing that is simultaneously going on and we lose events.
>
> I'm not sure what configuration of elasticsearch will help achieve this.
>
> Thanks,
> Rujuta
>
> On Friday, March 21, 2014 10:36:51 PM UTC+5:30, Ivan Brusic wrote:
>
>> One of the main usage of having a data-less node is that it would act as
>> a coordinator between the other nodes. It will gather all the responses
>> from the other nodes/shards and reduce them into one.
>>
>> In your case, the data-less node is gathering all the data from just one
>> node. In other words, it is not doing much since the reduce phase is
>> basically a pass-thru operation. With a two node cluster, I would say you
>> are better off having both machines act as full nodes.
>>
>> Cheers,
>>
>> Ivan
>>
>>
>>
>> On Fri, Mar 21, 2014 at 5:04 AM, Rujuta Deshpande <ruj...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I am setting up a system consisting of elasticsearch-logstash-kibana for
>>> log analysis. I am using one machine (2 GB RAM, 2 CPUs) running logstash,
>>> kibana and  two instances of elasticsearch. Two other machines, each
>>> running  logstash-forwarder are pumping logs into the ELK system.
>>>
>>> The reasoning behind using two ES instances was this - I needed one
>>> uninterrupted instance to index the incoming logs and I also needed to
>>> query the currently existing indices. However, I didn't want any complex
>>> querying to result in loss of events owing to Out of Memory Errors because
>>> of excessive querying.
>>>
>>> So, one elasticsearch node was master = true  and data = true which did
>>> the indexing (called the writer node) and the other node, was master =
>>> false and data = false (this was the workhorse or reader node) .
>>>
>>> I assumed that, in cases of excessive querying, although the data is
>>> stored on the writer node, the reader node will query the data and all the
>>> processing will take place on the reader as a result of which issues like
>>> out of memory error etc will be avoided and uninterrupted indexing will
>>> take place.
>>>
>>> However, while testing this, I realized that the reader hardly uses the
>>> heap memory ( Checked this in Marvel )  and when I fire a complex search
>>> query - which was a search request using the python API where the 'size'
>>> parameter was set to 10000, the writer node throws an out of memory error,
>>> indicating that the processing also takes place on the writer node only. My
>>> min and max heap size was set to 256m  for this test. I also ensured that I
>>> was firing the search query to the port on which the reader node was
>>> listening (Port 9200). The writer node was running on Port 9201.
>>>
>>> Was my previous understanding of the problem incorrect - i.e. having one
>>> reader and one writer node, doesn't help in uninterrupted indexing of
>>> documents? If this is so, what is the use of having a separate workhorse or
>>> reader node?
>>>
>>> My eventual aim is to be able to query elasticsearch and fetch large
>>> amounts of data at a time without interrupting/slowing down the indexing of
>>> documents.
>>>
>>> Thank you.
>>>
>>> Rujuta
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b552fc2c-1a22-49b5-b0a9-ddc54c134834%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/b552fc2c-1a22-49b5-b0a9-ddc54c134834%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBo8Tux5%3D8BWFrHtEGU6z83jjUjmkdXN8HF-L21MvQgcg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch configuration for uninterrupted indexing

Reply via email to