Re: Bulk Indexing Problems

joergpra...@gmail.com Tue, 09 Sep 2014 09:54:02 -0700

Let ES_HEAP_SIZE at least to 1 GB, for smaller heaps like 512m and indexing
around 1 million docs, you need some more fine tuning, which is
complicated. Your machine is ok to set the heap to 4 GB which is 50% of 8
GB RAM.


Jörg

On Tue, Sep 9, 2014 at 5:39 PM, Joshua P <jpetersen...@gmail.com> wrote:

> Here is /etc/default/elasticsearch
>
> # Run Elasticsearch as this user ID and group ID
> #ES_USER=elasticsearch
> #ES_GROUP=elasticsearch
>
> # Heap Size (defaults to 256m min, 1g max)
> ES_HEAP_SIZE=512m
>
> # Heap new generation
> #ES_HEAP_NEWSIZE=
>
> # max direct memory
> #ES_DIRECT_SIZE=
>
> # Maximum number of open files, defaults to 65535.
> MAX_OPEN_FILES=65535
>
> # Maximum locked memory size. Set to "unlimited" if you use the
> # bootstrap.mlockall option in elasticsearch.yml. You must also set
> # ES_HEAP_SIZE.
> MAX_LOCKED_MEMORY=unlimited
>
> # Maximum number of VMA (Virtual Memory Areas) a process can own
> #MAX_MAP_COUNT=262144
>
> # Elasticsearch log directory
> #LOG_DIR=/var/log/elasticsearch
>
> # Elasticsearch data directory
> #DATA_DIR=/var/lib/elasticsearch
>
> # Elasticsearch work directory
> #WORK_DIR=/tmp/elasticsearch
>
> # Elasticsearch configuration directory
> #CONF_DIR=/etc/elasticsearch
>
> # Elasticsearch configuration file (elasticsearch.yml)
> #CONF_FILE=/etc/elasticsearch/elasticsearch.yml
>
> # Additional Java OPTS
> #ES_JAVA_OPTS=
>
> # Configure restart on package upgrade (true, every other setting will
> lead to not restarting)
> #RESTART_ON_UPGRADE=true
>
> I also see the same setting in /etc/init.d/elasticsearch. Do you know
> which file takes priority? And what a good size would be?
>
> On Tuesday, September 9, 2014 11:32:19 AM UTC-4, vineeth mohan wrote:
>>
>> Hello Joshua ,
>>
>> I am not sure which variable you are referring to on the memory settings
>> in the config file , please paste the comment and config.
>> I usually change the config from init.d script.
>>
>> Best approach would be to bulk index say 10,000 feeds in sync mode , wait
>> until is everything is indexed and then proceed to the next batch.
>> I am not sure about the java API , but long back i used to curl to this
>> stats API and see how much request was rejected.
>>
>> Thanks
>>           Vineeth
>>
>> On Tue, Sep 9, 2014 at 8:58 PM, Joshua P <jpeter...@gmail.com> wrote:
>>
>>> You also said you wouldn't recommend indexing that much information at
>>> once. How would you suggest breaking it up and what status should I look
>>> for before doing another batch? I have to come up with some process that is
>>> repeatable and mostly automated.
>>>
>>> On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote:
>>>>
>>>> Thanks for the reply, Vineeth!
>>>>
>>>> What's a practical heap size? I've seen some people saying they set it
>>>> to 30gb but this confuses me because in the /etc/default/elasticsearch
>>>> file, the comment suggests the max is only 1gb?
>>>>
>>>> I'll look into the threadpool issue. Is there a Java API for monitoring
>>>> Cluster Node health? Can you point me at an example or give me a link to
>>>> that?
>>>>
>>>> Thanks!
>>>>
>>>> On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan wrote:
>>>>>
>>>>> Hello Joshuva ,
>>>>>
>>>>> I have a feeling this has something to do with the threadpool.
>>>>> There is a limit on number of feeds to be queued for indexing.
>>>>>
>>>>> Try increasing the size of threadpool queue of index and bulk to a
>>>>> large number.
>>>>> Also through cluster node API on threadpool, you can see if any
>>>>> request has failed.
>>>>> Monitor this API for any failed request due to large volume.
>>>>>
>>>>> Threadpool - http://www.elasticsearch.org/guide/en/elasticsearch/
>>>>> reference/current/modules-threadpool.html
>>>>> Threadpool stats - http://www.elasticsearch.org
>>>>> /guide/en/elasticsearch/reference/current/cluster-nodes-stats.html
>>>>>
>>>>> Having said that , i wont recommend bulk indexing that much
>>>>> information at a time and 512 MB is not going to help much.
>>>>>
>>>>> Thanks
>>>>>           Vineeth
>>>>>
>>>>> On Tue, Sep 9, 2014 at 7:48 PM, Joshua P <jpeter...@gmail.com> wrote:
>>>>>
>>>>>> Hi there!
>>>>>>
>>>>>> I'm trying to do a one-time index of about 800,000 records into an
>>>>>> instance of elasticsearch. But I'm having a bit of trouble. It 
>>>>>> continually
>>>>>> fails around 200,000 records. Looking at in the Elasticsearch Head 
>>>>>> Plugin,
>>>>>> my index goes offline and becomes unrecoverable.
>>>>>>
>>>>>> For now, I have it running on a VM on my personal machine.
>>>>>>
>>>>>> VM Config:
>>>>>> Ubuntu Server 14.04 64-Bit
>>>>>> 8 GB RAM
>>>>>> 2 Processors
>>>>>> 32 GB SSD
>>>>>>
>>>>>> Java
>>>>>> java version "1.7.0_65"
>>>>>> OpenJDK Runtime Environment (IcedTea 2.5.1)
>>>>>> (7u65-2.5.1-4ubuntu1~0.14.04.2)
>>>>>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
>>>>>>
>>>>>> Elasticsearch is using mostly the defaults. This is the output of:
>>>>>> curl http://localhost:9200/_nodes/process?pretty
>>>>>> {
>>>>>>   "cluster_name" : "property_transaction_data",
>>>>>>   "nodes" : {
>>>>>>     "KlFkO_qgSOKmV_jjj5xeVw" : {
>>>>>>       "name" : "Marvin Flumm",
>>>>>>       "transport_address" : "inet[/192.168.133.131:9300]",
>>>>>>       "host" : "ubuntu-es",
>>>>>>       "ip" : "127.0.1.1",
>>>>>>       "version" : "1.3.2",
>>>>>>       "build" : "dee175d",
>>>>>>       "http_address" : "inet[/192.168.133.131:9200]",
>>>>>>       "process" : {
>>>>>>         "refresh_interval_in_millis" : 1000,
>>>>>>         "id" : 1092,
>>>>>>         "max_file_descriptors" : 65535,
>>>>>>         "mlockall" : true
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> I adjusted ES_HEAP_SIZE to 512mb.
>>>>>>
>>>>>> I'm using the following code to pull data from SQL Server and index
>>>>>> it.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40goo
>>>>>> glegroups.com
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE38FnrB-4k59PdF86cQVX-FGv-%2BH9eT%2B4L2eyT8NXu1w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk Indexing Problems

Reply via email to