Let ES_HEAP_SIZE at least to 1 GB, for smaller heaps like 512m and indexing around 1 million docs, you need some more fine tuning, which is complicated. Your machine is ok to set the heap to 4 GB which is 50% of 8 GB RAM.
Jörg On Tue, Sep 9, 2014 at 5:39 PM, Joshua P <jpetersen...@gmail.com> wrote: > Here is /etc/default/elasticsearch > > # Run Elasticsearch as this user ID and group ID > #ES_USER=elasticsearch > #ES_GROUP=elasticsearch > > # Heap Size (defaults to 256m min, 1g max) > ES_HEAP_SIZE=512m > > # Heap new generation > #ES_HEAP_NEWSIZE= > > # max direct memory > #ES_DIRECT_SIZE= > > # Maximum number of open files, defaults to 65535. > MAX_OPEN_FILES=65535 > > # Maximum locked memory size. Set to "unlimited" if you use the > # bootstrap.mlockall option in elasticsearch.yml. You must also set > # ES_HEAP_SIZE. > MAX_LOCKED_MEMORY=unlimited > > # Maximum number of VMA (Virtual Memory Areas) a process can own > #MAX_MAP_COUNT=262144 > > # Elasticsearch log directory > #LOG_DIR=/var/log/elasticsearch > > # Elasticsearch data directory > #DATA_DIR=/var/lib/elasticsearch > > # Elasticsearch work directory > #WORK_DIR=/tmp/elasticsearch > > # Elasticsearch configuration directory > #CONF_DIR=/etc/elasticsearch > > # Elasticsearch configuration file (elasticsearch.yml) > #CONF_FILE=/etc/elasticsearch/elasticsearch.yml > > # Additional Java OPTS > #ES_JAVA_OPTS= > > # Configure restart on package upgrade (true, every other setting will > lead to not restarting) > #RESTART_ON_UPGRADE=true > > I also see the same setting in /etc/init.d/elasticsearch. Do you know > which file takes priority? And what a good size would be? > > On Tuesday, September 9, 2014 11:32:19 AM UTC-4, vineeth mohan wrote: >> >> Hello Joshua , >> >> I am not sure which variable you are referring to on the memory settings >> in the config file , please paste the comment and config. >> I usually change the config from init.d script. >> >> Best approach would be to bulk index say 10,000 feeds in sync mode , wait >> until is everything is indexed and then proceed to the next batch. >> I am not sure about the java API , but long back i used to curl to this >> stats API and see how much request was rejected. >> >> Thanks >> Vineeth >> >> On Tue, Sep 9, 2014 at 8:58 PM, Joshua P <jpeter...@gmail.com> wrote: >> >>> You also said you wouldn't recommend indexing that much information at >>> once. How would you suggest breaking it up and what status should I look >>> for before doing another batch? I have to come up with some process that is >>> repeatable and mostly automated. >>> >>> On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote: >>>> >>>> Thanks for the reply, Vineeth! >>>> >>>> What's a practical heap size? I've seen some people saying they set it >>>> to 30gb but this confuses me because in the /etc/default/elasticsearch >>>> file, the comment suggests the max is only 1gb? >>>> >>>> I'll look into the threadpool issue. Is there a Java API for monitoring >>>> Cluster Node health? Can you point me at an example or give me a link to >>>> that? >>>> >>>> Thanks! >>>> >>>> On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan wrote: >>>>> >>>>> Hello Joshuva , >>>>> >>>>> I have a feeling this has something to do with the threadpool. >>>>> There is a limit on number of feeds to be queued for indexing. >>>>> >>>>> Try increasing the size of threadpool queue of index and bulk to a >>>>> large number. >>>>> Also through cluster node API on threadpool, you can see if any >>>>> request has failed. >>>>> Monitor this API for any failed request due to large volume. >>>>> >>>>> Threadpool - http://www.elasticsearch.org/guide/en/elasticsearch/ >>>>> reference/current/modules-threadpool.html >>>>> Threadpool stats - http://www.elasticsearch.org >>>>> /guide/en/elasticsearch/reference/current/cluster-nodes-stats.html >>>>> >>>>> Having said that , i wont recommend bulk indexing that much >>>>> information at a time and 512 MB is not going to help much. >>>>> >>>>> Thanks >>>>> Vineeth >>>>> >>>>> On Tue, Sep 9, 2014 at 7:48 PM, Joshua P <jpeter...@gmail.com> wrote: >>>>> >>>>>> Hi there! >>>>>> >>>>>> I'm trying to do a one-time index of about 800,000 records into an >>>>>> instance of elasticsearch. But I'm having a bit of trouble. It >>>>>> continually >>>>>> fails around 200,000 records. Looking at in the Elasticsearch Head >>>>>> Plugin, >>>>>> my index goes offline and becomes unrecoverable. >>>>>> >>>>>> For now, I have it running on a VM on my personal machine. >>>>>> >>>>>> VM Config: >>>>>> Ubuntu Server 14.04 64-Bit >>>>>> 8 GB RAM >>>>>> 2 Processors >>>>>> 32 GB SSD >>>>>> >>>>>> Java >>>>>> java version "1.7.0_65" >>>>>> OpenJDK Runtime Environment (IcedTea 2.5.1) >>>>>> (7u65-2.5.1-4ubuntu1~0.14.04.2) >>>>>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode) >>>>>> >>>>>> Elasticsearch is using mostly the defaults. This is the output of: >>>>>> curl http://localhost:9200/_nodes/process?pretty >>>>>> { >>>>>> "cluster_name" : "property_transaction_data", >>>>>> "nodes" : { >>>>>> "KlFkO_qgSOKmV_jjj5xeVw" : { >>>>>> "name" : "Marvin Flumm", >>>>>> "transport_address" : "inet[/192.168.133.131:9300]", >>>>>> "host" : "ubuntu-es", >>>>>> "ip" : "127.0.1.1", >>>>>> "version" : "1.3.2", >>>>>> "build" : "dee175d", >>>>>> "http_address" : "inet[/192.168.133.131:9200]", >>>>>> "process" : { >>>>>> "refresh_interval_in_millis" : 1000, >>>>>> "id" : 1092, >>>>>> "max_file_descriptors" : 65535, >>>>>> "mlockall" : true >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> I adjusted ES_HEAP_SIZE to 512mb. >>>>>> >>>>>> I'm using the following code to pull data from SQL Server and index >>>>>> it. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>>> msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40goo >>>>>> glegroups.com >>>>>> <https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE38FnrB-4k59PdF86cQVX-FGv-%2BH9eT%2B4L2eyT8NXu1w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.