Re: Bulk Indexing Problems

Joshua P Tue, 09 Sep 2014 10:50:44 -0700

Just reran the indexer and found this error coming up. I'm running out of 
disk space on the partition ES wants to write to.


F38KqHhnRDWtiJCss5Wz0g -- INTERNAL_SERVER_ERROR -- 
TranslogException[[index_type][0] Failed to write operation 
[org.elasticsearch.index.translog.Translog$Create@6f1f6b1e]]; nested: 
IOException[No space left on device];  -- index_type

Where would I change the write location? Which config file? 

On Tuesday, September 9, 2014 1:28:21 PM UTC-4, Joshua P wrote:
>
> Hi Jörg, 
>
> Can you elaborate on what you mean by I still need more fine tuning? 
>
> I've upped the heap size to 4g (in both places I mentioned before because 
> it's not clear to me which one ES actually uses). I haven't tried to index 
> again yet. 
> Other than throttling my indexing, what are some other things I need to be 
> thinking about? 
>
> On Tuesday, September 9, 2014 12:53:35 PM UTC-4, Jörg Prante wrote:
>>
>> Let ES_HEAP_SIZE at least to 1 GB, for smaller heaps like 512m and 
>> indexing around 1 million docs, you need some more fine tuning, which is 
>> complicated. Your machine is ok to set the heap to 4 GB which is 50% of 8 
>> GB RAM.
>>
>> Jörg
>>
>> On Tue, Sep 9, 2014 at 5:39 PM, Joshua P <jpeter...@gmail.com> wrote:
>>
>>> Here is /etc/default/elasticsearch
>>>
>>> # Run Elasticsearch as this user ID and group ID
>>> #ES_USER=elasticsearch
>>> #ES_GROUP=elasticsearch
>>>
>>> # Heap Size (defaults to 256m min, 1g max)
>>> ES_HEAP_SIZE=512m
>>>
>>> # Heap new generation
>>> #ES_HEAP_NEWSIZE=
>>>
>>> # max direct memory
>>> #ES_DIRECT_SIZE=
>>>
>>> # Maximum number of open files, defaults to 65535.
>>> MAX_OPEN_FILES=65535
>>>
>>> # Maximum locked memory size. Set to "unlimited" if you use the
>>> # bootstrap.mlockall option in elasticsearch.yml. You must also set
>>> # ES_HEAP_SIZE.
>>> MAX_LOCKED_MEMORY=unlimited
>>>
>>> # Maximum number of VMA (Virtual Memory Areas) a process can own
>>> #MAX_MAP_COUNT=262144
>>>
>>> # Elasticsearch log directory
>>> #LOG_DIR=/var/log/elasticsearch
>>>
>>> # Elasticsearch data directory
>>> #DATA_DIR=/var/lib/elasticsearch
>>>
>>> # Elasticsearch work directory
>>> #WORK_DIR=/tmp/elasticsearch
>>>
>>> # Elasticsearch configuration directory
>>> #CONF_DIR=/etc/elasticsearch
>>>
>>> # Elasticsearch configuration file (elasticsearch.yml)
>>> #CONF_FILE=/etc/elasticsearch/elasticsearch.yml
>>>
>>> # Additional Java OPTS
>>> #ES_JAVA_OPTS=
>>>
>>> # Configure restart on package upgrade (true, every other setting will 
>>> lead to not restarting)
>>> #RESTART_ON_UPGRADE=true
>>>
>>> I also see the same setting in /etc/init.d/elasticsearch. Do you know 
>>> which file takes priority? And what a good size would be? 
>>>
>>> On Tuesday, September 9, 2014 11:32:19 AM UTC-4, vineeth mohan wrote:
>>>>
>>>> Hello Joshua , 
>>>>
>>>> I am not sure which variable you are referring to on the memory 
>>>> settings in the config file , please paste the comment and config.
>>>> I usually change the config from init.d script.
>>>>
>>>> Best approach would be to bulk index say 10,000 feeds in sync mode , 
>>>> wait until is everything is indexed and then proceed to the next batch.
>>>> I am not sure about the java API , but long back i used to curl to this 
>>>> stats API and see how much request was rejected.
>>>>
>>>> Thanks
>>>>           Vineeth
>>>>
>>>> On Tue, Sep 9, 2014 at 8:58 PM, Joshua P <jpeter...@gmail.com> wrote:
>>>>
>>>>> You also said you wouldn't recommend indexing that much information at 
>>>>> once. How would you suggest breaking it up and what status should I look 
>>>>> for before doing another batch? I have to come up with some process that 
>>>>> is 
>>>>> repeatable and mostly automated. 
>>>>>
>>>>> On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote:
>>>>>>
>>>>>> Thanks for the reply, Vineeth! 
>>>>>>
>>>>>> What's a practical heap size? I've seen some people saying they set 
>>>>>> it to 30gb but this confuses me because in the 
>>>>>> /etc/default/elasticsearch 
>>>>>> file, the comment suggests the max is only 1gb? 
>>>>>>
>>>>>> I'll look into the threadpool issue. Is there a Java API for 
>>>>>> monitoring Cluster Node health? Can you point me at an example or give 
>>>>>> me a 
>>>>>> link to that? 
>>>>>>
>>>>>> Thanks! 
>>>>>>
>>>>>> On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan wrote:
>>>>>>>
>>>>>>> Hello Joshuva ,
>>>>>>>
>>>>>>> I have a feeling this has something to do with the threadpool.
>>>>>>> There is a limit on number of feeds to be queued for indexing.
>>>>>>>
>>>>>>> Try increasing the size of threadpool queue of index and bulk to a 
>>>>>>> large number.
>>>>>>> Also through cluster node API on threadpool, you can see if any 
>>>>>>> request has failed.
>>>>>>> Monitor this API for any failed request due to large volume.
>>>>>>>
>>>>>>> Threadpool - http://www.elasticsearch.org/guide/en/elasticsearch/
>>>>>>> reference/current/modules-threadpool.html
>>>>>>> Threadpool stats - http://www.elasticsearch.org
>>>>>>> /guide/en/elasticsearch/reference/current/cluster-nodes-stats.html
>>>>>>>
>>>>>>> Having said that , i wont recommend bulk indexing that much 
>>>>>>> information at a time and 512 MB is not going to help much.
>>>>>>>
>>>>>>> Thanks
>>>>>>>           Vineeth
>>>>>>>
>>>>>>> On Tue, Sep 9, 2014 at 7:48 PM, Joshua P <jpeter...@gmail.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi there! 
>>>>>>>>
>>>>>>>> I'm trying to do a one-time index of about 800,000 records into an 
>>>>>>>> instance of elasticsearch. But I'm having a bit of trouble. It 
>>>>>>>> continually 
>>>>>>>> fails around 200,000 records. Looking at in the Elasticsearch Head 
>>>>>>>> Plugin, 
>>>>>>>> my index goes offline and becomes unrecoverable. 
>>>>>>>>
>>>>>>>> For now, I have it running on a VM on my personal machine. 
>>>>>>>>
>>>>>>>> VM Config: 
>>>>>>>> Ubuntu Server 14.04 64-Bit
>>>>>>>> 8 GB RAM
>>>>>>>> 2 Processors
>>>>>>>> 32 GB SSD
>>>>>>>>
>>>>>>>> Java
>>>>>>>> java version "1.7.0_65"
>>>>>>>> OpenJDK Runtime Environment (IcedTea 2.5.1) 
>>>>>>>> (7u65-2.5.1-4ubuntu1~0.14.04.2)
>>>>>>>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
>>>>>>>>
>>>>>>>> Elasticsearch is using mostly the defaults. This is the output of: 
>>>>>>>> curl http://localhost:9200/_nodes/process?pretty
>>>>>>>> {
>>>>>>>>   "cluster_name" : "property_transaction_data",
>>>>>>>>   "nodes" : {
>>>>>>>>     "KlFkO_qgSOKmV_jjj5xeVw" : {
>>>>>>>>       "name" : "Marvin Flumm",
>>>>>>>>       "transport_address" : "inet[/192.168.133.131:9300]",
>>>>>>>>       "host" : "ubuntu-es",
>>>>>>>>       "ip" : "127.0.1.1",
>>>>>>>>       "version" : "1.3.2",
>>>>>>>>       "build" : "dee175d",
>>>>>>>>       "http_address" : "inet[/192.168.133.131:9200]",
>>>>>>>>       "process" : {
>>>>>>>>         "refresh_interval_in_millis" : 1000,
>>>>>>>>         "id" : 1092,
>>>>>>>>         "max_file_descriptors" : 65535,
>>>>>>>>         "mlockall" : true
>>>>>>>>       }
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>> I adjusted ES_HEAP_SIZE to 512mb. 
>>>>>>>>
>>>>>>>> I'm using the following code to pull data from SQL Server and index 
>>>>>>>> it. 
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "elasticsearch" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3
>>>>>>>> f-462f-bdcf-df717cbc6269%40googlegroups.com 
>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%
>>>>> 40googlegroups.com 
>>>>> <https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1765489f-d2f5-47c5-a499-9633c9be54e2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk Indexing Problems

Reply via email to