Re: Bulk Indexing Problems

vineeth mohan Tue, 09 Sep 2014 08:32:42 -0700

Hello Joshua ,

I am not sure which variable you are referring to on the memory settings in
the config file , please paste the comment and config.
I usually change the config from init.d script.


Best approach would be to bulk index say 10,000 feeds in sync mode , wait
until is everything is indexed and then proceed to the next batch.
I am not sure about the java API , but long back i used to curl to this
stats API and see how much request was rejected.

Thanks
          Vineeth

On Tue, Sep 9, 2014 at 8:58 PM, Joshua P <jpetersen...@gmail.com> wrote:

> You also said you wouldn't recommend indexing that much information at
> once. How would you suggest breaking it up and what status should I look
> for before doing another batch? I have to come up with some process that is
> repeatable and mostly automated.
>
> On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote:
>>
>> Thanks for the reply, Vineeth!
>>
>> What's a practical heap size? I've seen some people saying they set it to
>> 30gb but this confuses me because in the /etc/default/elasticsearch file,
>> the comment suggests the max is only 1gb?
>>
>> I'll look into the threadpool issue. Is there a Java API for monitoring
>> Cluster Node health? Can you point me at an example or give me a link to
>> that?
>>
>> Thanks!
>>
>> On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan wrote:
>>>
>>> Hello Joshuva ,
>>>
>>> I have a feeling this has something to do with the threadpool.
>>> There is a limit on number of feeds to be queued for indexing.
>>>
>>> Try increasing the size of threadpool queue of index and bulk to a large
>>> number.
>>> Also through cluster node API on threadpool, you can see if any request
>>> has failed.
>>> Monitor this API for any failed request due to large volume.
>>>
>>> Threadpool - http://www.elasticsearch.org/guide/en/elasticsearch/
>>> reference/current/modules-threadpool.html
>>> Threadpool stats - http://www.elasticsearch.org/guide/en/elasticsearch/
>>> reference/current/cluster-nodes-stats.html
>>>
>>> Having said that , i wont recommend bulk indexing that much information
>>> at a time and 512 MB is not going to help much.
>>>
>>> Thanks
>>>           Vineeth
>>>
>>> On Tue, Sep 9, 2014 at 7:48 PM, Joshua P <jpeter...@gmail.com> wrote:
>>>
>>>> Hi there!
>>>>
>>>> I'm trying to do a one-time index of about 800,000 records into an
>>>> instance of elasticsearch. But I'm having a bit of trouble. It continually
>>>> fails around 200,000 records. Looking at in the Elasticsearch Head Plugin,
>>>> my index goes offline and becomes unrecoverable.
>>>>
>>>> For now, I have it running on a VM on my personal machine.
>>>>
>>>> VM Config:
>>>> Ubuntu Server 14.04 64-Bit
>>>> 8 GB RAM
>>>> 2 Processors
>>>> 32 GB SSD
>>>>
>>>> Java
>>>> java version "1.7.0_65"
>>>> OpenJDK Runtime Environment (IcedTea 2.5.1)
>>>> (7u65-2.5.1-4ubuntu1~0.14.04.2)
>>>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
>>>>
>>>> Elasticsearch is using mostly the defaults. This is the output of:
>>>> curl http://localhost:9200/_nodes/process?pretty
>>>> {
>>>>   "cluster_name" : "property_transaction_data",
>>>>   "nodes" : {
>>>>     "KlFkO_qgSOKmV_jjj5xeVw" : {
>>>>       "name" : "Marvin Flumm",
>>>>       "transport_address" : "inet[/192.168.133.131:9300]",
>>>>       "host" : "ubuntu-es",
>>>>       "ip" : "127.0.1.1",
>>>>       "version" : "1.3.2",
>>>>       "build" : "dee175d",
>>>>       "http_address" : "inet[/192.168.133.131:9200]",
>>>>       "process" : {
>>>>         "refresh_interval_in_millis" : 1000,
>>>>         "id" : 1092,
>>>>         "max_file_descriptors" : 65535,
>>>>         "mlockall" : true
>>>>       }
>>>>     }
>>>>   }
>>>> }
>>>>
>>>> I adjusted ES_HEAP_SIZE to 512mb.
>>>>
>>>> I'm using the following code to pull data from SQL Server and index it.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%
>>>> 40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5myvEj22pDn%3DetpS1gL-6cwthg2Cv6m_omy6_fe2YFFgw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk Indexing Problems

Reply via email to