Thanks guys.

I've made some changes to my bulk indexing.  I'm now kicking off java bulk 
loaders with 8 threads a piece on 3 of our 11 servers.  This initially did 
not help, so I went in and checked out the hot_threads in ElasticHQ.  
Virtually all CPU was being allocated to building SpatialPrefixTrees!  I 
changed my geoshape resolution from 1KM to 10KM on the index and began 
reindexing.  I'm now hitting 125 million records/hour over the past 20 
minutes!  What's more, indexing speed has remained relatively constant over 
the load!

What doesn't make sense to me is that the building of SpatialIndexTrees 
should be roughly CPU constant over the course of the bulk index, and 
performance was degrading dramatically as the index got bigger.  Has anyone 
else experienced this problem??  Where before it was taking 10-12 hours to 
index the data, it will now likely finish indexing within the hour.  That 
seems like an awfully big difference (though it might also have to do with 
the new Java loaders)...

On Wednesday, March 12, 2014 5:35:31 PM UTC-4, Mark Walkom wrote:
>
> Use a plugin like Marvel or ElasticHQ.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com <javascript:>
> web: www.campaignmonitor.com
>
>
> On 12 March 2014 23:29, Elliott Bradshaw <ebrad...@gmail.com <javascript:>
> > wrote:
>
>> Thanks Binh, Mark.
>>
>> I'm using Oracle's Java 7 (1.7.0_51).  I will try to upgrade to 
>> Elasticsearch 1.0.1 if possible.
>>
>> It could definitely be a disk speed issue.  Unfortunately, we're working 
>> in a virtualized environment and cannot upgrade to SSD storage.
>>
>> What utility do you use for gc collection times, hot threads?
>>
>> Thanks!
>>
>> On Tuesday, March 11, 2014 6:13:51 PM UTC-4, Mark Walkom wrote:
>>>
>>> What java version are you using?
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 12 March 2014 00:34, Elliott Bradshaw <ebrad...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We are currently attempting to optimize our configuration for a static 
>>>> index of roughly 120 million records.  In time, this index will probably 
>>>> be 
>>>> much larger, but for now this is the working set.  We've been playing 
>>>> around with Elasticsearch for several months now, and have made great 
>>>> progress with performance tuning.  However, we still run into issues which 
>>>> leave us scratching our heads.  One such issue is an unexpected indexing 
>>>> speed drop as the index grows.
>>>>
>>>> We are working on an 11 node cluster.  Each node has 8 CPUs and 16G of 
>>>> memory.  Heap size of each JVM is set to min/max of 8G.  Vm.swappiness has 
>>>> been set to 0 on all of the systems, as they are being used solely for 
>>>> Elasticsearch.  The Elasticsearch version is 0.90.7.  We are focusing on 
>>>> loading a single index, and it has been initialized with 48 shards, with a 
>>>> refresh interval of 120 seconds.  We're currently using Elasticsearch HQ 
>>>> for real time monitoring of the system state, along with linux utils like 
>>>> top, iotop and iftop.  Everything appears to be in order.
>>>>
>>>> Frequently we have to reindex the entire dataset as we are working in a 
>>>> development environment and are still determining how best to structure 
>>>> the 
>>>> dataset.  We are indexing via a batch load script that fires off 10,000 
>>>> record curl requests to the _bulk endpoint.  We partition the entire 
>>>> dataset between three servers and run the batch load script simultaneously 
>>>> on each one.
>>>>
>>>> At first, this appears to work great.  Initial indexing speeds are 
>>>> roughly 50 million/hour, which would load the entire dataset in a little 
>>>> over 2 hours.  However, once the index approaches 20 million records, 
>>>> indexing performance drops significantly (down to roughly 10 
>>>> million/hour).  As the index continues to grow, performance continues to 
>>>> degrade, and I have seen it drop as low as less than 1 million records per 
>>>> hour.  All in all, it takes nearly a day to index the entire dataset of 
>>>> 120 
>>>> million records.
>>>>
>>>> I was hoping that the community might be able to offer some advice as 
>>>> to what we might be doing wrong, or suggest other diagnostic approaches.  
>>>> We're really trying to ratchet this system up to prepare it for production 
>>>> mode, and are currently left scratching our heads.  Any thoughts, 
>>>> opinions, 
>>>> or tips would be greatly appreciated.
>>>>
>>>> Thanks!
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/98958587-eaf9-4451-84ee-78c38e7eab42%
>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/98958587-eaf9-4451-84ee-78c38e7eab42%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/e27693ac-6c6b-45f8-87cc-690f1077a49f%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/e27693ac-6c6b-45f8-87cc-690f1077a49f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39b1d6d0-fbb3-4ed4-b9e3-88bbb6826365%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to