Some questions on wikipedia river and cluster config

jorge canellas Tue, 18 Mar 2014 10:53:09 -0700

Hi!

I am trying to index the wikipedia dumps (downloaded and uncompressed) 
using the wikipedia river, but it takes about 6 hours in the cluster.
I have increased the number of nodes and primary shards from 5 to 8, but 
the performance does not increase.


   - I have set the refresh time to -1, the number of replicas to 0, 
   increased the amount of memory from 256MB/1GB to 3GB/4GB, allowed the 
   mlockall, and set the buffer to 30% from 10%.
   - I have changed the river settings increasing the bulk_size from 100 to 
   10k, refresh interval is set to 1s
   - I have changed the mapping to only index title and text. And set 
   _source to false.


I do not know what else can I modify to increase the indexing ratio, at 
this moment it is indexing about 120 docs/sec.

Any ideas?

Kind regards,

Jorge

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c9bd6437-3241-46a4-838e-5327067bc3cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Some questions on wikipedia river and cluster config

Reply via email to