Hi! I am trying to index the wikipedia dumps (downloaded and uncompressed) using the wikipedia river, but it takes about 6 hours in the cluster. I have increased the number of nodes and primary shards from 5 to 8, but the performance does not increase.
- I have set the refresh time to -1, the number of replicas to 0, increased the amount of memory from 256MB/1GB to 3GB/4GB, allowed the mlockall, and set the buffer to 30% from 10%. - I have changed the river settings increasing the bulk_size from 100 to 10k, refresh interval is set to 1s - I have changed the mapping to only index title and text. And set _source to false. I do not know what else can I modify to increase the indexing ratio, at this moment it is indexing about 120 docs/sec. Any ideas? Kind regards, Jorge -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c9bd6437-3241-46a4-838e-5327067bc3cc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.