So we've been using ES for a while now, and I have an architecture I've set 
up that I'm absolutely not 100% sure is right. I'd like to lay it out and 
see if anyone can tell me where I might be going wrong.

We have, as our data set, roughly 10 million documents. Each one represents 
a product and then a bunch of data on that product suitable for queries. 
Our queries are pretty good (because someone else writes them :-)) and we 
get the results we want.

We have five nodes. Three are in one data center (call it data center M) 
and two are in another (call it data center B). There is a nice, fat pipe 
between the two so communication is acceptable.

I replicate every shard on every node. We have plenty of disk space, the 
data set isn't so huge that it fills up memory, and I really do want to 
optimize for reads. The reason for that is that we re-load our index once 
per day in the middle of the night.

To do this, I create a new index, load all the data, and then move an index 
alias from the old to the new. No downtime. I wrote a job that loads the 
data via the bulk API. I'm pretty happy with this, too.

In the M data center, machine M1 is the one I use to load the data. It is 
NOT in our load balancing rotation for reads. Machines M2 and M3 are, as 
are both machines in data center B.

All M machines are master=true data=true. All B machines are master=false 
data=true. The reason I made B machines master=false was so that while 
building the new index nightly on M1, it doesn't have to go to a B machine 
as the master. I presume this is wise. I'm not sure.

I write in batches of 2000 documents and get about 1300 documents per 
second on write speeds.

I also have ONE job that does scripted upserts in batches of 1000 each that 
gets about 300 documents per second. This is slower than I'd like. I'm 
unsure how I might speed this up.

So... anything stand out as bad?

Could I maybe speed up writes by turning replication off while writing and 
then back on when done, so that my cluster isn't updating every node during 
the writes? Since I keep the index alias pointed at the previous index 
until the new one is ready, this should be okay, right?

Anything I might be missing?

THANK YOU TONS if you can chime in. ES is wonderful, but as we all know, 
there's a lot to learn!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9348104d-efa7-42ae-baac-f1c63d849e6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to