Rip apart my cluster architecture, would you? ;)

Christopher Ambler Tue, 18 Nov 2014 08:25:46 -0800

So we've been using ES for a while now, and I have an architecture I've set 
up that I'm absolutely not 100% sure is right. I'd like to lay it out and 
see if anyone can tell me where I might be going wrong.

We have, as our data set, roughly 10 million documents. Each one represents
a product and then a bunch of data on that product suitable for queries.
Our queries are pretty good (because someone else writes them :-)) and we
get the results we want.

We have five nodes. Three are in one data center (call it data center M)
and two are in another (call it data center B). There is a nice, fat pipe
between the two so communication is acceptable.

I replicate every shard on every node. We have plenty of disk space, the
data set isn't so huge that it fills up memory, and I really do want to
optimize for reads. The reason for that is that we re-load our index once
per day in the middle of the night.

To do this, I create a new index, load all the data, and then move an index
alias from the old to the new. No downtime. I wrote a job that loads the
data via the bulk API. I'm pretty happy with this, too.

In the M data center, machine M1 is the one I use to load the data. It is
NOT in our load balancing rotation for reads. Machines M2 and M3 are, as
are both machines in data center B.

All M machines are master=true data=true. All B machines are master=false
data=true. The reason I made B machines master=false was so that while
building the new index nightly on M1, it doesn't have to go to a B machine
as the master. I presume this is wise. I'm not sure.

I write in batches of 2000 documents and get about 1300 documents per
second on write speeds.

I also have ONE job that does scripted upserts in batches of 1000 each that
gets about 300 documents per second. This is slower than I'd like. I'm
unsure how I might speed this up.

So... anything stand out as bad?

Could I maybe speed up writes by turning replication off while writing and
then back on when done, so that my cluster isn't updating every node during
the writes? Since I keep the index alias pointed at the previous index
until the new one is ready, this should be okay, right?

Anything I might be missing?

THANK YOU TONS if you can chime in. ES is wonderful, but as we all know,
there's a lot to learn!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9348104d-efa7-42ae-baac-f1c63d849e6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rip apart my cluster architecture, would you? ;)

Reply via email to