Re: Memcachd with httpd to cache user stickyness to a datacenter

dormando Tue, 05 Apr 2011 17:24:51 -0700

(First; Roberto I swear if you do that thing where you spam three e-mails
in a row one more time I'm blocking you from the list. To be honest I do
that to occasionally, but I limit myself to two responses and I try a lot
harder to be useful.)


Mohit; Sorry for the confusion here. I hope you can see what happened and
that it was without malice:

- User comes in with a vague request (I didn't even understand what "httpd
to cache" meant until just now)
- People who have solved this problem themselves answer with what works. I
can vouch for these people that there's a reason why they give these
answers. What you're suggesting isn't something you can innovate around,
it's something you can either hack or fail at.

If you're willing I'd like to try one more time with a bit less vagueness,
so I've picked up your message here:

> Currently there is no memcache. I was thinking of using memcache to do
> store user session per site info and redirect if user came to wrong
> site, but memcached seems to be not meant to be master/master or
> master/slave either. Generally industry practice is such use cases is
> to use cookies but we can't even use that because most of our apps are
> non-browser based app. Flow is something like this for eg:
>
> desktop client -> service X  -> our service in either data center.
>
> So essentially client calls the server which then calls us.

I'm going to take some shots in the dark, but it would be useful to have
even more information about the behavior of this cilent, so:

You say that you want to prevent a client from hitting the wrong DC once
they're supposed to be "sticky". Are you saying that the desktop client
has to guess every single time it needs to make a request? ie; request
goes to random DC, then *redirects* to the correct one if it guessed
wrong?

Is there any sense of keepalive? ie; if one request succeeds, is the
connection held open for further requests? Or do you have to do this
discovery every single time?

Your description of non-browser-based thing is fuzzy as well. How much
control do you have over the service calling you? Cookies do in fact work
anywhere you bother to parse them, they're not particular to browsers :P

So correct me if I'm wrong, but it looks like you're running a service
where "some client" makes a single HTTP request to you, but will honor
HTTP redirects, and you have zero ability to change the way this guy makes
requests? Or can you actually add some control here?

So some shots in the dark:

1) You might even be able to do this with redirects and no client smarts.
The workflow would be something like:

- Client sends WRITE req to http://happytimes.com/api/whatever
- RR DNS sends them to 192.168.5.1
- 192.168.5.1 has no record of talking to the client before. It creates a
local session and stores a note about seeing this recent request, then it
redirects to the other DC (192.168.6.1)
- 192.168.6.1 does the same, and redirects back again (since it hasn't
seen this client before). It also notes that the original request had been
redirected once, so in the future it should bounce all requests back to
192.168.5.1
- 192.168.5.1 gets the request again, has a session which says it's seen
it before, and that this DC was the originator of the request, so then
serves back the request.
- Client sends second WRITE req in
- RR DNS sends them to 192.168.6.1
- 192.168.6.1 has a note about that client, knows it's not the master, and
redirects back to 192.168.5.1, who then serves the request.

* This *requires* a method of uniquely identifying the remote client
* This works best with RR dns. If you want to make a decision about
whether or not to handle the incoming client, the exchange gets trickier..
If a DC has received a request it hasn't seen before, but doesn't want to
handle it, it has to redirect to a special url telling the other side to
become master and handle the request.
* This *requires* that if one DC has never seen a client before, both
*must* see the request before *either* serves it.

* It's also a huge hack, and the extra redirects will cause annoying lag.

2) Use GeoDNS.

I'm not sure why you said you couldn't. Something something 70% of users
in one area, 30% in another? I always thought it'd be a cute trick to mess
with a GeoIP database a bit. If 50% of my users are in california but my
servers are in texas and chicago, I can just hack the GeoIP database to
redistribute what areas get what responses (norcal to chicago, socal to
texas). So far as I know nothing stops you from dicking with the data.

Some users will get worse latency, but if you wanted to serve all the
users without extra latency I'd *hopefully* assume your business has taken
this into account before deciding it's better to split traffic 50/50 and
doubling your chances of catastrophic downtime.

3) Add a teeny tiny bit of app smarts. When they send a write, read back
the cookie and use it :P But given my assumption about your client above,
I'm guessing this isn't possible. If business has some shitty reason for
not doing it, I would push harder unless you have a good way to dodge when
the hack put in place ultimately fails.

4) Assuming that the client hits a *random datacenter* *every single
time*, ALL OTHER OPTIONS, which use asynchronous replication, will have a
race condition failure. If you want to implement this, you *must* have the
source datacenter block the client's response until it has written the
session note to the remote datacenter. Perhaps you only need to do this
once per hour.

5) None of this will work if the client can make multiple requests at a
time, or if your service makes decisions based on *any* data that isn't
uniquely paired to that original client (like a feed list or twitter
timeline)

6) I can think of more variations of #1 while using backhauls, but tbh
they're all super gross.

n' stuff.
-Dormando

Re: Memcachd with httpd to cache user stickyness to a datacenter

Reply via email to