Re: Memcachd with httpd to cache user stickyness to a datacenter

Mohit Anchlia Wed, 06 Apr 2011 10:25:29 -0700

Thanks! These points are on my list but none of them are useful. The
reason is I think I mentioned before that most of these servers that
are sending requests to us are hosted inside the co. but by different
group. So geoReplication will not work in this case since 70% of
request comes from one region, infact same data center.


Point# 1 mentioned by you is the best option but I am having some
challanges there. Problem like I mentioned is that User A -> connects
to one of the servers in the pool and that server sends -> http to our
server. Now user A can sign out and connect to other server in the
pool and then we get the request. Only way we can solve this is by
changing the server code, this would be best. However, we are having
hard time and I am trying to see if there are other solutions like say
a nosql distributed db that keeps track of user session.

On Tue, Apr 5, 2011 at 5:24 PM, dormando <dorma...@rydia.net> wrote:
> (First; Roberto I swear if you do that thing where you spam three e-mails
> in a row one more time I'm blocking you from the list. To be honest I do
> that to occasionally, but I limit myself to two responses and I try a lot
> harder to be useful.)
>
> Mohit; Sorry for the confusion here. I hope you can see what happened and
> that it was without malice:
>
> - User comes in with a vague request (I didn't even understand what "httpd
> to cache" meant until just now)
> - People who have solved this problem themselves answer with what works. I
> can vouch for these people that there's a reason why they give these
> answers. What you're suggesting isn't something you can innovate around,
> it's something you can either hack or fail at.
>
> If you're willing I'd like to try one more time with a bit less vagueness,
> so I've picked up your message here:
>
>> Currently there is no memcache. I was thinking of using memcache to do
>> store user session per site info and redirect if user came to wrong
>> site, but memcached seems to be not meant to be master/master or
>> master/slave either. Generally industry practice is such use cases is
>> to use cookies but we can't even use that because most of our apps are
>> non-browser based app. Flow is something like this for eg:
>>
>> desktop client -> service X  -> our service in either data center.
>>
>> So essentially client calls the server which then calls us.
>
> I'm going to take some shots in the dark, but it would be useful to have
> even more information about the behavior of this cilent, so:
>
> You say that you want to prevent a client from hitting the wrong DC once
> they're supposed to be "sticky". Are you saying that the desktop client
> has to guess every single time it needs to make a request? ie; request
> goes to random DC, then *redirects* to the correct one if it guessed
> wrong?
>
> Is there any sense of keepalive? ie; if one request succeeds, is the
> connection held open for further requests? Or do you have to do this
> discovery every single time?
>
> Your description of non-browser-based thing is fuzzy as well. How much
> control do you have over the service calling you? Cookies do in fact work
> anywhere you bother to parse them, they're not particular to browsers :P
>
> So correct me if I'm wrong, but it looks like you're running a service
> where "some client" makes a single HTTP request to you, but will honor
> HTTP redirects, and you have zero ability to change the way this guy makes
> requests? Or can you actually add some control here?
>
> So some shots in the dark:
>
> 1) You might even be able to do this with redirects and no client smarts.
> The workflow would be something like:
>
> - Client sends WRITE req to http://happytimes.com/api/whatever
> - RR DNS sends them to 192.168.5.1
> - 192.168.5.1 has no record of talking to the client before. It creates a
> local session and stores a note about seeing this recent request, then it
> redirects to the other DC (192.168.6.1)
> - 192.168.6.1 does the same, and redirects back again (since it hasn't
> seen this client before). It also notes that the original request had been
> redirected once, so in the future it should bounce all requests back to
> 192.168.5.1
> - 192.168.5.1 gets the request again, has a session which says it's seen
> it before, and that this DC was the originator of the request, so then
> serves back the request.
> - Client sends second WRITE req in
> - RR DNS sends them to 192.168.6.1
> - 192.168.6.1 has a note about that client, knows it's not the master, and
> redirects back to 192.168.5.1, who then serves the request.
>
> * This *requires* a method of uniquely identifying the remote client
> * This works best with RR dns. If you want to make a decision about
> whether or not to handle the incoming client, the exchange gets trickier..
> If a DC has received a request it hasn't seen before, but doesn't want to
> handle it, it has to redirect to a special url telling the other side to
> become master and handle the request.
> * This *requires* that if one DC has never seen a client before, both
> *must* see the request before *either* serves it.
>
> * It's also a huge hack, and the extra redirects will cause annoying lag.
>
> 2) Use GeoDNS.
>
> I'm not sure why you said you couldn't. Something something 70% of users
> in one area, 30% in another? I always thought it'd be a cute trick to mess
> with a GeoIP database a bit. If 50% of my users are in california but my
> servers are in texas and chicago, I can just hack the GeoIP database to
> redistribute what areas get what responses (norcal to chicago, socal to
> texas). So far as I know nothing stops you from dicking with the data.
>
> Some users will get worse latency, but if you wanted to serve all the
> users without extra latency I'd *hopefully* assume your business has taken
> this into account before deciding it's better to split traffic 50/50 and
> doubling your chances of catastrophic downtime.
>
> 3) Add a teeny tiny bit of app smarts. When they send a write, read back
> the cookie and use it :P But given my assumption about your client above,
> I'm guessing this isn't possible. If business has some shitty reason for
> not doing it, I would push harder unless you have a good way to dodge when
> the hack put in place ultimately fails.
>
> 4) Assuming that the client hits a *random datacenter* *every single
> time*, ALL OTHER OPTIONS, which use asynchronous replication, will have a
> race condition failure. If you want to implement this, you *must* have the
> source datacenter block the client's response until it has written the
> session note to the remote datacenter. Perhaps you only need to do this
> once per hour.
>
> 5) None of this will work if the client can make multiple requests at a
> time, or if your service makes decisions based on *any* data that isn't
> uniquely paired to that original client (like a feed list or twitter
> timeline)
>
> 6) I can think of more variations of #1 while using backhauls, but tbh
> they're all super gross.
>
> n' stuff.
> -Dormando
>

Re: Memcachd with httpd to cache user stickyness to a datacenter

Reply via email to