Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-27 Thread moses wejuli
thnx

On 27 March 2010 13:50, Martin Grotzke wrote:

> Great, thanx!
>
> Cheers,
> Martin
>
>
> On Mon, Mar 22, 2010 at 6:04 PM, Adam Lee  wrote:
> > On Sat, Mar 20, 2010 at 7:31 PM, Martin Grotzke
> >  wrote:
> >> Ok, thanx for sharing your experience. Do you have some app online
> >> implemented like this I can have a look at?
> >
> > http://www.fotolog.com/
> >
> > --
> > awl
> >
> > To unsubscribe from this group, send email to memcached+
> unsubscribegooglegroups.com or reply to this email with the words "REMOVE
> ME" as the subject.
> >
>
>
>
> --
> Martin Grotzke
> http://www.javakaffee.de/blog/
>
> To unsubscribe from this group, send email to memcached+
> unsubscribegooglegroups.com or reply to this email with the words "REMOVE
> ME" as the subject.
>

To unsubscribe from this group, send email to 
memcached+unsubscribegooglegroups.com or reply to this email with the words 
"REMOVE ME" as the subject.


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-27 Thread Martin Grotzke
Great, thanx!

Cheers,
Martin


On Mon, Mar 22, 2010 at 6:04 PM, Adam Lee  wrote:
> On Sat, Mar 20, 2010 at 7:31 PM, Martin Grotzke
>  wrote:
>> Ok, thanx for sharing your experience. Do you have some app online
>> implemented like this I can have a look at?
>
> http://www.fotolog.com/
>
> --
> awl
>
> To unsubscribe from this group, send email to 
> memcached+unsubscribegooglegroups.com or reply to this email with the words 
> "REMOVE ME" as the subject.
>



-- 
Martin Grotzke
http://www.javakaffee.de/blog/

To unsubscribe from this group, send email to 
memcached+unsubscribegooglegroups.com or reply to this email with the words 
"REMOVE ME" as the subject.


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-22 Thread Adam Lee
On Sat, Mar 20, 2010 at 7:31 PM, Martin Grotzke
 wrote:
> Ok, thanx for sharing your experience. Do you have some app online
> implemented like this I can have a look at?

http://www.fotolog.com/

-- 
awl

To unsubscribe from this group, send email to 
memcached+unsubscribegooglegroups.com or reply to this email with the words 
"REMOVE ME" as the subject.


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-20 Thread Martin Grotzke
On Thu, Mar 18, 2010 at 6:49 PM, dormando  wrote:
> I had a thought; You could use ADD to "ping" the sessions every time
> they're accessed. When a session is served from local memory, do an async
> ADD of a 0 byte value with a 1 second expiration time against that
> session. If the add command fails it actually promotes the existing item
> to the head of the LRU. If it doesn't fail you probably want to SET the
> session back into memcached.
Cool! I wasn't aware of this ADD semantics...


> Feels like your background thread is a little wrong as well. If you can
> journal in the session somewhere the last time it was written to
> memcached, your thread could update purely based on that. ie; if they're
> off by 5+ minutes, sync them asap. So then:
>
> - session created, 1hr expiration. session notes it is "clean" with
> memcached as of that second.
>
> - session accessed 10 minutes later, not modified. 1hr expiration, 50 mins
> in memcached. Session is pinged via ADD and moves to head of LRU. Fresh.
>
> - 5 minutes later, background thread trawls through sessions in local
> memory whose "memcached clean" timestamp is 5+ minutes off from its last
> accessed time. syncs to memcached, updates session locally to note when it
> was synced? Session is still relatively "fresh" (last accessed 10 minutes
> ago). Bumping it to the top of the LRU isn't as bad of a problem as
> bumping a 50 minute old session to the top for no reason.
>
> - 55 minutes later, session expires from tomcat. Thread issues DELETE
> against memcached which cleans up the session, if it's still there.
Sounds like a good solution for the potential issue, and adding the
"clean" timestamp
is no issue.

Thanx && cheers,
Martin

To unsubscribe from this group, send email to 
memcached+unsubscribegooglegroups.com or reply to this email with the words 
"REMOVE ME" as the subject.


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-20 Thread Martin Grotzke
On Tue, Mar 16, 2010 at 5:31 PM, Adam Lee  wrote:
> Yes.  As described in my previous post, all necessary state
> information is contained in the request-- if you want to pass state
> information to the next request, it's easiest to encode it in a sort
> of cryptographically secure "token."  I find it easiest to think of it
> as almost like an FSM where the cookies, token and query parameters
> are inputs.  Obviously it's not purely deterministic or referentially
> transparent, since things do end up getting written to the datastores
> and such, but it is a somewhat useful abstraction.
Ok, thanx for sharing your experience. Do you have some app online
implemented like this I can have a look at?

Cheers,
Martin


>
> On Tue, Mar 16, 2010 at 4:02 AM, Martin Grotzke
>  wrote:
>> On Mon, Mar 15, 2010 at 6:57 PM, Adam Lee  wrote:
>>>
>>> On Sun, Mar 14, 2010 at 2:59 PM, Les Mikesell 
>>> wrote:

 Adam Lee wrote:
>
> well, it depends on what you mean by scalability... i'm personally of
> the opinion that traditional sessions should be avoided if you want to
> truly scale.

 And yet, everyone wants dynamic pages custom-generated to the user's
 preferences.  So how do you reconcile that?  You can help things a bit by
 splitting pages into iframe/image components that do/don't need sessions,
 and you can make the client do more of the work by sending back values in
 cookies instead of just the session key, but I'm not sure how far you can
 go.
>>>
>>> Well, I guess it depends on your definition of "session."  Obviously, you
>>> need to account for user preferences and such, but I don't consider those
>>> "session" data since they are consistent across any session that the user
>>> instantiates.
>>> Probably the easiest way to build a "stateless"/shared-nothing web
>>> application, and what we've done to scale, is to store user authentication
>>> data and the like in an encrypted cookie.  Any other session-like data (geo
>>> location from IP lookup, language preference, etc) can be set in separate
>>> cookies.  Since cookies are sent with every request, it is possible to
>>> easily authenticate that the user is who they say they are and discern the
>>> necessary data to build their page using only these cookies and you don't
>>> need to look anything up in any sort of centralized session cache.
>>> Data that is needed to authenticate a request or to display a message on a
>>> subsequent page view (things that would be stored in the Flash in Rails,
>>> from how I understand that to work) can be encoded into a cryptographically
>>> secure "token" that is passed to the following request.
>>> User preferences and settings, on the other hand, are not really session
>>> data, as I said above.  I've already described somewhat how we have this
>>> data stored in a few previous posts on this thread, but I guess I'll do a
>>> basic overview for the sake of completeness...
>>> Our central datastore for users is still (unfortunately) a database
>>> (mysql), but this is essentially only used for writes.  All user data is
>>> also written to TokyoTyrant, which is our primary persistent datastore for
>>> reads, and is replicated exactly in memcached.
>>> Since not all user data is needed for every page view, we've broken the
>>> user data into what we call "user chunks," which roughly correspond to what
>>> would be DB tables or separate objects in a traditional ORM.  We built a
>>> service that will get you the data you want for a specific user or set of
>>> users by taking name(s) and a bitmask for what chunks you want.  So, for
>>> example, if I wanted to load the basic user data, active photo and profile
>>> data for the user "admin," I'd just have to do something like this:
>>> RoUserCache.get("admin", USER | ACTIVE_PHOTO | PROFILE);
>>> The beauty of this is that the cache is smart-- it batches all of the
>>> requests from a thread into bulk gets, it does as much as possible
>>> asynchronously and it tries to get data from memcached first and, if it's
>>> not there, then gets it from TokyoTyrant. TokyoTyrant and memcached are both
>>> great at doing bulk gets, so this is pretty fast and, since they both speak
>>> the same protocol (memcached), it wasn't terribly difficult to build.  Doing
>>> it asynchronously means that most of the latency is absorbed, too, since we
>>> try to do these loads as early on in the page building process as possible,
>>> so it tends to be there by the time the page tries to use it.
>>> Anyway, I've strayed a bit from the topic at hand, but I guess I felt I
>>> should elaborate on what I meant...
>>
>> So you're one of the lucky guys that don't have to support users with
>> cookies disabled?
>> According to what you describe is seems you're not using sticky sessions. Do
>> you handle concurrency issues in any way, to make sure that concurrent
>> requests (e.g. tabbed browsing, AJAX) hitting different servers see the same
>> data?
>> Cheers,
>> Ma

Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-18 Thread Ren


On Mar 15, 5:57 pm, Adam Lee  wrote:
> On Sun, Mar 14, 2010 at 2:59 PM, Les Mikesell  wrote:
> > Adam Lee wrote:
>
> >> well, it depends on what you mean by scalability... i'm personally of
> >> the opinion that traditional sessions should be avoided if you want to
> >> truly scale.
>
> > And yet, everyone wants dynamic pages custom-generated to the user's
> > preferences.  So how do you reconcile that?  You can help things a bit by
> > splitting pages into iframe/image components that do/don't need sessions,
> > and you can make the client do more of the work by sending back values in
> > cookies instead of just the session key, but I'm not sure how far you can
> > go.
>
> Well, I guess it depends on your definition of "session."  Obviously, you
> need to account for user preferences and such, but I don't consider those
> "session" data since they are consistent across any session that the user
> instantiates.
>
> Probably the easiest way to build a "stateless"/shared-nothing web
> application, and what we've done to scale, is to store user authentication
> data and the like in an encrypted cookie.  Any other session-like data (geo
> location from IP lookup, language preference, etc) can be set in separate
> cookies.  Since cookies are sent with every request, it is possible to
> easily authenticate that the user is who they say they are and discern the
> necessary data to build their page using only these cookies and you don't
> need to look anything up in any sort of centralized session cache.
>
> Data that is needed to authenticate a request or to display a message on a
> subsequent page view (things that would be stored in the Flash in Rails,
> from how I understand that to work) can be encoded into a cryptographically
> secure "token" that is passed to the following request.

Yes, I like the implementation outlined in the "A secure cookie
protocol" paper, 
http://scholar.google.co.uk/scholar?cluster=1658505294292503872&hl=en&as_sdt=2000

And using technologies like ESI (http://en.wikipedia.org/wiki/
Edge_Side_Includes ) where you can have HTML fragments which return
vary heads on the cookies, can move the session problem from the
application server to a HTTP accelerator (Varnish/Squid3/.. ) in front
of it.

Jared

To unsubscribe from this group, send email to 
memcached+unsubscribegooglegroups.com or reply to this email with the words 
"REMOVE ME" as the subject.


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-18 Thread dormando
> Nope, basically it's like this: on every request (with an associated session) 
> the session is served from local memory (the jvm local session map). When the 
> request is finished, the session is sent to
> memcached (with the session timeout as expiration, e.g. 3600). It's still 
> held in the local session map, memcached is just there as a kind of backup 
> device. The session is only pulled from memcached, if a
> tomcat originally serving this session (say tomcat1) died and therefore 
> another tomcat (tomcat2) is asked to serve this session. tomcat2 then does 
> not have a session for the requested sessionId and
> therefore loads this session from memcached.
>
> The case that I was describing before (sending only modified sessions to 
> memcached) is a feature for performance optimization: the asumption is, that 
> there are more requests just accessing the session but
> less requests that actually modify the session. So the idea is that I don't 
> have to update a session in memcached that was accessed but not modified. The 
> issue that had to be handled then was only the case
> of the different timeouts/expiration times:
> - a session was first created/updated and stored in memcached: both in tomcat 
> and memcached it has an expiration of 1h
> - this session is accessed 10 minutes later; as it was not modified it is not 
> updated in memcached. Then this session has an expiration of 1 hour again in 
> tomcat, but in memcached it's already 10 minutes
> old, so it would expire 50 minutes later.
>
> To prevent this premature expiration I used the mentioned background thread 
> to update the expiration of such items in memcached to the remaining 
> expiration time they have in tomcat. In the example above
> nearly 50 minutes later the session would be updated with an expiration time 
> of 10 minutes.

For sake of argument I think that background thread should just journal to
DB, but eh :P

I had a thought; You could use ADD to "ping" the sessions every time
they're accessed. When a session is served from local memory, do an async
ADD of a 0 byte value with a 1 second expiration time against that
session. If the add command fails it actually promotes the existing item
to the head of the LRU. If it doesn't fail you probably want to SET the
session back into memcached.

Feels like your background thread is a little wrong as well. If you can
journal in the session somewhere the last time it was written to
memcached, your thread could update purely based on that. ie; if they're
off by 5+ minutes, sync them asap. So then:

- session created, 1hr expiration. session notes it is "clean" with
memcached as of that second.

- session accessed 10 minutes later, not modified. 1hr expiration, 50 mins
in memcached. Session is pinged via ADD and moves to head of LRU. Fresh.

- 5 minutes later, background thread trawls through sessions in local
memory whose "memcached clean" timestamp is 5+ minutes off from its last
accessed time. syncs to memcached, updates session locally to note when it
was synced? Session is still relatively "fresh" (last accessed 10 minutes
ago). Bumping it to the top of the LRU isn't as bad of a problem as
bumping a 50 minute old session to the top for no reason.

- 55 minutes later, session expires from tomcat. Thread issues DELETE
against memcached which cleans up the session, if it's still there.

> That's already done: sessions that are expiring in tomcat are deleted from 
> memcached
>
> The issue that I described regarding sessions, that were only accessed by the 
> application but not modified and therefore were not updated in memached was 
> the following: when such a session (session A) is
> updated in memcached (just before it would expire in memcached) with a new 
> expiration time of say then 10 minutes (the time that it has left in tomcat), 
> it will be pushed to the head of the LRU.
> Another session (session B) might have been modified just 20 minutes before 
> and sent to memcached with an expiration of 60 minutes, this one will be 
> closer to the tail of the LRU than session A, even if
> session B will still have 40 minutes to live - 30 minutes more than session 
> A. And because it's closer to the tail session B might be dropped before 
> session A, even if session A would already be expired.
>
> However, this would only be an issue if there are too many sessions for the 
> available memory of a slab.

Yeah. Think what I described above solves almost everything :p granted you
can add that sync timestamp. That's essentially the same algorithm I
describe for syncing to a database in my old post, + the ADD trick. Guess
I should go rewrite the post.

This sorta gates on your background thread being able to update a
timestamp in the local memory session though... Failing that you still
have some workaround options but I can't think of any non-ugly ones.

I'll push again on my main point though... Don't overengineer the
slabbing unless you prove that it's a problem first. 1.4 ser

Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-16 Thread Adam Lee
Yes.  As described in my previous post, all necessary state
information is contained in the request-- if you want to pass state
information to the next request, it's easiest to encode it in a sort
of cryptographically secure "token."  I find it easiest to think of it
as almost like an FSM where the cookies, token and query parameters
are inputs.  Obviously it's not purely deterministic or referentially
transparent, since things do end up getting written to the datastores
and such, but it is a somewhat useful abstraction.

On Tue, Mar 16, 2010 at 4:02 AM, Martin Grotzke
 wrote:
> On Mon, Mar 15, 2010 at 6:57 PM, Adam Lee  wrote:
>>
>> On Sun, Mar 14, 2010 at 2:59 PM, Les Mikesell 
>> wrote:
>>>
>>> Adam Lee wrote:

 well, it depends on what you mean by scalability... i'm personally of
 the opinion that traditional sessions should be avoided if you want to
 truly scale.
>>>
>>> And yet, everyone wants dynamic pages custom-generated to the user's
>>> preferences.  So how do you reconcile that?  You can help things a bit by
>>> splitting pages into iframe/image components that do/don't need sessions,
>>> and you can make the client do more of the work by sending back values in
>>> cookies instead of just the session key, but I'm not sure how far you can
>>> go.
>>
>> Well, I guess it depends on your definition of "session."  Obviously, you
>> need to account for user preferences and such, but I don't consider those
>> "session" data since they are consistent across any session that the user
>> instantiates.
>> Probably the easiest way to build a "stateless"/shared-nothing web
>> application, and what we've done to scale, is to store user authentication
>> data and the like in an encrypted cookie.  Any other session-like data (geo
>> location from IP lookup, language preference, etc) can be set in separate
>> cookies.  Since cookies are sent with every request, it is possible to
>> easily authenticate that the user is who they say they are and discern the
>> necessary data to build their page using only these cookies and you don't
>> need to look anything up in any sort of centralized session cache.
>> Data that is needed to authenticate a request or to display a message on a
>> subsequent page view (things that would be stored in the Flash in Rails,
>> from how I understand that to work) can be encoded into a cryptographically
>> secure "token" that is passed to the following request.
>> User preferences and settings, on the other hand, are not really session
>> data, as I said above.  I've already described somewhat how we have this
>> data stored in a few previous posts on this thread, but I guess I'll do a
>> basic overview for the sake of completeness...
>> Our central datastore for users is still (unfortunately) a database
>> (mysql), but this is essentially only used for writes.  All user data is
>> also written to TokyoTyrant, which is our primary persistent datastore for
>> reads, and is replicated exactly in memcached.
>> Since not all user data is needed for every page view, we've broken the
>> user data into what we call "user chunks," which roughly correspond to what
>> would be DB tables or separate objects in a traditional ORM.  We built a
>> service that will get you the data you want for a specific user or set of
>> users by taking name(s) and a bitmask for what chunks you want.  So, for
>> example, if I wanted to load the basic user data, active photo and profile
>> data for the user "admin," I'd just have to do something like this:
>> RoUserCache.get("admin", USER | ACTIVE_PHOTO | PROFILE);
>> The beauty of this is that the cache is smart-- it batches all of the
>> requests from a thread into bulk gets, it does as much as possible
>> asynchronously and it tries to get data from memcached first and, if it's
>> not there, then gets it from TokyoTyrant. TokyoTyrant and memcached are both
>> great at doing bulk gets, so this is pretty fast and, since they both speak
>> the same protocol (memcached), it wasn't terribly difficult to build.  Doing
>> it asynchronously means that most of the latency is absorbed, too, since we
>> try to do these loads as early on in the page building process as possible,
>> so it tends to be there by the time the page tries to use it.
>> Anyway, I've strayed a bit from the topic at hand, but I guess I felt I
>> should elaborate on what I meant...
>
> So you're one of the lucky guys that don't have to support users with
> cookies disabled?
> According to what you describe is seems you're not using sticky sessions. Do
> you handle concurrency issues in any way, to make sure that concurrent
> requests (e.g. tabbed browsing, AJAX) hitting different servers see the same
> data?
> Cheers,
> Martin
>
>>
>> --
>> awl
>
>
>
> --
> Martin Grotzke
> http://www.javakaffee.de/blog/
>



-- 
awl


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-16 Thread Martin Grotzke
On Mon, Mar 15, 2010 at 6:57 PM, Adam Lee  wrote:

> On Sun, Mar 14, 2010 at 2:59 PM, Les Mikesell wrote:
>
>> Adam Lee wrote:
>>
>>> well, it depends on what you mean by scalability... i'm personally of
>>> the opinion that traditional sessions should be avoided if you want to
>>> truly scale.
>>>
>>
>> And yet, everyone wants dynamic pages custom-generated to the user's
>> preferences.  So how do you reconcile that?  You can help things a bit by
>> splitting pages into iframe/image components that do/don't need sessions,
>> and you can make the client do more of the work by sending back values in
>> cookies instead of just the session key, but I'm not sure how far you can
>> go.
>>
>
> Well, I guess it depends on your definition of "session."  Obviously, you
> need to account for user preferences and such, but I don't consider those
> "session" data since they are consistent across any session that the user
> instantiates.
>
> Probably the easiest way to build a "stateless"/shared-nothing web
> application, and what we've done to scale, is to store user authentication
> data and the like in an encrypted cookie.  Any other session-like data (geo
> location from IP lookup, language preference, etc) can be set in separate
> cookies.  Since cookies are sent with every request, it is possible to
> easily authenticate that the user is who they say they are and discern the
> necessary data to build their page using only these cookies and you don't
> need to look anything up in any sort of centralized session cache.
>
> Data that is needed to authenticate a request or to display a message on a
> subsequent page view (things that would be stored in the Flash in Rails,
> from how I understand that to work) can be encoded into a cryptographically
> secure "token" that is passed to the following request.
>
> User preferences and settings, on the other hand, are not really session
> data, as I said above.  I've already described somewhat how we have this
> data stored in a few previous posts on this thread, but I guess I'll do a
> basic overview for the sake of completeness...
>
> Our central datastore for users is still (unfortunately) a database
> (mysql), but this is essentially only used for writes.  All user data is
> also written to TokyoTyrant, which is our primary persistent datastore for
> reads, and is replicated exactly in memcached.
>
> Since not all user data is needed for every page view, we've broken the
> user data into what we call "user chunks," which roughly correspond to what
> would be DB tables or separate objects in a traditional ORM.  We built a
> service that will get you the data you want for a specific user or set of
> users by taking name(s) and a bitmask for what chunks you want.  So, for
> example, if I wanted to load the basic user data, active photo and profile
> data for the user "admin," I'd just have to do something like this:
>
> RoUserCache.get("admin", USER | ACTIVE_PHOTO | PROFILE);
>
> The beauty of this is that the cache is smart-- it batches all of the
> requests from a thread into bulk gets, it does as much as possible
> asynchronously and it tries to get data from memcached first and, if it's
> not there, then gets it from TokyoTyrant. TokyoTyrant and memcached are both
> great at doing bulk gets, so this is pretty fast and, since they both speak
> the same protocol (memcached), it wasn't terribly difficult to build.  Doing
> it asynchronously means that most of the latency is absorbed, too, since we
> try to do these loads as early on in the page building process as possible,
> so it tends to be there by the time the page tries to use it.
>
> Anyway, I've strayed a bit from the topic at hand, but I guess I felt I
> should elaborate on what I meant...
>
So you're one of the lucky guys that don't have to support users with
cookies disabled?

According to what you describe is seems you're not using sticky sessions. Do
you handle concurrency issues in any way, to make sure that concurrent
requests (e.g. tabbed browsing, AJAX) hitting different servers see the same
data?

Cheers,
Martin



>
> --
> awl
>



-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-15 Thread Adam Lee
On Sun, Mar 14, 2010 at 2:59 PM, Les Mikesell  wrote:

> Adam Lee wrote:
>
>> well, it depends on what you mean by scalability... i'm personally of
>> the opinion that traditional sessions should be avoided if you want to
>> truly scale.
>>
>
> And yet, everyone wants dynamic pages custom-generated to the user's
> preferences.  So how do you reconcile that?  You can help things a bit by
> splitting pages into iframe/image components that do/don't need sessions,
> and you can make the client do more of the work by sending back values in
> cookies instead of just the session key, but I'm not sure how far you can
> go.
>

Well, I guess it depends on your definition of "session."  Obviously, you
need to account for user preferences and such, but I don't consider those
"session" data since they are consistent across any session that the user
instantiates.

Probably the easiest way to build a "stateless"/shared-nothing web
application, and what we've done to scale, is to store user authentication
data and the like in an encrypted cookie.  Any other session-like data (geo
location from IP lookup, language preference, etc) can be set in separate
cookies.  Since cookies are sent with every request, it is possible to
easily authenticate that the user is who they say they are and discern the
necessary data to build their page using only these cookies and you don't
need to look anything up in any sort of centralized session cache.

Data that is needed to authenticate a request or to display a message on a
subsequent page view (things that would be stored in the Flash in Rails,
from how I understand that to work) can be encoded into a cryptographically
secure "token" that is passed to the following request.

User preferences and settings, on the other hand, are not really session
data, as I said above.  I've already described somewhat how we have this
data stored in a few previous posts on this thread, but I guess I'll do a
basic overview for the sake of completeness...

Our central datastore for users is still (unfortunately) a database (mysql),
but this is essentially only used for writes.  All user data is also written
to TokyoTyrant, which is our primary persistent datastore for reads, and is
replicated exactly in memcached.

Since not all user data is needed for every page view, we've broken the user
data into what we call "user chunks," which roughly correspond to what would
be DB tables or separate objects in a traditional ORM.  We built a service
that will get you the data you want for a specific user or set of users by
taking name(s) and a bitmask for what chunks you want.  So, for example, if
I wanted to load the basic user data, active photo and profile data for the
user "admin," I'd just have to do something like this:

RoUserCache.get("admin", USER | ACTIVE_PHOTO | PROFILE);

The beauty of this is that the cache is smart-- it batches all of the
requests from a thread into bulk gets, it does as much as possible
asynchronously and it tries to get data from memcached first and, if it's
not there, then gets it from TokyoTyrant. TokyoTyrant and memcached are both
great at doing bulk gets, so this is pretty fast and, since they both speak
the same protocol (memcached), it wasn't terribly difficult to build.  Doing
it asynchronously means that most of the latency is absorbed, too, since we
try to do these loads as early on in the page building process as possible,
so it tends to be there by the time the page tries to use it.

Anyway, I've strayed a bit from the topic at hand, but I guess I felt I
should elaborate on what I meant...

-- 
awl


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Martin Grotzke
On Sun, Mar 14, 2010 at 7:59 PM, Les Mikesell  wrote:

> Adam Lee wrote:
>
>> well, it depends on what you mean by scalability... i'm personally of
>> the opinion that traditional sessions should be avoided if you want to
>> truly scale.
>
>
> And yet, everyone wants dynamic pages custom-generated to the user's
> preferences.  So how do you reconcile that?  You can help things a bit by
> splitting pages into iframe/image components that do/don't need sessions,
> and you can make the client do more of the work by sending back values in
> cookies instead of just the session key, but I'm not sure how far you can
> go.
>
Basically - yes. For true scalability avoid sessions.

Though, I'm still looking for a way to create a so called *stateless*, but
"dynamic" ecommerce application that supports multivariate testing,
behavioral targeting and all the other stuff the marketing *needs* to have.

The thing is that you can't avoid state, the question is where to keep it.
So IMO there's no stateless application, perhaps a stateless server side,
when state is hold on the client. But these approaches also have
disadvantages (security, data/content size etc.) and if you need to support
users without cookies it's getting even harder. And POSTing every request to
transfer state in hidden form fields is also not nice. These days I stumbled
over play (playframework.org out of my head) which says to keep things
stateless, I want to have a look at what can be done with this.

So, if you have a solution how to build an application without server side
session that still matches these other conflicting requirements please let
me know!

Cheers,
Martin


>
> --
>  Les Mikesell
>   lesmikes...@gmail.com
>
>
>


-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Martin Grotzke
On Sun, Mar 14, 2010 at 5:12 PM, moses wejuli wrote:

> there seems to be a general phobia out there of storing sessions in the DB

It's not a phobia it's simply the case that it's easier to scale out by
throwing more machines with app-server/memcached pairs in than scaling the
database. For our current project we also have a policy that for a single
"normal" request the database must not be touched (placing an order in fact
is allowed to hit the database).

The memcached-session-manager also serves sessions from local memory for
optimal performance, so that for a single request there's no I/O needed.
Writing sessions to memcached after the request is finished can be done
asynchronously. And if such a write would fail occasionally the application
can still work correctly as the up-to-date session is still in local memory
(the primary data source).

Cheers,
Martin




> -- i know this coz i had it once, but overcame it by realizing (mentally
> and through this forum) that we really should use memcached for what it's
> good at and NOT as a persistent data store.
>
> unless you have abysmally low/poor server specs, i think you shouldn't
> worry about perfomance issues with regard to DB-based session handling --
> enhanced with memcached!


>
> On 14 March 2010 13:27, Martin Grotzke wrote:
>
>> Yes, what you described is similar to the situation with the sessions not
>> updated in memcached as they were only read by the application issue.
>>
>> Still, I think if there's enough memory for all active sessions only
>> sessions should be dropped that are in fact expired. For this a simplified
>> slab configuration (one slab for all sessions) would be helpful AFAICS.
>>
>> Cheers,
>> Martin
>>
>>
>> On Sun, Mar 14, 2010 at 8:54 AM, Peter J. Holzer  wrote:
>>
>>> On 2010-03-12 17:07:25 -0800, dormando wrote:
>>> > Now, it should be obvious that if a user session has reached a point
>>> where
>>> > it would be evicted early, it is because you did not have enough memory
>>> to
>>> > store *all active sessions anyway*. The odds of it evicting someone who
>>> > has visited your site *after* me are highly unlikely. The longer I stay
>>> > off the site, the higher the odds of it being evicted early due to lack
>>> of
>>> > memory.
>>> >
>>> > This does mean, by way of painfully describing how an LRU works, that
>>> the
>>> > odds of you finding sessions in memcached which have not been expired,
>>> but
>>> > are being evicted from the LRU earlier than expired sessions, is very
>>> > unlikely.
>>> [...]
>>> > The caveat is that memcached has one LRU per slab class.
>>> >
>>> > So, lets say your traffic ends up looking like:
>>> >
>>> > - For the first 10,000 sessions, they are all 200 kilobytes. This ends
>>> up
>>> > having memcached allocate all of its slab memory toward something that
>>> > will fit 200k items.
>>> > - You get linked from the frontpage of digg.com and suddenly you have
>>> a
>>> > bunch of n00bass users hitting your site. They have smaller sessions
>>> since
>>> > they are newbies. 10k items.
>>> > - Memcached has only reserved 1 megabyte toward 10k items. So now all
>>> of
>>> > your newbies share a 1 megabyte store for sessions, instead of 200
>>> > megabytes.
>>>
>>> There's another caveat (I think Martin may have been referring to this
>>> scenario, but he wasn't very clear):
>>>
>>>
>>> Suppose you have two kinds of entries in your memcached, with different
>>> expire times. For example, in addition to your sessions with 3600s, you
>>> have some alert box with an expiry time of 60s. By chance,
>>> both items are approximately the same size and occupy the same slab
>>> class(es).
>>>
>>> You have enough memory to keep all sessions for 3600 seconds and enough
>>> memory to keep all alert boxes for 60 seconds. But you don't have enough
>>> memory to keep all alert boxes for 3600 seconds (why should you, they
>>> expire
>>> after 60 seconds).
>>>
>>> Now, when you walk the LRU chain, the search for expired items will only
>>> return expired alert boxes which are about as old as your oldest session.
>>> As soon as there are 50 (not yet expired) sessions older than the oldest
>>> (expired) alert box, you will evict a session although you still have a
>>> lot of expired alert boxes which you could reuse.
>>>
>>> The only workaround for this problem I can see is to use different
>>> memcached servers for items of (wildly) different expiration times.
>>>
>>> > However the slab out of balance thing is a real fault of ours. It's a
>>> > project on my plate to have automated slab rebalancing done in some
>>> usable
>>> > fashion within the next several weeks. This means that if a slab is out
>>> of
>>> > memory and under pressure, memcached will decide if it can pull memory
>>> > from another slab class to satisfy that need. As the size of your items
>>> > change over time, it will thus try to compensate.
>>>
>>> That's good to hear.
>>>
>>>hp
>>>
>>> --
>>>   _  | Peter J. Holzer| Openmoko has alr

Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Les Mikesell

Adam Lee wrote:

well, it depends on what you mean by scalability... i'm personally of
the opinion that traditional sessions should be avoided if you want to
truly scale.


And yet, everyone wants dynamic pages custom-generated to the user's 
preferences.  So how do you reconcile that?  You can help things a bit by 
splitting pages into iframe/image components that do/don't need sessions, and 
you can make the client do more of the work by sending back values in cookies 
instead of just the session key, but I'm not sure how far you can go.


--
  Les Mikesell
   lesmikes...@gmail.com




Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Adam Lee
well, it depends on what you mean by scalability... i'm personally of
the opinion that traditional sessions should be avoided if you want to
truly scale.


On Sunday, March 14, 2010, Martin Grotzke  wrote:
> On Sun, Mar 14, 2010 at 5:37 PM, Les Mikesell  wrote:
>
> What about tomcat's ClusterManager?  Doesn't that provide replication across 
> server instances?Yes, but the DeltaManager does an all-to-all replication 
> which limits scalability. The BackupManager does a replication to another 
> tomcat (according to the docs this is not that much tested), and this 
> requires special configuration in the load balancer to support this.
>
> And both are using java serialization - for the memcached-session-manager I 
> implemented xml based serialization that allows easier code upgrades. Of 
> course you still need to think about code changes that affect classes stored 
> in the session, but removing fields is easier to support than with java 
> serialization.
>
> Cheers,Martin
>
> --
>   Les Mikesell
>    lesmikes...@gmail.com
>
>
>
> --
> Martin Grotzke
> http://www.javakaffee.de/blog/
>

-- 
awl


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Martin Grotzke
On Sun, Mar 14, 2010 at 5:37 PM, Les Mikesell  wrote:
>
> What about tomcat's ClusterManager?  Doesn't that provide replication
> across server instances?

Yes, but the DeltaManager does an all-to-all replication which limits
scalability. The BackupManager does a replication to another tomcat
(according to the docs this is not that much tested), and this requires
special configuration in the load balancer to support this.

And both are using java serialization - for the memcached-session-manager I
implemented xml based serialization that allows easier code upgrades. Of
course you still need to think about code changes that affect classes stored
in the session, but removing fields is easier to support than with java
serialization.

Cheers,
Martin


>
>
> --
>  Les Mikesell
>   lesmikes...@gmail.com
>
>


-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Les Mikesell

Martin Grotzke wrote:


Tomcat sounds like such a pisser :P Even with your backup thing I'd
probably still add an option to allow it to journal to a database, and I
say this knowing how to get every last ounce of efficiency out of
memcached.

Tomcat provides a PersistentManager ([1]) which allows to store sessions 
in the database. But this manager backups all sessions in batches every 
10 seconds. For one thing scalability of the application is then 
directly dependent from the database (more than it is already if a 
database is used) and there's a timeframe where sessions can be lost. If 
the session backup frequency is shortened, the database is hit more 
often. Additionally sessions are stored in the database again and again 
even if they were not changed at all. That was the reason why I decided 
not to use this.


What about tomcat's ClusterManager?  Doesn't that provide replication across 
server instances?


--
  Les Mikesell
   lesmikes...@gmail.com



Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread moses wejuli
there seems to be a general phobia out there of storing sessions in the DB
-- i know this coz i had it once, but overcame it by realizing (mentally and
through this forum) that we really should use memcached for what it's good
at and NOT as a persistent data store.

unless you have abysmally low/poor server specs, i think you shouldn't worry
about perfomance issues with regard to DB-based session handling -- enhanced
with memcached!

On 14 March 2010 13:27, Martin Grotzke wrote:

> Yes, what you described is similar to the situation with the sessions not
> updated in memcached as they were only read by the application issue.
>
> Still, I think if there's enough memory for all active sessions only
> sessions should be dropped that are in fact expired. For this a simplified
> slab configuration (one slab for all sessions) would be helpful AFAICS.
>
> Cheers,
> Martin
>
>
> On Sun, Mar 14, 2010 at 8:54 AM, Peter J. Holzer  wrote:
>
>> On 2010-03-12 17:07:25 -0800, dormando wrote:
>> > Now, it should be obvious that if a user session has reached a point
>> where
>> > it would be evicted early, it is because you did not have enough memory
>> to
>> > store *all active sessions anyway*. The odds of it evicting someone who
>> > has visited your site *after* me are highly unlikely. The longer I stay
>> > off the site, the higher the odds of it being evicted early due to lack
>> of
>> > memory.
>> >
>> > This does mean, by way of painfully describing how an LRU works, that
>> the
>> > odds of you finding sessions in memcached which have not been expired,
>> but
>> > are being evicted from the LRU earlier than expired sessions, is very
>> > unlikely.
>> [...]
>> > The caveat is that memcached has one LRU per slab class.
>> >
>> > So, lets say your traffic ends up looking like:
>> >
>> > - For the first 10,000 sessions, they are all 200 kilobytes. This ends
>> up
>> > having memcached allocate all of its slab memory toward something that
>> > will fit 200k items.
>> > - You get linked from the frontpage of digg.com and suddenly you have a
>> > bunch of n00bass users hitting your site. They have smaller sessions
>> since
>> > they are newbies. 10k items.
>> > - Memcached has only reserved 1 megabyte toward 10k items. So now all of
>> > your newbies share a 1 megabyte store for sessions, instead of 200
>> > megabytes.
>>
>> There's another caveat (I think Martin may have been referring to this
>> scenario, but he wasn't very clear):
>>
>>
>> Suppose you have two kinds of entries in your memcached, with different
>> expire times. For example, in addition to your sessions with 3600s, you
>> have some alert box with an expiry time of 60s. By chance,
>> both items are approximately the same size and occupy the same slab
>> class(es).
>>
>> You have enough memory to keep all sessions for 3600 seconds and enough
>> memory to keep all alert boxes for 60 seconds. But you don't have enough
>> memory to keep all alert boxes for 3600 seconds (why should you, they
>> expire
>> after 60 seconds).
>>
>> Now, when you walk the LRU chain, the search for expired items will only
>> return expired alert boxes which are about as old as your oldest session.
>> As soon as there are 50 (not yet expired) sessions older than the oldest
>> (expired) alert box, you will evict a session although you still have a
>> lot of expired alert boxes which you could reuse.
>>
>> The only workaround for this problem I can see is to use different
>> memcached servers for items of (wildly) different expiration times.
>>
>> > However the slab out of balance thing is a real fault of ours. It's a
>> > project on my plate to have automated slab rebalancing done in some
>> usable
>> > fashion within the next several weeks. This means that if a slab is out
>> of
>> > memory and under pressure, memcached will decide if it can pull memory
>> > from another slab class to satisfy that need. As the size of your items
>> > change over time, it will thus try to compensate.
>>
>> That's good to hear.
>>
>>hp
>>
>> --
>>   _  | Peter J. Holzer| Openmoko has already embedded
>> |_|_) | Sysadmin WSR   | voting system.
>> | |   | h...@hjp.at | Named "If you want it -- write it"
>> __/   | http://www.hjp.at/ |  -- Ilja O. on commun...@lists.openmoko.org
>>
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1.4.9 (GNU/Linux)
>>
>> iD4DBQFLnJYsfZ+RkG8quy0RAqt8AJoCTvx1wPJE6Q4P7+rz8Pvi2l2HLgCYvhpa
>> SBop1pFUnyf6ODozse9kyA==
>> =c9w0
>> -END PGP SIGNATURE-
>>
>>
>
>
> --
> Martin Grotzke
> http://www.javakaffee.de/blog/
>


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Martin Grotzke
Yes, what you described is similar to the situation with the sessions not
updated in memcached as they were only read by the application issue.

Still, I think if there's enough memory for all active sessions only
sessions should be dropped that are in fact expired. For this a simplified
slab configuration (one slab for all sessions) would be helpful AFAICS.

Cheers,
Martin


On Sun, Mar 14, 2010 at 8:54 AM, Peter J. Holzer  wrote:

> On 2010-03-12 17:07:25 -0800, dormando wrote:
> > Now, it should be obvious that if a user session has reached a point
> where
> > it would be evicted early, it is because you did not have enough memory
> to
> > store *all active sessions anyway*. The odds of it evicting someone who
> > has visited your site *after* me are highly unlikely. The longer I stay
> > off the site, the higher the odds of it being evicted early due to lack
> of
> > memory.
> >
> > This does mean, by way of painfully describing how an LRU works, that the
> > odds of you finding sessions in memcached which have not been expired,
> but
> > are being evicted from the LRU earlier than expired sessions, is very
> > unlikely.
> [...]
> > The caveat is that memcached has one LRU per slab class.
> >
> > So, lets say your traffic ends up looking like:
> >
> > - For the first 10,000 sessions, they are all 200 kilobytes. This ends up
> > having memcached allocate all of its slab memory toward something that
> > will fit 200k items.
> > - You get linked from the frontpage of digg.com and suddenly you have a
> > bunch of n00bass users hitting your site. They have smaller sessions
> since
> > they are newbies. 10k items.
> > - Memcached has only reserved 1 megabyte toward 10k items. So now all of
> > your newbies share a 1 megabyte store for sessions, instead of 200
> > megabytes.
>
> There's another caveat (I think Martin may have been referring to this
> scenario, but he wasn't very clear):
>
>
> Suppose you have two kinds of entries in your memcached, with different
> expire times. For example, in addition to your sessions with 3600s, you
> have some alert box with an expiry time of 60s. By chance,
> both items are approximately the same size and occupy the same slab
> class(es).
>
> You have enough memory to keep all sessions for 3600 seconds and enough
> memory to keep all alert boxes for 60 seconds. But you don't have enough
> memory to keep all alert boxes for 3600 seconds (why should you, they
> expire
> after 60 seconds).
>
> Now, when you walk the LRU chain, the search for expired items will only
> return expired alert boxes which are about as old as your oldest session.
> As soon as there are 50 (not yet expired) sessions older than the oldest
> (expired) alert box, you will evict a session although you still have a
> lot of expired alert boxes which you could reuse.
>
> The only workaround for this problem I can see is to use different
> memcached servers for items of (wildly) different expiration times.
>
> > However the slab out of balance thing is a real fault of ours. It's a
> > project on my plate to have automated slab rebalancing done in some
> usable
> > fashion within the next several weeks. This means that if a slab is out
> of
> > memory and under pressure, memcached will decide if it can pull memory
> > from another slab class to satisfy that need. As the size of your items
> > change over time, it will thus try to compensate.
>
> That's good to hear.
>
>hp
>
> --
>   _  | Peter J. Holzer| Openmoko has already embedded
> |_|_) | Sysadmin WSR   | voting system.
> | |   | h...@hjp.at | Named "If you want it -- write it"
> __/   | http://www.hjp.at/ |  -- Ilja O. on commun...@lists.openmoko.org
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iD4DBQFLnJYsfZ+RkG8quy0RAqt8AJoCTvx1wPJE6Q4P7+rz8Pvi2l2HLgCYvhpa
> SBop1pFUnyf6ODozse9kyA==
> =c9w0
> -END PGP SIGNATURE-
>
>


-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Martin Grotzke
Ok, I'll have a look at TokyoTyrant. Do you have numbers comparing the write
performance of memcached and tt?

Cheers,
Martin


On Sun, Mar 14, 2010 at 2:20 AM, Adam Lee  wrote:

> If your goal is only to make memcached into a reliable datastore, then I
> think you are perhaps going about it in the wrong way.  The memcached server
> is extremely well written and tuned and does it's job incredibly well and
> very efficiently.  If you want to ensure that it is deterministic, I think
> that you should do your code on the client side rather than on the server
> side.
>
> We, for example, did some work in the past to store our user data (we don't
> really use sessions in the traditional sense of the word, but this probably
> the closest thing we have outside of our cookie) in memcached because the
> load on our primary database was just too high.  In order to make it
> deterministic, we wrote our own client and did a special setup.
>
> We had several servers (started with 3, ended up growing it to 5 before we
> replaced it with TokyoTyrant) that had identical configurations, such that
> each server had more than enough memory to fit the entire dataset.  We then
> wrote a client that had the following behavior:
>
> - Writes were sent to every server
> - All updates to the database had to also be written to memcached in order
> to be considered a success
> - Reads were performed on a randomly selected server
>
> We also wrote a populate-user-cache script that could fill a new server
> with the required data. Since we have about 30 million users, this job took
> quite a while, so we also built in the idea of an "is populated" flag.  This
> flag would not be set by the populate script until it was totally finished
> replicating the data.  The client code was written such that it could write
> to a server that didn't have the "is populated" flag, but would never read
> from it.  This meant that we could bring up new servers and they would be
> populated with new data, but only would be used once they were accurate (the
> populate-user-cache script only issued add commands, making sure that it
> didn't clobber any data being written by actual traffic).
>
> One of the key features of this setup was that every server had the full
> dataset-- this meant that we could build a page that needed data for, say,
> 500 users and load it with almost no more latency than needed to get the
> data for one user because of how well memcached handles multi-gets.
>
> We don't use this setup anymore because we moved to using TokyoTyrant as
> our persistent cache layer, but I will say that it worked pretty much
> flawlessly for about two years.  There was no way that our database would
> have been able to handle the read necessary read load, but these servers
> performed exceedingly well-- easily handling over 30,000+ gets per second.
>
> Anyway, I think that building something similar might do a much better job
> of performing the task you're attempting.  The key thing to recognize is
> that memcached is built to do a specific task and it's _GREAT_ at it, so you
> should use it for what it does best. Let me know if any of this doesn't make
> sense to you or if you have any further questions.
>
> --
> awl
>



-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Adam Lee
i find that people tend to do a lot of mental gymnastics to come up
with "what if?" scenarios for memcached (particularly in regard to
node failures and data integrity) and, while they're technically
possible, they very, very rarely happen in the wild. for this one,
i'll just say that memcached does a great job with its slab allocation
and, unless you're running with way too little memory, you'll not very
often see items evicted before their expiration time.

that said, memcached is a cache and should be treated as such unless
you jump through hoops to make it more deterministic (e.g. what i
described in my most recent mail to the list)

On Sunday, March 14, 2010, Peter J. Holzer  wrote:
> On 2010-03-12 17:07:25 -0800, dormando wrote:
>> Now, it should be obvious that if a user session has reached a point where
>> it would be evicted early, it is because you did not have enough memory to
>> store *all active sessions anyway*. The odds of it evicting someone who
>> has visited your site *after* me are highly unlikely. The longer I stay
>> off the site, the higher the odds of it being evicted early due to lack of
>> memory.
>>
>> This does mean, by way of painfully describing how an LRU works, that the
>> odds of you finding sessions in memcached which have not been expired, but
>> are being evicted from the LRU earlier than expired sessions, is very
>> unlikely.
> [...]
>> The caveat is that memcached has one LRU per slab class.
>>
>> So, lets say your traffic ends up looking like:
>>
>> - For the first 10,000 sessions, they are all 200 kilobytes. This ends up
>> having memcached allocate all of its slab memory toward something that
>> will fit 200k items.
>> - You get linked from the frontpage of digg.com and suddenly you have a
>> bunch of n00bass users hitting your site. They have smaller sessions since
>> they are newbies. 10k items.
>> - Memcached has only reserved 1 megabyte toward 10k items. So now all of
>> your newbies share a 1 megabyte store for sessions, instead of 200
>> megabytes.
>
> There's another caveat (I think Martin may have been referring to this
> scenario, but he wasn't very clear):
>
>
> Suppose you have two kinds of entries in your memcached, with different
> expire times. For example, in addition to your sessions with 3600s, you
> have some alert box with an expiry time of 60s. By chance,
> both items are approximately the same size and occupy the same slab
> class(es).
>
> You have enough memory to keep all sessions for 3600 seconds and enough
> memory to keep all alert boxes for 60 seconds. But you don't have enough
> memory to keep all alert boxes for 3600 seconds (why should you, they expire
> after 60 seconds).
>
> Now, when you walk the LRU chain, the search for expired items will only
> return expired alert boxes which are about as old as your oldest session.
> As soon as there are 50 (not yet expired) sessions older than the oldest
> (expired) alert box, you will evict a session although you still have a
> lot of expired alert boxes which you could reuse.
>
> The only workaround for this problem I can see is to use different
> memcached servers for items of (wildly) different expiration times.
>
>> However the slab out of balance thing is a real fault of ours. It's a
>> project on my plate to have automated slab rebalancing done in some usable
>> fashion within the next several weeks. This means that if a slab is out of
>> memory and under pressure, memcached will decide if it can pull memory
>> from another slab class to satisfy that need. As the size of your items
>> change over time, it will thus try to compensate.
>
> That's good to hear.
>
>         hp
>
> --
>    _  | Peter J. Holzer    | Openmoko has already embedded
> |_|_) | Sysadmin WSR       | voting system.
> | |   | h...@hjp.at         | Named "If you want it -- write it"
> __/   | http://www.hjp.at/ |  -- Ilja O. on commun...@lists.openmoko.org
>

-- 
awl


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-14 Thread Martin Grotzke
On Sun, Mar 14, 2010 at 1:10 AM, dormando  wrote:
>
> > That's right for the normal case. However, for the
> memcached-session-manager I just implemented a feature so that sessions are
> only sent to memcached for backup if session data was modified. To prevent
> > expiration of the session in memcached, a background thread is updating
> sessions in memcached that would expire ~20 seconds later in memcached (if
> there would be some kind of "touch" operation I'd just use
> > that one). When they are updated, they are set with an expiration of the
> remaining expiration time in tomcat. This would introduce the issue, that
> the update would push them to the head of the LRU, but
> > their expiration might be e.g. only 5 minutes. So they would be expired
> but won't be reached when the 50 items are checked for expiration.
>
> Are sessions modified on every hit? or just fetched on every hit?

Nope, basically it's like this: on every request (with an associated
session) the session is served from local memory (the jvm local session
map). When the request is finished, the session is sent to memcached (with
the session timeout as expiration, e.g. 3600). It's still held in the local
session map, memcached is just there as a kind of backup device. The session
is only pulled from memcached, if a tomcat originally serving this session
(say tomcat1) died and therefore another tomcat (tomcat2) is asked to serve
this session. tomcat2 then does not have a session for the requested
sessionId and therefore loads this session from memcached.

The case that I was describing before (sending only modified sessions to
memcached) is a feature for performance optimization: the asumption is, that
there are more requests just accessing the session but less requests that
actually modify the session. So the idea is that I don't have to update a
session in memcached that was accessed but not modified. The issue that had
to be handled then was only the case of the different timeouts/expiration
times:
- a session was first created/updated and stored in memcached: both in
tomcat and memcached it has an expiration of 1h
- this session is accessed 10 minutes later; as it was not modified it is
not updated in memcached. Then this session has an expiration of 1 hour
again in tomcat, but in memcached it's already 10 minutes old, so it would
expire 50 minutes later.

To prevent this premature expiration I used the mentioned background thread
to update the expiration of such items in memcached to the remaining
expiration time they have in tomcat. In the example above nearly 50 minutes
later the session would be updated with an expiration time of 10 minutes.



> I don't
> see how this would be different since active sessions are still getting
> put to the front of the LRU.

This would be true if the application would read them from memcached - which
is not the case here.


> When you modify a session isn't the
> expiration time of that session extended usually?

Yes, when accessed a session gets another hour on this earth (or what else
the session timeout is).


> Honestly, if you have a background thread that can already find sessions
> when they're about to expire, why not issue a DELETE to memcached when
> they do expire? Or even a GET after they expire :)
>
That's already done: sessions that are expiring in tomcat are deleted from
memcached

The issue that I described regarding sessions, that were only accessed by
the application but not modified and therefore were not updated in memached
was the following: when such a session (session A) is updated in memcached
(just before it would expire in memcached) with a new expiration time of say
then 10 minutes (the time that it has left in tomcat), it will be pushed to
the head of the LRU.
Another session (session B) might have been modified just 20 minutes before
and sent to memcached with an expiration of 60 minutes, this one will be
closer to the tail of the LRU than session A, even if session B will still
have 40 minutes to live - 30 minutes more than session A. And because it's
closer to the tail session B might be dropped before session A, even if
session A would already be expired.

However, this would only be an issue if there are too many sessions for the
available memory of a slab.

[...]

> > In my case this is not that much an issue (users won't get logged out),
> as sessions are served from local memory. Session are only in memcached for
> the purpose of session failover. So this restart could be
> > done when operations could be *sure* that no tomcat will die.
>
> Tomcat sounds like such a pisser :P Even with your backup thing I'd
> probably still add an option to allow it to journal to a database, and I
> say this knowing how to get every last ounce of efficiency out of
> memcached.
>
Tomcat provides a PersistentManager ([1]) which allows to store sessions in
the database. But this manager backups all sessions in batches every 10
seconds. For one thing scalability of the application

Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-13 Thread Peter J. Holzer
On 2010-03-12 17:07:25 -0800, dormando wrote:
> Now, it should be obvious that if a user session has reached a point where
> it would be evicted early, it is because you did not have enough memory to
> store *all active sessions anyway*. The odds of it evicting someone who
> has visited your site *after* me are highly unlikely. The longer I stay
> off the site, the higher the odds of it being evicted early due to lack of
> memory.
> 
> This does mean, by way of painfully describing how an LRU works, that the
> odds of you finding sessions in memcached which have not been expired, but
> are being evicted from the LRU earlier than expired sessions, is very
> unlikely.
[...]
> The caveat is that memcached has one LRU per slab class.
> 
> So, lets say your traffic ends up looking like:
> 
> - For the first 10,000 sessions, they are all 200 kilobytes. This ends up
> having memcached allocate all of its slab memory toward something that
> will fit 200k items.
> - You get linked from the frontpage of digg.com and suddenly you have a
> bunch of n00bass users hitting your site. They have smaller sessions since
> they are newbies. 10k items.
> - Memcached has only reserved 1 megabyte toward 10k items. So now all of
> your newbies share a 1 megabyte store for sessions, instead of 200
> megabytes.

There's another caveat (I think Martin may have been referring to this
scenario, but he wasn't very clear):


Suppose you have two kinds of entries in your memcached, with different
expire times. For example, in addition to your sessions with 3600s, you
have some alert box with an expiry time of 60s. By chance,
both items are approximately the same size and occupy the same slab
class(es).

You have enough memory to keep all sessions for 3600 seconds and enough
memory to keep all alert boxes for 60 seconds. But you don't have enough
memory to keep all alert boxes for 3600 seconds (why should you, they expire
after 60 seconds). 

Now, when you walk the LRU chain, the search for expired items will only
return expired alert boxes which are about as old as your oldest session.
As soon as there are 50 (not yet expired) sessions older than the oldest
(expired) alert box, you will evict a session although you still have a
lot of expired alert boxes which you could reuse.

The only workaround for this problem I can see is to use different
memcached servers for items of (wildly) different expiration times.

> However the slab out of balance thing is a real fault of ours. It's a
> project on my plate to have automated slab rebalancing done in some usable
> fashion within the next several weeks. This means that if a slab is out of
> memory and under pressure, memcached will decide if it can pull memory
> from another slab class to satisfy that need. As the size of your items
> change over time, it will thus try to compensate.

That's good to hear.

hp

-- 
   _  | Peter J. Holzer| Openmoko has already embedded
|_|_) | Sysadmin WSR   | voting system.
| |   | h...@hjp.at | Named "If you want it -- write it"
__/   | http://www.hjp.at/ |  -- Ilja O. on commun...@lists.openmoko.org


signature.asc
Description: Digital signature


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-13 Thread Adam Lee
If your goal is only to make memcached into a reliable datastore, then I
think you are perhaps going about it in the wrong way.  The memcached server
is extremely well written and tuned and does it's job incredibly well and
very efficiently.  If you want to ensure that it is deterministic, I think
that you should do your code on the client side rather than on the server
side.

We, for example, did some work in the past to store our user data (we don't
really use sessions in the traditional sense of the word, but this probably
the closest thing we have outside of our cookie) in memcached because the
load on our primary database was just too high.  In order to make it
deterministic, we wrote our own client and did a special setup.

We had several servers (started with 3, ended up growing it to 5 before we
replaced it with TokyoTyrant) that had identical configurations, such that
each server had more than enough memory to fit the entire dataset.  We then
wrote a client that had the following behavior:

- Writes were sent to every server
- All updates to the database had to also be written to memcached in order
to be considered a success
- Reads were performed on a randomly selected server

We also wrote a populate-user-cache script that could fill a new server with
the required data. Since we have about 30 million users, this job took quite
a while, so we also built in the idea of an "is populated" flag.  This flag
would not be set by the populate script until it was totally finished
replicating the data.  The client code was written such that it could write
to a server that didn't have the "is populated" flag, but would never read
from it.  This meant that we could bring up new servers and they would be
populated with new data, but only would be used once they were accurate (the
populate-user-cache script only issued add commands, making sure that it
didn't clobber any data being written by actual traffic).

One of the key features of this setup was that every server had the full
dataset-- this meant that we could build a page that needed data for, say,
500 users and load it with almost no more latency than needed to get the
data for one user because of how well memcached handles multi-gets.

We don't use this setup anymore because we moved to using TokyoTyrant as our
persistent cache layer, but I will say that it worked pretty much flawlessly
for about two years.  There was no way that our database would have been
able to handle the read necessary read load, but these servers performed
exceedingly well-- easily handling over 30,000+ gets per second.

Anyway, I think that building something similar might do a much better job
of performing the task you're attempting.  The key thing to recognize is
that memcached is built to do a specific task and it's _GREAT_ at it, so you
should use it for what it does best. Let me know if any of this doesn't make
sense to you or if you have any further questions.

-- 
awl


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-13 Thread dormando
> Cool. Would it be possible to make this number configurable via a cmd line 
> switch?

You really don't want to mess with this value. It will bring you the
absolute opposite results of what you expect. Memcached does this search
while holding a global mutex lock, so no other threads are able to access
the allocator at the same time. It must be very fast, and most people will
just crank it and grind the system to a halt, so I'm unsure if we will
make this configurable.

The number's just in a couple places in items.c if you wish to recompile
it and test, but again that's absolutely not a good idea if you want it to
perform.
  
> That's right for the normal case. However, for the memcached-session-manager 
> I just implemented a feature so that sessions are only sent to memcached for 
> backup if session data was modified. To prevent
> expiration of the session in memcached, a background thread is updating 
> sessions in memcached that would expire ~20 seconds later in memcached (if 
> there would be some kind of "touch" operation I'd just use
> that one). When they are updated, they are set with an expiration of the 
> remaining expiration time in tomcat. This would introduce the issue, that the 
> update would push them to the head of the LRU, but
> their expiration might be e.g. only 5 minutes. So they would be expired but 
> won't be reached when the 50 items are checked for expiration.

Are sessions modified on every hit? or just fetched on every hit? I don't
see how this would be different since active sessions are still getting
put to the front of the LRU. When you modify a session isn't the
expiration time of that session extended usually? Or do users get logged
out after they've run out of their 5 free hours of AOL?

Honestly, if you have a background thread that can already find sessions
when they're about to expire, why not issue a DELETE to memcached when
they do expire? Or even a GET after they expire :)

> That's true, this is the drawback that I'd have to accept.
>
> Do you see other disadvantages with this approach (e.g. performance wise)?

Shouldn't be any particular performance disadvantage, other than serverely
reducing the effectiveness of your cache and caching fewer items overall
:P

> Yes, for this I (the memcached-session-manager) should provide some stats on 
> size distribution.
> Or does memcached already provide this information via its stats?

`stats items` and `stats slabs` provide many statistics on individual
slabs. You can monitor this to see if a particular slab size is having a
higher number of evictions and doesn't have enough slabs assigned to it.
As well as how many items are in each slab and all that junk.
  
> In my case this is not that much an issue (users won't get logged out), as 
> sessions are served from local memory. Session are only in memcached for the 
> purpose of session failover. So this restart could be
> done when operations could be *sure* that no tomcat will die.

Tomcat sounds like such a pisser :P Even with your backup thing I'd
probably still add an option to allow it to journal to a database, and I
say this knowing how to get every last ounce of efficiency out of
memcached.

-Dormando


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-13 Thread Martin Grotzke
Hi Carlos,

thanx for your answer!

I'm already using this option (-M - return error on memory exhausted (rather
than removing items)), it's working fine.

Cheers,
Martin


On Sat, Mar 13, 2010 at 3:03 PM, Carlos Alvarez  wrote:

> On Fri, Mar 12, 2010 at 8:56 PM, Martin Grotzke
>  wrote:
> > Hi Brian,
> > you're making a very clear point. However it would be nice if you'd
> provide
> > concrete answers to concrete questions. I want to get a better
> understanding
> > of memcached's memory model and I'm thankful for any help I'm getting
> here
> > on this list. If my intro was not supporting this please forgive me...
> > Cheers,
> > Martin
>
> Well, you asked "how do I ..." and the answer was "you can't". It
> sounds quite concrete to me. :-)
>
> Anyway, if you want to try (you'll face risk and your solution will be
> error prone, don't forget that) I remember lurking around the code and
> seeing an option of 'no evictions': ie when there is not enough memory
> the set/add fails. I don't know if it this option is fully functional,
> but the code is there.
>
>if (settings.evict_to_free == 0) {
>itemstats[id].outofmemory++;
>return NULL;
>}
>
> In this case, when you run out of memory to store sessions, you'll
> notice in the 'overflow' session and not in a older one (I would
> prefer that).
>
> Anyway, remember that evictions is not the only cause of not being
> able to retrieve previously stored items.
>
>
>
> Carlos.
>



-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-13 Thread Martin Grotzke
Hi Dormando,

thanx for this great explanation!

On Sat, Mar 13, 2010 at 2:07 AM, dormando  wrote:
>
> - On read, if a key is past its expiry time, return its memory to the slab
> pool and return NOT FOUND
> - On write, try to allocate new memory:
>  * From the slab's pool of free memory.
>  * ... and if the slab is full, add another megabyte from the main pool
>  * ... if the main pool is fully assigned out to slab classes, find
>something to evict.
>  * for that slab class, look at the LRU tail (oldest items in the cache)
>  * walk up the tail looking for *items that are already expired*
>  * after 50 items have been examined, if no expired items are found,
>
Cool. Would it be possible to make this number configurable via a cmd line
switch?


>  * walk the list again, and *expire* the first thing you find
> (I'm leaving out refcount locks for sake of this discussion)
>

[...]


> This does mean, by way of painfully describing how an LRU works, that the
> odds of you finding sessions in memcached which have not been expired, but
> are being evicted from the LRU earlier than expired sessions, is very
> unlikely.
>
That's right for the normal case. However, for the memcached-session-manager
I just implemented a feature so that sessions are only sent to memcached for
backup if session data was modified. To prevent expiration of the session in
memcached, a background thread is updating sessions in memcached that would
expire ~20 seconds later in memcached (if there would be some kind of
"touch" operation I'd just use that one). When they are updated, they are
set with an expiration of the remaining expiration time in tomcat. This
would introduce the issue, that the update would push them to the head of
the LRU, but their expiration might be e.g. only 5 minutes. So they would be
expired but won't be reached when the 50 items are checked for expiration.

[...]

> So, lets say your traffic ends up looking like:
>
> - For the first 10,000 sessions, they are all 200 kilobytes. This ends up
> having memcached allocate all of its slab memory toward something that
> will fit 200k items.
> - You get linked from the frontpage of digg.com and suddenly you have a
> bunch of n00bass users hitting your site. They have smaller sessions since
> they are newbies. 10k items.
> - Memcached has only reserved 1 megabyte toward 10k items. So now all of
> your newbies share a 1 megabyte store for sessions, instead of 200
> megabytes.
>
> If you set the minimum slab size to 200k, you unify the memory so the
> largest pool of memory is always available for your users. However, you
> drastically raise the memory overhead for users with 10k sessions. 90%+
> overhead.

That's true, this is the drawback that I'd have to accept.

Do you see other disadvantages with this approach (e.g. performance wise)?



> In reality when you start a memcached instance and throw traffic
> at it, sessions will either be very close together in size, or vary by
> some usable distribution.
>
Yes, for this I (the memcached-session-manager) should provide some stats on
size distribution.
Or does memcached already provide this information via its stats?


> If it ends up changing over time, the only present workaround is to
> restart memcached. This sucks since it'll end up logging your users out.
>
In my case this is not that much an issue (users won't get logged out), as
sessions are served from local memory. Session are only in memcached for the
purpose of session failover. So this restart could be done when operations
could be *sure* that no tomcat will die.

[...]

> However the slab out of balance thing is a real fault of ours. It's a
> project on my plate to have automated slab rebalancing done in some usable
> fashion within the next several weeks. This means that if a slab is out of
> memory and under pressure, memcached will decide if it can pull memory
> from another slab class to satisfy that need. As the size of your items
> change over time, it will thus try to compensate.
>
Yeah, great!

Cheers,
Martin


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-13 Thread Carlos Alvarez
On Fri, Mar 12, 2010 at 8:56 PM, Martin Grotzke
 wrote:
> Hi Brian,
> you're making a very clear point. However it would be nice if you'd provide
> concrete answers to concrete questions. I want to get a better understanding
> of memcached's memory model and I'm thankful for any help I'm getting here
> on this list. If my intro was not supporting this please forgive me...
> Cheers,
> Martin

Well, you asked "how do I ..." and the answer was "you can't". It
sounds quite concrete to me. :-)

Anyway, if you want to try (you'll face risk and your solution will be
error prone, don't forget that) I remember lurking around the code and
seeing an option of 'no evictions': ie when there is not enough memory
the set/add fails. I don't know if it this option is fully functional,
but the code is there.

if (settings.evict_to_free == 0) {
itemstats[id].outofmemory++;
return NULL;
}

In this case, when you run out of memory to store sessions, you'll
notice in the 'overflow' session and not in a older one (I would
prefer that).

Anyway, remember that evictions is not the only cause of not being
able to retrieve previously stored items.



Carlos.


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Dustin

On Mar 12, 6:10 pm, Les Mikesell  wrote:
> So what happens when a key is repeatedly written and it grows a bit each time?
> I had trouble with that long ago with a berkeleydb version that I think was
> eventually fixed.  As things work now, if the new storage has to move to a
> larger block, is the old space immediately freed?

  Every write is to a new block, then the previous is made available
for the next write.  Size isn't relevant for this case.


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Les Mikesell

dormando wrote:



To be most accurate, it is "how many chunks will fit into the max item
size, which by default is 1mb". The page size being == to the max item
size is just due to how the slabbing algorithm works. It creates slab
classes between a minimum and a maximum. So the maximum ends up being the
item size limit.

I can see this changing in the future, where we have a "max item size" of
whatever, and a "page size" of 1mb, then larger items are made up of
concatenated smaller pages or individual malloc's.


So what happens when a key is repeatedly written and it grows a bit each time? 
I had trouble with that long ago with a berkeleydb version that I think was 
eventually fixed.  As things work now, if the new storage has to move to a 
larger block, is the old space immediately freed?


--
  Les Mikesell
   lesmikes...@gmail.com





Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread dormando
> The memory allocation is a bit more subtle... but it's hard to explain and
> doesn't really affect anyone.
>
> Urr... I'll give it a shot.
>
> ./memcached -m 128
> ^ means memcached can use up to 128 megabytes of memory for item storage.
>
> Now lets say you store items that will fit in a slab class of "128", which
> means 128 bytes for the key + flags + CAS + value.
>
> The maximum item size is (by default) 1 megabyte. This ends up being the
> ceiling for how big a slab "page" can be.
>
> 8192 128 byte items will fit inside the slab "page" limit of 1mb.
>
> So now a slab page of 1mb is allocated, and split up into 8192 "chunks".
> Each chunk can store a single item.
>
> Slabs grow at a default factor of 1.20 or whatever -f is. So the next slab
> class after 128 bytes will be 154 bytes (rounding up). (note I don't
> actually recall offhand if it rounds up or down :P)
>
> 154 bytes is not evenly divisible into 1048576. You end up with
> 6808.93 chunks. So instead memcached allocates a page of 1048432 bytes,
> providing 6,808 chunks.
>
> This is slightly smaller than 1mb! So as your chunks grow, they
> don't allocate exactly a megabyte from the main pool of "128 megabytes",
> then split that into chunks. Memcached attempts to leave the little scraps
> of memory in the main pool in hopes that they'll add up to an extra page
> down the line, rather than be thrown out as overhead when if a slab class
> were to allocate a 1mb page.
>
> So in memcached, a slab "page" is "however many chunks of this size will
> fit into 1mb", a chunk is "how many chunks will fit into that page". The
> slab growth factor determines how many slab classes exist.
>
> I'm gonna turn this into a wiki entry in a few days... been slowly
> whittling away at revising the whole thing.

To be most accurate, it is "how many chunks will fit into the max item
size, which by default is 1mb". The page size being == to the max item
size is just due to how the slabbing algorithm works. It creates slab
classes between a minimum and a maximum. So the maximum ends up being the
item size limit.

I can see this changing in the future, where we have a "max item size" of
whatever, and a "page size" of 1mb, then larger items are made up of
concatenated smaller pages or individual malloc's.


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread dormando
> Some details here:
> http://dev.mysql.com/doc/refman/5.0/en/ha-memcached-using-memory.html
>
> thanx for this link. Some details in the text are confusing to me. It says:
>
>   When you start to store data into the cache, memcached does not 
> allocate the memory for the data on an item by item basis. Instead, a slab 
> allocation is used to optimize memory usage and
>   prevent memory fragmentation when information expires from the cache.
>
>
>   With slab allocation, memory is reserved in blocks of 1MB. The slab is 
> divided up into a number of blocks of equal size.
>
> Ok, blocks of equal size, all blocks have 1 MB (as said in the sentence 
> before).
>
>   When you try to store a value into the cache, memcached checks the size 
> of the value that you are adding to the cache and determines which slab 
> contains the right size allocation for the item.
>   If a slab with the item size already exists, the item is written to the 
> block within the slab.
>
> "written to the block within the slab" sounds, as if there's one block for 
> one slab?
>  
>
>
>   If the new item is bigger than the size of any existing blocks,
>
> I thought all blocks are 1MB in their size? Should this be "of any existing 
> slab"?
>  
>   then a new slab is created, divided up into blocks of a suitable size.
>
> Again, I thought blocks are 1MB, then what is a suitable size here?
>  
>   If an existing slab with the right block size already exists,
>
> Confusing again.
>  
>   but there are no free blocks, a new slab is created.
>
>
> In the second part of this documentation, the terms "page" and "chunk" are 
> used, but not related to "block". "block" is not used at all in the second 
> part. Can you clarify the meaning of block in this
> context and create a link to slab, page and chunk?
>
> Btw, I found "Slabs, Pages, Chunks and Memcached" ([1]) really well written 
> and easy to understand. Would say this explanation is complete?

The memory allocation is a bit more subtle... but it's hard to explain and
doesn't really affect anyone.

Urr... I'll give it a shot.

./memcached -m 128
^ means memcached can use up to 128 megabytes of memory for item storage.

Now lets say you store items that will fit in a slab class of "128", which
means 128 bytes for the key + flags + CAS + value.

The maximum item size is (by default) 1 megabyte. This ends up being the
ceiling for how big a slab "page" can be.

8192 128 byte items will fit inside the slab "page" limit of 1mb.

So now a slab page of 1mb is allocated, and split up into 8192 "chunks".
Each chunk can store a single item.

Slabs grow at a default factor of 1.20 or whatever -f is. So the next slab
class after 128 bytes will be 154 bytes (rounding up). (note I don't
actually recall offhand if it rounds up or down :P)

154 bytes is not evenly divisible into 1048576. You end up with
6808.93 chunks. So instead memcached allocates a page of 1048432 bytes,
providing 6,808 chunks.

This is slightly smaller than 1mb! So as your chunks grow, they
don't allocate exactly a megabyte from the main pool of "128 megabytes",
then split that into chunks. Memcached attempts to leave the little scraps
of memory in the main pool in hopes that they'll add up to an extra page
down the line, rather than be thrown out as overhead when if a slab class
were to allocate a 1mb page.

So in memcached, a slab "page" is "however many chunks of this size will
fit into 1mb", a chunk is "how many chunks will fit into that page". The
slab growth factor determines how many slab classes exist.

I'm gonna turn this into a wiki entry in a few days... been slowly
whittling away at revising the whole thing.

-Dormando


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Martin Grotzke
I'm trying to get a better understanding of how memcached works. I'm
starting with a simple example here to see how this could be handled. The
numbers are just taken for example, if I didn't mention this already. When
this simple example is ok, more complexity is added and numbers change.
Still, I want to start simple and bottom up instead of with a full blown
you-can't-handle-this-at-all-example. And also I don't expect mathematically
exact numbers but I just want to get an idea from this.

Back to the topic: I created the memcached-session-manager (a tomcat session
manager keeping sessions in memcached just for backup / session failover,
they're still stored in memory for normal operations; see [1]) and I want to
find out how memcached should be "tuned" to provide "best" results. Of
course this will be application specific and require some knowledge of
memcached. Still I want to be able to give advices. And in the end it would
be totally ok for me if the result would be s.th. like "use the
max-possible-session-size as min slab size, multiply this with the
max-number-of-sessions and this gives you 10% of the memory you need to give
memcached".

Thanx for your help,
cheers,
Martin


[1] http://code.google.com/p/memcached-session-manager/



On Sat, Mar 13, 2010 at 1:07 AM, Marc Bollinger wrote:

> Part of the disconnect is that, "how do I have to run memcached to
> 'store' these sessions in memcached," is not a concrete question. It's
> wibbly wobbly at best to try and achieve this behavior, and, "You
> can't," _is_ concrete in that there is no way to do this in a
> mathematically provable way. The best you're going to get is something
> that anecdotally works, contingent on the existence of roughly
> homogeneous object sizes, and that you're allocating enough memory to
> memcached. The scheme you described above (maybe tweak the growth
> factor downward to taste?) will probably work, but that assumes a
> limit of 1000 users, which is unrealistic for the scale that memcached
> was really designed for in the first place.
>
> - Marc
>
> On Fri, Mar 12, 2010 at 3:56 PM, Martin Grotzke
>  wrote:
> > Hi Brian,
> > you're making a very clear point. However it would be nice if you'd
> provide
> > concrete answers to concrete questions. I want to get a better
> understanding
> > of memcached's memory model and I'm thankful for any help I'm getting
> here
> > on this list. If my intro was not supporting this please forgive me...
> > Cheers,
> > Martin
> >
> > On Sat, Mar 13, 2010 at 12:27 AM, Brian Moon  wrote:
> >>
> >> The resounding answer you will get from this list is: You don't, can't
> and
> >> won't with memcached. That is not its job. It will never be its job.
> Perhaps
> >> when storage engines are done, maybe. But then you won't get the
> performance
> >> that you get with memcached. There is a trade off for performance.
> >>
> >> Brian.
> >> 
> >> http://brian.moonspot.net/
> >>
> >> On 3/12/10 3:02 PM, martin.grotzke wrote:
> >>>
> >>> Hi,
> >>>
> >>> I know that this topic is rather burdened, as it was said often enough
> >>> that memcached never was created to be used like a reliable datastore.
> >>> Still, there are users interested in some kind of reliability, users
> >>> that want to store items in memcached and be "sure" that these items
> >>> can be pulled from memcached as long as they are not expired.
> >>>
> >>> I read the following on memcached's memory management:
> >>>   "Memcached has two separate memory management strategies:
> >>> - On read, if a key is past its expiry time, return NOT FOUND.
> >>> - On write, choose an appropriate slab class for the value; if it's
> >>> full, replace the oldest-used (read or written) key with the new one.
> >>> Note that the second strategy, LRU eviction, does not check the expiry
> >>> time at all." (from "peeping into memcached", [1])
> >>>
> >>> I also found "Slabs, Pages, Chunks and Memcached" ([2]) a really good
> >>> explanation of memcached's memory model.
> >>>
> >>> Having this as background, I wonder how it would be possible to get
> >>> more predictability regarding the availability of cached items.
> >>>
> >>> Asume that I want to store sessions in memcached. How could I run
> >>> memcached so that I can be sure that my sessions are available in
> >>> memcached when I try to "get" them? Additionally asume, that I expect
> >>> to have 1000 sessions at a time in max in one memcached node (and that
> >>> I can control/limit this in my application). Another asumption is,
> >>> that sessions are between 50kb and 200 kb.
> >>>
> >>> The question now is how do I have to run memcached to "store" these
> >>> sessions in memcached?
> >>>
> >>> Would it be an option to run memcached with a minimum slab size of
> >>> 200kb. Then I would know that for each session a 200kb chunk is used.
> >>> When I have 1000 session between 50kb and 200kb this should take 200mb
> >>> in total. When I run memcached with more than 200mb memory, could I be
> >>> s

Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Martin Grotzke
Hi Les,

On Sat, Mar 13, 2010 at 12:29 AM, Les Mikesell wrote:

> On 3/12/2010 5:10 PM, Martin Grotzke wrote:
>
>> With my question "how do I have to run memcached to 'store' these
>> sessions in memcached" I was not referring to a general approach, but I
>> was referring to the concrete memcached options (e.g. -n 204800 for
>> 200kb slabs) to use.
>>
>> The post you mentioned is very high level and does not answer my
>> question. For this you should go into a little more depth.
>>
>
> Some details here:
> http://dev.mysql.com/doc/refman/5.0/en/ha-memcached-using-memory.html

thanx for this link. Some details in the text are confusing to me. It says:

When you start to store data into the cache, memcached does not allocate the
> memory for the data on an item by item basis. Instead, a slab allocation is
> used to optimize memory usage and prevent memory fragmentation when
> information expires from the cache.


> With slab allocation, memory is reserved in blocks of 1MB. The slab is
> divided up into a number of blocks of equal size.

Ok, blocks of equal size, all blocks have 1 MB (as said in the sentence
before).

When you try to store a value into the cache, memcached checks the size of
> the value that you are adding to the cache and determines which slab
> contains the right size allocation for the item. If a slab with the item
> size already exists, the item is written to the block within the slab.

"written to the block within the slab" sounds, as if there's one block for
one slab?


>
> If the new item is bigger than the size of any existing blocks,

I thought all blocks are 1MB in their size? Should this be "of any existing
slab"?


> then a new slab is created, divided up into blocks of a suitable size.

Again, I thought blocks are 1MB, then what is a suitable size here?


> If an existing slab with the right block size already exists,

Confusing again.


> but there are no free blocks, a new slab is created.


In the second part of this documentation, the terms "page" and "chunk" are
used, but not related to "block". "block" is not used at all in the second
part. Can you clarify the meaning of block in this context and create a link
to slab, page and chunk?

Btw, I found "Slabs, Pages, Chunks and Memcached" ([1]) really well written
and easy to understand. Would say this explanation is complete?



>
> But it's still not very clear how much extra you need to make sure that the
> hash to a server/slab will always find free space instead of evicting
> something even though space is available elsewhere.
>
What are you meaning with "hash to a server/slab"?
I'm selecting the memcached node an item goes to manually btw, without
hashing.
The selected memcached node is stored in a cookie, so that I know where to
get my session from.
Btw: my sessions are stored only for backup in memcached, they're still kept
in local memory for normal operations (see [2] for more details).

Regarding the extra space: asume we have a minimum space allocated for
key+value+flags of 1mb.
Then I have only a single slab class and each chunk is going to take 1mb. So
I'd know that I could "store"  / 1mb items (item here is
key+value+flags) in memcached (e.g. with 1gb, I could store 1000 items).
Is there s.th. missing?

Thanx && cheers,
Martin


[1] http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/
[2] http://code.google.com/p/memcached-session-manager/


>
> --
>  Les Mikesell
>   lesmikes...@gmail.com
>



-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Marc Bollinger
Part of the disconnect is that, "how do I have to run memcached to
'store' these sessions in memcached," is not a concrete question. It's
wibbly wobbly at best to try and achieve this behavior, and, "You
can't," _is_ concrete in that there is no way to do this in a
mathematically provable way. The best you're going to get is something
that anecdotally works, contingent on the existence of roughly
homogeneous object sizes, and that you're allocating enough memory to
memcached. The scheme you described above (maybe tweak the growth
factor downward to taste?) will probably work, but that assumes a
limit of 1000 users, which is unrealistic for the scale that memcached
was really designed for in the first place.

- Marc

On Fri, Mar 12, 2010 at 3:56 PM, Martin Grotzke
 wrote:
> Hi Brian,
> you're making a very clear point. However it would be nice if you'd provide
> concrete answers to concrete questions. I want to get a better understanding
> of memcached's memory model and I'm thankful for any help I'm getting here
> on this list. If my intro was not supporting this please forgive me...
> Cheers,
> Martin
>
> On Sat, Mar 13, 2010 at 12:27 AM, Brian Moon  wrote:
>>
>> The resounding answer you will get from this list is: You don't, can't and
>> won't with memcached. That is not its job. It will never be its job. Perhaps
>> when storage engines are done, maybe. But then you won't get the performance
>> that you get with memcached. There is a trade off for performance.
>>
>> Brian.
>> 
>> http://brian.moonspot.net/
>>
>> On 3/12/10 3:02 PM, martin.grotzke wrote:
>>>
>>> Hi,
>>>
>>> I know that this topic is rather burdened, as it was said often enough
>>> that memcached never was created to be used like a reliable datastore.
>>> Still, there are users interested in some kind of reliability, users
>>> that want to store items in memcached and be "sure" that these items
>>> can be pulled from memcached as long as they are not expired.
>>>
>>> I read the following on memcached's memory management:
>>>   "Memcached has two separate memory management strategies:
>>> - On read, if a key is past its expiry time, return NOT FOUND.
>>> - On write, choose an appropriate slab class for the value; if it's
>>> full, replace the oldest-used (read or written) key with the new one.
>>> Note that the second strategy, LRU eviction, does not check the expiry
>>> time at all." (from "peeping into memcached", [1])
>>>
>>> I also found "Slabs, Pages, Chunks and Memcached" ([2]) a really good
>>> explanation of memcached's memory model.
>>>
>>> Having this as background, I wonder how it would be possible to get
>>> more predictability regarding the availability of cached items.
>>>
>>> Asume that I want to store sessions in memcached. How could I run
>>> memcached so that I can be sure that my sessions are available in
>>> memcached when I try to "get" them? Additionally asume, that I expect
>>> to have 1000 sessions at a time in max in one memcached node (and that
>>> I can control/limit this in my application). Another asumption is,
>>> that sessions are between 50kb and 200 kb.
>>>
>>> The question now is how do I have to run memcached to "store" these
>>> sessions in memcached?
>>>
>>> Would it be an option to run memcached with a minimum slab size of
>>> 200kb. Then I would know that for each session a 200kb chunk is used.
>>> When I have 1000 session between 50kb and 200kb this should take 200mb
>>> in total. When I run memcached with more than 200mb memory, could I be
>>> sure, that the sessions are alive as long as they are not expired?
>>>
>>> What do you think about this?
>>>
>>> Cheers,
>>> Martin
>>>
>>>
>>> [1]
>>> http://blog.evanweaver.com/articles/2009/04/20/peeping-into-memcached/
>>> [2]
>>> http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/
>
>
>
> --
> Martin Grotzke
> http://www.javakaffee.de/blog/
>


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Martin Grotzke
Hi Brian,

you're making a very clear point. However it would be nice if you'd provide
concrete answers to concrete questions. I want to get a better understanding
of memcached's memory model and I'm thankful for any help I'm getting here
on this list. If my intro was not supporting this please forgive me...

Cheers,
Martin


On Sat, Mar 13, 2010 at 12:27 AM, Brian Moon  wrote:

> The resounding answer you will get from this list is: You don't, can't and
> won't with memcached. That is not its job. It will never be its job. Perhaps
> when storage engines are done, maybe. But then you won't get the performance
> that you get with memcached. There is a trade off for performance.
>
> Brian.
> 
> http://brian.moonspot.net/
>
>
> On 3/12/10 3:02 PM, martin.grotzke wrote:
>
>> Hi,
>>
>> I know that this topic is rather burdened, as it was said often enough
>> that memcached never was created to be used like a reliable datastore.
>> Still, there are users interested in some kind of reliability, users
>> that want to store items in memcached and be "sure" that these items
>> can be pulled from memcached as long as they are not expired.
>>
>> I read the following on memcached's memory management:
>>   "Memcached has two separate memory management strategies:
>> - On read, if a key is past its expiry time, return NOT FOUND.
>> - On write, choose an appropriate slab class for the value; if it's
>> full, replace the oldest-used (read or written) key with the new one.
>> Note that the second strategy, LRU eviction, does not check the expiry
>> time at all." (from "peeping into memcached", [1])
>>
>> I also found "Slabs, Pages, Chunks and Memcached" ([2]) a really good
>> explanation of memcached's memory model.
>>
>> Having this as background, I wonder how it would be possible to get
>> more predictability regarding the availability of cached items.
>>
>> Asume that I want to store sessions in memcached. How could I run
>> memcached so that I can be sure that my sessions are available in
>> memcached when I try to "get" them? Additionally asume, that I expect
>> to have 1000 sessions at a time in max in one memcached node (and that
>> I can control/limit this in my application). Another asumption is,
>> that sessions are between 50kb and 200 kb.
>>
>> The question now is how do I have to run memcached to "store" these
>> sessions in memcached?
>>
>> Would it be an option to run memcached with a minimum slab size of
>> 200kb. Then I would know that for each session a 200kb chunk is used.
>> When I have 1000 session between 50kb and 200kb this should take 200mb
>> in total. When I run memcached with more than 200mb memory, could I be
>> sure, that the sessions are alive as long as they are not expired?
>>
>> What do you think about this?
>>
>> Cheers,
>> Martin
>>
>>
>> [1]
>> http://blog.evanweaver.com/articles/2009/04/20/peeping-into-memcached/
>> [2]
>> http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/
>>
>


-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Les Mikesell

On 3/12/2010 5:10 PM, Martin Grotzke wrote:

With my question "how do I have to run memcached to 'store' these
sessions in memcached" I was not referring to a general approach, but I
was referring to the concrete memcached options (e.g. -n 204800 for
200kb slabs) to use.

The post you mentioned is very high level and does not answer my
question. For this you should go into a little more depth.


Some details here:
http://dev.mysql.com/doc/refman/5.0/en/ha-memcached-using-memory.html
But it's still not very clear how much extra you need to make sure that 
the hash to a server/slab will always find free space instead of 
evicting something even though space is available elsewhere.


--
  Les Mikesell
   lesmikes...@gmail.com


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Brian Moon
The resounding answer you will get from this list is: You don't, can't 
and won't with memcached. That is not its job. It will never be its job. 
Perhaps when storage engines are done, maybe. But then you won't get the 
performance that you get with memcached. There is a trade off for 
performance.


Brian.

http://brian.moonspot.net/

On 3/12/10 3:02 PM, martin.grotzke wrote:

Hi,

I know that this topic is rather burdened, as it was said often enough
that memcached never was created to be used like a reliable datastore.
Still, there are users interested in some kind of reliability, users
that want to store items in memcached and be "sure" that these items
can be pulled from memcached as long as they are not expired.

I read the following on memcached's memory management:
   "Memcached has two separate memory management strategies:
- On read, if a key is past its expiry time, return NOT FOUND.
- On write, choose an appropriate slab class for the value; if it's
full, replace the oldest-used (read or written) key with the new one.
Note that the second strategy, LRU eviction, does not check the expiry
time at all." (from "peeping into memcached", [1])

I also found "Slabs, Pages, Chunks and Memcached" ([2]) a really good
explanation of memcached's memory model.

Having this as background, I wonder how it would be possible to get
more predictability regarding the availability of cached items.

Asume that I want to store sessions in memcached. How could I run
memcached so that I can be sure that my sessions are available in
memcached when I try to "get" them? Additionally asume, that I expect
to have 1000 sessions at a time in max in one memcached node (and that
I can control/limit this in my application). Another asumption is,
that sessions are between 50kb and 200 kb.

The question now is how do I have to run memcached to "store" these
sessions in memcached?

Would it be an option to run memcached with a minimum slab size of
200kb. Then I would know that for each session a 200kb chunk is used.
When I have 1000 session between 50kb and 200kb this should take 200mb
in total. When I run memcached with more than 200mb memory, could I be
sure, that the sessions are alive as long as they are not expired?

What do you think about this?

Cheers,
Martin


[1] http://blog.evanweaver.com/articles/2009/04/20/peeping-into-memcached/
[2] http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Martin Grotzke
With my question "how do I have to run memcached to 'store' these sessions
in memcached" I was not referring to a general approach, but I was referring
to the concrete memcached options (e.g. -n 204800 for 200kb slabs) to use.

The post you mentioned is very high level and does not answer my question.
For this you should go into a little more depth.

Thanx && cheers,
Martin



On Fri, Mar 12, 2010 at 10:27 PM, Ren  wrote:

> I believe most of this is covered in
>
> http://dormando.livejournal.com/495593.html
>
> Jared
>
> On Mar 12, 9:02 pm, "martin.grotzke" 
> wrote:
> > Hi,
> >
> > I know that this topic is rather burdened, as it was said often enough
> > that memcached never was created to be used like a reliable datastore.
> > Still, there are users interested in some kind of reliability, users
> > that want to store items in memcached and be "sure" that these items
> > can be pulled from memcached as long as they are not expired.
> >
> > I read the following on memcached's memory management:
> >   "Memcached has two separate memory management strategies:
> > - On read, if a key is past its expiry time, return NOT FOUND.
> > - On write, choose an appropriate slab class for the value; if it's
> > full, replace the oldest-used (read or written) key with the new one.
> > Note that the second strategy, LRU eviction, does not check the expiry
> > time at all." (from "peeping into memcached", [1])
> >
> > I also found "Slabs, Pages, Chunks and Memcached" ([2]) a really good
> > explanation of memcached's memory model.
> >
> > Having this as background, I wonder how it would be possible to get
> > more predictability regarding the availability of cached items.
> >
> > Asume that I want to store sessions in memcached. How could I run
> > memcached so that I can be sure that my sessions are available in
> > memcached when I try to "get" them? Additionally asume, that I expect
> > to have 1000 sessions at a time in max in one memcached node (and that
> > I can control/limit this in my application). Another asumption is,
> > that sessions are between 50kb and 200 kb.
> >
> > The question now is how do I have to run memcached to "store" these
> > sessions in memcached?
> >
> > Would it be an option to run memcached with a minimum slab size of
> > 200kb. Then I would know that for each session a 200kb chunk is used.
> > When I have 1000 session between 50kb and 200kb this should take 200mb
> > in total. When I run memcached with more than 200mb memory, could I be
> > sure, that the sessions are alive as long as they are not expired?
> >
> > What do you think about this?
> >
> > Cheers,
> > Martin
> >
> > [1]
> http://blog.evanweaver.com/articles/2009/04/20/peeping-into-memcached/
> > [2]
> http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/
>



-- 
Martin Grotzke
http://www.javakaffee.de/blog/


Re: How to get more predictable caching behavior - how to store sessions in memcached

2010-03-12 Thread Ren
I believe most of this is covered in

http://dormando.livejournal.com/495593.html

Jared

On Mar 12, 9:02 pm, "martin.grotzke" 
wrote:
> Hi,
>
> I know that this topic is rather burdened, as it was said often enough
> that memcached never was created to be used like a reliable datastore.
> Still, there are users interested in some kind of reliability, users
> that want to store items in memcached and be "sure" that these items
> can be pulled from memcached as long as they are not expired.
>
> I read the following on memcached's memory management:
>   "Memcached has two separate memory management strategies:
> - On read, if a key is past its expiry time, return NOT FOUND.
> - On write, choose an appropriate slab class for the value; if it's
> full, replace the oldest-used (read or written) key with the new one.
> Note that the second strategy, LRU eviction, does not check the expiry
> time at all." (from "peeping into memcached", [1])
>
> I also found "Slabs, Pages, Chunks and Memcached" ([2]) a really good
> explanation of memcached's memory model.
>
> Having this as background, I wonder how it would be possible to get
> more predictability regarding the availability of cached items.
>
> Asume that I want to store sessions in memcached. How could I run
> memcached so that I can be sure that my sessions are available in
> memcached when I try to "get" them? Additionally asume, that I expect
> to have 1000 sessions at a time in max in one memcached node (and that
> I can control/limit this in my application). Another asumption is,
> that sessions are between 50kb and 200 kb.
>
> The question now is how do I have to run memcached to "store" these
> sessions in memcached?
>
> Would it be an option to run memcached with a minimum slab size of
> 200kb. Then I would know that for each session a 200kb chunk is used.
> When I have 1000 session between 50kb and 200kb this should take 200mb
> in total. When I run memcached with more than 200mb memory, could I be
> sure, that the sessions are alive as long as they are not expired?
>
> What do you think about this?
>
> Cheers,
> Martin
>
> [1]http://blog.evanweaver.com/articles/2009/04/20/peeping-into-memcached/
> [2]http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/