Re: Optimising cache performance

2003-03-08 Thread gphat
 What implications does this have on the size of the cache that can be
 created with IPC::MM

I believe that documentation is telling you that each OS governs the 
amount of shared memory you can have in different ways.  Linux, for 
example, has a variable called shmmax, accessible 
as /proc/sys/kernel/shmmax, which controls how much shared memory you 
are allowed to allocate.  I think Solaris' setting lives in /etc/system 
somewhere.

Cory 'G' Watson
http://gcdb.spleck.net





Re: Optimising cache performance

2003-03-08 Thread Perrin Harkins
Clinton Gormley wrote:
For now it's not a distributed system, and I have been using 
Cache::FileCache.  But that still means freezing and thawing objects - 
which I'm trying to minimise.
Other things (IPC::MM, MLDBM::Sync, Cache::Mmap, BerkeleyDB) are 
significantly faster than Cache::FileCache.  If you have tons of free 
memory, then go ahead and cache things in memory.  My feeling is that 
the very small amount of time that the fastest of these systems use to 
freeze and thaw is totally made up for in the huge memory savings which 
allows you to run more server processes.

When you say that Cache::Mmap is only limited by the size of your disk, 
is that because the file in memory gets written to disk as part of VM? ( 
I don't see any other mention of files in the docs.) Which presumably 
means resizing your VM to make space for the cache?
That's right, it uses your system's mmap() call.  I've never needed to 
adjust the amount of VM I have because of memory-mapping a file, but I 
suppose it could happen.  This would be a good question for the author 
of the module, or an expert on your system's mmap() implementation.

I see the author of IPC::MM has an e-toys address - was this something 
you used at e-toys?
It was used at one point, although not in the version of the system that 
I wrote about.  He originally wrote it as a wrapper around the mm 
library, and I asked if he could put in a shared hash just for fun.  It 
turned out be very fast, largely because the sharing and the hash (or 
btree) is implemented in C.  The Perl part is just an interface to it.

I know very little about shared memory segments, 
but is MM used to share small data objects, rather than to keep large 
caches in shared memory?
It's a shared hash.  You can put whatever you want into it.  Apache uses 
mm to share data between processes.

Ralph Engelschall writes in the MM documentation :
The maximum size of a continuous shared memory segment one can allocate 
depends on the underlaying platform. This cannot be changed, of course. 
But currently the high-level malloc(3)-style API just uses a single 
shared memory segment as the underlaying data structure for an MM object 
which means that the maximum amount of memory an MM object represents 
also depends on the platform.

What implications does this have on the size of the cache that can be 
created with IPC::MM
It varies by platform, but I believe that on Linux it means each 
individual hash is limited to 64MB.  So maybe I spoke too soon about 
having unlimited storage, but you should be able to have as many hashes 
as you want.

If you're seriously concerned about storage limits like these, you could 
use one of the other options like MLDBM::Sync or BerkeleyDB which use 
disk-storage.

- Perrin



Optimising cache performance

2003-03-07 Thread Clinton Gormley




I'd appreciate some feedback on my logic to optimise my cache (under mod_perl 1)

I'm building a site which will have a large number of fairly complicated objects (each of which would require 5-20 queries to build from scratch) which are read frequently and updated relatively seldom.

I'm planning a two level cache : 
 1) Live objects in each mod_perl process
 2) Serialised objects in a database

The logic goes as follows :
NORMAL READ-ONLY REQUEST
1) REQUEST FROM BROWSER
 * Request comes from browser to view for object 12345 
 (responding to this request may involve accessing 10 other objects)
2) PURGE OUTDATED LIVE OBJECTS
 * mod_perl process runs a query to look for the ID's of any objects
 that have been updated since the last time this 
 query was run (last_modified_time).
 * Any object IDs returned by this request have their objects removed 
 from the in-memory mod_perl process specific cache
3) REQUEST IS PROCESSED
 * Any objects required by this request are retrieved first from 
 the in-memory cache.
 * If they are not present, 
 * the process looks in the serialised object cache in the database.
 * If not present there either, 
 * the object is constructed from scratch the relational DB.
 and stored in the serialised object cache
 * retrieved object is store in the in-memory live object cache 
4) TRIM LIVE OBJECT CACHE
 * Any live objects that are not in the 1000 most recently accessed 
 objects are deleted from the in-memory cache


UPDATE REQUEST
Steps as above except : 
3a) UPDATING OBJECT
 * Any objects that are modified 
 * are deleted from the serialised object cache in the DB
 * and are deleted from the in-memory cache for this mod_perl 
 process only


This means that at the start of every request, each process has access to the most up to date versions of each object with a small (hopefully) penalty to pay in the form of the query checking for last_modified_time.

Does this sound reasonable or is overkill

many thanks

Clinton Gormley





Re: Optimising cache performance

2003-03-07 Thread Perrin Harkins
Clinton Gormley wrote:
I'd appreciate some feedback on my logic to optimise my cache (under 
mod_perl 1)
First, I'm assuming this is for a distributed system running on multiple 
servers.  If not, you should just download one of the cache modules from 
CPAN.  They're good.

I'm planning a two level cache :
1) Live objects in each mod_perl process
2) Serialised objects in a database
I suggest you use either Cache::Mmap or IPC::MM for your local cache. 
They are both very fast and will save you memory.  Also, Cache::Mmap is 
only limited by the size of your disk, so you don't have to do any purging.

You seem to be taking a lot of care to ensure that everything always has 
the latest version of the data.  If you can handle slightly out-of-date 
data, I would suggest that you simply keep objects in the local cache 
with a time-to-live (which Cache::Mmap or Cache::FileCache can do for 
you) and just look at the local version until it expires.  You would end 
up building the objects once per server, but that isn't so bad.

If everything really does have to be 100% up-to-date, then what you're 
doing is reasonable.  It would be nice to not do the step that checks 
for outdated objects before processing the request, but instead do it in 
a cleanup handler, although that could lead to stale data being used now 
and then.

If you were using a shared cache like Cache::Mmap, you could have a cron 
job or a separate Perl daemon that simply purges outdated objects every 
minute or so, and leave that out of your mod_perl code completely.

Yet another way to handle a distributed cache is to have each write to 
the cache send updates to the other caches using something like 
Spread::Queue.  This is a bit more complex, but it means you don't need 
a second-tier in your cache to share updates.

- Perrin



Re: Optimising cache performance

2003-03-07 Thread Cory 'G' Watson
On Friday, March 7, 2003, at 12:45  PM, Perrin Harkins wrote:
You seem to be taking a lot of care to ensure that everything always 
has the latest version of the data.  If you can handle slightly 
out-of-date data, I would suggest that you simply keep objects in the 
local cache with a time-to-live (which Cache::Mmap or Cache::FileCache 
can do for you) and just look at the local version until it expires.  
You would end up building the objects once per server, but that isn't 
so bad.
I'm not sure if my way would fit in with your objects Clinton, but I 
have some code in the commit() method of all my objects which, when it 
is called, removes any cached copies of the object.  That's how I stay 
up to date.

Cory 'G' Watson
http://gcdb.spleck.net


Re: Optimising cache performance

2003-03-07 Thread Perrin Harkins
Cory 'G' Watson wrote:
I'm not sure if my way would fit in with your objects Clinton, but I 
have some code in the commit() method of all my objects which, when it 
is called, removes any cached copies of the object.  That's how I stay 
up to date.
Why wouldn't it simply update the version in the cache when you commit? 
 Also, do you have a way of synchronizing changes across multiple machines?

- Perrin



Re: Optimising cache performance

2003-03-07 Thread Cory 'G' Watson
On Friday, March 7, 2003, at 02:20  PM, Perrin Harkins wrote:

Cory 'G' Watson wrote:
I'm not sure if my way would fit in with your objects Clinton, but I 
have some code in the commit() method of all my objects which, when 
it is called, removes any cached copies of the object.  That's how I 
stay up to date.
Why wouldn't it simply update the version in the cache when you 
commit?  Also, do you have a way of synchronizing changes across 
multiple machines?
I suppose it could, but I use it as a poor man's cache cleaning.  I 
suppose it would boost performance to do what you suggest.  I'll just 
implement a cache cleaner elsewhere.

I only run on one machine, so I don't do any synchronization.  I hope 
to have that problem some day ;)

Cory 'G' Watson
http://gcdb.spleck.net


Re: Optimising cache performance

2003-03-07 Thread Clinton Gormley




Thanks for your feedback - a couple more questions


First, I'm assuming this is for a distributed system running on multiple 
servers.  If not, you should just download one of the cache modules from 
CPAN.  They're good.


For now it's not a distributed system, and I have been using Cache::FileCache. But that still means freezing and thawing objects - which I'm trying to minimise.

I suggest you use either Cache::Mmap or IPC::MM for your local cache. 
They are both very fast and will save you memory.  Also, Cache::Mmap is 
only limited by the size of your disk, so you don't have to do any purging.


When you say that Cache::Mmap is only limited by the size of your disk, is that because the file in memory gets written to disk as part of VM? ( I don't see any other mention of files in the docs.) Which presumably means resizing your VM to make space for the cache?

You seem to be taking a lot of care to ensure that everything always has 
the latest version of the data.  If you can handle slightly out-of-date 

Call me anal ;) Most of the time it wouldn't really matter, but sometimes it could be extremely off-putting

If everything really does have to be 100% up-to-date, then what you're 
doing is reasonable.  It would be nice to not do the step that checks 
for outdated objects before processing the request, but instead do it in 
a cleanup handler, although that could lead to stale data being used now 
and then.

Yes - had considered that.

If you were using a shared cache like Cache::Mmap, you could have a cron 
job or a separate Perl daemon that simply purges outdated objects every 
minute or so, and leave that out of your mod_perl code completely.


I see the author of IPC::MM has an e-toys address - was this something you used at e-toys? I know very little about shared memory segments, but is MM used to share small data objects, rather than to keep large caches in shared memory?

Ralph Engelschall writes in the MM documentation : 
The maximum size of a continuous shared memory segment one can allocate depends on the underlaying platform. This cannot be changed, of course. But currently the high-level malloc(3)-style API just uses a single shared memory segment as the underlaying data structure for an MM object which means that the maximum amount of memory an MM object represents also depends on the platform.

What implications does this have on the size of the cache that can be created with IPC::MM


thanks

Clinton Gormley