Re: Apache::Session - What goes in session?

2002-08-21 Thread Ask Bjoern Hansen

On Tue, 20 Aug 2002 [EMAIL PROTECTED] wrote:

 We are investigating using IPC rather then a file based
 structure but its purely investigation at this point.

 What are the speed diffs between an IPC cache and a
 Berkely DB cache. My gut instinct always screams 'Stay Off
 The Disk' but my gut is not always right.. Ok, rarely
 right.. ;)

IPC (for many definitions of that) has all sorts of odd limitations
and isn't that fast.  Don't go there.

The disk is usually much faster than you think.  Often overlooked
for caching is a simple file based cache.

Here's a story about that:

A while ago Graham Barr and I spend some time going through a number
of iterations for a self cleaning cache system.  It would take
lots of writes and fewer reads.  In each cache entry a number of
integers would be stored.  Just storing the last thousand entries
would be enough.

We tried quite a few different approaches; the most noteworthy was a
system of semaphores to control access to a number of slots in a
BerkeleyDB.  That should be pretty fast, right?

It got a bit complicated as our systems didn't support that many
semaphores, so we had to come up with a system for sharing the
semaphores across multiple slots in the database.

Designing and writing this implementation took a few days.  It was
really cool.

Anyway, after fixing that and a few deadlocks we were benchmarking
away.  The system was so clever.  We thought it was simple and neat.
Okay, neat at least.  And it was really slow. Slow. (~200 writes a
second on a 400MHz Pentium II if I recall correctly).

First we suspected we did something wrong with the semaphores, but
further benchmarking showed that the BerkeleyDB just wasn't that
fast for writing.

30 minutes thinking and 30 minutes typing code later we had a
prototype for a simple filebased system.

Now using good old Fcntl to control access to simple flat files.
(Data serialized with pack(N*, ...); I don't think anything beats
pack and unpack for serializing data).

The expiration went into the data and purging the cache was a simple
cronjob to find files older than a few minutes and deleting them.

The performance?  I don't remember the exact figure, but it was at
least several times faster than the BerkeleyDB system.  And *much*
simpler.


The morale of the story:  Flat files rock!  ;-)


  - ask

-- 
ask bjoern hansen, http://www.askbjoernhansen.com/ !try; do();




Re: Apache::Session - What goes in session?

2002-08-21 Thread Peter J. Schoenster


On 21 Aug 2002 at 2:09, Ask Bjoern Hansen wrote:

 Now using good old Fcntl to control access to simple flat files.
 (Data serialized with pack(N*, ...); I don't think anything beats
 pack and unpack for serializing data).
 
 The expiration went into the data and purging the cache was a simple
 cronjob to find files older than a few minutes and deleting them.
 
 The performance?  I don't remember the exact figure, but it was at
 least several times faster than the BerkeleyDB system.  And *much*
 simpler.
 
 
 The morale of the story:  Flat files rock!  ;-)

If I'm using Apache::DBI so I have a persistent connection to MySQL, 
would it not be faster to simply use a table in MySQL?


Peter



---
Reality is that which, when you stop believing in it, doesn't go
away.
-- Philip K. Dick




RE: Apache::Session - What goes in session?

2002-08-21 Thread Jesse Erlbaum

Hi Peter --

  The morale of the story:  Flat files rock!  ;-)

 If I'm using Apache::DBI so I have a persistent connection to MySQL,
 would it not be faster to simply use a table in MySQL?


Unlikely.  Even with cached database connections you are probably not going
to beat the performance of going to a flat text file.  Accessing files is
something the OS is optimized to do.  The process of issuing a SQL query,
having it parsed and retrieving results is probably more time-consuming than
you think.

One way to think about it is this:  MySQL stores its data in files.  There
are many layers of code between DBI and those files, each of which add
processing time.  Going directly to files is far less code, and less code is
most often faster code.

The best way to be cure is to benchmark the difference yourself.  Try out
the Benchmark module.  Quantitative data trumps anecdotal data every time.


Warmest regards,

-Jesse-


--

  Jesse Erlbaum
  The Erlbaum Group
  [EMAIL PROTECTED]
  Phone: 212-684-6161
  Fax: 212-684-6226






Re: Apache::Session - What goes in session?

2002-08-21 Thread James G Smith

Jesse Erlbaum [EMAIL PROTECTED] wrote:
Hi Peter --

  The morale of the story:  Flat files rock!  ;-)

 If I'm using Apache::DBI so I have a persistent connection to MySQL,
 would it not be faster to simply use a table in MySQL?


Unlikely.  Even with cached database connections you are probably not going
to beat the performance of going to a flat text file.  Accessing files is
something the OS is optimized to do.  The process of issuing a SQL query,
having it parsed and retrieving results is probably more time-consuming than
you think.

All depends on the file structure.  A linear search through a
thousand records can be slower than a binary search through a million
(500 ave. compares vs. about 20 max [10 ave.] compares - hope the
extra overhead for the binary search is worth the savings in
comparisons).

One way to think about it is this:  MySQL stores its data in files.  There
are many layers of code between DBI and those files, each of which add
processing time.  Going directly to files is far less code, and less code is
most often faster code.

MySQL also stores indices.  As soon as you start having to store data
in files and maintain indices, you might as well start using a
database.

The best way to be cure is to benchmark the difference yourself.  Try out
the Benchmark module.  Quantitative data trumps anecdotal data every time.

Definitely.  But before you do, make sure the proper indices are
created on the MySQL side.  Wrong database configurations can kill
any performance gain.
-- 
James Smith [EMAIL PROTECTED], 979-862-3725
Texas AM CIS Operating Systems Group, Unix



RE: Apache::Session - What goes in session?

2002-08-21 Thread Jesse Erlbaum

Hey James --

 One way to think about it is this:  MySQL stores its data in
 files.  There
 are many layers of code between DBI and those files, each of which add
 processing time.  Going directly to files is far less code, and
 less code is
 most often faster code.

 MySQL also stores indices.  As soon as you start having to store data
 in files and maintain indices, you might as well start using a
 database.


You bring up a really important point: Scale.  If a custom file-based data
storage system starts growing in both size and functionality it will sooner
or latter reach a point where it is a far worse solution.

Relational databases are optimized for two things:  Ease of access and
management of large data sets.  If the data set is small and the functional
requirements are very narrow then a custom system can outperform a database
most of the time (not including poorly written systems!).  Once you have to
deal with large amounts of data, or you need to have an interface which
allows customizable retrieval of sub-sets of data (a la SQL), a database is
going to be the way to go.

The trick is knowing which path to choose.  Having an idea of the potential
growth of the system and use of the data, combined with a few well chosen
benchmarks come in handy here.

TTYL,

-Jesse-


--

  Jesse Erlbaum
  The Erlbaum Group
  [EMAIL PROTECTED]
  Phone: 212-684-6161
  Fax: 212-684-6226





Re: Apache::Session - What goes in session?

2002-08-21 Thread Perrin Harkins

Ask Bjoern Hansen wrote:
 The performance?  I don't remember the exact figure, but it was at
 least several times faster than the BerkeleyDB system.  And *much*
 simpler.

In my benchmarks, recent versions of BerkeleyDB, used with the 
BerkeleyDB module and allowed to manage their own locking, beat all 
available flat-file modules.  It may be possible to improve the 
flat-file ones, but it even beat Tie::TextDir which is about as simple 
(and therefore fast) as they come.  The only thing that did better was 
IPC::MM.

- Perrin




Re: Apache::Session - What goes in session?

2002-08-21 Thread Perrin Harkins

Peter J. Schoenster wrote:
 If I'm using Apache::DBI so I have a persistent connection to MySQL, 
 would it not be faster to simply use a table in MySQL?

Probably not, if the MySQL server is on a separate machine.  If it's on 
the same machine, it would be close.  Remember, MySQL has more work to 
do (parse SQL statement, make query plan, etc.) than a simple hash-based 
system like BerkeleyDB does.  Best thing would be to benchmark it though.

- Perrin




Re: Apache::Session - What goes in session?

2002-08-20 Thread md


--- Perrin Harkins [EMAIL PROTECTED] wrote:

 There are a few ways to deal with this.  The
 simplest is to use the 
 sticky load-balancing feature that many
 load-balancers have.  Failing 
 that, you can store to a network file system like
 NFS or CIFS, or use a 
 database.  (There are also fancier options with
 things like Spread, but 
 that's getting a little ahead of the game.)  You can
 use MySQL for 
 caching, and it will probably have similar
 performance to a networked 
 file system.  Unfortunately, the Apache::Session
 code isn't all that 
 easy to use for this, since it assumes you want to
 generate IDs for the 
 objects you store rather than passing them in.  You
 could adapt the code 
 from it to suit your needs though.  The important
 thing is to leave out 
 all of the mutually exclusive locking it implements,
 since a cache is 
 all about get the latest as quick as you can and
 lost updates are not 
 a problem (last save wins is good enough for a
 cache).

I haven't looked at the cache modules docs yet...would
it be possible to build cache on the separate
load-balanced machines as we go along...as we do with
template caching? By that I mean if an item has cached
on machine one then further requests on machine one
will come from cache where if on machine two the same
item hasn't cached, it will be pulled from the db the
first time and then cached?

If this isn't possible, I'm not sure if I'll be able
to implement any caching or not (some of the site
configuration is out of my hands) and everything seems
so user specific...I'll definitely reread your posts
and go through my app for things that should be
cached.

I would be curious though that if my choice is simply
that the data is stored in the session or comes from
the database with each request, would it still be best
to essentially only store the session id in the
session and pull everything else from the db? It still
seems that something trivial like a greeting name (a
preference) could go in the session.

 The relationships to the features and pages differ
 by user, but there 
 might be general information about the features
 themselves that is 
 stored in the database and is not user-specific. 
 That could be cached 
 separately, to save some trips to the db for each
 user.

The only thing I can think of right now is a
calendar...that should probably be cached. The only
gotcha would be that the calendar would need to update
every day, at least on the current month's pages. But
this is only on a feature page, not a users created
page (that is a user can click a link on their daily
page that takes them to a feature page where they can
go through archives).
 

 You can cache the names too if you want to, but
 keeping them out of the 
 session means that you won't be slowed down by
 fetching that extra data 
 and de-serializing it with Storable unless the page
 you're on actually 
 needs it.  

Even though there are some preset pages, the user
can change the names and the user can also create a
cutom page with its own name. So there could be
thousands of unique page names, many (most) specific
to unique users (like Jim's Sports Page, etc.). Not
to mention that between the fact that the users' daily
pages can have any number of user selected features
per page and features themselves can have archive
depths of anywhere from 3 to 20 years, there's a lot
of info.

 It's also good to separate things that
 have to be reliable 
 (like the ID of the current user, since without that
 you have to send 
 them back to log in again) from things that don't
 need to be (you could 
 always fetch the list of pages from the db if your
 cache went down).

Very good advice. I've found that occasionally
something happens to my session where the sesssion id
is ok but some of the other data disapears (like
current page id) which really screws things up until
you log out and log back in again. This leads me to
suspect that I've answered my own question from above.
It's just whether I can cache or not.

Thanks for all your time and help.



__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com



Re: Apache::Session - What goes in session?

2002-08-20 Thread Tony Bowden

On Mon, Aug 19, 2002 at 06:54:01PM -0700, md wrote:
 I can definitely get it all from the db, but that doesn't
 seem very efficient.

Don't worry about whether it *seems* efficient. Do it right, and then
worry about how to speed that up - if, and only if, it's too slow.

Premature optimisation is the root of all evil, and all that ..

At BlackStar the session was just a single hashed ID and all other info
was loaded from the database every time. We thought about caching some
info a few times, but always ran into problems with replication.  In the
end we discovered that fetching everything from the database on every
request wasn't noticeably slower than anything else we could up with,
and was a lot more flexible. Throwing more memory at the database servers
was usually quicker, cheaper and more effective than micro-optimising
our session vs caching strategy...

Tony



Re: Apache::Session - What goes in session?

2002-08-20 Thread siberian

We do see some slowdown on our langauge translation db 
calls since they are so intensive. Moving to a 'per child' 
cache for each string as it came out of the db sped page 
loads up from 4.5 seconds to .6-1.0 seconds per page which 
is significant.

Currently we are working on a 'per machine' cache so all 
children can benefit for each childs initial database read 
of the translated string, the differential between 
children is annoying in the 'per child cache' strategy.

John-

On Tue, 20 Aug 2002 16:33:07 +0100
  Tony Bowden [EMAIL PROTECTED] wrote:
On Mon, Aug 19, 2002 at 06:54:01PM -0700, md wrote:
 I can definitely get it all from the db, but that 
doesn't
 seem very efficient.

Don't worry about whether it *seems* efficient. Do it 
right, and then
worry about how to speed that up - if, and only if, it's 
too slow.

Premature optimisation is the root of all evil, and all 
that ..

At BlackStar the session was just a single hashed ID and 
all other info
was loaded from the database every time. We thought about 
caching some
info a few times, but always ran into problems with 
replication.  In the
end we discovered that fetching everything from the 
database on every
request wasn't noticeably slower than anything else we 
could up with,
and was a lot more flexible. Throwing more memory at the 
database servers
was usually quicker, cheaper and more effective than 
micro-optimising
our session vs caching strategy...

Tony




Re: Apache::Session - What goes in session?

2002-08-20 Thread Dave Rolsky

On Tue, 20 Aug 2002 [EMAIL PROTECTED] wrote:

 Currently we are working on a 'per machine' cache so all
 children can benefit for each childs initial database read
 of the translated string, the differential between
 children is annoying in the 'per child cache' strategy.

Sounds like you want BerkeleyDB.pm (not DB_File), which is quite fast and
handles locking/concurrent access internally (when set up properly).

See the Alzabo::ObjectCache::{Store,Sync}::BerkeleyDB modules for
examples.

For Alzabo, I also have a caching system that caches data in a database,
for cross-machine caching/syncing.  I haven't really benchmarked it yet
but I imagine it could be a win in some situations.  For example, you
could set up the cache as a separate machine running MySQL and still pull
your data from another machine, possibly running a different RDBMS.


-dave

/*==
www.urth.org
we await the New Sun
==*/




Re: Apache::Session - What goes in session?

2002-08-20 Thread siberian

We are investigating using IPC rather then a file based 
structure but its purely investigation at this point.

What are the speed diffs between an IPC cache and a 
Berkely DB cache. My gut instinct always screams 'Stay Off 
The Disk' but my gut is not always right.. Ok, rarely 
right.. ;)

John-

On Tue, 20 Aug 2002 11:49:52 -0500 (CDT)
  Dave Rolsky [EMAIL PROTECTED] wrote:
On Tue, 20 Aug 2002 [EMAIL PROTECTED] wrote:

 Currently we are working on a 'per machine' cache so all
 children can benefit for each childs initial database 
read
 of the translated string, the differential between
 children is annoying in the 'per child cache' strategy.

Sounds like you want BerkeleyDB.pm (not DB_File), which 
is quite fast and
handles locking/concurrent access internally (when set up 
properly).

See the Alzabo::ObjectCache::{Store,Sync}::BerkeleyDB 
modules for
examples.

For Alzabo, I also have a caching system that caches data 
in a database,
for cross-machine caching/syncing.  I haven't really 
benchmarked it yet
but I imagine it could be a win in some situations.  For 
example, you
could set up the cache as a separate machine running 
MySQL and still pull
your data from another machine, possibly running a 
different RDBMS.


-dave

/*==
www.urth.org
we await the New Sun
==*/





Re: Apache::Session - What goes in session?

2002-08-20 Thread Perrin Harkins

md wrote:
 I haven't looked at the cache modules docs yet...would
 it be possible to build cache on the separate
 load-balanced machines as we go along...as we do with
 template caching?

Of course.  However, if a user is sent to a random machine each time you 
won't be able to cache anything that a user is allowed to change during 
their time on the site, because they could end up on a machine that has 
an old cached value for it.  Sticky load-balancing or a cluster-wide 
cache (which you can update when data changes) deals with this problem.

 everything seems so user specific...

That doesn't mean you can't cache it.  You can do basically the same 
thing you were doing with the session: stuff a hash of user-specific 
stuff into the cache.  The next time that user sends a request, you 
check the cache for data on that user ID (you get the user ID from the 
session) and if you don't find any you just fetch it from the db.

Pseudo-code:

sub fetch_user_data {
   my $user_id = shift;
   my $user_data;
   unless ($user_data = fetch_from_cache($user_id)) {
 $user_data = fetch_from_db($user_id);
   }
   return $user_data;
}

 I would be curious though that if my choice is simply
 that the data is stored in the session or comes from
 the database with each request, would it still be best
 to essentially only store the session id in the
 session and pull everything else from the db? It still
 seems that something trivial like a greeting name (a
 preference) could go in the session.

Your decision about what to put in the session is not connected to your 
decision about what to pull from the db each time.  You can cache all 
the data if you want to, and still have very little in the session.

This might sound like an academic distinction, but I think it's 
important to keep the concepts separate: a session is a place to store 
transient state information that is irrelevant as soon as the user logs 
out, and a cache is a way of speeding up access to a slow resource like 
a database, and the two things should not be confused.  You can actually 
cache the session data if you need to (with a write-through cache that 
updates the backing database as well).  A cache will typically be faster 
than session storage because it doesn't need to be very reliable and 
because you can store and retrieve individual chunks of data (user's 
name, page names) when you need them instead of storing and retrieving 
everything on every request.  Separating these concepts allows you to do 
things like migrate the session storage to a transactional database some 
day, and move your cache storage to a distributed multicast cache when 
someone comes out with a module for that.

 The only
 gotcha would be that the calendar would need to update
 every day, at least on the current month's pages.

The cache modules I mentioned have a concept of timeout so that you 
can say cache this for 12 hours and then when it expires you fetch it 
again and update the cache for another 12 hours.

 Even though there are some preset pages, the user
 can change the names and the user can also create a
 cutom page with its own name.

No problem, you can cache data that's only useful for a single user, as 
I explained above.

 Not
 to mention that between the fact that the users' daily
 pages can have any number of user selected features
 per page and features themselves can have archive
 depths of anywhere from 3 to 20 years, there's a lot
 of info.

No problem, disks are cheap.  400MB of disk space will cost you about as 
much as a movie in New York these days.

- Perrin




Re: Apache::Session - What goes in session?

2002-08-20 Thread md


Thanks...you've given me plenty to work with. Great
explination. This is good pragmatic stuff to know!


__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com



Re: Apache::Session - What goes in session?

2002-08-20 Thread Perrin Harkins

[EMAIL PROTECTED] wrote:
 We are investigating using IPC rather then a file based structure but 
 its purely investigation at this point.
 
 What are the speed diffs between an IPC cache and a Berkely DB cache. My 
 gut instinct always screams 'Stay Off The Disk' but my gut is not always 
 right.. Ok, rarely right.. ;)

Most of the shared memory modules are much slower than Berkeley DB.  The 
fastest option around is IPC::MM, but data you store in that does not 
persist if you restart the server which is a problem for some. 
BerkeleyDB (the new one, not DB_File) is also very fast, and other 
options like Cache::Mmap and Cache::FileCache are much faster than 
anything based on IPC::Sharelite and the like.

I have charts and numbers in my TPC presentation, which I will be 
putting up soon.

- Perrin




Re: Apache::Session - What goes in session?

2002-08-20 Thread siberian

Thanks, you just saved us a ton of time.

Off to change course ;)

J

On Tue, 20 Aug 2002 13:12:29 -0400
  Perrin Harkins [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote:
We are investigating using IPC rather then a file based 
structure but 
its purely investigation at this point.

What are the speed diffs between an IPC cache and a 
Berkely DB cache. My 
gut instinct always screams 'Stay Off The Disk' but my 
gut is not always 
right.. Ok, rarely right.. ;)

Most of the shared memory modules are much slower than 
Berkeley DB.  The fastest option around is IPC::MM, but 
data you store in that does not persist if you restart 
the server which is a problem for some. BerkeleyDB (the 
new one, not DB_File) is also very fast, and other 
options like Cache::Mmap and Cache::FileCache are much 
faster than anything based on IPC::Sharelite and the 
like.

I have charts and numbers in my TPC presentation, which I 
will be putting up soon.

- Perrin





Re: Apache::Session - What goes in session?

2002-08-20 Thread jjore

Just to jump in here - as I understand it you can split a hash across 
multiple threads if you preload it before apache forks. So load it in your 
startup.pl and get it in memory prior to forking. It'll be part of the 
shared memory since you aren't writing to it. Or at least that's how I 
understand the theory to work anyway.

Josh




[EMAIL PROTECTED]
08/20/2002 10:54 AM

 
To: Tony Bowden [EMAIL PROTECTED], md [EMAIL PROTECTED]
cc: Perrin Harkins [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject:Re: Apache::Session - What goes in session?


We do see some slowdown on our langauge translation db 
calls since they are so intensive. Moving to a 'per child' 
cache for each string as it came out of the db sped page 
loads up from 4.5 seconds to .6-1.0 seconds per page which 
is significant.

Currently we are working on a 'per machine' cache so all 
children can benefit for each childs initial database read 
of the translated string, the differential between 
children is annoying in the 'per child cache' strategy.

John-

On Tue, 20 Aug 2002 16:33:07 +0100
  Tony Bowden [EMAIL PROTECTED] wrote:
On Mon, Aug 19, 2002 at 06:54:01PM -0700, md wrote:
 I can definitely get it all from the db, but that 
doesn't
 seem very efficient.

Don't worry about whether it *seems* efficient. Do it 
right, and then
worry about how to speed that up - if, and only if, it's 
too slow.

Premature optimisation is the root of all evil, and all 
that ..

At BlackStar the session was just a single hashed ID and 
all other info
was loaded from the database every time. We thought about 
caching some
info a few times, but always ran into problems with 
replication.  In the
end we discovered that fetching everything from the 
database on every
request wasn't noticeably slower than anything else we 
could up with,
and was a lot more flexible. Throwing more memory at the 
database servers
was usually quicker, cheaper and more effective than 
micro-optimising
our session vs caching strategy...

Tony







Re: Apache::Session - What goes in session?

2002-08-20 Thread siberian

I havent had much luck with that but we will look at it 
again and see what we can get from it. We want to avoid 
preloading all data per child direct from the database but 
I wouldnt mind doing it on startup for the root process 
and then copying it to each child.

J


On Tue, 20 Aug 2002 16:39:45 -0500
  [EMAIL PROTECTED] wrote:
Just to jump in here - as I understand it you can split a 
hash across 
multiple threads if you preload it before apache forks. 
So load it in your 
startup.pl and get it in memory prior to forking. It'll 
be part of the 
shared memory since you aren't writing to it. Or at least 
that's how I 
understand the theory to work anyway.

Josh




[EMAIL PROTECTED]
08/20/2002 10:54 AM

  
 To: Tony Bowden [EMAIL PROTECTED], md 
[EMAIL PROTECTED]
 cc: Perrin Harkins [EMAIL PROTECTED], 
[EMAIL PROTECTED]
 Subject:Re: Apache::Session - What goes 
in session?


We do see some slowdown on our langauge translation db 
calls since they are so intensive. Moving to a 'per 
child' 
cache for each string as it came out of the db sped page 
loads up from 4.5 seconds to .6-1.0 seconds per page 
which 
is significant.

Currently we are working on a 'per machine' cache so all 
children can benefit for each childs initial database 
read 
of the translated string, the differential between 
children is annoying in the 'per child cache' strategy.

John-

On Tue, 20 Aug 2002 16:33:07 +0100
   Tony Bowden [EMAIL PROTECTED] wrote:
On Mon, Aug 19, 2002 at 06:54:01PM -0700, md wrote:
 I can definitely get it all from the db, but that 
doesn't
 seem very efficient.

Don't worry about whether it *seems* efficient. Do it 
right, and then
worry about how to speed that up - if, and only if, it's 
too slow.

Premature optimisation is the root of all evil, and all 
that ..

At BlackStar the session was just a single hashed ID and 
all other info
was loaded from the database every time. We thought about 
caching some
info a few times, but always ran into problems with 
replication.  In the
end we discovered that fetching everything from the 
database on every
request wasn't noticeably slower than anything else we 
could up with,
and was a lot more flexible. Throwing more memory at the 
database servers
was usually quicker, cheaper and more effective than 
micro-optimising
our session vs caching strategy...

Tony








Re: Apache::Session - What goes in session?

2002-08-20 Thread Ian Struble

Not in the MS house that I am living in right now :^(

On Tue, 20 Aug 2002, Perrin Harkins wrote:

 Ian Struble wrote:
  And just to throw one more wrench into the works.  You could load up only
  the most popular data at startup and let the rest of the data get loaded
  on a cache miss.  
  
  That is one technique that we have used for some customer session
  servers.  It allowed each server to start up in well under a minute
  instead of in 15-30 minutes while pegging the DB.  The 15-30 minutes was
  when we were dealing with ~5mil total entries and I would hate to see it
  now that the size of the table has doubled.  Now we just need to do some
  batch processing to determine what subset gets loaded at startup.
 
 You could also just dump the whole thing into a Berkeley DB file every 
 now and then.
 
 - Perrin
 
 
 




Apache::Session - What goes in session?

2002-08-19 Thread md

I'm using mod_perl and Apache::Session on an app that
is similar to MyYahoo. I found a few bits of info from
a previous thread, but I'm curious as to what type of
information should go in the session and what should
come from the database.

Currently I'm putting very little in the session, but
what I am putting in the session is more global in
nature...greeting, current page number, current page
name...data that doesn't change very often. I'm
pulling a lot of info from the database and I wonder
if my design is sound. Most of the info being pulled
from the database is features for the page. 

Now I need to add global modules to the page which
will show user info like which pages they have created
and which features are being emailed to the user.
These modules will display on every page unless the
user turns them off. It seems that since this info
wouldn't change very often that I should put the data
in the session...

Anyone have any general tips on session design?

Thanks.

__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com



RE: Apache::Session - What goes in session?

2002-08-19 Thread Jesse Erlbaum

Hello md --

 I'm using mod_perl and Apache::Session on an app that
 is similar to MyYahoo. I found a few bits of info from
 a previous thread, but I'm curious as to what type of
 information should go in the session and what should
 come from the database.

One thing to watch out for is the trap of using session data as a dumping
ground for global variables.  Since you are asking what belongs in a
session, it seems you are already thinking along those lines.  I have found
that many people who are fond of sessions often use them to store data which
I would be personally inclined to store in hidden form data, in a simple
cookie, or retrieve from a database when needed.

In my systems I usually only store a single session ID in a cookie -- a
key which references a database row.  This allows me to have as much data as
I like but keep it all in the database.  There is one case where it might
make sense to put data into a session of some sort -- to cache information
which is very time-consuming to retrieve.  Minimizing time-consuming
database operations is an important thing to think about in large systems,
and a place where session data might come in handy.

Warmest regards,

-Jesse-


--

  Jesse Erlbaum
  The Erlbaum Group
  [EMAIL PROTECTED]
  Phone: 212-684-6161
  Fax: 212-684-6226






Re: Apache::Session - What goes in session?

2002-08-19 Thread Perrin Harkins

md wrote:
 Currently I'm putting very little in the session

Good.  You should put in as little as possible.

 what I am putting in the session is more global in
 nature...greeting, current page number, current page
 name...

That doesn't sound very global to me.  What happens when users open 
multiple browser windows on your site?  Doesn't it screw up the current 
page data?

 I'm
 pulling a lot of info from the database and I wonder
 if my design is sound.

Optimizing database fetches or caching data is independent of the 
session issue.  Nothing that is relevant to more than one user should 
ever go in the session.

 Now I need to add global modules to the page which
 will show user info like which pages they have created
 and which features are being emailed to the user.
 These modules will display on every page unless the
 user turns them off.

That sounds like a user or subscriptions object to me, not session data.

 It seems that since this info
 wouldn't change very often that I should put the data
 in the session...

No, that's caching.  Don't use the session for caching, use a cache for 
it.  They're not the same.  A session is often stored in a database so 
that it can be reliable.  A cache is usually stored on the file system 
so it can be fast.

Things like the login status of this session, and the user ID that is 
associated with it go in the session.  Status of a particular page has 
to be passed in query args or hidden fields, to avoid problems with 
multiple browser windows.  Data that applies to multiple users or lasts 
more than the current browsing session never goes in the session.

- Perrin




RE: Apache::Session - What goes in session?

2002-08-19 Thread Narins, Josh

Thanks though. That was succinctly put.

Could you go back in time and tell me that a year or two ago?

That would be great, thanks again.

-Josh

:)

 Things like the login status of this session, 
 and the user ID that is associated with it go
 in the session.  Status of a particular page 
 has to be passed in query args or hidden fields,
 to avoid problems with multiple browser windows.
 Data that applies to multiple users or lasts 
 more than the current browsing session never 
 goes in the session.




--
This message is intended only for the personal and confidential use of the designated 
recipient(s) named above.  If you are not the intended recipient of this message you 
are hereby notified that any review, dissemination, distribution or copying of this 
message is strictly prohibited.  This communication is for information purposes only 
and should not be regarded as an offer to sell or as a solicitation of an offer to buy 
any financial product, an official confirmation of any transaction, or as an official 
statement of Lehman Brothers.  Email transmission cannot be guaranteed to be secure or 
error-free.  Therefore, we do not represent that this information is complete or 
accurate and it should not be relied upon as such.  All information is subject to 
change without notice.





Re: Apache::Session - What goes in session?

2002-08-19 Thread md


--- Perrin Harkins [EMAIL PROTECTED] wrote:
 md wrote:

 That doesn't sound very global to me.  What happens
 when users open 
 multiple browser windows on your site?  Doesn't it
 screw up the current 
 page data?

I don't think global was the term I should have
used. What I mean is data that will be seen on all or
most pages by the same user...like Hello Jim, where
Jim is pulled from the database when the session is
created and passed around in the session after that
(and updated in the db and session if user changes
their greeting name). 

Current page name and id are never stored in db, so
different browser windows can be on different
pages...I'm not sure if that's good or bad. However,
changes to the user name will be seen in both browser
windows since that's updated both in the session and
db.
 

 Optimizing database fetches or caching data is
 independent of the 
 session issue.  Nothing that is relevant to more
 than one user should 
 ever go in the session.

Correct. That little info I am putting in the session
corresponds directly to a single user.
 

 That sounds like a user or subscriptions object
 to me, not session data.

Once again, I shouldn't have used the term global.
This is the subscriptions info for a single
user...that's why I had thought to put this in the
session instead of pulling from the db each page call
since the data will rarely change. This info will be
displayed on every page the user visits (unless they
turn off this module).

 
 No, that's caching.  Don't use the session for
 caching, use a cache for 
 it.  They're not the same.  A session is often
 stored in a database so 
 that it can be reliable.  A cache is usually stored
 on the file system 
 so it can be fast.

The session is stored in a database
(Apache::Session::MySQL), and I am using TT caching
for the templates, but I'm not sure how to cache the
non-session data. I've seen this discussed but I
definitely need more info on this. As it stands I see
two options: get data from the session or get it from
the db...how do I bring  caching into play?
 
 Things like the login status of this session, and
 the user ID that is 
 associated with it go in the session.  Status of a
 particular page has 
 to be passed in query args or hidden fields, to
 avoid problems with 
 multiple browser windows.  Data that applies to
 multiple users or lasts 
 more than the current browsing session never goes in
 the session.

What about something like default page id, which is
the page that is considered your home page? This id is
stored permanently in the db (lasts more than the
current current browsing session) but I keep it in
the session since this also rarely changes so I don't
want 
to keep hitting the db to get it.

Thanks again...



__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com



Re: Apache::Session - What goes in session?

2002-08-19 Thread Perrin Harkins

md wrote:

I don't think global was the term I should have
used. What I mean is data that will be seen on all or
most pages by the same user...like Hello Jim


Okay, don't put that in the session.  It belongs in a cache.  The 
session is for transient state information, that you don't want to keep 
after the user logs out.

Current page name and id are never stored in db, so
different browser windows can be on different
pages...


I thought your session was all stored in MySQL.  Why are you putting 
these in the session exactly?  If these things are not relevant to more 
than one request (page), they don't belong in the session.  They should 
just be in ordinary variables.

That sounds like a user or subscriptions object
to me, not session data.



Once again, I shouldn't have used the term global.
This is the subscriptions info for a single
user...that's why I had thought to put this in the
session instead of pulling from the db each page call
since the data will rarely change.


You should use a cache for that, rather than the session.  This is 
long-term data that you just want quicker access to.

I am using TT caching
for the templates, but I'm not sure how to cache the
non-session data.


Template Toolkit caches the compiled template code, but it doesn't cache 
your data or the output of the templates.  What you should do is grab a 
module like Cache::Cache or Cache::Mmap and take a look at the examples 
there.  You use it in a way that's very similar to what you're doing 
with Apache::Session for the things you referred to as global.  There 
are also good examples in the documentation for the Memoize module.

There are various reasons to use a cache rather than treating the 
session like a cache.  If you put a lot of data in the session, it will 
slow down every hit loading and saving that data.  In a cache, you can 
just keep multiple cached items separately and only grab them if you 
need them for this page.  With a cache you can store things that come 
from the database but are not user-specific, like today's weather.

What about something like default page id, which is
the page that is considered your home page? This id is
stored permanently in the db (lasts more than the
current current browsing session) but I keep it in
the session since this also rarely changes so I don't
want 
to keep hitting the db to get it.


I would have some kind of user object which has a property of 
default_page_id.  The first time the user logs in I would fetch that 
from the database, and then I would cache it so that I wouldn't need to 
go back to the database for it on future requests.

- Perrin




Re: Apache::Session - What goes in session?

2002-08-19 Thread md


--- Perrin Harkins [EMAIL PROTECTED] wrote:

 Current page name and id are never stored in db, so
 different browser windows can be on different
 pages...
 
 
 I thought your session was all stored in MySQL.  Why
 are you putting 
 these in the session exactly?  If these things are
 not relevant to more 
 than one request (page), they don't belong in the
 session.  They should 
 just be in ordinary variables.

You are correct, these items are in the session in the
db. I meant that they weren't kept in long term
storage in the db after the session ended like the
default page id and user name are. The current page
id/name is only relevent for an active session. Once a
session is started current page is set to whatever the
default page id is and will change as the user changes
pages. The only reason I did this (as I recall) is
that way I can get the page name once. 
 
 You should use a cache for that, rather than the
 session.  This is 
 long-term data that you just want quicker access to.

Yes, that's exactly what I want to do. My main concern
is long-term data that I want quicker access to. I can
definitely get it all from the db, but that doesn't
seem very efficient.
 
 Template Toolkit caches the compiled template code,
 but it doesn't cache 
 your data or the output of the templates.  What you
 should do is grab a 
 module like Cache::Cache or Cache::Mmap and take a
 look at the examples 
 there.  You use it in a way that's very similar to
 what you're doing 
 with Apache::Session for the things you referred to
 as global.  There 
 are also good examples in the documentation for the
 Memoize module.

Great...exactly the kind of info I was looking for.
I'll look at those. We are using a load-balanced
system; I shoudl have mentioned that earlier. Won't
that be an issue with caching to disk? Is it possible
to cache to the db?

 There are various reasons to use a cache rather than
 treating the 
 session like a cache.  If you put a lot of data in
 the session, it will 
 slow down every hit loading and saving that data. 
 In a cache, you can 
 just keep multiple cached items separately and only
 grab them if you 
 need them for this page.  With a cache you can store
 things that come 
 from the database but are not user-specific, like
 today's weather.

Thank you for all the excellent advice and
explination(in this and other posts).

Most of the info I'll be pulling is *very*
user-specific...user name, which features to display
on which page, what features the user gets by email,
etc.

What happens is the user logs in and then the username
(greeting), the default page id (the user can create
many pages with different features per page) and what
features go on the default page are pulled from the
database and the default page is displayed, as well as
any module info.

The modules will consist of a pages module with
the names of all the pages the user has created (with
links) and a emails module which will display all
the features that the user is getting via email. 
These modules will be displayed on every page. 

You can see that almost everything is user-specific.

Right now I'm storing the page names/ids in a hash ref
in the session (the emails module isn't live yet), but
I thought that I would change that and only store the
module id and pull the names from the db (if the user
hasn't turned off the module) with each page call.

Thanks again for all the info!

__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com



Re: Apache::Session - What goes in session?

2002-08-19 Thread Perrin Harkins

md wrote:

We are using a load-balanced
system; I shoudl have mentioned that earlier. Won't
that be an issue with caching to disk? Is it possible
to cache to the db?


There are a few ways to deal with this.  The simplest is to use the 
sticky load-balancing feature that many load-balancers have.  Failing 
that, you can store to a network file system like NFS or CIFS, or use a 
database.  (There are also fancier options with things like Spread, but 
that's getting a little ahead of the game.)  You can use MySQL for 
caching, and it will probably have similar performance to a networked 
file system.  Unfortunately, the Apache::Session code isn't all that 
easy to use for this, since it assumes you want to generate IDs for the 
objects you store rather than passing them in.  You could adapt the code 
from it to suit your needs though.  The important thing is to leave out 
all of the mutually exclusive locking it implements, since a cache is 
all about get the latest as quick as you can and lost updates are not 
a problem (last save wins is good enough for a cache).

The modules will consist of a pages module with
the names of all the pages the user has created (with
links) and a emails module which will display all
the features that the user is getting via email. 
These modules will be displayed on every page. 

You can see that almost everything is user-specific.


The relationships to the features and pages differ by user, but there 
might be general information about the features themselves that is 
stored in the database and is not user-specific.  That could be cached 
separately, to save some trips to the db for each user.

Right now I'm storing the page names/ids in a hash ref
in the session (the emails module isn't live yet), but
I thought that I would change that and only store the
module id and pull the names from the db (if the user
hasn't turned off the module) with each page call.


You can cache the names too if you want to, but keeping them out of the 
session means that you won't be slowed down by fetching that extra data 
and de-serializing it with Storable unless the page you're on actually 
needs it.  It's also good to separate things that have to be reliable 
(like the ID of the current user, since without that you have to send 
them back to log in again) from things that don't need to be (you could 
always fetch the list of pages from the db if your cache went down).

- Perrin