This is going to be a somewhat preliminary "feeler" post because we are not
yet able to fully describe or recreate the bug we're seeing, but I'm hoping
some of you have seen something similar.

We use Apache::Session::File as the storage module for our Apache::Session
sessions.  I have written an object (RMS::Session where RMS is our app) that
basically is just a wrapper class for the Apache::Sessions.  When I
instantiate a new RMS::Session, it goes and ties to the actual
Apache::Session, gets a hold of the session hash, populates it's member
variables with values from the session hash, and unties/undefs the session
hash.  Thus we end up with a perl object representing our session with a
friendly OO interface for our developers that they are used to, and the real
session is freed for use by other requests.

Everytime I instantiate a new RMS::Session, I timestamp the Apache::Session
and I increment a 'retrievals' variable.  Pretty much every request into our
app needs to look at the session for something, so the end result is that
sessions are being tied and written a lot.  In some cases, a user will click
into an area of our application that has say three frames, and the content
of all three frames will go and look at the session, so three requests for
the same session could come in at the same time, so it's probably exercising
the locking mechanism fairly well.

Here's the basic problem we're seeing...our sessions have a very well
defined set of variables in them so the size of the session file is very
predictable - in our case, they all are between 320-360 bytes at all times.
What seems to be happening is that sometimes (more on this later) the files
get written out in a corrupted state, and I've noticed it's a well-defined
"corruption" to where the session file will shrink to a size of either 150
bytes or 63 bytes. Once this happens, the session is corrupted, in that I
can no longer successfully retrieve any information from it.  The session is
still there, but the contents have been completely garbled.

Unfortunately, it's neither predictable nor easy to reproduce.  First, it
only happens occasionally.  we haven't yet found one set of actions that we
can take and cause it to happen every time.  One test we use to demonstrate
it is to simply log in and out several times.  Sometimes, 7 or 8 logins will
go by without incident, and then the 9th will cause a corrupted session.
Other times, 10 logins in a row will lead to a corrupted session.  Secondly,
it happens far more frequently on our production server than our development
server (same exact code and versions of perl and all modules).  I've begun
to suspect that perhaps it only happens after a certain period of latency.
Since our production server has a lot more data in it's database, operations
tend to take much longer than they would during development. Perhaps this
means that there's more opportunity in production for a request to ask for a
session that's still held/locked by another child request.  Like I said,
it's still very preliminary.

Anyway, my question for now is whether anyone has seen corruption like this
with Apache::Session::File in your typical multi-user mod_perl web app
environment?  We're just trying to narrow down the possibilities since it's
been two days of four engineers trying to come up with any sort of recipe
for reliable reproduction or pattern to the bug with no luck so far.

Thanks,
Fran

Reply via email to