Last week bloglines.com upgraded from 2.0.x to 2.2.x. At the same time, I switched us from mod_cgid to mod_cgi.

mod_cgid has some problems if its path to the cgisock changes at any time, it really needs to call realpath() on the cgisock path, to avoid issues with how we distribute releases. (We swap some symlinks around, but as soon as these were changed, it would cause every CGI request to return a 503...)

I decided to instead to just switch to mod_cgi. Fork() on a threaded process according to POSIX is supposed to be thread local. And Linux wasn't supposed to be stupid about it, unlike Solaris. So, running mod_cgi on the worker MPM should of been safe.

However, we have started getting reports of a problem. Some users cookies are getting mixed up. We use a cookie to determine which language to display to the user, and randomly, some users would get put into the wrong language. We had a couple reports, and assumed at first it was user error, but then it happened to me... And I was damn sure I hadn't changed my language.

And we looked deeper. None of this code had changed. Not for a long long long time. The only things that changed was the upgrade to 2.2, and switching to mod_cgi. We haven't been able to reproduce this in any test environment.

The OS is Linux, RHEL 4, update 2, with the 2.6.9-22.ELsmp kernel, NPTL enabled, and the Worker MPM.

I believe the cause is a race condition with fork. In a highly threaded process, with perhaps nearly all of them calling fork(), there is a bug somewhere low level in glibc/nptl. Sometimes you get the wrong data back, or your execv run later has the wrong environ.

So, we are in the process of switching back to mod_cgid for now. Does anyone else have any ideas, or seen this type of problem before?

-Paul

Reply via email to