On Jun 22, 2006, at 10:33 AM, Mark Woodward wrote:

Ühel kenal päeval, N, 2006-06-22 kell 09:59, kirjutas Mark Woodward:
After a long battle with technology, [EMAIL PROTECTED] ("Mark
Woodward"), an earthling, wrote:
Clinging to sanity, [EMAIL PROTECTED] ("Mark Woodward") mumbled
into
It pointed to *ALL* the versions.

Hmm, OK, then the problem is more serious than I suspected.
This means that every index on a row has to be updated on every
transaction that modifies that row. Is that correct?

Yes.

I am attaching some code that shows the problem with regard to
applications such as web server session management, when run, each
second
the system can handle fewer and fewer connections. Here is a brief
output:

[EMAIL PROTECTED]:~/pgfoo$ ./footest
1307 sessions per second, elapsed: 1
1292 sessions per second, elapsed: 2
1287 sessions per second, elapsed: 3
....
1216 sessions per second, elapsed: 25
1213 sessions per second, elapsed: 26
1208 sessions per second, elapsed: 27
....
1192 sessions per second, elapsed: 36
1184 sessions per second, elapsed: 37
1183 sessions per second, elapsed: 38
....
1164 sessions per second, elapsed: 58
1170 sessions per second, elapsed: 59
1168 sessions per second, elapsed: 60

As you can see, in about a minute at high load, this very simple table lost about 10% of its performance, and I've seen worse based on update frequency. Before you say this is an obscure problem, I can tell you it isn't. I have worked with more than a few projects that had to switch
away
from PostgreSQL because of this behavior.

You mean systems that are designed so exactly, that they can't take 10%
performance change ?

No, that's not really the point, performance degrades over time, in one
minute it degraded 10%.

The update to session ratio has a HUGE impact on PostgreSQL. If you have a
thousand active sessions, it may take a minute to degrade 10% assuming
some level of active vs operations per session per action.

How big are your session? Running with about 1000 sessions, running vacuum on just the session table is so fast it is barely noticeable. Vacuuming my session table every 5 minutes keeps them very, very small and easy to vacuum and performance degradation is not an issue. I could probably do it every minute if I had to and it would be fine. But my sessions are only about 5k on average.

What is a bigger concern for me is the massive amount of writes to the disk that happen in postgres to make sure the data is safe. It's just a waste of disk bandwidth for data for data that is transient anyway.

To me postgres (and rdbms's in general) are just not good for handling session data for web apps. Once again isn't that why you wrote mcache?


If an active user causes a session update once a second, that is not too bad, but if an active user updates a session more often, then it is worse.

Generally speaking, sessions aren't updated when they change, they are
usually updated per HTTP request. The data in a session may not change, but the session handling code doesn't know this and simply updates anyway.

This problem is more or less specific to php no? Because it reads the whole session on session open and writes the whole thing on close. Because of this I am looking into smarter ways of handling sessions than this. Because yes, the session data RARELY changes. I am looking into ways of only doing updates when the data changes. In fact for a very similar problem, where I had tons of tiny requests coming in that would NEVER alter the sessions I skipped session_start and used my own session_touch function to update the timestamp on the session because that's all I needed to do. It saved TONS of wasted overhead.

I don't mean to get off topic but it seems like these sorts of problems are better solved outside of postgres. I think your session daemon is in fact the right approach here. If you have other tables with similar problems that is one thing but if it is just php session tables then I think we need to look for a better use case to look into this.


In a heavily AJAX site, you may have many smaller HTTP requests returning items in a page. So, a single page may consist of multiple HTTP requests. Worse yet, as a user drags an image around, there are lots of background requests being made. Each request typically means a session lookup and a session update. This is compounded by the number of active users. Since
the object of a site is to have many active users, this is always a
problem. It is less intrusive now that non-locking vacuum is there, but
that doesn't mean it isn't a problem.

Once again I think to run an "Enterprise" app (and by that I mean it scales well) you need "Enterprise" class session management. The php model is not good for this and using postgres is not good for this. It's just not the right tool for this job in my opinion. I would think you could gain WAY more by using a more intelligent session handling model then you could ever eek out of postgres for this problem.



Or just that they did not vacuum for so long, that performance was less
than needed in the end?

In an active site or application, vacuuming often enough to prevent this
often is, itself, a load on the system.

Once again this is just anecdotal evidence but for me that load is virtually unnoticeable it is so small. The key here is that session tables are SMALL. That is less than a few thousand rows. Maybe you are storing a lot more data in there than I am but for me a vacuum every 5 minutes solves the problem performance degradation. It is still too much disk bandwidth though so I am just going to stop storing sessions in postgres.



btw, what did they switch to ?

One switched to oracle and one is using a session handler I wrote for PHP.

Just switching the session handler sounds like a pretty good solutions to this problem to me.

just my $0.02

Rick
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to