On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe <scott.marl...@gmail.com>wrote:
> Let's review: > 1: No test or staging system used before production > no, I do not have a full ubuntu machine replicating the exact memory and application load of the production server. this was changing one configuration parameter. something I was advised to do, read about quite a bit, tested on my development server (mac) and then proceeded to do at 6 am on Sunday morning, our slowest time. 2: DB left in an unknown state (trying to shut down, not able) > what ? I checked the site, everything was normal. I went in via psql and tried some queries for about half an hour and continued to monitor the site. then I went to bed at 7am (EU time). Why did it shutdown so much later ? I have never restarted postgres before, so this was all new to me. I apologize that I wasn't born innately with such knowledge. So is it normal for postgres to report that it failed to shut down, operate for an hour and then go ahead and restart itself ? 3: No monitoring software to tell you when the site is down > of course I have monitoring software. both external and internal. but it doesn't come and kick me out of bed. yes, I need an automated cel phone call. that was the first thing I saw to afterwards. 4: I'm gonna just go ahead and guess no backups were taken either, or > are regularly taken. > WTF ? of course I have backups. I just went through a very harsh down period event. I fail to see why it is now necessary for you to launch such an attack on me. Perhaps the tone of my post sounded like I was blaming you, or at least you felt that way. Why do you feel that way ? Why not respond with: "ouch ! did you check this ... that...." say something nice and helpful. correct my mistakes > This website can't be very important, if that's the way you treat it. > just to let you know, that is straight up offensive This is high traffic real estate site. Downtime is unacceptable. I had less downtime than this when I migrated to the new platform. I spent rather a large amount of time reading and questioning here. I asked many questions for clarification and didn't do ANYTHING until I was sure it was the correct solution. I didn't just pull some shit off a blog and start changing settings at random. I double checked opinions against different people and I searched for more docs on that param. Amazingly none of the ones I found commented on the shared memory issue and I didn't even understand the docs discussing shared memory because it didn't seem to apply to what I was doing. that's my misunderstanding. I come her to share my misunderstanding. > And my phone starts complaining a minute after the site stops > responding if something does go wrong the rest of the time. Do not > lay this at anyone else's feet. > I didn't. There is not even the slightest hint of that in my post. I came here and posted the details of where I went wrong and what confused me about the documentation that I followed. That's so other people can follow it and so somebody here can comment on it. > changing that default > > is brutally difficult and can only really be done by adjusting something > in > > the kernel. > > Please, that's a gross exaggeration. The sum totoal to changing them is: > > run sysctl -a|grep shm > copy out proper lines to cahnge > edit sysctl.conf > put new lines in there with changes > sudo sysctl -p # applies changes > edit the appropriate postgresql.conf, make changes > sudo /etc/init.d/postgresql-8.3 stop > sudo /etc/init.d/postgresql-8.3 start > Considering how splendidly the experiment with changing fsm_max_pages went, I think you can understand that I have no desire to experiment with kernel settings. It is easy for you because you ALREADY KNOW everything involved. I am not a sysadmin and we don't have one. My apologies for that. so does the above mean that I don't have to restart the entire server, just postgres ? I assumed that changing kernel settings means rebooting the server. > I have clustered that table, its still unbelievably slow. > > Did you actually delete the old entries before clustering it? if it's > still got 4G of old sessions or whatever in it, clustering ain't gonna > help. > its a different table. the problem one has only 300k rows the problem is not the size, the problem is the speed is catastrophic > I still don't know if this bloat due to the small free space map has > > anything to do with why the table is performing like this. > > Since you haven't show us what changes, if any, have happened to the > table, neither do we :) > sorry, it didn't seem to be the most important topic when I got out of bed