On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe <scott.marl...@gmail.com>wrote:

> Let's review:
>


1: No test or staging system used before production
>

no, I do not have a full ubuntu machine replicating the exact memory and
application load of the production server.

this was changing one configuration parameter. something I was advised to
do, read about quite a bit, tested on my development server (mac) and then
proceeded to do at 6 am on Sunday morning, our slowest time.


2: DB left in an unknown state (trying to shut down, not able)
>

what ?

I checked the site, everything was normal.  I went in via psql and tried
some queries for about half an hour and continued to monitor the site.  then
I went to bed at 7am (EU time).

Why did it shutdown so much later ?

I have never restarted postgres before, so this was all new to me.  I
apologize that I wasn't born innately with such knowledge.

So is it normal for postgres to report that it failed to shut down, operate
for an hour and then go ahead and restart itself ?

3: No monitoring software to tell you when the site is down
>

of course I have monitoring software.  both external and internal.  but it
doesn't come and kick me out of bed.  yes, I need an automated cel phone
call.  that was the first thing I saw to afterwards.


4: I'm gonna just go ahead and guess no backups were taken either, or
> are regularly taken.
>

WTF ?   of course I have backups.  I just went through a very harsh down
period event.  I fail to see why it is now necessary for you to launch such
an attack on me.

Perhaps the tone of my post sounded like I was blaming you, or at least you
felt that way.  Why do you feel that way ?

Why not respond with:  "ouch !  did you check this ... that...."  say
something nice and helpful.  correct my mistakes




> This website can't be very important, if that's the way you treat it.
>

just to let you know, that is straight up offensive

This is high traffic real estate site.  Downtime is unacceptable.  I had
less downtime than this when I migrated to the new platform.

I spent rather a large amount of time reading and questioning here.  I asked
many questions for clarification and didn't do ANYTHING until I was sure it
was the correct solution.  I didn't just pull some shit off a blog and start
changing settings at random.

I double checked opinions against different people and I searched for more
docs on that param.  Amazingly none of the ones I found commented on the
shared memory issue and I didn't even understand the docs discussing shared
memory because it didn't seem to apply to what I was doing.  that's my
misunderstanding.  I come her to share my misunderstanding.




> And my phone starts complaining a minute after the site stops
> responding if something does go wrong the rest of the time.  Do not
> lay this at anyone else's feet.
>

I didn't.  There is not even the slightest hint of that in my post.

I came here and posted the details of where I went wrong and what confused
me about the documentation that I followed.  That's so other people can
follow it and so somebody here can comment on it.



>  changing that default
> > is brutally difficult and can only really be done by adjusting something
> in
> > the kernel.
>
> Please, that's a gross exaggeration.  The sum totoal to changing them is:
>
> run sysctl -a|grep shm
> copy out proper lines to cahnge
> edit sysctl.conf
> put new lines in there with changes
> sudo sysctl -p  # applies changes
> edit the appropriate postgresql.conf, make changes
> sudo /etc/init.d/postgresql-8.3 stop
> sudo /etc/init.d/postgresql-8.3 start
>

Considering how splendidly the experiment with changing fsm_max_pages went,
I think you can understand that I have no desire to experiment with kernel
settings.

It is easy for you because you ALREADY KNOW everything involved.  I am not a
sysadmin and we don't have one.  My apologies for that.

so does the above mean that I don't have to restart the entire server, just
postgres ?  I assumed that changing kernel settings means rebooting the
server.



> I have clustered that table, its still unbelievably slow.
>
> Did you actually delete the old entries before clustering it?  if it's
> still got 4G of old sessions or whatever in it, clustering ain't gonna
> help.
>

its a different table.  the problem one has only 300k rows

the problem is not the size, the problem is the speed is catastrophic



> I still don't know if this bloat due to the small free space map has
> > anything to do with why the table is performing like this.
>
> Since you haven't show us what changes, if any, have happened to the
> table, neither do we :)
>

sorry, it didn't seem to be the most important topic when I got out of bed

Reply via email to