Re: [GENERAL] Upgrading from 9.0.11 to 9.3.5 on CentOS 6 (64 bit)

2014-08-02 Thread Phoenix Kiula
Thanks...comments below.



 assuming you installed 9.0 from the yum.postgresql.com respositories, then,
 `yum update postgresql90-server`   and restart the postgresql-9.0 service
 should do nicely.



This worked. Took me to 9.0.17 for some reason. I'm OK with this.

But the vacuum full was a terrible idea. I just spent 2 tranches of
5 hours each waiting for it to work. Many websites/blogs mention NOT
to run vacuum full at all. Instead, run cluster. Is this better then?

My table is huge. Over a billion rows. The idea of pg_dump and then
pg_restore of a table might work, followed by reindexing. But that
would also cause serious downtime.

Anything else I can do? Just adjust the autovacuum information for
example? My current settings are as follows:

autovacuum  = on
autovacuum_max_workers  = 5
autovacuum_vacuum_cost_delay= 20ms
autovacuum_vacuum_cost_limit= 350


The table in question has over a billon rows.

HOW do I know if this table is causing the issues? This is the only
table that's heavily queried. Recently many times the PG server has
been locked, and the pending queries have led to server outage. When I
manually vacuumdb, this is the table where the process has stuck for
hours.

So I need to tune this table back to its usual performance.

Appreciate any ideas! I'm sure there are much larger tables in the
world than mine. What do they do? (Apart from replication etc)

Thanks.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Reindex taking forever, and 99% CPU

2014-08-02 Thread Phoenix Kiula
Hi. I've been patient. PG is 9.0.17, updated via Yum yesterday.

One of my large tables (101 GB on disk, about 1.1 billion rows) used
to take too long to vacuum. Not sure if it's an index corruption
issue. But I tried VACUUM FULL ANALYZE as recommended in another
thread yesterday, which took 5 hours on the two times I tried, without
finishing.

Now the REINDEX TABLE has taken over 6 hours as I decided to be
patient and just let something finish. Not sure this is normal though!
How do production level DBAs do this if it takes so long?

If I open another SSH window to my server and try select * from
pg_stats_activity it just hangs there, as the REINDEX I presume is
taking up all the memory? I basically can't do anything else on this
server.

Just in case it helps, a segment of my postgresql.conf is below. Would
appreciate any tips on what I can do.

(I did a pg_dump of just this table, which also took about 2 hours,
then I renamed the original table in the database, and tried to
pg_restore just the table, but it gave me an error message about the
archive being in the wrong format !!! So REINDEX or something like it
seems to be the only idea?)

Thanks for any help!

PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
and TOP output during the running of the REINDEX are below..


POSTGRESQL.CONF-

max_connections = 180
superuser_reserved_connections  = 5
shared_buffers  = 512MB
effective_cache_size= 1200MB
temp_buffers= 32MB
maintenance_work_mem= 320MB
work_mem= 128MB
wal_buffers = 20MB
fsync   = on
checkpoint_segments = 128
checkpoint_timeout  = 1000
enable_indexscan= on

# AUTOVAC
autovacuum  = on
autovacuum_max_workers  = 5  # max number of autovacuum subprocesses
  #autovacuum_vacuum_scale_factor  = 0.2# fraction of table size
before vacuum
autovacuum_vacuum_cost_delay= 20ms
autovacuum_vacuum_cost_limit= 350
...





--TOP OUTPUT (db name changed for privacy, with the word MYDOMAIN) ---

top - 21:18:51 up 22 days,  7:43,  2 users,  load average: 1.20, 1.17, 1.18
Tasks: 214 total,   3 running, 211 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.1%us,  1.6%sy,  0.0%ni, 71.9%id,  1.1%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4046644k total,  4022324k used,24320k free, 9880k buffers
Swap:  2096440k total,   177144k used,  1919296k free,  2526536k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
21044 postgres  25   0 1102m 513m  76m R 97.7 13.0 432:03.46 postgres:
MYDOMAIN_MYDOMAIN MYDOMAIN [local] REINDEX
 8812 root  18   0 1403m  53m 3344 S  2.3  1.4 377:33.38
./jre/bin/java -Djava.compiler=NONE -cp /usr/StorMan/RaidMan.jar com.
 8319 named 24   0  317m  37m 1860 S  1.3  0.9 319:11.26
/usr/sbin/named -u named -4 -t /var/named/chroot
14184 nobody15   0  266m  15m 5156 S  1.0  0.4   4:13.43 nginx:
worker process
14181 nobody15   0  279m  34m 5160 S  0.7  0.9   4:13.93 nginx:
worker process
30285 root  15   0 12760 1188  820 R  0.7  0.0   0:00.03 top
  282 root  10  -5 000 S  0.3  0.0 184:37.48 [kswapd0]
25093 nobody16   0  334m  15m 5124 S  0.3  0.4   0:01.00
/usr/local/apache/bin/httpd -k restart -DSSL
25095 nobody15   0  334m  15m 5256 S  0.3  0.4   0:00.94
/usr/local/apache/bin/httpd -k restart -DSSL
25102 nobody15   0  334m  15m 5120 S  0.3  0.4   0:00.93
/usr/local/apache/bin/httpd -k restart -DSSL
25106 nobody15   0  334m  15m 5416 S  0.3  0.4   0:00.99
/usr/local/apache/bin/httpd -k restart -DSSL
25109 nobody15   0  334m  15m 5424 S  0.3  0.4   0:00.94
/usr/local/apache/bin/httpd -k restart -DSSL
25113 nobody16   0  334m  15m 4980 S  0.3  0.4   0:00.93
/usr/local/apache/bin/httpd -k restart -DSSL
25115 nobody16   0  334m  15m 5192 S  0.3  0.4   0:00.95
/usr/local/apache/bin/httpd -k restart -DSSL
25117 nobody16   0  334m  15m 4988 S  0.3  0.4   0:00.97
/usr/local/apache/bin/httpd -k restart -DSSL
25119 nobody16   0  334m  15m 5028 S  0.3  0.4   0:00.96
/usr/local/apache/bin/httpd -k restart -DSSL
31759 root  15   0 000 S  0.3  0.0   0:35.37 [pdflush]
1 root  15   0 10368  592  556 S  0.0  0.0   0:04.29 init [3]
2 root  RT  -5 000 S  0.0  0.0   0:06.24
[migration/0]
3 root  34  19 000 S  0.0  0.0   0:08.72
[ksoftirqd/0]
4 root  RT  -5 000 S  0.0  0.0   0:00.00
[watchdog/0]
5 root  RT  -5 000 S  0.0  0.0   0:05.27
[migration/1]
6 root  34  19 000 S  0.0  0.0   3:49.89
[ksoftirqd/1]
7 root  RT  -5 000 S  0.0  0.0   0:00.00 [watchdog/1]


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reindex taking forever, and 99% CPU

2014-08-02 Thread Adrian Klaver

On 08/02/2014 06:20 PM, Phoenix Kiula wrote:

Hi. I've been patient. PG is 9.0.17, updated via Yum yesterday.

One of my large tables (101 GB on disk, about 1.1 billion rows) used
to take too long to vacuum. Not sure if it's an index corruption
issue. But I tried VACUUM FULL ANALYZE as recommended in another
thread yesterday, which took 5 hours on the two times I tried, without
finishing.

Now the REINDEX TABLE has taken over 6 hours as I decided to be
patient and just let something finish. Not sure this is normal though!
How do production level DBAs do this if it takes so long?


So why the REINDEX?



If I open another SSH window to my server and try select * from
pg_stats_activity it just hangs there, as the REINDEX I presume is
taking up all the memory? I basically can't do anything else on this
server.

Just in case it helps, a segment of my postgresql.conf is below. Would
appreciate any tips on what I can do.

(I did a pg_dump of just this table, which also took about 2 hours,
then I renamed the original table in the database, and tried to
pg_restore just the table, but it gave me an error message about the
archive being in the wrong format !!! So REINDEX or something like it
seems to be the only idea?)


Sounds to me like you did a plain text dump and then tried to use 
pg_restore to restore. One of the quirks of pg_dump/pg_restore is that 
if you do a plain text dump you need to feed it to psql not pg_restore. 
That being said I am not sure that increasing the size of your database 
by another 101 GB on what seems to be an overloaded machine is the answer.




Thanks for any help!


Still not sure what the problem is that you are trying to solve?

There was reference to VACUUM issues, but not a lot of detail. Some more 
information on what specifically you where having issues with might lead 
to some clarity on where to go from here.




PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
and TOP output during the running of the REINDEX are below..




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reindex taking forever, and 99% CPU

2014-08-02 Thread John R Pierce

On 8/2/2014 6:20 PM, Phoenix Kiula wrote:

PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
and TOP output during the running of the REINDEX are below..


POSTGRESQL.CONF-

max_connections = 180
superuser_reserved_connections  = 5
shared_buffers  = 512MB
effective_cache_size= 1200MB
temp_buffers= 32MB
maintenance_work_mem= 320MB
work_mem= 128MB


with 4GB of ram, and 180 connections, if you actually had all 180 
connections busy at once, you could use over 180 times work_mem, 
180*128MB in 4GB would be fatal.




--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reindex taking forever, and 99% CPU

2014-08-02 Thread Phoenix Kiula
Thanks John.

So what're the right settings? Anyway, right now Postgresql is
servicing only one main connection, which is the REINDEX. All other
stuff is switched off, no one else is connecting to the DB.

My issue with this table was the vaccum process would stop at this
table, and take hours. So I thought something was wrong with this
table. My version of PG was 9.0.11, and googling for similar issues
brought up an old post by Tom Lane that suggested to the poster of
that thread to upgrade. So now I have, and am at 9.0.17 -- I recognize
this is not 9.3.5, but not sure we have the appetite right now for a
massive upgrade.

So what I'm trying to do is reindex this specific table.



 iostat

Linux 2.6.18-238.9.1.el5 (coco.MYDOMAIN.com) 08/02/2014

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  10.630.103.11   13.420.00   72.74

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 284.10 43828.59  5729.27 84628259628 11062603842
sda1  0.00 0.00 0.00   2272 10
sda2  3.0844.97   989.21   86838949 1910058506
sda3  7.65   193.0484.34  372745356  162860428
sda4  0.00 0.00 0.00  6  0
sda5  1.5831.15 6.84   60140845   13208874
sda6  0.8220.28 0.88   391611381693104
sda7  1.1015.5924.32   30101692   46962204
sda8  2.7744.8820.07   86661146   38754800
sda9267.11 43478.67  4603.61 83952607992 8889065916






On Sun, Aug 3, 2014 at 9:56 AM, John R Pierce pie...@hogranch.com wrote:
 On 8/2/2014 6:20 PM, Phoenix Kiula wrote:

 PS: CentOS 6 64 bit, 4 GB of RAM, Raid 1 Raptor disks. Postgresql.conf
 and TOP output during the running of the REINDEX are below..


 POSTGRESQL.CONF-

 max_connections = 180
 superuser_reserved_connections  = 5
 shared_buffers  = 512MB
 effective_cache_size= 1200MB
 temp_buffers= 32MB
 maintenance_work_mem= 320MB
 work_mem= 128MB


 with 4GB of ram, and 180 connections, if you actually had all 180
 connections busy at once, you could use over 180 times work_mem, 180*128MB
 in 4GB would be fatal.



 --
 john r pierce  37N 122W
 somewhere on the middle of the left coast




 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reindex taking forever, and 99% CPU

2014-08-02 Thread Adrian Klaver

On 08/02/2014 07:02 PM, Phoenix Kiula wrote:

Thanks John.

So what're the right settings? Anyway, right now Postgresql is
servicing only one main connection, which is the REINDEX. All other
stuff is switched off, no one else is connecting to the DB.

My issue with this table was the vaccum process would stop at this
table, and take hours.


In your original post you said it was stopping on pg_class so now I am 
confused.




So I thought something was wrong with this
table. My version of PG was 9.0.11, and googling for similar issues
brought up an old post by Tom Lane that suggested to the poster of
that thread to upgrade. So now I have, and am at 9.0.17 -- I recognize
this is not 9.3.5, but not sure we have the appetite right now for a
massive upgrade.

So what I'm trying to do is reindex this specific table.





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reindex taking forever, and 99% CPU

2014-08-02 Thread Phoenix Kiula
 In your original post you said it was stopping on pg_class so now I am
 confused.



No need to be confused. The vacuum thing is a bit tricky for laymen
like myself. The pg_class seemed to be associated to this table.
Anyway, even before the upgrade, the vacuum was stopping at this table
and taking forever.

The question is: what now. Where can I give you information from?
IOSTAT I've already shared.

Will the work_mem settings affect the manual REINDEX that's still
running? What can I do to speed up the REINDEX? Should I change my
autovacuum settings for this table specifcally (it's the only mammoth
table in the DB, and our main one)?

Thanks.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reindex taking forever, and 99% CPU

2014-08-02 Thread Adrian Klaver

On 08/02/2014 07:37 PM, Phoenix Kiula wrote:

In your original post you said it was stopping on pg_class so now I am
confused.




No need to be confused. The vacuum thing is a bit tricky for laymen
like myself. The pg_class seemed to be associated to this table.
Anyway, even before the upgrade, the vacuum was stopping at this table
and taking forever.


Well pg_class is associated with all tables, it is the system catalog 
that holds information on tables, among other things.


So what made you think pg_class is involved in your issue?

I suspect you did not just pull that name out of thin air, that it came 
from some log or message.


Is that the case?



The question is: what now. Where can I give you information from?
IOSTAT I've already shared.

Will the work_mem settings affect the manual REINDEX that's still
running? What can I do to speed up the REINDEX? Should I change my
autovacuum settings for this table specifcally (it's the only mammoth
table in the DB, and our main one)?


What would be helpful would be some information on the table itself.

What is the schema definition?

What are you storing in the table?

What is its usage pattern, SELECT only, mixed INSERT UPDATE SELECT, etc?

At this point semi-randomly changing settings and operations on this 
table would seem to be counter productive.




Thanks.





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general