Re: [GENERAL] Waiting on ExclusiveLock on extension

2015-04-19 Thread Andomar

To put the top question first:

How can table extension locks explain a a massive spike in CPU usage?

I can imagine 400 connections waiting on disk I/O, but then, wouldn't 
they all be sleeping?


 Ok, that's a MAJOR hint, because relation 1249 is a system catalog;
 namely pg_attribute. So I think what's happening here is that your
 catalog has become horrifically bloated. I'm 99% certain that VACUUM ALL
 will not vacuum the catalog tables.

 Do you by chance have autovacuum turned off?

 A manual VACUUM VERBOSE pg_attribute might provide some immediate relief.

Autovacuum is turned on.  In addition, we do a manual VACUUM ALL at 
night.  VACUUM VERBOSE pg_attribute ran in 0 seconds and processed a few 
hundred rows.


 Are you using a connection pool? Establishing 50 new database
 connections per second won't do anything to help performance...

As I understand it, a pool reduces network and CPU load.  We have never 
seen any issues with those.  So the extra monitoring and maintenance 
cost of a pool seems hard to justify.


 I think what that means is that there was suddenly a big spike in memory
 demand at the OS level, so now the OS is frantically dumping cached
 pages. That in itself won't explain this, but it may be a clue.

We monitor memory usage with Cacti.  It's a dedicated server and nearly 
all memory is used as cache.  If a script runs and demands memory, that 
becomes visible as cache is cleared out.  There is no change in the 
amount of memory used as cache around the outage.


 In order to extend a relation we need to ask the filesystem to actually
 extend the file (which presumably means at least writing some metadata
 to disk), and then I think we create a WAL record. Creating the WAL
 record won't by itself write to disk... *unless* the wal_buffers are all
 already full.

I have a question here, we have synchronous_commit = off.  So when 
Postgres extends a page, would it do that just in memory, or does part 
of the extend operation require synchronous I/O?


 So if you also see an I/O spike when this happens you could well
 just be starved from the I/O system (though obviously it'd be
 better if we handled that situation more elegantly than this).

The SAR data shows no increase in pgpgin/s and pgpgout/s, which if I 
understand it correctly, means that there is no I/O spike.  There is 
however an enormous increase in CPU usage.


 I do suspect your pgfree/s is very high though; putting 200k pages/s on
 the free list seems like something's broken.

The system has constant and considerable load of small writes.  The 
pg_activity tool shows 300 IOPs sustained (it claims max IPs above 
11000.)  Postgres 9.3 had a comparable pgfree/s.


Would you know a good resource to get more knowledgeable about pgfree, 
pgpin, pgsteal?


Kind regards,
Andomar


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Waiting on ExclusiveLock on extension

2015-04-19 Thread Jim Nasby

On 4/19/15 4:24 AM, Andomar wrote:

To put the top question first:

How can table extension locks explain a a massive spike in CPU usage?

I can imagine 400 connections waiting on disk I/O, but then, wouldn't
they all be sleeping?


Not necessarily. Spinlocks don't put the process to sleep, but they're 
also supposed to be very short lived.



  Ok, that's a MAJOR hint, because relation 1249 is a system catalog;
  namely pg_attribute. So I think what's happening here is that your
  catalog has become horrifically bloated. I'm 99% certain that VACUUM ALL
  will not vacuum the catalog tables.
 
  Do you by chance have autovacuum turned off?
 
  A manual VACUUM VERBOSE pg_attribute might provide some immediate
relief.
 
Autovacuum is turned on.  In addition, we do a manual VACUUM ALL at
night.  VACUUM VERBOSE pg_attribute ran in 0 seconds and processed a few
hundred rows.

  Are you using a connection pool? Establishing 50 new database
  connections per second won't do anything to help performance...
 
As I understand it, a pool reduces network and CPU load.  We have never
seen any issues with those.  So the extra monitoring and maintenance
cost of a pool seems hard to justify.


Well, it sounds like you are CPU bound here... :P I don't know if this 
is related or not, but it wouldn't hurt. If you install pg_bouncer on 
the database server itself (which it's designed for) it shouldn't add 
much maintenance cost.



  I think what that means is that there was suddenly a big spike in memory
  demand at the OS level, so now the OS is frantically dumping cached
  pages. That in itself won't explain this, but it may be a clue.
 
We monitor memory usage with Cacti.  It's a dedicated server and nearly
all memory is used as cache.  If a script runs and demands memory, that
becomes visible as cache is cleared out.  There is no change in the
amount of memory used as cache around the outage.

  In order to extend a relation we need to ask the filesystem to actually
  extend the file (which presumably means at least writing some metadata
  to disk), and then I think we create a WAL record. Creating the WAL
  record won't by itself write to disk... *unless* the wal_buffers are all
  already full.
 
I have a question here, we have synchronous_commit = off.  So when
Postgres extends a page, would it do that just in memory, or does part
of the extend operation require synchronous I/O?


Turning that off doesn't mean there will never be an fsync, it just 
means that we don't wait for one before returning from COMMIT. I don't 
think relation extension itself requires a fsnyc, but see below.



  So if you also see an I/O spike when this happens you could well
  just be starved from the I/O system (though obviously it'd be
  better if we handled that situation more elegantly than this).

The SAR data shows no increase in pgpgin/s and pgpgout/s, which if I
understand it correctly, means that there is no I/O spike.  There is
however an enormous increase in CPU usage.


I'm not familiar enough with SAR to know if that's correct or not; 
iostat would be a good way to confirm it.



  I do suspect your pgfree/s is very high though; putting 200k pages/s on
  the free list seems like something's broken.
 
The system has constant and considerable load of small writes.  The
pg_activity tool shows 300 IOPs sustained (it claims max IPs above
11000.)  Postgres 9.3 had a comparable pgfree/s.


That leads me to a new theory... you may be running into problems 
finding free buffers in the buffer pool. We need to have a buffer before 
we can extend a relation, and if you have a lot of pressure on shared 
buffers it can take quite a bit of CPU to find one. To make matters 
worse, that search for a buffer takes place while holding the extension 
lock.


Would you be able to get a stack trace of a backend that's holding an 
extension lock? Or maybe perf would provide some insight.



Would you know a good resource to get more knowledgeable about pgfree,
pgpin, pgsteal?


Unfortunately I'm not strong on the system tools.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Waiting on ExclusiveLock on extension

2015-04-19 Thread Jim Nasby

On 4/19/15 3:08 PM, Jim Nasby wrote:

  I do suspect your pgfree/s is very high though; putting 200k
pages/s on
  the free list seems like something's broken.
 
The system has constant and considerable load of small writes.  The
pg_activity tool shows 300 IOPs sustained (it claims max IPs above
11000.)  Postgres 9.3 had a comparable pgfree/s.


That leads me to a new theory... you may be running into problems
finding free buffers in the buffer pool. We need to have a buffer before
we can extend a relation, and if you have a lot of pressure on shared
buffers it can take quite a bit of CPU to find one. To make matters
worse, that search for a buffer takes place while holding the extension
lock.

Would you be able to get a stack trace of a backend that's holding an
extension lock? Or maybe perf would provide some insight.


BTW, 
http://postgresql.org/message-id/flat/20150329185619.ga29...@alap3.anarazel.de 
has some useful info about this.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with slow table update

2015-04-19 Thread Tim Uckun
On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov pawel.vese...@gmail.com
wrote:

 I found some dangling prepared transactions



How do you find and remove these?


Re: [GENERAL] [SQL] function to send email with query results

2015-04-19 Thread Anil Menon
Using Apache camel
Listen/notify -PGEvent Component (http://camel.apache.org/pgevent.html) -
Mail Component (http://camel.apache.org/mail.html)
You can also then handle all the exceptions of the email server easily.



On Sun, Apr 19, 2015 at 6:42 AM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/18/15 12:52 AM, David G. Johnston wrote:

 On Friday, April 17, 2015, Suresh Raja suresh.raja...@gmail.com
 mailto:suresh.raja...@gmail.com wrote:

 Hi all:

 I'm looking to write a function to send email with result of a
 query.Is it possible to send email with in a function.  Any help
 is appreciated.


 Yes...though neither the neither the sql nor the plpgsql languages have
 the necessary language features to do so - you will need to use
 something like plperlu.

 An (not mutually exclusive) alternative is to create an email queue and
 write an email sending client application to process that queue.  The
 main advantages being loose coupling and the ability to send the emails
 asynchronously.


 If you go that route you might find LISTEN/NOTIFY useful:
 http://www.postgresql.org/docs/9.4/static/sql-listen.html
 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] ORDER BY for jsonb

2015-04-19 Thread Pai-Hung Chen
​Thanks for the insight.

Pai-Hung


Re: [GENERAL] Help with slow table update

2015-04-19 Thread Jim Nasby

On 4/19/15 9:53 PM, Tim Uckun wrote:


On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov pawel.vese...@gmail.com
mailto:pawel.vese...@gmail.com wrote:

I found some dangling prepared transactions



How do you find and remove these?


SELECT * FROM pg_prepared_xacts;
ROLLBACK PREPARED xid;
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running pg_upgrade under Debian

2015-04-19 Thread Jeff Janes
On Sun, Apr 19, 2015 at 9:34 AM, rob stone floripa...@gmail.com wrote:




 On Fri, 2015-04-17 at 16:16 -0700, Adrian Klaver wrote:
  On 04/17/2015 03:09 PM, rob stone wrote:
   Hello,
  
   I'm trying to upgrade from 9.3 to 9.4 on my laptop and encountering
 this
   error:-
  
  
   postgres@roblaptop:/usr/lib/postgresql/9.4/bin$ ./pg_upgrade
 ...



 So, on the command line I'm putting listen_addresses=localhost which
 pg_upgrade picks up but the following option sets this to ''.
 It is starting pg_ctl in the 9.3/bin directory but makes the socket
 directory in 9.4/bin??


Note that 9.4/bin seems to be the current working directory from which you
are running the command.  That is probably not such a good idea.


 If pg_upgrade is actually reading the conf files it should not be
 necessary to supply any of these options, as port numbers, PID file
 locations, etc. are specified there.


If pg_upgrade used the values from the config files, then anyone who is
accustomed to connecting to the database with those setting could
unthinkingly connect to it during the upgrade process and screw up the
upgrade.  It intentionally chooses methods to prevent that from happening,
by putting the socket somewhere private, or at least unpredictable.

waiting for server to start2596 5533cb8c.a24 2015-04-20 01:36:44
 AESTLOG:  redirecting log output to logging collector process
 2596 5533cb8c.a24 2015-04-20 01:36:44 AESTHINT:  Future log output will

appear in directory pg_log


However, it appears appear that pg_upgrade does honor the logging-collector
related settings from the config file (should it?  I can see arguments both
ways) and you so you have to look in the pg_log directory to figure out
what the actual problem is.

Cheers,

Jeff


Re: [GENERAL] Running pg_upgrade under Debian

2015-04-19 Thread rob stone



On Fri, 2015-04-17 at 16:16 -0700, Adrian Klaver wrote:
 On 04/17/2015 03:09 PM, rob stone wrote:
  Hello,
 
  I'm trying to upgrade from 9.3 to 9.4 on my laptop and encountering this
  error:-
 
 
  postgres@roblaptop:/usr/lib/postgresql/9.4/bin$ ./pg_upgrade
  -b /usr/lib/postgresql/9.3/bin -B /usr/lib/postgresql/9.4/bin
  -d /home/postgres/data93/userqueries
  -D /home/postgres/data94/userqueries -U pguserqueries
 
  check for /home/postgres/data93/userqueries/base failed: No such file
  or directory
 
  Failure, exiting
  postgres@roblaptop:/usr/lib/postgresql/9.4/bin$
 
  postgres@roblaptop:/usr/lib/postgresql/9.4/bin$ ./pg_upgrade --version
  pg_upgrade (PostgreSQL) 9.4.1
 
 
  I have two tablespaces defined for this data -- one to hold the tables
  and another for the indices.
 
  There is no base file or directory.
 
 Guessing /home/postgres/data*/userqueries/ is the location of your 
 tablespace, not the cluster directory.
 

Correct.

 The -d and -D need to point at the cluster directory, which will be 
 something like :
 
 /var/lib/postgresql/9.3/main
 
 
  Do I just create a dummy directory named base?
 
 No, you need to point to the correct directory.
 
 See here for more information:
 
 http://www.postgresql.org/docs/9.4/interactive/pgupgrade.html
 

Actually, the wiki pages on upgrading gave me some more information,
particularly about specifying where to find postgresql.conf. I adopted
the symlink suggestion.

However, running the following still gives errors:-

postgres@roblaptop:/usr/lib/postgresql/9.4/bin$ ./pg_upgrade
-b /usr/lib/postgresql/9.3/bin -B /usr/lib/postgresql/9.4/bin
-d /var/lib/postgresql/9.3/main -D /var/lib/postgresql/9.4/main -p 5432
-P 5433 -o ' -c listen_addresses=localhost ' -O ' -c
listen_addresses=localhost ' --check --verbose
Running in verbose mode
Performing Consistency Checks
-
Checking cluster versions   ok
pg_control values:

First log segment after reset:00010002
pg_control version number:937
Catalog version number:   201306121
Database system identifier:   6026352641161422830
Latest checkpoint's TimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/684
Latest checkpoint's NextOID:  12036
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:674
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0
Current pg_control values:

pg_control version number:942
Catalog version number:   201409291
Database system identifier:   6108663141646003887
Latest checkpoint's TimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/719
Latest checkpoint's NextOID:  12142
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:675
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 2048
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0


Values to be changed:

First log segment after reset:00010002
/usr/lib/postgresql/9.3/bin/pg_ctl -w -l pg_upgrade_server.log -D
/var/lib/postgresql/9.3/main -o -p 5432 -b  -c
listen_addresses=localhost  -c listen_addresses='' -c
unix_socket_permissions=0700 -c
unix_socket_directories='/usr/lib/postgresql/9.4/bin' start 
pg_upgrade_server.log 21

*failure*
There were problems executing /usr/lib/postgresql/9.3/bin/pg_ctl -w
-l pg_upgrade_server.log -D /var/lib/postgresql/9.3/main -o -p 5432
-b  -c listen_addresses=localhost  -c listen_addresses='' -c
unix_socket_permissions=0700 -c