Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6
2017-03-09 18:28 GMT+01:00 Robert Haas <robertmh...@gmail.com>: > On Thu, Mar 9, 2017 at 7:47 AM, Naytro Naytro <nay...@googlemail.com> > wrote: > > We are having some performance issues after we upgraded to newest > > version of PostgreSQL, before it everything was fast and smooth. > > > > Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we > > upgraded to 9.6.2 with no improvement. > > > > Some information about our setup: Freebsd, Solaris (SmartOS), simple > > master-slave using streaming replication. > > > > Problem: > > Very high system CPU when master is streaming replication data, CPU > > goes up to 77%. Only one process is generating this load, it's a > > postgresql startup process. When I attached a truss to this process I > > saw a lot o read calls with almost the same number of errors (EAGAIN). > > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' > > > > Descriptor 6 is a pipe > > > > Read call try to read one byte over and over, I looked up to source > > code and I think this file is responsible for this behavior > > src/backend/storage/ipc/latch.c. There was no such file in 9.4. > > Our latch implementation did get overhauled pretty thoroughly in 9.6; > see primarily commit 98a64d0bd713cb89e61bef6432befc4b7b5da59e. But I > can't guess what is going wrong here based on this information. It > might help if you can pull some stack backtraces from the startup > process. Dtrace from solaris: http://pastebin.com/u03uVKbr
Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6
2017-03-09 20:19 GMT+01:00 Andres Freund <and...@anarazel.de>: > Hi, > > On 2017-03-09 13:47:35 +0100, Naytro Naytro wrote: > > We are having some performance issues after we upgraded to newest > > version of PostgreSQL, before it everything was fast and smooth. > > > > Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we > > upgraded to 9.6.2 with no improvement. > > > > Some information about our setup: Freebsd, Solaris (SmartOS), simple > > master-slave using streaming replication. > > Which node is on which of those, and where is the high load? > > High load in only on slaves, FreeBSD (master+slave) and Solaris (only slaves) > > > Problem: > > Very high system CPU when master is streaming replication data, CPU > > goes up to 77%. Only one process is generating this load, it's a > > postgresql startup process. When I attached a truss to this process I > > saw a lot o read calls with almost the same number of errors (EAGAIN). > > Hm. Just to clarify: The load is on the *receiving* side, in the startup > process? Because the load doesn't quite look that way... > > Yes > > > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' > > > > Descriptor 6 is a pipe > > That's presumably a latches internal pipe. Could you redo that > truss/strace with timestamps attached? Does truss show signals > received? The above profile would e.g. make a lot more sense if not. Is > the wal receiver sending signals? > > Truss from Solaris: http://pastebin.com/WajedZ8Y and FreeBSD: http://pastebin.com/DB5iT8na FreeBSD truss should show signals by default Dtrace from solaris: http://pastebin.com/u03uVKbr > > > Read call try to read one byte over and over, I looked up to source > > code and I think this file is responsible for this behavior > > src/backend/storage/ipc/latch.c. There was no such file in 9.4. > > It was "just" moved (and expanded), used to be at > src/backend/port/unix_latch.c. > > There normally shouldn't be that much "latch traffic" in the startup > process, we'd expect to block from within WaitForWALToBecomeAvailable(). > > Hm. Any chance you've configured a recovery_min_apply_delay? Although > I'd expect more timestamp calls in that case. > > No, we don't have this option configured > > Greetings, > > Andres Freund >
[HACKERS] Performance issue after upgrading from 9.4 to 9.6
Hi, We are having some performance issues after we upgraded to newest version of PostgreSQL, before it everything was fast and smooth. Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we upgraded to 9.6.2 with no improvement. Some information about our setup: Freebsd, Solaris (SmartOS), simple master-slave using streaming replication. Problem: Very high system CPU when master is streaming replication data, CPU goes up to 77%. Only one process is generating this load, it's a postgresql startup process. When I attached a truss to this process I saw a lot o read calls with almost the same number of errors (EAGAIN). root(at)d8:~ # truss -c -p 38091 ^Csyscall seconds calls errors semop 0.001611782 198 0 write 0.74404 2 0 read2.281535100 17266 12375 openat 0.000683532 48 0 lseek 0.177612479 20443 0 close 0.000395549 48 0 - --- --- 2.461912846 38005 12375 read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(444,0x0,SEEK_END) = 32571392 (0x1f1) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(444,0x0,SEEK_END) = 32571392 (0x1f1) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000) lseek(778,0x0,SEEK_END) = 57344 (0xe000) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000) lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000) read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000) lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000) read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000) lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000) lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000) read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000) lseek(818,0x0,SEEK_END) = 57344 (0xe000) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000) lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000) read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' Descriptor 6 is a pipe Read call try to read one byte over and over, I looked up to source code and I think this file is responsible for this behavior src/backend/storage/ipc/latch.c. There was no such file in 9.4.
[HACKERS] Performance issue after upgrading from 9.4 to 9.6
Hi, We are having some performance issues after we upgraded to newest version of PostgreSQL, before it everything was fast and smooth. Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we upgraded to 9.6.2 with no improvement. Some information about our setup: Freebsd, Solaris (SmartOS), simple master-slave using streaming replication. Problem: Very high system CPU when master is streaming replication data, CPU goes up to 77%. Only one process is generating this load, it's a postgresql startup process. When I attached a truss to this process I saw a lot o read calls with almost the same number of errors (EAGAIN). root(at)d8:~ # truss -c -p 38091 ^Csyscall seconds calls errors semop 0.001611782 198 0 write 0.74404 2 0 read2.281535100 17266 12375 openat 0.000683532 48 0 lseek 0.177612479 20443 0 close 0.000395549 48 0 - --- --- 2.461912846 38005 12375 read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(444,0x0,SEEK_END) = 32571392 (0x1f1) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(444,0x0,SEEK_END) = 32571392 (0x1f1) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000) lseek(778,0x0,SEEK_END) = 57344 (0xe000) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000) lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000) read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000) lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000) read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000) lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000) lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000) read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000) lseek(818,0x0,SEEK_END) = 57344 (0xe000) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000) lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000) read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000) read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable' Descriptor 6 is a pipe Read call try to read one byte over and over, I looked up to source code and I think this file is responsible for this behavior src/backend/storage/ipc/latch.c. There was no such file in 9.4.