Re: [BUGS] BUG #8315: GRANTS allowed on extension functions, but not dumped by pg_dump

2013-07-18 Thread Jeff Frost

On Jul 18, 2013, at 11:47 AM, Tom Lane  wrote:

> j...@pgexperts.com writes:
>> permtest=# create extension dblink;
>> CREATE EXTENSION
>> permtest=# grant EXECUTE on FUNCTION dblink(text) to permtestuser;
>> GRANT
> 
> I see no bug here.  This is not different from any other
> property-alteration you might do on an extension member object.
> We allow that (if you have privileges), but it's up to you to keep it
> in sync with the extension definition file.

So, perhaps we could warn users on extension property alterations?

Definitely seems like a gotcha.




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #8225: logging options don't change after reload

2013-06-14 Thread Jeff Frost
On 06/13/13 20:44, Jeff Frost wrote:
> These are definitely busy systems, but usually not running close to the edge, 
> I'll see if I can make a test case using pgbench on 9.2.4.


I'm afraid my attempts to reproduce were again unsuccessful.


:-(


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #8225: logging options don't change after reload

2013-06-13 Thread Jeff Frost

On Jun 13, 2013, at 7:48 PM, Tom Lane  wrote:

> Jeff Frost  writes:
>> What I don't understand is the new log file being created from the new
>> log_filename setting but then nothing being logged into it.  Is it the
>> postmaster which creates that file?  I would've thought it would be
>> the logger process?
> 
> Hm, I hadn't focused on that --- that *is* pretty bizarre.  The
> postmaster creates the log file initially before forking the syslogger,
> but subsequent rotations are handled by the syslogger process.
> 
> Is it possible that your systems are running on the hairy edge of ENFILE
> limits?  I notice that the syslogger will silently fail to rotate if it
> gets ENFILE while trying to open the new log file.  That doesn't look
> like it'd explain the lack of log_checkpoint activity, though.  Also,
> usually people notice this state because everything else on the box
> starts to fall over ...

These are definitely busy systems, but usually not running close to the edge, 
I'll see if I can make a test case using pgbench on 9.2.4.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #8225: logging options don't change after reload

2013-06-13 Thread Jeff Frost

On Jun 13, 2013, at 5:16 PM, Tom Lane  wrote:

> Jeff Frost  writes:
>> On Jun 13, 2013, at 4:50 PM, Tom Lane  wrote:
>>> ...  So one theory about this would be that those processes
>>> aren't absorbing the GUC updates, perhaps because the SIGHUP signals the
>>> postmaster should be sending them are getting lost.
> 
>> Interestingly, it will often pick them up if you wait a few seconds and send 
>> it another reload.
> 
> Hmm, that definitely lends some credence to the lost-signal theory,
> since another reload would cause the postmaster to again signal all
> its children, and this time the signal might go through.
> 
> But I still have no idea how we might debug further.  You could possibly
> try something like strace'ing the processes, but it seems fairly likely
> that the Heisenberg principle would apply if you did.

What I don't understand is the new log file being created from the new 
log_filename setting but then nothing being logged into it.  Is it the 
postmaster which creates that file?  I would've thought it would be the logger 
process?

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #8225: logging options don't change after reload

2013-06-13 Thread Jeff Frost

On Jun 13, 2013, at 4:50 PM, Tom Lane  wrote:

> j...@pgexperts.com writes:
>> What happens is that we change various logging options in postgresql.conf,
>> then reload, and every so often, the settings don't seem to take effect even
>> though they are logged as being changed.
> 
> FWIW, the "parameter changed" messages are logged when the postmaster
> process updates its values of the GUCs.  The particular options you're
> complaining of here, though, are not actually checked in the postmaster
> --- they're used in the checkpointer or syslogger processes
> respectively.  So one theory about this would be that those processes
> aren't absorbing the GUC updates, perhaps because the SIGHUP signals the
> postmaster should be sending them are getting lost.  I'm not sure how we
> might track down the cause though.  How "various" are the platforms
> you're seeing this on?

I've seen it on 9.0, 9.1 and 9.2 recent versions running on Ubuntu 10.04/12.04, 
Centos 5/6 and Scientific Linux 6.

I've not tried on Windows.

Interestingly, it will often pick them up if you wait a few seconds and send it 
another reload.

I've been seeing it for a while, but haven't reported it since I couldn't come 
up with a reproducible test case.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #8225: logging options don't change after reload

2013-06-11 Thread Jeff Frost
On 06/11/13 15:27, j...@pgexperts.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:  8225
> Logged by:      Jeff Frost
> Email address:  j...@pgexperts.com
> PostgreSQL version: 9.1.8
> Operating system:   various
> Description:
>
> I've seen this a few times on client servers but still can't seem to boil it
> down to a nice reproducible test case.  
>
> What happens is that we change various logging options in postgresql.conf,
> then reload, and every so often, the settings don't seem to take effect even
> though they are logged as being changed.
>
> Here's an example from a recent logging run:
>
> grep -hi checkpoint *.csv
> 2013-05-31 12:58:33.051 MDT,,,5398,,5189c940.1516,8,,2013-05-07 21:40:48
> MDT,,0,LOG,0,"parameter ""log_checkpoints"" changed to
> ""on""",""
> 2013-05-31 15:18:36.131 MDT,,,5398,,5189c940.1516,22,,2013-05-07 21:40:48
> MDT,,0,LOG,0,"parameter ""log_checkpoints"" removed from configuration
> file, reset to default",""
>
> So, note that log_checkpoints was logged as being changed, then 2hrs 20mins
> later it's logged as being set back to default, but there's not a single
> checkpoint logged in that time period.  I checked and checkpoint_timeout is
> indeed set to 300s.  
>
> Also, in addition, we change the log_filename to datacollection-%H and while
> that also shows up in the log during the reload:
>
> 2013-05-31 12:58:33.051 MDT,,,5398,,5189c940.1516,9,,2013-05-07 21:40:48
> MDT,,0,LOG,0,"parameter ""log_filename"" changed to
> ""datacollection-%H""",""
>
> and the logfile is created, postgresql continues to log to the old filename
> for the duration of the logging run.
>
> I've seen this on recent 9.0, 9.1 and 9.2 versions.  This particular run was
> on 9.1.8.
>
>
>

Oh sorry, that should specify "Various linux version" meaning: Ubuntu 12.04,
Ubuntu 10.04, CentOS 5/6, Scientific Linux 6.

-- 
Jeff Frost 
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues

2013-02-26 Thread Jeff Frost
On 02/26/13 09:08, Jeff Janes wrote:
>
> They're all number of xlog based and the first one doesn't remove any,
> that's not surprising,
>
>
> What was the state of the pg_xlog directory before you started the test?  If
> it were not full up to check_point segments * 2, (which is likely was not,
> since populating pgbench_accounts avoids WAL when not archiving), this I
> agree it is not surprising.

It always start at around 258 files.

The total WAL volume during the 10min pgbench run as evidenced by enabling an
archive command that just touches the filename is 1,972 WAL files archived, or
31G.  So, it is cleaning up or recycling about 9G, unfortunately, that's just
2G too few for a 20G filesystem.



-- 
Jeff Frost 
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 



Re: [BUGS] BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues

2013-02-24 Thread Jeff Frost

On Feb 24, 2013, at 1:05 PM, Jeff Frost  wrote:

> 
> On Feb 24, 2013, at 7:16 AM, Rafael Martinez Guerrero 
>  wrote:
> 
>> We reported this back in 2011, but we did not get to any conclusion:
>> http://www.postgresql.org/message-id/4de89072.7070...@usit.uio.no
>> 
>> In our case, we had this problem when creating a GIN index.
>> 
>> I think the problem has something to do with checkpoints, I think the
>> number of WAL files will grow beyond the limit defined because the
>> system can not be finish with checkpoints. A good start to try to
>> explain what it is happening would be to try to identify or describe the
>> situations where checkpoints can take very long to complete or fail
>> altogether.
>> 
>> We are interested in this subject because we have had this problem a few
>> times. But we have not been able to identify the reason that triggers this.
> 
> Rafael, did you do a followup post on pgsql-performance, or did the thread 
> die at the end of the one you post above?

This is how the log_checkpoint output looks during the run:

2013-02-24 21:06:31.156 UTC,,,1624,,51282598.658,114,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint starting: immediate force wait",""
2013-02-24 21:06:31.216 UTC,,,1624,,51282598.658,115,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint complete: wrote 108 buffers (0.0%); 0 transaction 
log file(s) added, 0 removed, 4 recycled; write=0.054 s, sync=0.002 s, 
total=0.059 s; sync files=10, longest=0.000 s, average=0.000 s",""

So, this ^ is the manual checkpoint prior to the pgbench run and everything 
else is part of the pgbench run.

2013-02-24 21:06:45.227 UTC,,,1624,,51282598.658,116,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoints are occurring too frequently (14 seconds 
apart)",,"Consider increasing the configuration parameter 
""checkpoint_segments"".",,,""
2013-02-24 21:06:45.227 UTC,,,1624,,51282598.658,117,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint starting: xlog",""
2013-02-24 21:08:35.239 UTC,,,1624,,51282598.658,118,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint complete: wrote 266935 buffers (12.7%); 0 
transaction log file(s) added, 0 removed, 1 recycled; write=10.889 s, 
sync=99.079 s, total=110.011 s; sync files=29, longest=12.888 s, average=3.416 
s",""
2013-02-24 21:08:35.241 UTC,,,1624,,51282598.658,119,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint starting: xlog",""
2013-02-24 21:11:19.506 UTC,,,1624,,51282598.658,120,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint complete: wrote 801517 buffers (38.2%); 0 
transaction log file(s) added, 0 removed, 128 recycled; write=15.484 s, 
sync=148.739 s, total=164.266 s; sync files=32, longest=11.773 s, average=4.648 
s",""
2013-02-24 21:11:19.506 UTC,,,1624,,51282598.658,121,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint starting: xlog",""
2013-02-24 21:14:20.486 UTC,,,1624,,51282598.658,122,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint complete: wrote 1009466 buffers (48.1%); 0 
transaction log file(s) added, 140 removed, 257 recycled; write=13.690 s, 
sync=167.245 s, total=180.980 s; sync files=33, longest=14.442 s, average=5.067 
s",""
2013-02-24 21:14:20.487 UTC,,,1624,,51282598.658,123,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint starting: xlog",""
2013-02-24 21:17:31.507 UTC,,,1624,,51282598.658,124,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint complete: wrote 1055059 buffers (50.3%); 0 
transaction log file(s) added, 259 removed, 257 recycled; write=22.731 s, 
sync=168.239 s, total=191.020 s; sync files=32, longest=13.650 s, average=5.257 
s",""
2013-02-24 21:17:31.507 UTC,,,1624,,51282598.658,125,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint starting: xlog",""
2013-02-24 21:23:36.201 UTC,,,1624,,51282598.658,126,,2013-02-23 02:12:40 
UTC,,0,LOG,0,"checkpoint complete: wrote 814767 buffers (38.9%); 0 
transaction log file(s) added, 546 removed, 0 recycled; write=269.553 s, 
sync=95.042 s, total=364.693 s; sync files=31, longest=11.211 s, average=3.065 
s",""

They're all number of xlog based and the first one doesn't remove any, that's 
not surprising, but I'm surprised that the second one doesn't remove any.  It 
does recycle quite a few.  I guess the issue is that it can generate WAL files 
way faster than it can checkpoint and cleanup.  I think I need to write a 
script to output the number of WAL files each minute so we can better correlate 
what's going on.

Re: [BUGS] BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues

2013-02-24 Thread Jeff Frost

On Feb 24, 2013, at 7:16 AM, Rafael Martinez Guerrero 
 wrote:

> We reported this back in 2011, but we did not get to any conclusion:
> http://www.postgresql.org/message-id/4de89072.7070...@usit.uio.no
> 
> In our case, we had this problem when creating a GIN index.
> 
> I think the problem has something to do with checkpoints, I think the
> number of WAL files will grow beyond the limit defined because the
> system can not be finish with checkpoints. A good start to try to
> explain what it is happening would be to try to identify or describe the
> situations where checkpoints can take very long to complete or fail
> altogether.
> 
> We are interested in this subject because we have had this problem a few
> times. But we have not been able to identify the reason that triggers this.

Rafael, did you do a followup post on pgsql-performance, or did the thread die 
at the end of the one you post above?


---
Jeff Frost 
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 








Re: [BUGS] BUG #6704: ALTER EXTENSION postgis SET SCHEMA leaves dangling relations

2012-06-22 Thread Jeff Frost
On Jun 22, 2012, at 7:37 PM, Tom Lane  wrote:

> j...@pgexperts.com writes:
>> DROP and CREATE extension appear to work fine, but if you ALTER EXTENSION
>> postgis SET SCHEMA foo, it leaves a few relations behind.
> 
> What it seems to be leaving behind is indexes ... also relation rowtypes.
> 
> A bit of looking shows that ALTER EXTENSION SET SCHEMA calls
> AlterObjectNamespace_oid on the table.  AlterObjectNamespace_oid
> calls AlterRelationNamespaceInternal, and nothing else.  In comparison,
> ALTER TABLE SET SCHEMA (AlterTableNamespace) calls
> AlterRelationNamespaceInternal and about four other things.  I'm not
> sure if this was broken before the last round of refactoring in this
> area, but for sure it's broken now.
> 
>regards, tom lane

Forgot to mention: initially saw this on 9.1.2 and tested 9.1.4 to see if it 
was resolved, but both exhibit same behavior. 
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-05-24 Thread Jeff Frost

On May 24, 2012, at 3:35 PM, Tom Lane wrote:

> Jeff Frost  writes:
>> On May 24, 2012, at 3:13 PM, Tom Lane wrote:
>>> Huh.  A bit bigger, but not by that much.  It doesn't seem like this
>>> would be enough to make seqscan performance fall off a cliff, as it
>>> apparently did.  Unless maybe the slightly-bloated catalogs were just a
>>> bit too large to fit in RAM, and so the repeated seqscans went from
>>> finding all their data in kernel disk buffers to finding none of it?
> 
>> Seems unlikely.
>> Server has 128GB of RAM.
> 
> Hm ... sure seems like that ought to be enough ;-), even with a near-2GB
> pg_attribute table.  What's your shared_buffers setting?

It's 8GB.

> 
>> BTW, when I connected to it this time, I had a really long time before my 
>> psql was able to send a query, so it seems to be still broken at least.
> 
> Yeah, I was afraid that re-initdb would be at best a temporary solution.

Oh, sorry, I wasn't clear on that.  The currently running system is still 
happy, but the old data directory got stuck in 'startup' for a while when I 
connected via psql.

> 
> It would probably be useful to confirm the theory that relcache rebuild
> is what makes it stall.  You should be able to manually remove the
> pg_internal.init file in the database's directory; then connect with
> psql, and time how long it takes before the pg_internal.init file
> reappears.


So, you're thinking autovac invalidates the cache and causes it to be rebuilt, 
then a bunch of new connections get stalled as they all wait for the rebuild?

I'll see if I can get the customer to move the data directory to a test system 
so I can futz with it on a non production system.
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-05-24 Thread Jeff Frost

On May 24, 2012, at 3:13 PM, Tom Lane wrote:

> Jeff Frost  writes:
>> On 05/24/12 12:21, Tom Lane wrote:
> 
> Huh.  A bit bigger, but not by that much.  It doesn't seem like this
> would be enough to make seqscan performance fall off a cliff, as it
> apparently did.  Unless maybe the slightly-bloated catalogs were just a
> bit too large to fit in RAM, and so the repeated seqscans went from
> finding all their data in kernel disk buffers to finding none of it?

Seems unlikely.

> 
>> So, interestingly, they're both quite large, but the old broken system is
>> quite a bit larger.  Any other data points be helpful?
> 
> I think it would be interesting to get the pg_relation_size for pg_class
> plus pg_attribute plus pg_index (which I think is everything that gets
> seqscannedd in this way) on both systems, and see how those numbers
> match up to total RAM on the box.

Server has 128GB of RAM.

Currently running system:

select pg_size_pretty(pg_relation_size('pg_class'));
 pg_size_pretty 

 181 MB
(1 row)

select pg_size_pretty(pg_relation_size('pg_index'));
 pg_size_pretty 

 23 MB
(1 row)


Old broken system:

select pg_size_pretty(pg_relation_size('pg_class'));
 pg_size_pretty 

 221 MB
(1 row)

select pg_size_pretty(pg_relation_size('pg_index'));
 pg_size_pretty 

 44 MB
(1 row)

BTW, when I connected to it this time, I had a really long time before my psql 
was able to send a query, so it seems to be still broken at least.



Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-05-24 Thread Jeff Frost
On 05/24/12 12:21, Tom Lane wrote:
> Jeff Frost  writes:
>> A few times today, we've seen postgresql 9.1.3 backends on Ubuntu 11.10 
>> x86_64
>> get stuck in 'startup' mode.  By that I mean the set_ps_output mode. Postgres
>> is installed via Martin Pitt's packages.
> I took another look at your original report here, and noticed something
> that had escaped me the first time: the backtrace shows that what that
> backend is doing is loading critical relcache entries the hard way, that
> is, via seqscans on the system catalogs.  This happens when the relcache
> cache file (pg_internal.init) is missing or stale.  A backend that has
> to go through that will write a new pg_internal.init file when it's
> done; but if many sessions are connecting at the same time, they'll each
> have to do it until someone succeeds in creating a new cache file.
>
> Another piece of evidence is here:
>
>> #5  0x7f7dd3d91ca4 in ReadBufferExtended (reln=0x7f7dd3b17540,
>> forkNum=MAIN_FORKNUM, blockNum=201362, mode=RBM_NORMAL, strategy=> out>) at bufmgr.c:246
> It's possible that gdb is lying to us, but it sure looks like this read
> request is for block 201362 of pg_attribute (I know it's pg_attribute
> from the location of the heap_getnext call).  That's 1.5Gb into the
> relation, and who knows if it's anywhere near the end yet.
>
> So in other words, we've got a boatload of backends all seqscanning a
> pretty darn large relation, and doing so multiple times.  That's
> probably enough to explain why it took a long time to finish
> initializing; the locking business is likely a red herring.
>
> How big is pg_attribute normally in this database?  I'm suspicious that
> what triggered this is some major bloating of pg_attribute (and maybe
> some of the other catalogs too).  Once you had that, relcache cache file
> invalidations are a fairly routine occurrence --- I think an autovacuum
> on one of the system catalogs would trigger that, for example.
>
>   regards, tom lane

Interesting!  Fortunately, that data directory is still lying around even
though we initdb'd to get a working system.

So, the current working system's pg_attribute looks like:

select pg_size_pretty(pg_relation_size('pg_catalog.pg_attribute'));
 pg_size_pretty

 1671 MB
(1 row)

select pg_size_pretty(pg_total_relation_size('pg_catalog.pg_attribute'));
 pg_size_pretty

 2439 MB


and on the old broken system:

select pg_size_pretty(pg_relation_size('pg_catalog.pg_attribute'));
 pg_size_pretty

 1858 MB
(1 row)

select pg_size_pretty(pg_total_relation_size('pg_catalog.pg_attribute'));
 pg_size_pretty

 1907 MB
(1 row)

So, interestingly, they're both quite large, but the old broken system is
quite a bit larger.  Any other data points be helpful?

-- 
Jeff Frost 
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-30 Thread Jeff Frost
On 04/28/12 17:17, Jeff Frost wrote:
> Since I had a theory that it's probably stalling on pg_catalog access, one of 
> the guys wrote a test harness that makes several connections and creates and 
> drops lots of temp tables. That does seem to allow us to reproduce the issue 
> in the production DB, but not in a newly created DB.  So, we're about to 
> initdb a new data directory and slony replicate the data over to it on the 
> original server.

After slony replicating over to the freshly initdb'd original server, the
problem has not recurred.  Maybe the problem was catalog corruption?


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-28 Thread Jeff Frost

On Apr 27, 2012, at 6:15 PM, Tom Lane wrote:

> Jeff Frost  writes:
>> Oh, good idea!  Looks like pg_buffercache is installed in this DB.  Customer
>> reports that it has been installed since the server has existed (and on the
>> previous server) but is not currently being used, though the issue with the
>> hanging startups did not start until this morning. Could it still cause
>> contention even if it's not being executed?
> 
> No, it shouldn't.  Perhaps we should start asking questions around the
> idea of "so what changed this morning?".  There has to have been some
> triggering cause, a configuration change or application change or
> something.


Since I had a theory that it's probably stalling on pg_catalog access, one of 
the guys wrote a test harness that makes several connections and creates and 
drops lots of temp tables. That does seem to allow us to reproduce the issue in 
the production DB, but not in a newly created DB.  So, we're about to initdb a 
new data directory and slony replicate the data over to it on the original 
server.
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost

On Apr 27, 2012, at 6:34 PM, Jeff Frost wrote:

> On 04/27/12 18:27, Jeff Frost wrote:
>> To make it more interesting, today is a
>> slow day.
> 
> And since it's a slow day..one more question..any further logging we could do
> to help find the culprit?




FYI, reindexed the system tables and reduced shared_buffers to 2GB.  Still 
seeing the issue with regularity.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
On 04/27/12 18:27, Jeff Frost wrote:
> To make it more interesting, today is a
> slow day.

And since it's a slow day..one more question..any further logging we could do
to help find the culprit?

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
On 04/27/12 18:15, Tom Lane wrote:
> No, it shouldn't.  Perhaps we should start asking questions around the
> idea of "so what changed this morning?".  There has to have been some
> triggering cause, a configuration change or application change or
> something.
Oh, we've been asking those questions all day long!

No packages were installed, postgresql.conf last edited on 3/29/2012.  Last
application update 4/24/2012.  There were some DDL changes, but those were
reverted and the problem persists.  To make it more interesting, today is a
slow day.

And we just saw the issue recur with pg_buffercache uninstalled. :-/

-- 
Jeff Frost 
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
On 04/27/12 17:45, Jeff Frost wrote:
> Oh, good idea!  Looks like pg_buffercache is installed in this DB.  Customer
> reports that it has been installed since the server has existed (and on the
> previous server) but is not currently being used, though the issue with the
> hanging startups did not start until this morning. Could it still cause
> contention even if it's not being executed?

BTW, just to be safe, it is now removed.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
On 04/27/12 17:30, Tom Lane wrote:
> Jeff Frost  writes:
>> and I've got 81 more that do not contain bufmgr.c and are also not block on
>> LWLockAcquire.
> Hm ... no smoking gun in what you showed so far.  I also took another
> look through 9.1 bufmgr.c, and I'm darned if I can see any code path
> there that holds a buffer mapping lock for any long interval.
>
> One possible theory is that you're using pg_buffercache_pages(), which
> does take all those locks.  It tries its best to not hold them long,
> but with a sufficiently large buffer cache there could be an issue.
> (What is the shared_buffers setting on this installation, anyway?)
>

Tom, what does the startup code path read?  The list of relations or
something?  Just wondering what the contention would be.

Oh, good idea!  Looks like pg_buffercache is installed in this DB.  Customer
reports that it has been installed since the server has existed (and on the
previous server) but is not currently being used, though the issue with the
hanging startups did not start until this morning. Could it still cause
contention even if it's not being executed?

shared_buffers is 8GB on this machine which has 128GB of RAM.

It seems the pooling software that's being used has a max connection lifetime
of 1hr which is what makes this so painful.



-- 
Jeff Frost 
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
]
PinBuffer (buf=0x7f60ac8748f0, strategy=0x0)
at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:1060
1060   
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:
No such file or directory.
in
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c
#0  PinBuffer (buf=0x7f60ac8748f0, strategy=0x0) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:1060
#1  0x7f62b81808d2 in BufferAlloc (smgr=0x7f62b961be00,
relpersistence=, forkNum=,
blockNum=, mode=, strategy=, hit=0x7fff9045d98f "") at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:742
#2  ReadBuffer_common (smgr=0x7f62b961be00, relpersistence=, forkNum=, blockNum=,
mode=, strategy=, hit=0x7fff9045d98f
"") at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:325
#3  0x7f62b81811e4 in ReadBufferExtended (reln=0x7f62b9787d78,
forkNum=MAIN_FORKNUM, blockNum=865, mode=RBM_NORMAL, strategy=0xa01962db) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:246
#4  0x7f62b7fc97bf in heapgetpage (scan=0x7f62b9874730, page=865) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:223
#5  0x7f62b7fc9cea in heapgettup_pagemode (scan=0x7f62b9874730, dir=, nkeys=, key=0x0) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:817
#6  0x7f62b7fca786 in heap_getnext (scan=0x7f62b9874730,
direction=NoMovementScanDirection) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:1342
#7  0x7f62b80e7f8b in SeqNext (node=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/executor/nodeSeqscan.c:66
#8  0x7f62b80d769a in ExecScanFetch (node=0x7f62b9873800, accessMtd=, recheckMtd=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/executor/execScan.c:82
#9  ExecScan (node=0x7f62b9873800, accessMtd=,
recheckMtd=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/executor/execScan.c:167
#10 0x7f62b80d0078 in ExecProcNode (node=0x7f62b9873800) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/executor/execProcnode.c:394
#11 0x7f62b80e8a69 in ExecSort (node=0x7f62b9873560) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/executor/nodeSort.c:103
#12 0x7f62b80cff98 in ExecProcNode (node=0x7f62b9873560) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/executor/execProcnode.c:458
#13 0x7f62b80ceed2 in ExecutePlan (queryDesc=0x7f62b9871970,
direction=NoMovementScanDirection, count=0) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/executor/execMain.c:1439
#14 standard_ExecutorRun (queryDesc=0x7f62b9871970,
direction=NoMovementScanDirection, count=0) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/executor/execMain.c:313
#15 0x7f62b81a45c7 in PortalRunSelect (portal=0x7f62b94afdf0,
forward=, count=0, dest=0x7f62b9543080) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/tcop/pquery.c:943
#16 0x7f62b81a5a30 in PortalRun (portal=,
count=, isTopLevel=, dest=, altdest=, completionTag=) at 
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/tcop/pquery.c:787
#17 0x7f62b81a310c in exec_execute_message (argc=,
argv=, username=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/tcop/postgres.c:1963
#18 PostgresMain (argc=, argv=,
username=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/tcop/postgres.c:3983
#19 0x7f62b8161ad3 in BackendRun () at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:3606
#20 BackendStartup () at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:3291
#21 ServerLoop () at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:1455
#22 0x7f62b816457c in PostmasterMain (argc=-1186270896,
argv=0x7f62b9471190) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:1116
#23 0x7f62b8102ec3 in main (argc=5, argv=0x7f62b9471170) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/main/main.c:199

-- 
Jeff Frost 
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
> I think you can probably skip all that are blocked in LWLockAcquire
> called from bufmgr.c:531, at least for a first pass.  Calls from
> elsewhere in bufmgr.c might be more interesting, and anything that's not
> blocked at an LWLockAcquire at all might be even more interesting.

A few more that include bufmgr.c:



#0  0x7f62b6150583 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:82
#1  0x7f62b82948ca in pg_usleep (microsec=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/port/pgsleep.c:43
#2  0x7f62b8159678 in CheckpointWriteDelay (flags=,
progress=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/bgwriter.c:722
#3  0x7f62b81806ab in BufferSync (flags=64) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:1283
#4  CheckPointBuffers (flags=64) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:1764
#5  0x7f62b7fea0e3 in CheckPointGuts (checkPointRedo=..., flags=64) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/transam/xlog.c:7989
#6  0x7f62b7ff5a3c in CreateCheckPoint (flags=64) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/transam/xlog.c:7797
#7  0x7f62b8159a46 in BackgroundWriterMain () at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/bgwriter.c:502
#8  0x7f62b80115c2 in AuxiliaryProcessMain (argc=,
argv=0x7fff9045db00) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/bootstrap/bootstrap.c:422
#9  0x7f62b815fe8b in StartChildProcess (type=BgWriterProcess) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:4504
#10 0x7f62b8162ec2 in reaper (postgres_signal_arg=)
at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:2424
#11 
#12 0x7f62b6150583 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:82
#13 0x7f62b81614d3 in ServerLoop () at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:1415
#14 0x7f62b816457c in PostmasterMain (argc=-1186270896,
argv=0x7f62b9471190) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:1116
#15 0x7f62b8102ec3 in main (argc=5, argv=0x7f62b9471170) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/main/main.c:199

#0  0x7f62b6149a50 in __read_nocancel () at
../sysdeps/unix/syscall-template.S:82
#1  0x7f62b8183ced in read (file=,
buffer=0x7f613ff51fa0 ":\n", amount=) at
/usr/include/bits/unistd.h:45
#2  FileRead (file=, buffer=0x7f613ff51fa0 ":\n",
amount=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/file/fd.c:1191
#3  0x7f62b819c7a6 in mdread (reln=, forknum=, blocknum=39946, buffer=0x7f613ff51fa0 ":\n") at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/smgr/md.c:640
#4  0x7f62b81807f1 in ReadBuffer_common (smgr=0x7f62b955d090,
relpersistence=, forkNum=,
blockNum=, mode=, strategy=, hit=0x7fff9045d5cf "") at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:438
#5  0x7f62b81811e4 in ReadBufferExtended (reln=0x7f62b7eef540,
forkNum=MAIN_FORKNUM, blockNum=39946, mode=RBM_NORMAL, strategy=0x4ff74188) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:246
#6  0x7f62b7fc97bf in heapgetpage (scan=0x7f62b9558350, page=39946) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:223
#7  0x7f62b7fca1ba in heapgettup (scan=0x7f62b9558350, dir=, nkeys=, key=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:556
#8  0x7f62b7fca7bc in heap_getnext (scan=0x7f62b9558350,
direction=1073029024) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:1345
#9  0x7f62b8259570 in RelationBuildTupleDesc (targetRelId=, insertIt=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:468
#10 RelationBuildDesc (targetRelId=, insertIt=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:882
#11 0x7f62b825af26 in RelationIdGetRelation (relationId=2601) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:1568
#12 0x7f62b7fcdaab in relation_open (relationId=2601, lockmode=1073029024)
at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:907
#13 0x7f62b7fcdb33 in heap_open (relationId=8, lockmode=1073029024) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:1077
#14 0x7f62b8252d82 in SearchCatCache (cache=0x7f62b94c9e70, v1=403,
v2=, v3=0, v4=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/catcache.c:1184
#15 0x7f62b8258a30 in RelationInitIndexAccessInfo
(relation=0x7f62b7efa828) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:1010
#16 0x7f62b8259d91 in RelationBuildDesc (targetRelId=, insertIt=) at

Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
On 04/27/12 12:27, Tom Lane wrote:
> Jeff Frost  writes:
>> Any idea what I should be looking for in the backtraces?
>> I would imagine I can ignore any that are in InitPostgres, but that still
>> leaves quite a few to look through.
> I think you can probably skip all that are blocked in LWLockAcquire
> called from bufmgr.c:531, at least for a first pass.  Calls from
> elsewhere in bufmgr.c might be more interesting, and anything that's not
> blocked at an LWLockAcquire at all might be even more interesting.
>
>   regards, tom lane

Here we go.  One that's not blocked on acquiring the lock, but does have
bufmgr.c in it:

#0  0x7f252f601a50 in __read_nocancel () at
../sysdeps/unix/syscall-template.S:82
#1  0x7f253163bced in read (file=,
buffer=0x7f251f3a1fa0 ":\n", amount=) at
/usr/include/bits/unistd.h:45
#2  FileRead (file=, buffer=0x7f251f3a1fa0 ":\n",
amount=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/file/fd.c:1191
#3  0x7f25316547a6 in mdread (reln=, forknum=, blocknum=5825, buffer=0x7f251f3a1fa0 ":\n") at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/smgr/md.c:640
#4  0x7f25316387f1 in ReadBuffer_common (smgr=0x7f2532d38310,
relpersistence=, forkNum=,
blockNum=, mode=, strategy=, hit=0x7fffea5fb47f "") at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:438
#5  0x7f25316391e4 in ReadBufferExtended (reln=0x7f2532cf6990,
forkNum=MAIN_FORKNUM, blockNum=5825, mode=RBM_NORMAL, strategy=0x4ff74188) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:246
#6  0x7f25314817bf in heapgetpage (scan=0x7f2532d335d0, page=5825) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:223
#7  0x7f25314821ba in heapgettup (scan=0x7f2532d335d0, dir=, nkeys=, key=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:556
#8  0x7f25314827bc in heap_getnext (scan=0x7f2532d335d0,
direction=523902880) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:1345
#9  0x7f2531711570 in RelationBuildTupleDesc (targetRelId=, insertIt=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:468
#10 RelationBuildDesc (targetRelId=, insertIt=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:882
#11 0x7f2531712f26 in RelationIdGetRelation (relationId=2616) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:1568
#12 0x7f2531485aab in relation_open (relationId=2616, lockmode=523902880)
at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:907
#13 0x7f2531485b33 in heap_open (relationId=6, lockmode=523902880) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/heap/heapam.c:1077
#14 0x7f25317106d7 in LookupOpclassInfo (operatorClassOid=10035,
numSupport=1) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:1278
#15 0x7f2531710c48 in IndexSupportInitialize (relation=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:1159
#16 RelationInitIndexAccessInfo (relation=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:1102
#17 0x7f2531711d91 in RelationBuildDesc (targetRelId=, insertIt=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:904
#18 0x7f253171201d in load_critical_index (indexoid=2693, heapoid=2618) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:3080
#19 0x7f25317124d2 in RelationCacheInitializePhase3 () at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/cache/relcache.c:2892
#20 0x7f25317293e9 in InitPostgres (in_dbname=,
dboid=, username=, out_dbname=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/init/postinit.c:816
#21 0x7f253165a908 in PostgresMain (argc=828737552, argv=, username=0x7f2532b90820 "monitoring") at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/tcop/postgres.c:3650
#22 0x7f2531619ad3 in BackendRun () at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:3606
#23 BackendStartup () at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:3291
#24 ServerLoop () at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:1455
#25 0x7f253161c57c in PostmasterMain (argc=851231024, argv=0x7f2532b8c8f0)
at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/postmaster/postmaster.c:1116
#26 0x7f25315baec3 in main (argc=3, argv=0x7f2532b8c8e0) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/main/main.c:199


-- 
Jeff Frost 
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexper

Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
On 04/27/12 12:17, Tom Lane wrote:
> Jeff Frost  writes:
>> Alright, found one that's a little different (at least it wasn't in 
>> InitPostgres):
> It's still blocking at bufmgr.c:531 though ... so AFAICS this is another
> victim of somebody monopolizing a buffer mapping lock.
>
>
Any idea what I should be looking for in the backtraces?

I would imagine I can ignore any that are in InitPostgres, but that still
leaves quite a few to look through.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
On 04/27/12 11:54, Jeff Frost wrote:
> On 04/27/12 10:14, Tom Lane wrote:
>> Jeff Frost  writes:
>>> A few times today, we've seen postgresql 9.1.3 backends on Ubuntu 11.10 
>>> x86_64
>>> get stuck in 'startup' mode.
>> Well, the one you backtraced seems to be waiting for somebody else to
>> release one of the buffer mapping locks ... which is not a lock I'd
>> expect to get held long, ever.  Could you check all the Postgres
>> processes and see which of them have backtraces different from that?
>> We need to figure out what's sitting on that lock and why.
>>
> Thanks Tom.  We've failed over to the replica which seems to have the issue
> happen less often, so I'll try and get the backtraces as soon as it's 
> reproduced.
Alright, found one that's a little different (at least it wasn't in 
InitPostgres):

#0  0x7f252f612297 in semop () at ../sysdeps/unix/syscall-template.S:82
#1  0x7f253160ccc0 in PGSemaphoreLock (sema=0x7f252ba65c00, interruptOK=0
'\000') at pg_sema.c:418
#2  0x7f253164cea5 in LWLockAcquire (lockid=,
mode=LW_SHARED) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/lmgr/lwlock.c:464
#3  0x7f253163889a in BufferAlloc (smgr=0x7f2532c79f60,
relpersistence=, forkNum=,
blockNum=, mode=, strategy=, hit=0x7fffea5f9ecf "") at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:531
#4  ReadBuffer_common (smgr=0x7f2532c79f60, relpersistence=, forkNum=, blockNum=,
mode=, strategy=, hit=0x7fffea5f9ecf
"") at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:325
#5  0x7f25316391e4 in ReadBufferExtended (reln=0x7f231ee95438,
forkNum=MAIN_FORKNUM, blockNum=166364, mode=RBM_NORMAL, strategy=0xecaf9628)
at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/storage/buffer/bufmgr.c:246
#6  0x7f2531490fa3 in _bt_relandgetbuf (rel=0x7f231ee95438, obuf=77863,
blkno=1, access=1) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/nbtree/nbtpage.c:651
#7  0x7f2531494cda in _bt_search (rel=0x7f231ee95438, keysz=1,
scankey=0x7fffea5f9ff0, nextkey=, bufP=0x7fffea5fa9fc,
access=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/nbtree/nbtsearch.c:121
#8  0x7f25314951bf in _bt_first (scan=, dir=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/nbtree/nbtsearch.c:841
#9  0x7f2531493954 in btgettuple (fcinfo=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/nbtree/nbtree.c:315
#10 0x7f2531721003 in FunctionCall2Coll (flinfo=0x0, collation=3932134720,
arg1=1, arg2=18446744073709551615) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/fmgr/fmgr.c:1319
#11 0x7f253148d0d5 in index_getnext (scan=0x7f2532eff820, direction=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/access/index/indexam.c:499
#12 0x7f25316c392e in get_actual_variable_range (root=, vardata=, sortop=, min=, max=0x7f2532efc8e0) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/adt/selfuncs.c:4748
#13 0x7f25316c7969 in ineq_histogram_selectivity (root=, vardata=0x7fffea5fb250, opproc=, isgt=0 '\000',
constval=, consttype=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/adt/selfuncs.c:805
#14 0x7f25316c810c in scalarineqsel (root=0x7f2532d18c00, operator=, isgt=-1 '\377', vardata=0x7fffea5fb250, constval=47095179,
consttype=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/adt/selfuncs.c:535
#15 0x7f25316c85fc in mergejoinscansel (root=0x7f2532d18c00, clause=, opfamily=, strategy=, nulls_first=0 '\000', leftstart=0x7fffea5fb468, leftend=0x7fffea5fb460,
rightstart=0x7fffea5fb458, rightend=0x7fffea5fb450) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/utils/adt/selfuncs.c:2843
#16 0x7f25315e11d3 in cached_scansel (path=0x7f2532efbac0,
root=0x7f2532d18c00, sjinfo=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/optimizer/path/costsize.c:2099
#17 cost_mergejoin (path=0x7f2532efbac0, root=0x7f2532d18c00, sjinfo=) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/optimizer/path/costsize.c:1847
#18 0x7f2531605e40 in create_mergejoin_path (root=0x7f2532d18c00,
joinrel=0x7f2532efbbf0, jointype=JOIN_LEFT, sjinfo=0x7f2532ef3180,
outer_path=0x7f2532ef6e50, inner_path=,
restrict_clauses=0x7f2532efc310, pathkeys=0x0, mergeclauses=0x7f2532efc540,
outersortkeys=0x7f2532efc4f0, innersortkeys=0x7f2532efc590) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/optimizer/util/pathnode.c:1554
#19 0x7f25315e8559 in sort_inner_and_outer (root=0x7f2532d18c00,
joinrel=0x7f2532efbbf0, outerrel=, innerrel=, jointype=, sjinfo=,
restrictlist=0x7f2532efc310) at
/build/buildd/postgresql-9.1-9.1.3/build/../src/backend/optimizer/path/joinpath.c:305
#20 add_paths_to_joinrel

Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
On 04/27/12 10:14, Tom Lane wrote:
> Jeff Frost  writes:
>> A few times today, we've seen postgresql 9.1.3 backends on Ubuntu 11.10 
>> x86_64
>> get stuck in 'startup' mode.
> Well, the one you backtraced seems to be waiting for somebody else to
> release one of the buffer mapping locks ... which is not a lock I'd
> expect to get held long, ever.  Could you check all the Postgres
> processes and see which of them have backtraces different from that?
> We need to figure out what's sitting on that lock and why.
>

Thanks Tom.  We've failed over to the replica which seems to have the issue
happen less often, so I'll try and get the backtraces as soon as it's 
reproduced.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
On 04/27/12 09:07, Jeff Frost wrote:
> A few times today, we've seen postgresql 9.1.3 backends on Ubuntu 11.10 x86_64
> get stuck in 'startup' mode.  By that I mean the set_ps_output mode. Postgres
> is installed via Martin Pitt's packages.

quick followup on this..when it happens, you can connect successfully to the
postgres db, but as soon as you \c consprod, you get stuck.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] 9.1.3 backends getting stuck in 'startup'

2012-04-27 Thread Jeff Frost
A few times today, we've seen postgresql 9.1.3 backends on Ubuntu 11.10 x86_64
get stuck in 'startup' mode.  By that I mean the set_ps_output mode. Postgres
is installed via Martin Pitt's packages.

It manifests like this:

Server has been humming along fine, then suddenly many backends get stuck in
'startup' mode and the ps output looks like so:

postgres  2425  3.5  0.3 8695020 483868 ?  Ss   08:56   0:41 postgres: 
consprod consprod 192.168.1.150(41606) startup
postgres  2427  4.1  0.3 8695020 511756 ?  Ss   08:56   0:48 postgres: 
consprod consprod 192.168.1.152(60876) startup
postgres  2429  3.5  0.3 8695020 497252 ?  Ss   08:56   0:40 postgres: 
consprod consprod 192.168.1.151(41930) startup
postgres  2431  3.7  0.3 8695020 502004 ?  Ss   08:56   0:43 postgres: 
consprod consprod 192.168.1.151(41933) startup
postgres  2435  3.8  0.3 8695020 439080 ?  Ss   08:56   0:44 postgres: 
consprod consprod 192.168.1.157(37656) startup
postgres  2437  3.5  0.3 8695020 482304 ?  Ss   08:56   0:41 postgres: 
consprod consprod 192.168.1.153(45071) startup
postgres  2438  3.4  0.3 8695020 506776 ?  Ss   08:56   0:39 postgres: 
consprod consprod 192.168.1.153(45072) startup
postgres  2441  3.7  0.3 8695020 495652 ?  Ss   08:56   0:43 postgres: 
consprod consprod 192.168.1.153(45075) startup
postgres  2442  3.7  0.3 8695020 503336 ?  Ss   08:56   0:43 postgres: 
consprod consprod 192.168.1.153(45076) startup
postgres  2443  4.0  0.3 8695020 511760 ?  Ss   08:56   0:46 postgres: 
consprod consprod 192.168.1.153(45077) startup
postgres  2445  3.4  0.3 8695020 478632 ?  Ss   08:56   0:39 postgres: 
consprod consprod 192.168.1.155(54413) startup
postgres  2446  3.7  0.3 8695020 464380 ?  Ss   08:56   0:43 postgres: 
consprod consprod 192.168.1.154(46186) startup
postgres  2448  3.8  0.3 8695024 474400 ?  Ss   08:56   0:44 postgres: 
consprod consprod 192.168.1.154(46187) startup
postgres  2449  3.8  0.3 8695020 464932 ?  Rs   08:56   0:45 postgres: 
consprod consprod 192.168.1.154(46216) startup
postgres  2450  3.3  0.3 8695020 515200 ?  Ss   08:56   0:38 postgres: 
consprod consprod 192.168.1.152(60890) startup
postgres  2452  3.3  0.3 8695020 493384 ?  Ss   08:56   0:38 postgres: 
consprod consprod 192.168.1.152(60892) startup
postgres  2453  3.7  0.3 8695020 490168 ?  Ss   08:56   0:44 postgres: 
consprod consprod 192.168.1.152(60893) startup
postgres  2456  3.4  0.3 8695020 449348 ?  Ss   08:56   0:40 postgres: 
consprod consprod 192.168.1.156(50829) startup


Restarting required '-m immediate'.

Stracing reveals lots of semops:

semop(589837, {{3, 1, 0}}, 1)   = 0
semop(524299, {{11, 1, 0}}, 1)  = 0
semop(557068, {{6, 1, 0}}, 1)   = 0
semop(458761, {{2, 1, 0}}, 1)   = 0
semop(557068, {{9, 1, 0}}, 1)   = 0
read(8, ":\n\0\0\210\261\224\316\4\0\1\0\364\0\0 \0 \4
\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
read(8, ":\n\0\0\320\262\224\316\4\0\1\0\364\0\0 \0 \4
\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192


backtrace looks like this:
0x7f7dd1d1f497 in semop () at ../sysdeps/unix/syscall-template.S:82
82  in ../sysdeps/unix/syscall-template.S
#0  0x7f7dd1d1f497 in semop () at ../sysdeps/unix/syscall-template.S:82
#1  0x7f7dd3d67b18 in PGSemaphoreLock (sema=0x7f7dce0e8210, interruptOK=0
'\000') at pg_sema.c:418
#2  0x7f7dd3da67c5 in LWLockAcquire (lockid=FirstBufMappingLock,
mode=LW_SHARED) at lwlock.c:464
#3  0x7f7dd3d91289 in BufferAlloc (foundPtr=0x7fff84d3b45e "",
strategy=0x7f7dd5f359e0, blockNum=201362, forkNum=MAIN_FORKNUM,
relpersistence=112 'p', smgr=) at bufmgr.c:531
#4  ReadBuffer_common (smgr=0x7f7dd5f39f00, relpersistence=112 'p',
forkNum=MAIN_FORKNUM, blockNum=201362, mode=RBM_NORMAL,
strategy=0x7f7dd5f359e0, hit=0x7fff84d3b4cf "") at bufmgr.c:325
#5  0x7f7dd3d91ca4 in ReadBufferExtended (reln=0x7f7dd3b17540,
forkNum=MAIN_FORKNUM, blockNum=201362, mode=RBM_NORMAL, strategy=) at bufmgr.c:246
#6  0x7f7dd3be3d46 in heapgetpage (scan=0x7f7dd5f351c0, page=201362) at
heapam.c:223
#7  0x7f7dd3be4732 in heapgettup (scan=0x7f7dd5f351c0, dir=, nkeys=2, key=0x7f7dd5f358a0) at heapam.c:556
#8  0x7f7dd3be52f5 in heap_getnext (scan=0x7f7dd5f351c0,
direction=) at heapam.c:1345
#9  0x7f7dd3e66167 in RelationBuildTupleDesc (relation=0x7f7dd3b23ee8) at
relcache.c:468
#10 RelationBuildDesc (targetRelId=, insertIt=1 '\001') at
relcache.c:882
#11 0x7f7dd3e66bdd in load_critical_index (indexoid=2655, heapoid=2603) at
relcache.c:3080
#12 0x7f7dd3e6839b in RelationCacheInitializePhase3 () at relcache.c:2890
#13 0x7f7dd3e7f931 in InitPostgres (in_dbname=,
dboid=2228467824, username=, out_dbname=0x0) at postinit.c:816
#14 0x7f7dd3db26d8 in PostgresMain (argc=, argv=, username=0x7f7dd5e4bc90 "consprod") at postgres.c:3650
#15 0x7f7dd3d75923 in BackendRun (port=0x7f7dd5e96ea0) at postmaster.c:3606
#16 BackendStartup (port=0x7f7dd5e96ea0) at postmaster.c:3291
#17 Serv

Re: [BUGS] BUG #6092: specific casting required for gist indexing of bigint

2011-07-06 Thread Jeff Frost
On 07/05/11 17:06, Tom Lane wrote:
> "Jeff Frost"  writes:
>> Ran into a situation with a customer who is using the btree_gist contrib
>> module to allow combined index of some tsearch data and two other columns.
>> One of these other columns is a bigint field.  I noticed that the combined
>> index won't be used by the planner unless you specifically cast the bare
>> number to a bigint.
> If memory serves, the btree_gist opclasses don't include any cross-type
> operators, so "int8 = int4" doesn't work here.
>
Ah! And if you look at the contrib/btree_gist/sql/int8.sql test file, you'll
see this:

SELECT count(*) FROM int8tmp WHERE a <  464571291354841::int8;

So, I'd say it's expected behavior even though it's slightly counter intuitive
if you're used to the auto typing behavior.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #6092: specific casting required for gist indexing of bigint

2011-07-05 Thread Jeff Frost

The following bug has been logged online:

Bug reference:  6092
Logged by:  Jeff Frost
Email address:  j...@pgexperts.com
PostgreSQL version: 9.0.4
Operating system:   CentOS 5.5
Description:specific casting required for gist indexing of bigint
Details: 

Ran into a situation with a customer who is using the btree_gist contrib
module to allow combined index of some tsearch data and two other columns.
One of these other columns is a bigint field.  I noticed that the combined
index won't be used by the planner unless you specifically cast the bare
number to a bigint.  Here's a quick test case:

createdb jefftest
psql -f /usr/pgsql-9.0/share/contrib/btree_gist.sql jefftest

jefftest=# create table test (id bigint);
CREATE TABLE
Time: 28.541 ms
jefftest=# insert into test select generate_series(1,10);
INSERT 0 10
Time: 179.768 ms
jefftest=# CREATE INDEX test__id ON test using gist(id) ;
CREATE INDEX
Time: 1603.811 ms
jefftest=# ANALYZE test;
ANALYZE
Time: 21.854 ms
jefftest=# explain analyze select id from test WHERE id = 587;
 QUERY PLAN 
   


 Seq Scan on test  (cost=0.00..1693.00 rows=500 width=8) (actual
time=0.097..14.698 rows=1 loops=1)
   Filter: (id = 587)
 Total runtime: 14.739 ms
(3 rows)

Time: 32.965 ms
jefftest=# explain analyze select id from test WHERE id = 587::bigint;
 QUERY PLAN 
   


 Bitmap Heap Scan on test  (cost=20.16..490.49 rows=500 width=8) (actual
time=0.037..0.038 rows=1 loops=1)
   Recheck Cond: (id = 587::bigint)
   ->  Bitmap Index Scan on test__id  (cost=0.00..20.03 rows=500 width=0)
(actual time=0.027..0.027 rows=1 loops=1)
 Index Cond: (id = 587::bigint)
 Total runtime: 0.080 ms
(5 rows)

Time: 0.592 ms

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #4491: regression in gist indexes

2008-10-25 Thread Jeff Frost

Looks like this is a dup of #4479:

http://archives.postgresql.org/pgsql-bugs/2008-10/msg00094.php

-- Forwarded message --
Date: Wed, 22 Oct 2008 19:11:51 -0300
From: [EMAIL PROTECTED]
To: Jeff Frost <[EMAIL PROTECTED]>
Subject: Stalled post to pgsql-bugs

Your message to pgsql-bugs has been delayed, and requires the approval
of the moderators, for the following reason(s):

The author ("Jeff Frost" <[EMAIL PROTECTED]>)
  is not a member of any of the restrict_post groups.

If you do not wish the message to be posted, or have other concerns,
please send a message to the list owners at the following address:
  [EMAIL PROTECTED]--- Begin Message ---

The following bug has been logged online:

Bug reference:  4491
Logged by:  Jeff Frost
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.3.4
Operating system:   Fedora 9/Gentoo/Mac OS X
Description:regression in gist indexes
Details: 

It seems that 8.3.4 has a regression in the updating of gist indexes.  After
updating an indexed column in a row with a postgis gist index, the row is no
longer found in an index scan.  I have verified that this works on 8.2.7,
8.3.0, 8.3.1 and 8.3.3 but not 8.3.4.
Verified this is a problem on Fedora 9/Gentoo/Mac OS X.

You can see the test case here:

https://gist.github.com/d6c9b183196717d73b6a

Interestingly, the first index scan after the update does find the row, but
subsequent scans do not.

Here you can see it working on 8.3.0:

https://gist.github.com/d0dbbf29606822b8ceb9

You can grab the test table data dump here:

http://www.frostconsultingllc.com/coordinate_test.dmp

Please contact me if there's anything else you need to reproduce the bug. 
Note that if you're using the coordinate_test data, you'll have to set
enable_seqscan = 0 to force an index scan as the table only contains 100
rows.
--- End Message ---

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #4491: regression in gist indexes

2008-10-25 Thread Jeff Frost

The following bug has been logged online:

Bug reference:  4491
Logged by:  Jeff Frost
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.3.4
Operating system:   Fedora 9/Gentoo/Mac OS X
Description:regression in gist indexes
Details: 

It seems that 8.3.4 has a regression in the updating of gist indexes.  After
updating an indexed column in a row with a postgis gist index, the row is no
longer found in an index scan.  I have verified that this works on 8.2.7,
8.3.0, 8.3.1 and 8.3.3 but not 8.3.4.
Verified this is a problem on Fedora 9/Gentoo/Mac OS X.

You can see the test case here:

https://gist.github.com/d6c9b183196717d73b6a

Interestingly, the first index scan after the update does find the row, but
subsequent scans do not.

Here you can see it working on 8.3.0:

https://gist.github.com/d0dbbf29606822b8ceb9

You can grab the test table data dump here:

http://www.frostconsultingllc.com/coordinate_test.dmp

Please contact me if there's anything else you need to reproduce the bug. 
Note that if you're using the coordinate_test data, you'll have to set
enable_seqscan = 0 to force an index scan as the table only contains 100
rows.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs