Re: [HACKERS] shared memory release following failed lock acquirement.

2004-10-04 Thread Simon Riggs
 Merlin Moncure
  The name max_locks_per_transaction indicates a limit of some kind. The
  documentation doesn't mention anything about whether that limit is
  enforced
  or not.
 
  I suggest the additional wording:
  This parameter is not a hard limit: No limit is enforced on the
 number of
  locks in each transaction. System-wide, the total number of locks is
  limited
  by the size of the lock table.


 I think it's worse than that.  First of all, user locks persist outside
 of transactions, but they apply to this limit.

I was really thinking of the standard locking case. Yes, user locks make it
worse.

 A more appropriate name
 for the GUC variable would be 'estimated_lock_table_size_per_backend',
 or something like that.  I've been putting some thought into reworking
 the userlock contrib module into something acceptable into the main
 project, a substantial part of that being documentation changes.


I agree a renamed parameter would be more appropriate, though I suspect a
more accurate name will be about 5 yards long.

Documentation change would be worthwhile here... but I'll wait for your
changes before doing anything there,

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-30 Thread Simon Riggs
Tom Lane
 Simon Riggs [EMAIL PROTECTED] writes:
  Does this mean that the parameter max_locks_per_transaction
 isn't honoured
  at all, it is just used to size the lock table

 Yes, and that's how it's documented.


The name max_locks_per_transaction indicates a limit of some kind. The
documentation doesn't mention anything about whether that limit is enforced
or not.

I suggest the additional wording:
This parameter is not a hard limit: No limit is enforced on the number of
locks in each transaction. System-wide, the total number of locks is limited
by the size of the lock table.

The recent patch stops the system from crashing with an out of memory
condition, though this probably slightly hastens the condition of no locks
available. It would be good to clarify what behaviour the system exhibits
when we run out of locks.

I'm not sure myself now what that behaviour is: My understanding is that we
do not perform lock escalation (as does DB2), so presumably we just grind to
a halt? I take it that there is no automated way of getting out of this
situation? i.e. the deadlock detector doesn't start killing transactions
that hold lots of locks to free up space? So, we would basically just start
to build up lots of people waiting on locks - though without any mechanism
for diagnosing this is happening? What does happen and where does it end
(now)?

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-30 Thread Merlin Moncure
 The name max_locks_per_transaction indicates a limit of some kind. The
 documentation doesn't mention anything about whether that limit is
 enforced
 or not.
 
 I suggest the additional wording:
 This parameter is not a hard limit: No limit is enforced on the
number of
 locks in each transaction. System-wide, the total number of locks is
 limited
 by the size of the lock table.


I think it's worse than that.  First of all, user locks persist outside
of transactions, but they apply to this limit.  A more appropriate name
for the GUC variable would be 'estimated_lock_table_size_per_backend',
or something like that.  I've been putting some thought into reworking
the userlock contrib module into something acceptable into the main
project, a substantial part of that being documentation changes.

Merlin

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-29 Thread Merlin Moncure
Tgl wrote:
  As I see it, this means the user-locks (and perhaps all
  locks...?) eat around ~ 6k bytes memory each.
 
 They're allocated in groups of 32, which would work out to close to
6k;
 maybe you were measuring the incremental cost of allocating the first
one?

I got my 6k figure by dividing 1 into 64M, 1 being the value
that crashed the server.  That's reasonable because doubling shared
buffers slightly more than doubled the crash value.

I was wondering how ~ 10k locks ran me out of shared memory when each
lock takes ~ 260b (half that, as you say) and I am running 8k buffers =
64M.

260 * 100 backends * 64 maxlocks = 1.7 M.  Sure, the hash table and
other stuff adds some...but this is no where near what it should take to
run me out.  

Am I just totally misunderstanding how to estimate locks memory
consumption?

Merlin

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-29 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 I was wondering how ~ 10k locks ran me out of shared memory when each
 lock takes ~ 260b (half that, as you say) and I am running 8k buffers =
 64M.

The number of buffers you have doesn't have anything to do with this.
The question is how much shared memory space is there for the lock
table, above and beyond what's used for everything else (such as
buffers).

I just went through and corrected some minor errors in the calculation
of shared memory block size (mostly stuff where the estimation code had
gotten out of sync with the actual work over time).  I now find that
with all-default configuration parameters I can create 7808 locks before
running out of shared memory, rather than the promised 6400.  (YMMV due
to platform-specific differences in MAXALIGN, sizeof(pointer), etc.)
This is coming from two places: LockShmemSize deliberately adds on 10%
slop factor to its calculation of the lock table size, and then
CreateSharedMemoryAndSemaphores adds on 100KB for safety margin.  Both
of those numbers are kinda pulled from the air, but I don't see a strong
reason to change them.  The other space calculations seem to be pretty
nearly dead-on.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-29 Thread Simon Riggs
Tom Lane
 Merlin Moncure [EMAIL PROTECTED] writes:
  According to postgresql.conf, using these settings the lock table eats
  64*260*100 bytes =  2M.  Well, if it's running my server out of shared
  memory, it's eating much, much more shmem than previously thought.

 Hmm, the 260 is out of date I think.  I was seeing about 184 bytes/lock
 in my tests just now.

  Also, I was able to acquire around 10k locks before the server borked.
  This is obviously a lot more than than 64*100.

 Sure, because there's about 100K of deliberate slop in the shared memory
 size allocation, and you are probably also testing a scenario where the
 buffer and FSM hash tables haven't ramped to full size yet, so the lock
 table is able to eat more than the nominal amount of space.

  As I see it, this means the user-locks (and perhaps all
  locks...?) eat around ~ 6k bytes memory each.

 They're allocated in groups of 32, which would work out to close to 6k;
 maybe you were measuring the incremental cost of allocating the first one?

 I did some digging, and as far as I can see the only shared memory
 allocations that occur after postmaster startup are for the four shmem
 hash tables: buffers, FSM relations, locks, and proclocks.  Of these,
 the buffer and FSM hashtables have predetermined maximum sizes.  So
 arranging for the space in those tables to be fully preallocated should
 prevent any continuing problems from lock table overflow.  I've
 committed a fix that does this.  I verified that after running the thing
 out of shared memory via creating a lot of user locks and then releasing
 same, I could run the regression tests.


Few questions:

Is that fix in 8.0?

Does this mean that the parameter max_locks_per_transaction isn't honoured
at all, it is just used to size the lock table - which itself can expand
beyond that max limit in various circumstances? (Though with the bug fix,
not THAT much more than the max limit)
Should we rename and redocument the parameter? If that is so, the current
name is so far away from its real meaning as to constitute a bug in
itself

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-29 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Does this mean that the parameter max_locks_per_transaction isn't honoured
 at all, it is just used to size the lock table

Yes, and that's how it's documented.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] shared memory release following failed lock acquirement.

2004-09-28 Thread Merlin Moncure
Tom,

I noticed your recent corrections to lock.c regarding the releasing of
locks in an out of shared memory condition.  This may or may not be
relevant, but when I purposefully use up all the lock space with user
locks, the server runs out of shared memory and stays out until it is
restarted (not when the backend shuts down as it is supposed to).

In other words, after doing a select user_write_lock_oid(t.oid) from
big_table t;

It's server restart time.

What's really interesting about this is that the pg_locks view (after
the offending disconnects) reports nothing out of the ordinary even
though no backends can acquire locks after that point.

Merlin




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-28 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 In other words, after doing a select user_write_lock_oid(t.oid) from
 big_table t;
 It's server restart time.

User locks are not released at transaction failure.  Quitting that
backend should have got you out of it, however.

 What's really interesting about this is that the pg_locks view (after
 the offending disconnects) reports nothing out of the ordinary even
 though no backends can acquire locks after that point.

User locks are not shown in pg_locks, either.

There is a secondary issue here, which is that we don't have provision
to recycle hash table entries back into the general shared memory pool
(mainly because there *is* no shared memory pool, only never-yet-
allocated space).  So when you do release these locks, the freed space
only goes back to the lock hash table's freelist.  That means there
won't be any space for expansion of the buffer hash table, nor any other
shared data structures.  This could lead to problems if you hadn't been
running the server long enough to expand the buffer table to full size.

I don't think it's practical to introduce a real shared memory
allocator, but maybe we could alleviate the worst risks by forcing the
buffer hash table up to full size immediately at startup.  I'll look at
this.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-28 Thread Merlin Moncure
 Merlin Moncure [EMAIL PROTECTED] writes:
  In other words, after doing a select user_write_lock_oid(t.oid) from
  big_table t;
  It's server restart time.
 
 User locks are not released at transaction failure.  Quitting that
 backend should have got you out of it, however.

Right, my point being, it doesn't.
 
  What's really interesting about this is that the pg_locks view
(after
  the offending disconnects) reports nothing out of the ordinary even
  though no backends can acquire locks after that point.
 
 User locks are not shown in pg_locks, either.

Well, actually, they are.  The lock tag values are not shown, but they
do show up as mostly blank entries in the view.  
 
 There is a secondary issue here, which is that we don't have provision
 to recycle hash table entries back into the general shared memory pool
 (mainly because there *is* no shared memory pool, only never-yet-
 allocated space).  So when you do release these locks, the freed space
 only goes back to the lock hash table's freelist.  That means there
 won't be any space for expansion of the buffer hash table, nor any
other
 shared data structures.  This could lead to problems if you hadn't
been
 running the server long enough to expand the buffer table to full
size.

OK, this perhaps explains it.  You are saying then that I am running the
server out of shared memory, not necessarily space in the lock table.  I
jumped to the conclusion that the memory associated with the locks might
not have been getting freed.
 
 I don't think it's practical to introduce a real shared memory
 allocator, but maybe we could alleviate the worst risks by forcing the
 buffer hash table up to full size immediately at startup.  I'll look
at
 this.

This still doesn't fix the problem (albeit a low priority problem,
currently just a contrib. module) of user locks eating up all the space
in the lock table.  There are a couple of different ways to look at
fixing this.  My first thought is to bump up the error level of an out
of lock table space to 'fatal'.

Merlin

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-28 Thread Merlin Moncure
tgl wrote:
 There is a secondary issue here, which is that we don't have provision
 to recycle hash table entries back into the general shared memory pool
 (mainly because there *is* no shared memory pool, only never-yet-
 allocated space).  So when you do release these locks, the freed space
 only goes back to the lock hash table's freelist.  That means there
 won't be any space for expansion of the buffer hash table, nor any
other
 shared data structures.  This could lead to problems if you hadn't
been
 running the server long enough to expand the buffer table to full
size.

Ok, I confirmed that I'm running the server out of shared memory space,
not necessarily the lock table.  My server settings were:
max_connections: 100
shred bufs: 8192 buffers
max_locks: 64 (stock).

According to postgresql.conf, using these settings the lock table eats
64*260*100 bytes =  2M.  Well, if it's running my server out of shared
memory, it's eating much, much more shmem than previously thought.

Also, I was able to acquire around 10k locks before the server borked.
This is obviously a lot more than than 64*100.  However, I set the
max_locks down to 10 and this did affect how many locks could be
acquired (and in this case, a server restart was not required).

Doubling shared buffers to 16k bumped my limit to over 20k locks, but
less than 25k.  As I see it, this means the user-locks (and perhaps all
locks...?) eat around ~ 6k bytes memory each.

This is not really a big deal, 10k locks is way more than a lock heavy
application would be expected to use.  I'll look into this a bit more...

Merlin



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] shared memory release following failed lock acquirement.

2004-09-28 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 According to postgresql.conf, using these settings the lock table eats
 64*260*100 bytes =  2M.  Well, if it's running my server out of shared
 memory, it's eating much, much more shmem than previously thought.

Hmm, the 260 is out of date I think.  I was seeing about 184 bytes/lock
in my tests just now.

 Also, I was able to acquire around 10k locks before the server borked.
 This is obviously a lot more than than 64*100.

Sure, because there's about 100K of deliberate slop in the shared memory
size allocation, and you are probably also testing a scenario where the
buffer and FSM hash tables haven't ramped to full size yet, so the lock
table is able to eat more than the nominal amount of space.

 As I see it, this means the user-locks (and perhaps all
 locks...?) eat around ~ 6k bytes memory each.

They're allocated in groups of 32, which would work out to close to 6k;
maybe you were measuring the incremental cost of allocating the first one?

I did some digging, and as far as I can see the only shared memory
allocations that occur after postmaster startup are for the four shmem
hash tables: buffers, FSM relations, locks, and proclocks.  Of these,
the buffer and FSM hashtables have predetermined maximum sizes.  So
arranging for the space in those tables to be fully preallocated should
prevent any continuing problems from lock table overflow.  I've
committed a fix that does this.  I verified that after running the thing
out of shared memory via creating a lot of user locks and then releasing
same, I could run the regression tests.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match