Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-31 Thread Heikki Linnakangas

Tom Lane wrote:

The only alternative I can see is the one Heikki suggested: don't
truncate clog until the freeze horizon.  That's safe (given the planned
change to WAL-log tuple freezing) and clean and simple, but a permanent
requirement of 250MB+ for pg_clog would put the final nail in the coffin
of PG's usability in small-disk-footprint environments.  So I don't like
it much.  I suppose it could be made more tolerable by reducing the
freeze horizon, say to 100M instead of 1G transactions.  Anyone for a
GUC parameter?  In a high-volume DB you'd want the larger setting to
minimize the amount of tuple freezing work.  OTOH it seems like making
this configurable creates a nasty risk for PITR situations: a slave
that's configured with a smaller freeze window than the master is
probably not safe.


If we go down that route, we really should make it a GUC parameter, and 
reduce the default at least for 8_1_STABLE.


I got another idea. If we make sure that vacuum removes any aborted xid 
older than OldestXmin from the table, we can safely assume that any xid 
 the current clog truncation point we are going to be interested in is 
committed. Vacuum already removes any tuple with an aborted xmin. If we 
also set any aborted xmax (and xvac) to InvalidXid, and WAL logged that, 
we would know that after vacuum commits, any xid  relvacuumxid in the 
vacuumed table was committed, regardless of the hint bits. We could then 
safely truncate the clog without flushing anything. This also seems safe 
for PITR.


The only performance hit would be the clearing of xmax values of aborted 
transactions, but that doesn't seem too bad to me because most 
transactions commit.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] --single-transaction doc clarification

2006-10-31 Thread Jim C. Nasby
On Mon, Oct 30, 2006 at 07:18:04PM -0500, Neil Conway wrote:
 On Mon, 2006-10-30 at 22:56 +, Simon Riggs wrote:
  The man pages for VACUUM, CREATE TABLESPACE, CLUSTER and
  REINDEX DATABASE don't mention they are not allowed inside a transaction
  block at all
 
 That should be fixed, I think. Once that is done, I think it's
 sufficient to just say that --single-transaction won't work for commands
 that can't be executed in a transaction block.

What commands aren't allowed in a transaction? is still a logical
question to ask though, so it would be nice if we had such a list
hanging around.

Is there a standard way these commands test to see if they're in a
transaction block? If there is, perhaps something could be created that
would pull that info out of the code so that we didn't have to maintain
the list by hand. It might also be possible to do this with some SGML
magic.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-31 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 I got another idea. If we make sure that vacuum removes any aborted xid 
 older than OldestXmin from the table, we can safely assume that any xid 
  the current clog truncation point we are going to be interested in is 
 committed. Vacuum already removes any tuple with an aborted xmin. If we 
 also set any aborted xmax (and xvac) to InvalidXid, and WAL logged that, 

The problem with that is all the extra WAL log volume it creates.  I'm
also concerned about the loss of forensic information --- xmax values
are frequently useful for inferring what's been going on in a database.
(This is another reason for not wanting a very short freeze interval BTW.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-31 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Simon Riggs wrote:
 Ouch! We did discuss that also. Flushing the buffercache is nasty with
 very large caches, so this makes autovacuum much less friendly - and
 could take a seriously long time if you enforce the vacuum delay
 costings.

 Hmm, isn't the buffer cache aware of a vacuum operation?

Yeah.  What would probably happen is that we'd dump off most of the
dirtied pages to the kernel, which would likely still have a lot of them
in kernel buffers pending write.  But then we'd have to fsync the table
--- so a physical write storm would ensue, which we have no way to
throttle.

I think the don't-truncate-clog approach is a much better answer.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-31 Thread Tom Lane
It seems that we're converging on the conclusion that not truncating
clog early is the least bad alternative.  This has the advantage of
making things a lot simpler --- we won't need to track minxid at all.
Allow me to summarize what I think has to happen:

* VACUUM will determine a freeze cutoff XID the same way it does now,
except that instead of using a hard-wired freeze window of 1G
transactions, we'll either reduce the window to (say) 100M transactions
or provide a GUC variable that can be adjusted over some reasonable
range.

* All XIDs present in the table that are older than the cutoff XID will
be replaced by FrozenXid or InvalidXid as required, and such actions
will be WAL-logged.  (I think we need to consider all 3 of xmin, xmax,
and xvac here.)

* On successful completion, the cutoff XID is stored in
pg_class.relvacuumxid, and pg_database.datvacuumxid is updated
if appropriate.  (The minxid columns are now useless, but unless there
is another reason to force initdb before 8.2, I'm inclined to leave them
there and unused.  We can remove 'em in 8.3.)

* pg_clog is truncated according to the oldest pg_database.datvacuumxid.
We should WAL-log this action, because replaying such an entry will
allow a PITR slave to truncate its own clog and thereby avoid wraparound
conflicts.  Note that we no longer need a checkpoint before truncating
--- what we need is XLogFlush, instead.  (WAL before data)

These changes get us back into the regime where the hint bits truly are
hints, because the underlying pg_clog data is still there, both in a
master database and in a PITR slave.  So we don't need to worry about
WAL-logging hint bits.  We also avoid needing any flushes/fsyncs or
extra checkpoints.  The added WAL volume should be pretty minimal,
because only tuples that have gone untouched for a long time incur extra
work.  The added storage space for pg_clog could be annoying for a small
database, but reducing the freeze window ameliorates that objection.

Comments?  Anyone see any remaining holes?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-31 Thread Simon Riggs
On Mon, 2006-10-30 at 20:40 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  That was understood; in the above example I agree you need to flush. If
  you don't pass a truncation point, you don't need to flush whether or
  not you actually truncate. So we don't need to flush *every* time,
 
 OK, but does that actually do much of anything for your performance
 complaint?  Just after GlobalXmin has passed a truncation point, *every*
 vacuum the system does will start performing a flush-n-fsync, which
 seems like exactly what you didn't like.  If the syncs were spread out
 in time for different rels then maybe this idea would help, but AFAICS
 they won't be.

Makes sense, so we shouldn't do it that way after all.

Are you OK with the other patches I've submitted? My understanding was
that you're gonna have a look at those and this general area? I don't
want to hold up the release because of a PITR patch.

Feedback welcome ;-)

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-31 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 The added WAL volume should be pretty minimal, because only tuples that have
 gone untouched for a long time incur extra work.

That seems like a weak point in the logic. It seems like it would make VACUUM
which is already an i/o hog even more so. Perhaps something clever can be done
with vacuum_cost_delay and commit_siblings.

Something like inserting the delay between WAL logging and syncing the log and
writing to the heap. So if another transaction commits in the meantime we can
skip the extra fsync and continue.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-31 Thread Alvaro Herrera
Gregory Stark wrote:
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  The added WAL volume should be pretty minimal, because only tuples that have
  gone untouched for a long time incur extra work.
 
 That seems like a weak point in the logic. It seems like it would make VACUUM
 which is already an i/o hog even more so. Perhaps something clever can be done
 with vacuum_cost_delay and commit_siblings.
 
 Something like inserting the delay between WAL logging and syncing the log and
 writing to the heap. So if another transaction commits in the meantime we can
 skip the extra fsync and continue.

Huh, but the log would not be flushed for each operation that the vacuum
logs.  Only when it's going to commit.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PATCHES] Alternative patches for the btree deletion issue

2006-10-31 Thread Tom Lane
I've worked up two alternative patches for the btree deletion bug
discussed here:
http://archives.postgresql.org/pgsql-hackers/2006-10/msg01373.php

The first one doesn't try to do anything about the underlying problem of
index keys becoming out-of-order; it just hacks up _bt_pagedel() to be
able to recover from the failed to re-find parent key condition.

The second patch instead attacks the underlying problem, by guaranteeing
that childless half-dead parent pages will be deleted before we allow
any further insertions into the transferred key space.  This requires
two nontrivial changes: (1) a precheck step to refuse deletion of a page
if it would leave a childless but not immediately deletable parent at
any upper tree level; (2) additions to the WAL recovery code to complete
an incomplete series of deletions.

While I think the second patch is logically cleaner, it's certainly a
lot bigger and riskier.  And as far as we know, the keys-out-of-order
condition does not have any other consequences that would justify this
much work to prevent it.

I am thinking of applying the bigger patch to HEAD (8.2) and using the
smaller patch for 7.4-8.1 branches.  The bigger patch adds a new WAL
record type, so if we applied it to the back branches we'd be creating
an incompatibility, eg 8.1.6 WAL wouldn't load into an 8.1.5 postmaster.
I'm disinclined to do that when we don't know that it's fixing any real
bug.  But one could argue that we should use the smaller patch for 8.2
as well, and hold the bigger patch for 8.3 ... or even not use it at all
in the absence of any demonstrated bug.

Thoughts?

BTW, here's a reproducer for the problem, on machines with MAXALIGN 8.
Changing the constants a little would probably make it fail on MAXALIGN 4
too, but I haven't bothered trying.

create table foo(f1 int, f2 text);
insert into foo select x, repeat('xyzzy',100) from generate_series(1,1) x;
create index fooi on foo(f1,f2);
delete from foo where f1 between 3000 and 3150;
vacuum foo;
insert into foo select 3010, repeat('xyzzy',100) from generate_series(1,2000) x;
vacuum foo;
delete from foo where f1  3000;
vacuum foo;


regards, tom lane



bint6hcEYurWv.bin
Description: btree-del-1.patch.gz


binMxeH0zDJ0Q.bin
Description: btree-del-2.patch.gz

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Shared Memory Hooks Documentation (was Re: [PATCHES] New shared memory hooks proposal)

2006-10-31 Thread Marc Munro
On Sun, 2006-10-15 at 16:36 -0400, Tom Lane wrote:

 I'm not entirely sure where to put it, either.  Subsection 32.9.1 (Dynamic
 Loading) is pretty basic stuff for first-time C-function authors ---
 this seems off-topic for that section.  Maybe a new subsection down near
 the end of 32.9?  Another possibility is somewhere in the Internals
 chapters, although I don't offhand see an obvious place there either.
 
 
This patch, against xfunc.sgml, adds a new subsection 33.9.12, Shared
Memory and LWLocks in C-Language Functions, describing how shared memory
and lwlocks may be requested by C add-in functions.

I have, btw, verified that this works in the forthcoming release of
Veil.

__
Marc

*** xfunc.sgml	2006-10-31 12:52:01.0 -0800
--- xfunc.sgml.new	2006-10-31 12:51:51.0 -0800
***
*** 2909,2912 
--- 2909,2960 
  /programlisting
  /para
 /sect2
+sect2
+ titleShared Memory and LWLocks in C-Language Functions/title
+ 
+ para
+  Add-ins may reserve LWLocks and an allocation of shared memory on server
+  startup.  The add-in's shared library must be preloaded, by specifying
+  it in 
+  xref linkend=guc-shared-preload-librariesindextermprimaryshared-preload-libraries//,
+  and the shared memory must be reserved by calling:
+ programlisting
+ void RequestAddinShmemSpace(int size)
+ /programlisting
+  from your literal_PG_init/ function.
+ /para
+ para
+   LWLocks are reserved by calling:
+ programlisting
+ void RequestAddinLWLocks(int n)
+ /programlisting
+   from literal_PG_init/. 
+ /para	
+ para
+   To avoid possible race-conditions, each backend should use the LWLock
+   literalAddinShmemInitLock/ when connecting to and intializing 
+   its allocation of shared memory, as shown here:
+   
+ programlisting
+ static mystruct *ptr = NULL;
+ 
+ if (!ptr)
+ {
+ boolfound;
+ 
+ LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+ ptr = ShmemInitStruct(my struct name, size, amp;found);
+ if (!ptr)
+ elog(ERROR, out of shared memory);
+ if (!found)
+ {
+ initialize contents of shmem area;
+ acquire any requested LWLocks using:
+ ptr-mylockid = LWLockAssign();
+ }
+ LWLockRelease(AddinShmemInitLock);
+ }
+ /programlisting
+ /para
+/sect2
/sect1


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [PATCHES] WAL logging freezing

2006-10-31 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Huh, but the log would not be flushed for each operation that the vacuum
 logs.  Only when it's going to commit.

It strikes me that the vacuum cost delay feature omits to consider
generation of WAL records as a cost factor.  It may not be a big problem
though, as long as we can limit the number of records created to one or
two per page --- then you can see it as just a component of the dirtied
a page cost.  If we made a separate WAL record for each tuple then it
could be important to account for.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster