Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-17 Thread Fujii Masao
On Thu, Mar 17, 2011 at 2:35 AM, Robert Haas robertmh...@gmail.com wrote:
 1. If a die interrupt is received (pg_terminate_backend or fast
 shutdown), then terminate the sync rep wait and arrange for the
 connection to be closed without acknowledging the commit (but do send
 a warning message back).  The commit still happened, though, so other
 transactions will see its effects.  This is unavoidable unless we're
 willing to either ignore attempts to terminate a backend waiting for
 sync rep, or panic the system when it happens, and I don't think
 either of those is appropriate.

OK.

 2. If a query cancel interrupt is received (pg_cancel_backend or ^C),
 then cancel the sync rep wait and issue a warning before acknowledging
 the commit.  Again, the alternative is to either ignore the cancel or
 panic, neither of which I believe to be what users will want.

OK.

 3. If synchronous_standby_names is changed to '' by editing
 postgresql.conf and issuing pg_ctl reload, then cancel all waits in
 progress and wake everybody up.  As I mentioned before, reloading the
 config file from within the waiting backend (which can't safely throw
 an error) seems risky,

AFAIR, ProcessConfigFile() doesn't throw an error. So I don't think
that's so risky. But, as you said in another thread, reading config file
at that point is inconsistent, I agree. And it seems better to leave
background process to wake up backends.

 so what I did instead is made WAL writer
 responsible for handling this.  Nobody's allowed to wait for sync rep
 unless a global shared memory flag is set, and the WAL writer process
 is responsible for setting and clearing this flag when the config file
 is reloaded.  This has basically no performance cost; WAL writer only
 ever does any extra work at all with this code when it receives a
 SIGHUP, and even then the work is trivial except in the case where
 synchronous_standby_names has changed from empty to non-empty or visca
 versa.  The advantage of putting this in WAL writer rather than, say,
 bgwriter is that WAL writer doesn't have nearly as many jobs to do and
 they don't involve nearly as much I/O, so the chances of a long delay
 due to the process being busy are much less.

This occurs to me; we should ensure that, in shutdown case, walwriter
should exit after all the backends have gone out? I'm not sure if it's worth
thinking of the case, but what if synchronous_standby_names is unset
and config file is reloaded after smart shutdown is requested? In this
case, the reload cannot wake up the waiting backends since walwriter
has already exited. This behavior looks a bit inconsistent.

 4. Remove the SYNC_REP_MUST_DISCONNECT state, which actually does
 absolutely nothing right now, despite what the name would seem to
 imply.  In particular, it doesn't arrange for any sort of disconnect.
 This patch does arrange for that, but not using this mechanism.

OK.

 5. The existing code relies on being able to read MyProc-syncRepState
 without holding the lock, even while a WAL sender must be updating it
 in another process.  I'm not 100% sure this is safe on a
 multi-processor machine with weak memory ordering.  In practice, the
 chances of something going wrong here seem extremely small.  You'd
 need something like this: a WAL sender updates MyProc-syncRepState
 just after the wait timeout expires and before the latch is reset, but
 the regular backend fails to see the state change due to
 memory-ordering effects and drops through the loop, waiting another 60
 s, and then finally wakes up and completes the wait (but a minute
 later than expected).  That seems vanishingly unlikely but it's also
 simple to protect against, so I did.

In the patch, in order to read the latest value, you take a light-weight lock.
But I wonder why taking a lock can ensure that the value is up-to-date.

 Review appreciated.

Thanks! Here are some comments:

+ * WAL writer calls this as needed to update the shared sync_standbys_needed

Typo: s/sync_standbys_needed/sync_standbys_defined

+ * we exit normally, or SYNC_REP_MUST_DISCONNECT in abnormal cases.

Typo: the reference to SYNC_REP_MUST_DISCONNECT is not required.

+* So in this case we issue a NOTICE (which some clients may

Typo: s/NOTICE/WARNING

+   if (ProcDiePending)
+   {
+   ereport(WARNING,
+   (errcode(ERRCODE_ADMIN_SHUTDOWN),
+errmsg(canceling the wait for 
replication and terminating
connection due to administrator command),
+errdetail(The transaction has already 
been committed locally
but might have not been replicated to the standby.)));
+   whereToSendOutput = DestNone;
+   LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+   MyProc-syncRepState = SYNC_REP_WAIT_COMPLETE;
+   SHMQueueDelete((MyProc-syncRepLinks));


Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-17 Thread Fujii Masao
On Wed, Mar 16, 2011 at 11:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 How should recovery work when pause_at_recovery_target is
 enabled but hot standby is disabled? We have three choices:

 1. Forbit those settings, i.e., throw FATAL error. Tom dislikes this
     idea.

 No, I didn't say that.  I said not to write elog(FATAL).

Oh, sorry.

  If the
 combination is nonsensical then it's fine to forbid it, but you don't
 need FATAL for that.  In particular, attempting to change to a
 disallowed setting after system startup should not result in crashing
 the postmaster.  And it won't, if you just use the normal error level
 for complaining about an invalid GUC setting.

Sorry, I've not been able to understand the point well yet. We should
just use elog(ERROR) instead? But since ERROR in startup process
is treated as FATAL, I'm not sure whether it's worth using ERROR
instead. Or you meant another things?

Only startup process is able to notice that nonsensical settings since
pause_at_recovery_target is a recovery.conf parameter. So I'm not
sure there is another way to forbid that other than elog(ERROR) and
elog(FATAL).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] volatile markings to silence compilers

2011-03-17 Thread Martijn van Oosterhout
On Thu, Mar 17, 2011 at 12:36:59AM -0400, Bruce Momjian wrote:
 Looking over the release notes, we have added a few 'volatile' storage
 specifications to variables which are near longjump/TRY blocks to
 silence compilers.  I am worried that these specifications don't clearly
 identify their purpose.  Can we rename these to use a macro for
 'volatile' that will make their purpose clearer and perhaps their
 removal one day easier?

The question is, are they wrong? The longjmp manpage says:

   The values of automatic variables are unspecified after a call
   to longjmp() if they meet all the following criteria:

   ·  they are local to the function that made the corresponding
  setjmp(3) call;

   ·  their values are changed between the calls to setjmp(3) and
  longjmp(); and

   ·  they are not declared as volatile.

It appears the issue is mostly that the compiler is unable to prove
that the variables aren't changed. It's hard because the buffer created
by setjmp() doesn't expire. We know that after PG_END_TRY() the buffer
won't be used, but apparently the compiler doesn't.

My point is, are we hopeful this problem will ever go away?

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] really lazy vacuums?

2011-03-17 Thread Jesper Krogh

Robert Haas wrote:

Right.  Really-lazy vacuum could freeze tuples.  Unlike regular
vacuum, it can also sensibly be done incrementally.  One thing I was
thinking about is counting the number of times that we fetched a tuple
that was older than RecentGlobalXmin and had a committed xmin and an
invalid xmax, but where the page was not PD_ALL_VISIBLE.  If that's
happening a lot, it probably means that some vacuuming would speed
things up, by getting those PD_ALL_VISIBLE bits set.  Perhaps you
could work out some formula where you do a variable amount of
super-lazy vacuuming depending on the number of such tuple fetches.
The trick would be to avoid overdoing it (so that you swamp the I/O
system) or underdoing it (so that the system never converges).  It
would be really nice (for this and for other things) if we had some
way of measuring the I/O saturation of the system, so that we could
automatically adjust the aggressiveness of background processes
accordingly.

Note also that if and when we get index-only scans, making sure the
PD_ALL_VISIBLE bits (and thus the visibility map bits) actually get
set is going to be a lot more important.
  


Is it obvious that the visibillity map bits should track complete
pages and not individual tuples? If the visibillity map tracks at
page-level the benefit would fall on slim tables where you squeeze
200 tuples into each page and having an update rate of 1% would
lower the likelyhood even more. (it may be that for slim tables the
index-only-scans are not as benefitial as to wide tables).

In collaboration with a vacuuming discussion, I dont know if it
is there allready but how about opportunistic vacuuming. Say
you have a page what due to changes in one of the tuples are
being written out, will it, while being written out anyway get the
other tuples on the page vacuumed?

It actually dont have to hook into the process directly to benefit
the IO-usage, if it just can get the opportunity to do it before
the page gets evicted from the OS-cache, then it would save a
second read on that page, but it seems way harder to do something
sane around that assumption.

Really lazy vacuums would only benefit really static tables ?  where
vacuuming is not that big a problem in the first place.


--
Jesper - Demonstrating totally lack of insight I would assume.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-17 Thread Heikki Linnakangas

On 16.03.2011 19:35, Robert Haas wrote:

3. If synchronous_standby_names is changed to '' by editing
postgresql.conf and issuing pg_ctl reload, then cancel all waits in
progress and wake everybody up.  As I mentioned before, reloading the
config file from within the waiting backend (which can't safely throw
an error) seems risky, so what I did instead is made WAL writer
responsible for handling this.  Nobody's allowed to wait for sync rep
unless a global shared memory flag is set, and the WAL writer process
is responsible for setting and clearing this flag when the config file
is reloaded.  This has basically no performance cost; WAL writer only
ever does any extra work at all with this code when it receives a
SIGHUP, and even then the work is trivial except in the case where
synchronous_standby_names has changed from empty to non-empty or visca
versa.  The advantage of putting this in WAL writer rather than, say,
bgwriter is that WAL writer doesn't have nearly as many jobs to do and
they don't involve nearly as much I/O, so the chances of a long delay
due to the process being busy are much less.


Hmm, so setting synchronous_standby_names to '' takes effect 
immediately, but other changes to it don't apply to already-blocked 
commits. That seems a bit inconsistent. Perhaps walwriter should store 
the parsed list of standby-names in shared memory, not just a boolean.


+1 otherwise.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] really lazy vacuums?

2011-03-17 Thread Cédric Villemain
2011/3/17 Robert Haas robertmh...@gmail.com:
 On Wed, Mar 16, 2011 at 6:36 PM, Jim Nasby j...@nasby.net wrote:
 One way to look at this is that any system will have a limit on how quickly 
 it can vacuum everything. If it's having trouble dedicating enough IO to 
 vacuum, then autovac is going to have a long list of tables that it wants to 
 vacuum. When you're in that situation, you want to get to the next table 
 that needs vacuuming as quickly as possible, so if you've run through the 
 first heap scan and found only a limited number of dead tuples, it doesn't 
 make sense to spend a bunch of time scanning indexes and making a second 
 heap scan (though, IIRC the second scan doesn't hit the entire heap; it only 
 hits the tuples that were remembered as being dead).

 I mostly agree with this, but you also can't postpone vacuuming
 indefinitely just because you're too busy; that's going to blow up in
 your face.

 Of course, going along the lines of an autovac-based tuning mechanism, you 
 have to question how a table would show up for autovac if there's not 
 actually a number of dead tuples. One scenario is freezing (though I'm not 
 sure if your super-lazy vacuum could freeze tuples or not). Another is 
 inserts. That might become a big win; you might want to aggressively scan a 
 table that gets data loaded into it in order to set hint/all visible bits.

 Right.  Really-lazy vacuum could freeze tuples.  Unlike regular
 vacuum, it can also sensibly be done incrementally.  One thing I was
 thinking about is counting the number of times that we fetched a tuple
 that was older than RecentGlobalXmin and had a committed xmin and an
 invalid xmax, but where the page was not PD_ALL_VISIBLE.  If that's
 happening a lot, it probably means that some vacuuming would speed
 things up, by getting those PD_ALL_VISIBLE bits set.  Perhaps you
 could work out some formula where you do a variable amount of
 super-lazy vacuuming depending on the number of such tuple fetches.
 The trick would be to avoid overdoing it (so that you swamp the I/O
 system) or underdoing it (so that the system never converges).  It
 would be really nice (for this and for other things) if we had some
 way of measuring the I/O saturation of the system, so that we could
 automatically adjust the aggressiveness of background processes


Yes. I am thinking of something like that (the IO saturation
measurement) to let the background writer try to work on hint bit when
it does not have so much to do, if IO ressources are ok.


 Note also that if and when we get index-only scans, making sure the
 PD_ALL_VISIBLE bits (and thus the visibility map bits) actually get
 set is going to be a lot more important.

 From a manual standpoint, ISTM that super-lazy vac would be extremely useful 
 for dealing with hint bits after a bulk insert to a table that also has some 
 update activity. Using a regular vacuum in that case would result in a lot 
 of extra work to deal with the small number of dead tuples.

 I can see that.

 Perhaps it would be useful to write a script that analyzed the output of 
 vacuum verbose looking for tables where a super-lazy vacuum would have made 
 sense (assuming vacuum verbose provides the needed info). If we had such a 
 script we could ask folks to run it and see how much super-lazy vacuuming 
 would help in the real world.

 I'm a bit doubtful about this part.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 2:08 AM, Fujii Masao masao.fu...@gmail.com wrote:
 This occurs to me; we should ensure that, in shutdown case, walwriter
 should exit after all the backends have gone out? I'm not sure if it's worth
 thinking of the case, but what if synchronous_standby_names is unset
 and config file is reloaded after smart shutdown is requested? In this
 case, the reload cannot wake up the waiting backends since walwriter
 has already exited. This behavior looks a bit inconsistent.

I agree we need to fix smart shutdown.  I think that's going to have
to be a separate patch, however; it's got more problems than this
patch can fix without expanding into a monster.

 In the patch, in order to read the latest value, you take a light-weight lock.
 But I wonder why taking a lock can ensure that the value is up-to-date.

A lock acquisition acts as a memory sequence point.

 + * WAL writer calls this as needed to update the shared sync_standbys_needed

 Typo: s/sync_standbys_needed/sync_standbys_defined

Fixed.

 + * we exit normally, or SYNC_REP_MUST_DISCONNECT in abnormal cases.

 Typo: the reference to SYNC_REP_MUST_DISCONNECT is not required.

Fixed.

 +                * So in this case we issue a NOTICE (which some clients may

 Typo: s/NOTICE/WARNING

Fixed.

 +               if (ProcDiePending)
 +               {
 +                       ereport(WARNING,
 +                                       (errcode(ERRCODE_ADMIN_SHUTDOWN),
 +                                        errmsg(canceling the wait for 
 replication and terminating
 connection due to administrator command),
 +                                        errdetail(The transaction has 
 already been committed locally
 but might have not been replicated to the standby.)));
 +                       whereToSendOutput = DestNone;
 +                       LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 +                       MyProc-syncRepState = SYNC_REP_WAIT_COMPLETE;
 +                       SHMQueueDelete((MyProc-syncRepLinks));

 SHMQueueIsDetached() should be checked before calling SHMQueueDelete().
 Walsender can already delete the backend from the queue before reaching here.

Fixed.  But come to think of it, doesn't this mean
SyncRepCleanupAtProcExit() needs to repeat the test after acquiring
the lock?

 +               if (QueryCancelPending)
 +               {
 +                       QueryCancelPending = false;
 +                       ereport(WARNING,
 +                                       (errmsg(canceling wait for 
 synchronous replication due to user request),
 +                                        errdetail(The transaction has 
 committed locally, but may not
 have replicated to the standby.)));
 +                       LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 +                       MyProc-syncRepState = SYNC_REP_WAIT_COMPLETE;
 +                       SHMQueueDelete((MyProc-syncRepLinks));
 +                       LWLockRelease(SyncRepLock);

 Same as above.

Fixed.

 +               if (!PostmasterIsAlive(true))
 +               {
 +                       whereToSendOutput = DestNone;
 +                       proc_exit(1);

 proc_exit() should not be called at that point because it leads PANIC.

Fixed, although I'm not sure it matters.

 I think that it's better to check ProcDiePending, QueryCancelPending
 and PostmasterIsAlive *before* waiting on the latch, not after. Because
 those events can occur before reaching there, and it's not worth waiting
 for 60 seconds to detect them.

Not necessary.  Inspired by one of your earlier patches, I made die()
and StatementCancelHandler() set the latch.  We could still wait up to
60 s to detect postmaster death, but that's a very rare situation and
it's not worth slowing down the common case for it, especially since
there is a race condition no matter what.

Thanks for the review!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


sync-rep-wait-fixes-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 8:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Hmm, so setting synchronous_standby_names to '' takes effect immediately,
 but other changes to it don't apply to already-blocked commits. That seems a
 bit inconsistent. Perhaps walwriter should store the parsed list of
 standby-names in shared memory, not just a boolean.

I don't think this is necessary.  In general, the current or potential
WAL sender processes are responsible for working out among themselves
whose job it is to release waiters, and doing it.  As long as
synchronous_standby_names is non-empty, then either (1) there are one
or more standbys connected that can take on the role of synchronous
standby, and whoever does will release waiters or (2) there are no
standbys connected that can take on the role of synchronous standbys,
in which case no waiters should be released until one connects.  But
when synchronous_standby_names becomes completely empty, that doesn't
mean wait until a standby connects whose application name is in the
empty set and make him the synchronous standby but rather
synchronous replication is administratively disabled, don't wait in
the first place.  So we just need a Boolean flag.

 +1 otherwise.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Alvaro Herrera
Excerpts from Piyush Newe's message of jue mar 17 02:30:06 -0300 2011:
 Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
 Advanced Server.
 Thanks Burce for clarification.
 
 For the 1-digit, 2-digit  3-digit Year inputs, as I said, I didn't see any
 document in PG which will explain what would be the century considered if it
 is not given. If I missed out it somewhere please let me know.

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges.  Some of it has been fixed,
but a lot of bugs remain.  Looks like this is one of those places and it
seems appropriate to spend some time fixing it.  Since it would involve
a behavior change, it should only go to 9.2, of course.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] really lazy vacuums?

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 4:17 AM, Jesper Krogh jes...@krogh.cc wrote:
 Is it obvious that the visibillity map bits should track complete
 pages and not individual tuples? If the visibillity map tracks at
 page-level the benefit would fall on slim tables where you squeeze
 200 tuples into each page and having an update rate of 1% would
 lower the likelyhood even more. (it may be that for slim tables the
 index-only-scans are not as benefitial as to wide tables).

I'm not sure exactly what MaxHeapTuplesPerPage works out to be, but
say it's 200.  If you track visibility info per tuple rather than per
page, then the size of the visibility map is going to expand by a
factor of 200.  That might decrease contention, but otherwise it's a
bad thing - the whole point of having the visibility map in the first
place is that it's much, much smaller than the heap.  If it were the
same size as the heap, we could just read the heap.  What the map
attempts to accomplish is to allow us, by reading a small number of
pages, to check whether the tuples we're thinking of reading are
likely to be all-visible without actually looking at them.

 In collaboration with a vacuuming discussion, I dont know if it
 is there allready but how about opportunistic vacuuming. Say
 you have a page what due to changes in one of the tuples are
 being written out, will it, while being written out anyway get the
 other tuples on the page vacuumed?

The really lazy kind of vacuuming I'm talking about could be done this
way.  Regular vacuuming cannot, because you can't actually prune a
dead tuple until you've scanned all the indexes for references to that
CTID.  The obvious place to do this would be the background writer: if
the page is dirty anyway and we're about to evict it, we could decide
to (1) set hint bits, (2) set visibility map bits, (3) freeze tuples
that need freezing, and (4) identify dead tuples and reclaim the space
they use, but not the associated line pointer.

Sadly, I'm not sure this would help much.  If we have, say, a 4MB
relation, you're not even going to notice it when vacuum comes along
and does its thing.  Even a vacuum freeze is chump change.  The
problem is with big relations, like say 1GB+.  Processing the whole
table at once can have a material impact on system performance, so
it'd be nice to do some work incrementally.  But it's likely that
doing it opportunistically as you evict things from shared buffers is
only going to help here and there.  Even if you optimistically assumed
that we could opportunistically do 10% of the vacuuming that way,
that's still not much of a dent.  And I think it'd probably be less
than that in most real-world cases.  A further problem is that the
background writer is already a pretty busy process, and giving it more
things to do isn't very appealing from the standpoint of overall
system performance.

 It actually dont have to hook into the process directly to benefit
 the IO-usage, if it just can get the opportunity to do it before
 the page gets evicted from the OS-cache, then it would save a
 second read on that page, but it seems way harder to do something
 sane around that assumption.

Yeah.

 Really lazy vacuums would only benefit really static tables ?  where
 vacuuming is not that big a problem in the first place.

I think the benefit would be tables that are either quite large (where
the ability to do this incrementally would be an advantage over
regular VACUUM) or insert-only (where we currently have no way to get
PD_ALL_VISIBLE bits set without user intervention).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] volatile markings to silence compilers

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 12:36 AM, Bruce Momjian br...@momjian.us wrote:
 Looking over the release notes, we have added a few 'volatile' storage
 specifications to variables which are near longjump/TRY blocks to
 silence compilers.  I am worried that these specifications don't clearly
 identify their purpose.  Can we rename these to use a macro for
 'volatile' that will make their purpose clearer and perhaps their
 removal one day easier?

I don't particularly see the point of this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Piyush Newe's message of jue mar 17 02:30:06 -0300 2011:
 Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
 Advanced Server.
 Thanks Burce for clarification.

 For the 1-digit, 2-digit  3-digit Year inputs, as I said, I didn't see any
 document in PG which will explain what would be the century considered if it
 is not given. If I missed out it somewhere please let me know.

 Keep in mind that the datetime stuff was abandoned by the maintainer
 some years ago with quite some rough edges.  Some of it has been fixed,
 but a lot of bugs remain.  Looks like this is one of those places and it
 seems appropriate to spend some time fixing it.  Since it would involve
 a behavior change, it should only go to 9.2, of course.

I wouldn't object to fixing the problem with # of digits  # of Ys in
9.1, if the fix is simple and clear-cut.  I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday.  It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Keep in mind that the datetime stuff was abandoned by the maintainer
 some years ago with quite some rough edges.  Some of it has been fixed,
 but a lot of bugs remain.  Looks like this is one of those places and it
 seems appropriate to spend some time fixing it.  Since it would involve
 a behavior change, it should only go to 9.2, of course.

 I wouldn't object to fixing the problem with # of digits  # of Ys in
 9.1, if the fix is simple and clear-cut.  I think we are still
 accepting patches to make minor tweaks, like the tab-completion patch
 I committed yesterday.  It also doesn't bother me tremendously if we
 push it off, but I don't think that anyone's going to be too sad if
 TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
 sensible than 3010-01-01.

Agreed, it's certainly not too late for bug fixes in 9.1.  I agree
that this isn't something we would want to tweak in released branches,
but 9.1 isn't there yet.

Having said that, it's not entirely clear to me what sane behavior is
here.  Personally I would expect that an n-Ys format spec would consume
at most n digits from the input.  Otherwise how are you going to use
to_date to pick apart strings that don't have any separators?  So
I think the problem is actually upstream of the behavior complained of
here.  However, what we should first do is see what Oracle does in such
cases, because the main driving factor for these functions is Oracle
compatibility not what might seem sane in a vacuum.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Alvaro Herrera
Excerpts from Robert Haas's message of jue mar 17 11:09:56 -0300 2011:
 On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

  Keep in mind that the datetime stuff was abandoned by the maintainer
  some years ago with quite some rough edges.  Some of it has been fixed,
  but a lot of bugs remain.  Looks like this is one of those places and it
  seems appropriate to spend some time fixing it.  Since it would involve
  a behavior change, it should only go to 9.2, of course.
 
 I wouldn't object to fixing the problem with # of digits  # of Ys in
 9.1, if the fix is simple and clear-cut.  I think we are still
 accepting patches to make minor tweaks, like the tab-completion patch
 I committed yesterday.  It also doesn't bother me tremendously if we
 push it off, but I don't think that anyone's going to be too sad if
 TO_DATE('01-jan-2010',  'DD-MON-YYY') starts returning something more
 sensible than 3010-01-01.

If it can be delivered quickly and it is simple, sure.  But anything
more involved should respect the release schedule.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rectifying wrong Date outputs

2011-03-17 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 what we should first do is see what Oracle does in such cases,
 because the main driving factor for these functions is Oracle
 compatibility not what might seem sane in a vacuum.
 
+1
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] volatile markings to silence compilers

2011-03-17 Thread Tom Lane
Martijn van Oosterhout klep...@svana.org writes:
 It appears the issue is mostly that the compiler is unable to prove
 that the variables aren't changed.

IME, older versions of gcc will warn about any variable that's assigned
more than once, even if those assignments are before the setjmp call.
Presumably this is stricter than necessary, but I don't know enough
details of gcc's register usage to be sure.

 My point is, are we hopeful this problem will ever go away?

Since we're trying to silence the warning in existing and even obsolete
compilers, whether it gets improved in future compilers is not terribly
relevant.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] I am confused after reading codes of PostgreSQL three week

2011-03-17 Thread hom
Hi,

  I try to known how a database is implemented and I have been reading
PG source codes for a month.

Now, I only know a little about how PG work.  :(

I just know PG work like this but I don't know why PG work like this.  :(  :(

even worse, I feel I can better understand the source code. it may be
that I could't split the large module into small piece which may help
to understand.

Is there any article or some way could help understand the source code ?

Thanks for help ~

-- 
Best Wishes!

                                     hom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] BUG #5842: Memory leak in PL/Python when taking slices of results

2011-03-17 Thread Marko Kreen
On Thu, Mar 17, 2011 at 4:40 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Mar 11, 2011 at 6:02 AM, Bruce Momjian br...@momjian.us wrote:
 What has been done with this report/fix?

 AFAIK, nothing.  Added to 9.1 open items list.

The patch seems to do the right thing.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] BUG #5842: Memory leak in PL/Python when taking slices of results

2011-03-17 Thread Alvaro Herrera
Excerpts from Marko Kreen's message of jue mar 17 12:01:13 -0300 2011:
 On Thu, Mar 17, 2011 at 4:40 AM, Robert Haas robertmh...@gmail.com wrote:
  On Fri, Mar 11, 2011 at 6:02 AM, Bruce Momjian br...@momjian.us wrote:
  What has been done with this report/fix?
 
  AFAIK, nothing.  Added to 9.1 open items list.
 
 The patch seems to do the right thing.

Looking into this.  AFAICT this needs to be backported.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] I am confused after reading codes of PostgreSQL three week

2011-03-17 Thread Bruce Momjian
hom wrote:
 Hi,
 
   I try to known how a database is implemented and I have been reading
 PG source codes for a month.
 
 Now, I only know a little about how PG work.  :(
 
 I just know PG work like this but I don't know why PG work like this.  :(  :(
 
 even worse, I feel I can better understand the source code. it may be
 that I could't split the large module into small piece which may help
 to understand.
 
 Is there any article or some way could help understand the source code ?

I assume you have looked at these places:

http://wiki.postgresql.org/wiki/Developer_FAQ
http://www.postgresql.org/developer/coding

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] I am confused after reading codes of PostgreSQL three week

2011-03-17 Thread Kevin Grittner
hom obsidian...@gmail.com wrote:
 
 I try to known how a database is implemented and I have been
 reading PG source codes for a month.
 
That's ambitious.
 
find -name '*.h' -or -name '*.c' \
  | egrep -v '^\./src/test/.+/tmp_check/' \
  | xargs cat | wc -l
1059144
 
Depending on how you do the math, that's about 50,000 lines of code
per day to get through it in the time you mention.
 
 Is there any article or some way could help understand the source
 code ?
 
Your best bet would be to follow links from the Developers tab on
the main PostgreSQL web site:
 
http://www.postgresql.org/developer/
 
In particular the Developer FAQ page:
 
http://wiki.postgresql.org/wiki/Developer_FAQ
 
And the Coding links:
 
http://www.postgresql.org/developer/coding
 
may help.
 
Before reading code in a directory, be sure to read any README
file(s) in that directory carefully.
 
It helps to read this list.
 
In spite of reviewing all of that myself, it was rather intimidating
when I went to work on a major patch 14 months ago.  Robert Haas
offered some good advice which served me well in that effort --
divide the effort in to a series of incremental steps, each of which
deals with a small enough portion of the code to get your head
around.  As you work in any one narrow area, it becomes increasingly
clear; with that as a base you can expand your scope.
 
When you're working in the code, it is tremendously helpful to use
an editor with ctags support (or similar IDE functionality).
 
I hope this is helpful.  Good luck.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 2:47 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Mar 16, 2011 at 11:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 How should recovery work when pause_at_recovery_target is
 enabled but hot standby is disabled? We have three choices:

 1. Forbit those settings, i.e., throw FATAL error. Tom dislikes this
     idea.

 No, I didn't say that.  I said not to write elog(FATAL).

 Oh, sorry.

  If the
 combination is nonsensical then it's fine to forbid it, but you don't
 need FATAL for that.  In particular, attempting to change to a
 disallowed setting after system startup should not result in crashing
 the postmaster.  And it won't, if you just use the normal error level
 for complaining about an invalid GUC setting.

 Sorry, I've not been able to understand the point well yet. We should
 just use elog(ERROR) instead? But since ERROR in startup process
 is treated as FATAL, I'm not sure whether it's worth using ERROR
 instead. Or you meant another things?

Yeah, I think he's saying that an ERROR in the startup process is
better than a FATAL, even though the effect is the same.

On the substantive issue, I don't think we have any consensus that
forbidding this combination of parameters is the right thing to do
anyway.  Both Simon and I voted against that, and Tom's point has to
do only with style.  Similarly, I voted for flipping the default for
pause_at_recovery_target to off, rather than on, but no one else has
bought into that suggestion either.  Unless we get some more votes in
favor of doing one of those things, I think we should focus on the
actual must-fix issue here, which is properly documenting the way it
works now (i.e. adding the parameter to recovery.conf.sample with
appropriate documentation of the current behavior).

One thing I'm not quite clear on is what happens if we reach the
recovery target before we reach consistency.  i.e. create restore
point, flush xlog, abnormal shutdown, try to recover to named restore
point.  Is there any possibility that we can end up paused before Hot
Standby has actually started up.  Because that would be fairly useless
and annoying.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-17 Thread Robert Haas
On Fri, Mar 11, 2011 at 5:46 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Mar 11, 2011 at 5:04 AM, Robert Haas robertmh...@gmail.com wrote:
        if ((wrote_xlog  XactSyncCommit) || forceSyncCommit || nrels  0 ||
 SyncRepRequested())

 Whenever synchronous_replication is TRUE, we disable synchronous_commit.
 But, before disabling that, we should check also max_wal_senders and
 synchronous_standby_names? Otherwise, synchronous_commit can
 be disabled unexpectedly even in non replication case.

 Yeah, that's bad.  At the risk of repeating myself, I don't think this
 code should be checking SyncRepRequested() in the first place.  If the
 user has turned off synchronous_commit, then we should just commit
 asynchronously, even if sync rep is otherwise in force.  Otherwise,
 this if statement is going to get really complicated.   The logic is
 already at least mildly wrong here anyway: clearly we do NOT need to
 commit synchronously if the transaction has not written xlog, even if
 sync rep is enabled.

 Yeah, not to wait for replication when synchronous_commit is disabled
 seems to be more reasonable.

On further review, I've changed my mind.  Making synchronous_commit
trump synchronous_replication is appealing conceptually, but it's
going to lead to some weird corner cases.  For example, a transaction
that drops a non-temporary relation always commits synchronously; and
2PC also ignores synchronous_commit.  In the case where
synchronous_commit=off and synchronous_replication=on, we'd either
have to decide that these sorts of transactions aren't going to
replicate synchronously (which would give synchronous_commit a rather
long reach into areas it doesn't currently touch) or else that it's OK
for CREATE TABLE foo () to be totally asynchronous but that DROP TABLE
foo requires sync commit AND sync rep.  That's pretty weird.

What makes more sense to me after having thought about this more
carefully is to simply make a blanket rule that when
synchronous_replication=on, synchronous_commit has no effect.  That is
easy to understand and document.  I'm inclined to think it's OK to let
synchronous_replication have this effect even if max_wal_senders=0 or
synchronous_standby_names=''; you shouldn't turn
synchronous_replication on just for kicks, and I don't think we want
to complicate the test in RecordTransactionCommit() more than
necessary.  We should, however, adjust the logic so that a transaction
which has not written WAL can still commit asynchronously, because
such a transaction has only touched temp or unlogged tables and so
it's not important for it to make it to the standby, where that data
doesn't exist anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-17 Thread Robert Haas
On Thu, Mar 10, 2011 at 3:04 PM, Robert Haas robertmh...@gmail.com wrote:
 -                       /* Let the master know that we received some data. */
 -                       XLogWalRcvSendReply();
 -                       XLogWalRcvSendHSFeedback();

 This change completely eliminates the difference between write_location
 and flush_location in pg_stat_replication. If this change is reasoable, we
 should get rid of write_location from pg_stat_replication since it's useless.
 If not, this change should be reverted. I'm not sure whether monitoring
 the difference between write and flush locations is useful. But I guess that
 someone thought so and that code was added.

 I could go either way on this but clearly we need to do one or the other.

I'm not really sure why this was part of the synchronous replication
patch, but after mulling it over I think it's probably right to rip
out write_location completely.  There shouldn't ordinarily be much of
a gap between write location and flush location, so it's probably not
worth the extra network overhead to keep track of it.  We might need
to re-add some form of this in the future if we have a version of
synchronous replication that only waits for confirmation of receipt
rather than for confirmation of flush, but we don't have that in 9.1,
so why bother?

Barring objections, I'll go do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Fix various possible problems with synchronous replication.

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 1:24 PM, Thom Brown t...@linux.com wrote:
 errmsg(canceling the wait for replication and terminating connection
 due to administrator command)
 errmsg(canceling wait for synchronous replication due to user request)

 Should that first one then also say synchronous replication?

I could go either way.  Clearly if it's asynchronous replication, we
wouldn't be waiting.  But you're certainly right that we should be
consistent.

 errdetail(The transaction has already been committed locally but
 might have not been replicated to the standby.)));
 errdetail(The transaction has committed locally, but may not have
 replicated to the standby.)));

 Could we have these saying precisely the same thing?

Yeah.  Which is better?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-17 Thread Simon Riggs
On Thu, 2011-03-17 at 13:46 -0400, Robert Haas wrote:
 On Fri, Mar 11, 2011 at 5:46 AM, Fujii Masao masao.fu...@gmail.com wrote:
  On Fri, Mar 11, 2011 at 5:04 AM, Robert Haas robertmh...@gmail.com wrote:
 if ((wrote_xlog  XactSyncCommit) || forceSyncCommit || nrels  0 
  ||
  SyncRepRequested())
 
  Whenever synchronous_replication is TRUE, we disable synchronous_commit.
  But, before disabling that, we should check also max_wal_senders and
  synchronous_standby_names? Otherwise, synchronous_commit can
  be disabled unexpectedly even in non replication case.
 
  Yeah, that's bad.  At the risk of repeating myself, I don't think this
  code should be checking SyncRepRequested() in the first place.  If the
  user has turned off synchronous_commit, then we should just commit
  asynchronously, even if sync rep is otherwise in force.  Otherwise,
  this if statement is going to get really complicated.   The logic is
  already at least mildly wrong here anyway: clearly we do NOT need to
  commit synchronously if the transaction has not written xlog, even if
  sync rep is enabled.
 
  Yeah, not to wait for replication when synchronous_commit is disabled
  seems to be more reasonable.
 
 On further review, I've changed my mind.  Making synchronous_commit
 trump synchronous_replication is appealing conceptually, but it's
 going to lead to some weird corner cases.  For example, a transaction
 that drops a non-temporary relation always commits synchronously; and
 2PC also ignores synchronous_commit.  In the case where
 synchronous_commit=off and synchronous_replication=on, we'd either
 have to decide that these sorts of transactions aren't going to
 replicate synchronously (which would give synchronous_commit a rather
 long reach into areas it doesn't currently touch) or else that it's OK
 for CREATE TABLE foo () to be totally asynchronous but that DROP TABLE
 foo requires sync commit AND sync rep.  That's pretty weird.
 
 What makes more sense to me after having thought about this more
 carefully is to simply make a blanket rule that when
 synchronous_replication=on, synchronous_commit has no effect.  That is
 easy to understand and document.  I'm inclined to think it's OK to let
 synchronous_replication have this effect even if max_wal_senders=0 or
 synchronous_standby_names=''; you shouldn't turn
 synchronous_replication on just for kicks, and I don't think we want
 to complicate the test in RecordTransactionCommit() more than
 necessary.  We should, however, adjust the logic so that a transaction
 which has not written WAL can still commit asynchronously, because
 such a transaction has only touched temp or unlogged tables and so
 it's not important for it to make it to the standby, where that data
 doesn't exist anyway.

Agree to that.

Not read your other stuff yet, will do that later.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Fix various possible problems with synchronous replication.

2011-03-17 Thread Thom Brown
On 17 March 2011 17:55, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Mar 17, 2011 at 1:24 PM, Thom Brown t...@linux.com wrote:
 errdetail(The transaction has already been committed locally but
 might have not been replicated to the standby.)));
 errdetail(The transaction has committed locally, but may not have
 replicated to the standby.)));

 Could we have these saying precisely the same thing?

 Yeah.  Which is better?

Personally I prefer the 2nd.  It reads better somehow.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why is max standby delay only 35 minutes?

2011-03-17 Thread Peter Eisentraut
On tis, 2011-03-15 at 20:44 +0200, Peter Eisentraut wrote:
 Consequently, I propose the attached patch.  I didn't find any
 relevant documentation references that would need updating.

Applied.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flex output missing from 9.1a4 tarballs?

2011-03-17 Thread Robert Haas
2011/3/16 Devrim GÜNDÜZ dev...@gunduz.org:
 On Tue, 2011-03-15 at 22:00 -0400, Tom Lane wrote:

  My only hesitation about this is that it seems like sync rep and
  collation support are both still pretty broken.  Should we just not
  worry about that for alpha?

 FWIW, collations are probably still several days away from being
 noticeably less broken than they were in alpha4.  I have mixed
 feelings
 about whether an alpha5 right now is useful.

 Fair enough. Looks like we will skip next week, too.

Yeah, probably best.  I think we're making good progress, though, so I
propose we wrap alpha5 on Monday, March 28.  I don't think we'll be
all the way there yet by then, but I think we will have made enough
progress that it makes sense to get another snapshot out the door and
into the hands of testers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing multiple concurrent base backups

2011-03-17 Thread Robert Haas
On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Hmm, good point. It's harmless, but creating the history file in the first
 place sure seems like a waste of time.

 The attached patch changes pg_stop_backup so that it doesn't create
 the backup history file if archiving is not enabled.

 When I tested the multiple backups, I found that they can have the same
 checkpoint location and the same history file name.

 
 $ for ((i=0; i4; i++)); do
 pg_basebackup -D test$i -c fast -x -l test$i 
 done

 $ cat test0/backup_label
 START WAL LOCATION: 0/2B0 (file 00010002)
 CHECKPOINT LOCATION: 0/2E8
 START TIME: 2011-02-01 12:12:31 JST
 LABEL: test0

 $ cat test1/backup_label
 START WAL LOCATION: 0/2B0 (file 00010002)
 CHECKPOINT LOCATION: 0/2E8
 START TIME: 2011-02-01 12:12:31 JST
 LABEL: test1

 $ cat test2/backup_label
 START WAL LOCATION: 0/2B0 (file 00010002)
 CHECKPOINT LOCATION: 0/2E8
 START TIME: 2011-02-01 12:12:31 JST
 LABEL: test2

 $ cat test3/backup_label
 START WAL LOCATION: 0/2B0 (file 00010002)
 CHECKPOINT LOCATION: 0/2E8
 START TIME: 2011-02-01 12:12:31 JST
 LABEL: test3

 $ ls archive/*.backup
 archive/00010002.00B0.backup
 

 This would cause a serious problem. Because the backup-end record
 which indicates the same START WAL LOCATION can be written by the
 first backup before the other finishes. So we might think wrongly that
 we've already reached a consistency state by reading the backup-end
 record (written by the first backup) before reading the last required WAL
 file.

                /*
                 * Force a CHECKPOINT.  Aside from being necessary to prevent 
 torn
                 * page problems, this guarantees that two successive backup 
 runs will
                 * have different checkpoint positions and hence different 
 history
                 * file names, even if nothing happened in between.
                 *
                 * We use CHECKPOINT_IMMEDIATE only if requested by user (via 
 passing
                 * fast = true).  Otherwise this can take awhile.
                 */
                RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
                                                  (fast ? CHECKPOINT_IMMEDIATE 
 : 0));

 This problem happens because the above code (in do_pg_start_backup)
 actually doesn't ensure that the concurrent backups have the different
 checkpoint locations. ISTM that we should change the above or elsewhere
 to ensure that. Or we should include backup label name in the backup-end
 record, to prevent a recovery from reading not-its-own backup-end record.

 Thought?

This patch is on the 9.1 open items list, but I don't understand it
well enough to know whether it's correct.  Can someone else pick it
up?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 2nd Level Buffer Cache

2011-03-17 Thread Radosław Smogura
Hi,

I have implemented initial concept of 2nd level cache. Idea is to keep some 
segments of shared memory for special buffers (e.g. indices) to prevent 
overwrite those by other operations. I added those functionality to nbtree 
index scan.

I tested this with doing index scan, seq read, drop system buffers, do index 
scan and in few places I saw performance improvements, but actually, I'm not 
sure if this was just random or intended improvement.

There is few places to optimize code as well, and patch need many work, but 
may you see it and give opinions?

Regards,
Radek
diff --git a/.gitignore b/.gitignore
index 3f11f2e..6542e35 100644
--- a/.gitignore
+++ b/.gitignore
@@ -22,3 +22,4 @@ lcov.info
 /GNUmakefile
 /config.log
 /config.status
+/nbproject/private/
\ No newline at end of file
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 2796445..0229f5a 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -508,7 +508,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 	if (blkno != P_NEW)
 	{
 		/* Read an existing block of the relation */
-		buf = ReadBuffer(rel, blkno);
+		buf = ReadBufferLevel(rel, blkno, BUFFER_LEVEL_2ND);
 		LockBuffer(buf, access);
 		_bt_checkpage(rel, buf);
 	}
@@ -548,7 +548,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 			blkno = GetFreeIndexPage(rel);
 			if (blkno == InvalidBlockNumber)
 break;
-			buf = ReadBuffer(rel, blkno);
+			buf = ReadBufferLevel(rel, blkno, BUFFER_LEVEL_2ND);
 			if (ConditionalLockBuffer(buf))
 			{
 page = BufferGetPage(buf);
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index dadb49d..2922711 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -22,6 +22,7 @@ BufferDesc *BufferDescriptors;
 char	   *BufferBlocks;
 int32	   *PrivateRefCount;
 
+BufferLevelDesc *bufferLevels;
 
 /*
  * Data Structures:
@@ -72,8 +73,7 @@ int32	   *PrivateRefCount;
 void
 InitBufferPool(void)
 {
-	bool		foundBufs,
-foundDescs;
+	bool		foundBufs, foundDescs, foundBufferLevels = false;
 
 	BufferDescriptors = (BufferDesc *)
 		ShmemInitStruct(Buffer Descriptors,
@@ -83,19 +83,38 @@ InitBufferPool(void)
 		ShmemInitStruct(Buffer Blocks,
 		NBuffers * (Size) BLCKSZ, foundBufs);
 
-	if (foundDescs || foundBufs)
+bufferLevels = (BufferLevelDesc*)
+ShmemInitStruct(Buffer Levels Descriptors Table,
+		sizeof(BufferLevelDesc) * BUFFER_LEVEL_SIZE, 
+foundBufferLevels);
+	if (foundDescs || foundBufs || foundBufferLevels)
 	{
 		/* both should be present or neither */
-		Assert(foundDescs  foundBufs);
+		Assert(foundDescs  foundBufs  foundBufferLevels);
 		/* note: this path is only taken in EXEC_BACKEND case */
 	}
 	else
 	{
 		BufferDesc *buf;
+BufferLevelDesc *bufferLevelDesc;
+
 		int			i;
-
+
 		buf = BufferDescriptors;
 
+/* Initialize buffer levels. */
+//1st Level - Default
+bufferLevelDesc = bufferLevels;
+bufferLevelDesc-index = 0;
+bufferLevelDesc-super = BUFFER_LEVEL_END_OF_LIST;
+bufferLevelDesc-lower = BUFFER_LEVEL_END_OF_LIST;
+
+//2nd Level - For indices
+bufferLevelDesc++;
+bufferLevelDesc-index = 1;
+bufferLevelDesc-super = BUFFER_LEVEL_END_OF_LIST;
+bufferLevelDesc-lower = 0;
+
 		/*
 		 * Initialize all the buffer headers.
 		 */
@@ -117,6 +136,10 @@ InitBufferPool(void)
 			 */
 			buf-freeNext = i + 1;
 
+/* Assign buffer level. */
+//TODO Currently hardcoded - 
+buf-buf_level = ( 0.3 * NBuffers  i ) ? BUFFER_LEVEL_DEFAULT : BUFFER_LEVEL_2ND;
+
 			buf-io_in_progress_lock = LWLockAssign();
 			buf-content_lock = LWLockAssign();
 		}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1f89e52..867bae0 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -47,7 +47,8 @@
 #include storage/standby.h
 #include utils/rel.h
 #include utils/resowner.h
-
+#include catalog/pg_type.h
+#include funcapi.h
 
 /* Note: these two macros only work on shared buffers, not local ones! */
 #define BufHdrGetBlock(bufHdr)	((Block) (BufferBlocks + ((Size) (bufHdr)-buf_id) * BLCKSZ))
@@ -85,7 +86,7 @@ static volatile BufferDesc *PinCountWaitBuf = NULL;
 static Buffer ReadBuffer_common(SMgrRelation reln, char relpersistence,
   ForkNumber forkNum, BlockNumber blockNum,
   ReadBufferMode mode, BufferAccessStrategy strategy,
-  bool *hit);
+  bool *hit, BufferLevel bufferLevel);
 static bool PinBuffer(volatile BufferDesc *buf, 

Re: [HACKERS] really lazy vacuums?

2011-03-17 Thread Jesper Krogh

On 2011-03-17 15:02, Robert Haas wrote:

On Thu, Mar 17, 2011 at 4:17 AM, Jesper Kroghjes...@krogh.cc  wrote:

Is it obvious that the visibillity map bits should track complete
pages and not individual tuples? If the visibillity map tracks at
page-level the benefit would fall on slim tables where you squeeze
200 tuples into each page and having an update rate of 1% would
lower the likelyhood even more. (it may be that for slim tables the
index-only-scans are not as benefitial as to wide tables).

I'm not sure exactly what MaxHeapTuplesPerPage works out to be, but
say it's 200.  If you track visibility info per tuple rather than per
page, then the size of the visibility map is going to expand by a
factor of 200.  That might decrease contention, but otherwise it's a
bad thing - the whole point of having the visibility map in the first
place is that it's much, much smaller than the heap.  If it were the
same size as the heap, we could just read the heap.  What the map
attempts to accomplish is to allow us, by reading a small number of
pages, to check whether the tuples we're thinking of reading are
likely to be all-visible without actually looking at them.

Yes, that was sort of the math I was trying to make. I do allthough
belive that you have a way better feeling about it. But according
to this: 
http://wiki.postgresql.org/wiki/FAQ#How_much_database_disk_space_is_required_to_store_data_from_a_typical_text_file.3F 

The bulk row-overhead is around 24bytes, which will with 1 bit per row 
give a
size reduction of 1:(24x8) ~1:192, worstcase... that gives at best 341 
tuples/page

(where each tuple, does not contain any data at all). With that ratio, the
visibillitymap of a relation of 10GB would fill 52MB on disk (still 
worst case)

and that by itself would by all means be awesome. (with that small tuples a
10GB relation would have around 42 billion tuples).

On the 1 bit per page the best case would be 341 times better than above
reducing the size of the visibiility map on a 10GB table to around 152KB 
which

is extremely small (and thus also awesome) But the consequenses of a single
update would mean that you loose visibilllity map benefit on 341 tuples in
one shot.

Worst case situations are, where we approach the 4 tuples per page, before
we hit toast where the ratio of space reduction in 1 bit per tuple would 
be:

1:(2048x8) ~ 1:16384 and the 1 bit per page is 4 times better.
In the 1 bit per tuple a visibillity map of a 10GB relation would be 
around 610KB

1 bit per page would then drop it to around 160KB.


Can we drag out some average-case numbers on row-size in the heap
from some real production systems?

I may have gotten something hugely wrong in above calculations and/or
have missed some important points.

--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-17 Thread Kevin Grittner
Rados*aw Smogurarsmog...@softperience.eu wrote:
 
 I have implemented initial concept of 2nd level cache. Idea is to
 keep some segments of shared memory for special buffers (e.g.
 indices) to prevent overwrite those by other operations. I added
 those functionality to nbtree index scan.
 
 I tested this with doing index scan, seq read, drop system
 buffers, do index scan and in few places I saw performance
 improvements, but actually, I'm not sure if this was just random
 or intended improvement.
 
I've often wondered about this.  In a database I developed back in
the '80s it was clearly a win to have a special cache for index
entries and other special pages closer to the database than the
general cache.  A couple things have changed since the '80s (I mean,
besides my waistline and hair color), and PostgreSQL has many
differences from that other database, so I haven't been sure it
would help as much, but I have wondered.
 
I can't really look at this for a couple weeks, but I'm definitely
interested.  I suggest that you add this to the next CommitFest as a
WIP patch, under the Performance category.
 
https://commitfest.postgresql.org/action/commitfest_view/open
 
 There is few places to optimize code as well, and patch need many
 work, but may you see it and give opinions?
 
For something like this it makes perfect sense to show proof of
concept before trying to cover everything.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FK constraints NOT VALID by default?

2011-03-17 Thread Alvaro Herrera
I just ran this quick test in HEAD:

alvherre=# create table study (id int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY creará el índice implícito «study_pkey» 
para la tabla «study»
CREATE TABLE
alvherre=# insert into study select a from generate_series(1, 100) as a;
INSERT 0 100
alvherre=# create table studyform (id int primary key, study_id int not null 
references study);
NOTICE:  CREATE TABLE / PRIMARY KEY creará el índice implícito «studyform_pkey» 
para la tabla «studyform»
CREATE TABLE
alvherre=# insert into studyform select a, 1 + a * random() from 
generate_series(1, 10) a;
INSERT 0 10

and was very surprised to see that the foreign key is marked as NOT
VALID:

alvherre=# \d studyform
  Tabla «public.studyform»
 Columna  │  Tipo   │ Modificadores 
──┼─┼───
 id   │ integer │ not null
 study_id │ integer │ not null
Índices:
studyform_pkey PRIMARY KEY, btree (id)
Restricciones de llave foránea:
studyform_study_id_fkey FOREIGN KEY (study_id) REFERENCES study(id) NOT 
VALID


Is this really intended?

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] really lazy vacuums?

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 4:02 PM, Jesper Krogh jes...@krogh.cc wrote:
 On the 1 bit per page the best case would be 341 times better than above
 reducing the size of the visibiility map on a 10GB table to around 152KB
 which
 is extremely small (and thus also awesome) But the consequenses of a single
 update would mean that you loose visibilllity map benefit on 341 tuples in
 one shot.

True, but I'm not really sure that matters very much.  Keep in mind
also that would increase the frequency with which visibility map bits
would need to be flipped, which would carry its own costs.

 Worst case situations are, where we approach the 4 tuples per page, before
 we hit toast where the ratio of space reduction in 1 bit per tuple would be:
 1:(2048x8) ~ 1:16384 and the 1 bit per page is 4 times better.
 In the 1 bit per tuple a visibillity map of a 10GB relation would be around
 610KB
 1 bit per page would then drop it to around 160KB.


 Can we drag out some average-case numbers on row-size in the heap
 from some real production systems?

Well, unless you went to some significantly more complicated data
structure, you'd need to allow room for the maximum number of tuples
per page on every page, whether the slots were all in use or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FK constraints NOT VALID by default?

2011-03-17 Thread Andrew Dunstan



On 03/17/2011 05:22 PM, Alvaro Herrera wrote:

I just ran this quick test in HEAD:



and was very surprised to see that the foreign key is marked as NOT
VALID:




Is this really intended?


I sure hope not.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FK constraints NOT VALID by default?

2011-03-17 Thread Robert Haas
On Thu, Mar 17, 2011 at 5:29 PM, Andrew Dunstan and...@dunslane.net wrote:
 Is this really intended?

 I sure hope not.

That's a bug.  Not sure if it's a psql bug or a backend bug, but it's
definitely a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Sync Rep and shutdown Re: [HACKERS] Sync Rep v19

2011-03-17 Thread Jeff Davis
On Wed, 2011-03-16 at 13:35 -0400, Robert Haas wrote:
 2. If a query cancel interrupt is received (pg_cancel_backend or ^C),
 then cancel the sync rep wait and issue a warning before acknowledging
 the commit.

When I saw this commit, I noticed that the WARNING doesn't have an
errcode(). It seems like it should -- this is the kind of thing that the
client is likely to care about, and may want to handle specially. 

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

2011-03-17 Thread Fujii Masao
On Fri, Mar 18, 2011 at 1:17 AM, Robert Haas robertmh...@gmail.com wrote:
 Sorry, I've not been able to understand the point well yet. We should
 just use elog(ERROR) instead? But since ERROR in startup process
 is treated as FATAL, I'm not sure whether it's worth using ERROR
 instead. Or you meant another things?

 Yeah, I think he's saying that an ERROR in the startup process is
 better than a FATAL, even though the effect is the same.

We've already been using FATAL all over the recovery code. We should
s/FATAL/ERROR/g there (at least readRecoveryCommandFile)?

 On the substantive issue, I don't think we have any consensus that
 forbidding this combination of parameters is the right thing to do
 anyway.  Both Simon and I voted against that, and Tom's point has to
 do only with style.  Similarly, I voted for flipping the default for
 pause_at_recovery_target to off, rather than on, but no one else has
 bought into that suggestion either.  Unless we get some more votes in
 favor of doing one of those things, I think we should focus on the
 actual must-fix issue here, which is properly documenting the way it
 works now (i.e. adding the parameter to recovery.conf.sample with
 appropriate documentation of the current behavior).

I agree to flip the default to false, whether we forbid that combination
of settings.

 One thing I'm not quite clear on is what happens if we reach the
 recovery target before we reach consistency.  i.e. create restore
 point, flush xlog, abnormal shutdown, try to recover to named restore
 point.  Is there any possibility that we can end up paused before Hot
 Standby has actually started up.  Because that would be fairly useless
 and annoying.

Good catch! In that case, the same situation as (3) would happen.
I think that recovery should ignore pause_at_recovery_target until
it reaches consistent point. If we do so, when recovery target is
ahead of consistent point, recovery just ends in inconsistent point
and throws FATAL error.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.

2011-03-17 Thread Fujii Masao
On Fri, Mar 18, 2011 at 2:52 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Mar 10, 2011 at 3:04 PM, Robert Haas robertmh...@gmail.com wrote:
 -                       /* Let the master know that we received some data. 
 */
 -                       XLogWalRcvSendReply();
 -                       XLogWalRcvSendHSFeedback();

 This change completely eliminates the difference between write_location
 and flush_location in pg_stat_replication. If this change is reasoable, we
 should get rid of write_location from pg_stat_replication since it's 
 useless.
 If not, this change should be reverted. I'm not sure whether monitoring
 the difference between write and flush locations is useful. But I guess that
 someone thought so and that code was added.

 I could go either way on this but clearly we need to do one or the other.

 I'm not really sure why this was part of the synchronous replication
 patch, but after mulling it over I think it's probably right to rip
 out write_location completely.  There shouldn't ordinarily be much of
 a gap between write location and flush location, so it's probably not
 worth the extra network overhead to keep track of it.  We might need
 to re-add some form of this in the future if we have a version of
 synchronous replication that only waits for confirmation of receipt
 rather than for confirmation of flush, but we don't have that in 9.1,
 so why bother?

 Barring objections, I'll go do that.

I agree to get rid of write_location.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers