Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-13 Thread Greg Smith

Florian Pflug wrote:

glibc defines O_DSYNC as an alias for O_SYNC and warrants that with
Most Linux filesystems don't actually implement the POSIX O_SYNC semantics, which 
require all metadata updates of a write to be on disk on returning to userspace, but only 
the O_DSYNC semantics, which require only actual file data and metadata necessary to 
retrieve it to be on disk by the time the system call returns.

If that is true, I believe we should default to open_sync, not fdatasync if 
open_datasync isn't available, at least on linux.
  


It's not true, because Linux O_SYNC semantics are basically that it's 
never worked reliably on ext3.  See 
http://archives.postgresql.org/pgsql-hackers/2007-10/msg01310.php for 
example of how terrible the situation would be if O_SYNC were the 
default on Linux.


We just got a report that a better O_DSYNC is now properly exposed 
starting on kernel 2.6.33+glibc 2.12:  
http://archives.postgresql.org/message-id/201006041539.03868.cousinm...@gmail.com 
and it's possible they may have finally fixed it so it work like it's 
supposed to.  PostgreSQL versions compiled against the right 
prerequisites will default to O_DSYNC by themselves.  Whether or not 
this is a good thing has yet to be determined.  The last thing we'd want 
to do at this point is make the old and usually broken O_SYNC behavior 
suddenly preferred, when the new and possibly fixed O_DSYNC one will be 
automatically selected when available without any code changes on the 
database side.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PERFORM] Large (almost 50%!) performance drop after upgrading to 8.4.4?

2010-06-13 Thread Bruce Momjian
Bruce Momjian wrote:
 Alvaro Herrera wrote:
  Excerpts from Tom Lane's message of jue jun 10 11:46:25 -0400 2010:
  
   Yes, the folks at commandprompt need to be told about this.  Loudly.
   It's a serious packaging error.
  
  Just notified Lacey, the packager (not so loudly, though); she's working
  on new packages, and apologizes for the inconvenience.
 
 [ Thread moved to hackers.  8.4.4 RPMs were built with debug flags. ]
 
 Uh, where are we on this?  Has it been completed?  How are people
 informed about this?  Do we need to post to the announce email list? 
 Does Yum just update them?  How did this mistake happen?  How many days
 did it take to detect the problem?
 
 Why has no news been posted here?
 
   https://public.commandprompt.com/projects/pgcore/news

Why have I received no reply to this email?  Do people think this is not
a serious issue?  I know it is a weekend but the problem was identified
on Thursday, meaning there was a full workday for someone from
CommandPrompt to reply to the issue and report a status:

http://archives.postgresql.org/pgsql-performance/2010-06/msg00165.php

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 9.1 tentative timeline

2010-06-13 Thread Robert Haas
On Sat, Jun 12, 2010 at 1:45 AM, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-06-11 at 08:15 -0400, Stephen Frost wrote:
 * Pr, Solaiyappan (NSN - IN/Bangalore) (solaiyappan...@nsn.com) wrote:
  I understand this is very early to ask this.. but, is there any tentative 
  timeline has been planned / available for the PostgreSQL 9.1 release, like 
  for the alpha or beta releases before the general release?

 The tentative timeline is more-or-less the same as 9.0 went.  That is
 to say, we're hopeful to start a commitfest July 15th, finish by Auguet
 15th, cut alpha1, then switch to GIT for the main repo, and start the
 next commitfets September 15th.

 I wrote it down now:
 http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Development_Plan

Thanks!  Looks good, except I thought our plan was to cut alpha1
before the git switch.  Or maybe it doesn't matter?  Not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG 9.1 tentative timeline

2010-06-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Jun 12, 2010 at 1:45 AM, Peter Eisentraut pete...@gmx.net wrote:
 I wrote it down now:
 http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Development_Plan

 Thanks!  Looks good, except I thought our plan was to cut alpha1
 before the git switch.

Yes, I thought so too.  The page lists those events in the right order,
but now I see the associated dates don't agree.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Robert Haas
On Thu, Jun 10, 2010 at 4:09 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Here's the code.

I haven't more than glanced at this, but +1 for committing it if
you're confident it DTRT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Robert Haas
On Sat, Jun 12, 2010 at 4:51 PM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 Dimitri Fontaine dfonta...@hi-media.com writes:
 Also, should I try to send a patch implementing my proposal (internal
 command exposed as a function at the SQL level, and while at it, maybe
 the internal command pg_archive_bypass to mimic /usr/bin/true as an
 archive_command)?

 I had to have a try at it, even if quick and dirty. I've not tried to
 code the pg_archive_bypass internal command for lack of discussion, but
 I still think it would be great to have it.

 So here's a see my idea in code patch, that put the previous code by
 Simon into a backend function. As the goal was not to adapt the existing
 code intended as external to use the internal APIs, you'll find it quite
 ugly I'm sure.

 For example, this #define XLOG_DATA_FNAME_LEN has to go away, but that
 won't help having the idea accepted or not, and as I'm only warming up,
 I didn't tackle the problem. If you want me to do it, I'd appreciate
 some guidance as how to, though.

 It goes like this:

 dim=# select pg_switch_xlog();
  pg_switch_xlog
 
  0/198
 (1 row)

 dim=# select pg_archive_cleanup('0/198');
 DEBUG:  removing pg_xlog/0001
 DEBUG:  removing pg_xlog/00010001
  pg_archive_cleanup
 
  t
 (1 row)

 I hope you too will find this way of interfacing is easier to deal with
 for everybody (from code maintenance to user settings).

I'm a bit perplexed here.  The archive cleanup has to run on the
standby, not the master, right?  Whereas pg_switch_xlog() can only run
on the master.  The purpose of making this a standalone executable is
so that people who have, for example, multiple standbys, can customize
the logic without having to hack the backend.  Pushing this into the
backend would defeat that goal; plus, it wouldn't be usable at all for
people who aren't running Hot Standby.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.0beta2 - server crash when using HS + SR

2010-06-13 Thread Robert Haas
On Fri, Jun 11, 2010 at 9:29 PM, Rafael Martinez
r.m.guerr...@usit.uio.no wrote:
 I am testing HS + SR in a system running 9.0beta2. What I am doing is
 just trying all kind of crazy combinations and see how the system
 handles them.

Thanks!

 One of the test I knew was going to fail was to create a tablespace in
 the master node with the directory used by the tablespace existing in
 the master and not in the standby node.

 What I didn't expect was such a serious consequence. Postgres crashed in
 the standby node and it refused to start until the directory needed by
 the tablespace was created also in the standby.

 I suppose there is not an easy way of fixing this, but at least it would
 be a good idea to update the documentation with some information about
 how to fix this error situation (hot-standby.html#HOT-STANDBY-CAVEATS
 will be a nice place to have this information)

 Another thing is that the HINT message in the logs was a little
 misleading. The server is down and it will not start without fixing the
 cause of the problem.
 - 
 FATAL:  directory /var/pgsql/ts_test does not exist
 CONTEXT:  xlog redo create ts: 20177 /var/pgsql/ts_test
 LOG:  startup process (PID 10147) exited with exit code 1
 LOG:  terminating any other active server processes
 WARNING:  terminating connection because of crash of another server process
 DETAIL:  The postmaster has commanded this server process to roll back
 the current transaction and exit, because another server process exited
 abnormally and possibly corrupted shared memory.
 HINT:  In a moment you should be able to reconnect to the database and
 repeat your command.

I think the behavior is correct (what else would we do? we must be
able to replace the subsequent WAL records that use the new
tablespace) but I agree that the hint is a little misleading.
Ideally, it seems like we'd like to issue that hint if we're planning
to restart, but not otherwise.  You get that same message, for
example, if the DBA performs an immediate shutdown.

I'm somewhat disinclined to try to address this for 9.0.  We've had
this problem for a long time, and I'm not sure that the fact that it
can now happen in a slightly wider set of circumstances is enough
reason to engineer a solution so close to release time, nor am I sure
what that other solution would look like.  But I'm open to other
opinions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I'm a bit perplexed here.  The archive cleanup has to run on the
 standby, not the master, right?  Whereas pg_switch_xlog() can only run
 on the master.

I used it just to show a possible use case, easy to grasp. Sorry if
that's confusing instead.

  The purpose of making this a standalone executable is
 so that people who have, for example, multiple standbys, can customize
 the logic without having to hack the backend.  Pushing this into the
 backend would defeat that goal; plus, it wouldn't be usable at all for
 people who aren't running Hot Standby.

In the simple cases, what you want to be able to easily choose is just
the first XLOG file you're NOT cleaning. And this is the only argument
you give the function. 

So you can either use the backend function as your internal command for
archive cleanup, or use a script that choose where to stop cleaning then
call it with that as an argument (it's SQL callable).

What it does is unlink the file. If that behavior doesn't suit you, it's
still possible to use an external command and tune some already proposed
scripts. I just don't see how an external binary has more to offer than
a backend function here. It's more code to maintain, it's harder to
setup for people, and if it does not suit you, you still have to make
you own script but you can not use what we ship easily (you have to get
the sources and code in C for that).

What I'm after is being able to tell people to just setup a GUC to a
given value, not to copy/paste a (perl or bash) script from the docs,
make it executable under their system, then test it and run it in
production. We can do better than that, and it's not even hard.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Robert Haas
On Sun, Jun 13, 2010 at 1:04 PM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm a bit perplexed here.  The archive cleanup has to run on the
 standby, not the master, right?  Whereas pg_switch_xlog() can only run
 on the master.

 I used it just to show a possible use case, easy to grasp. Sorry if
 that's confusing instead.

  The purpose of making this a standalone executable is
 so that people who have, for example, multiple standbys, can customize
 the logic without having to hack the backend.  Pushing this into the
 backend would defeat that goal; plus, it wouldn't be usable at all for
 people who aren't running Hot Standby.

 In the simple cases, what you want to be able to easily choose is just
 the first XLOG file you're NOT cleaning. And this is the only argument
 you give the function.

 So you can either use the backend function as your internal command for
 archive cleanup, or use a script that choose where to stop cleaning then
 call it with that as an argument (it's SQL callable).

 What it does is unlink the file. If that behavior doesn't suit you, it's
 still possible to use an external command and tune some already proposed
 scripts. I just don't see how an external binary has more to offer than
 a backend function here. It's more code to maintain, it's harder to
 setup for people, and if it does not suit you, you still have to make
 you own script but you can not use what we ship easily (you have to get
 the sources and code in C for that).

 What I'm after is being able to tell people to just setup a GUC to a
 given value, not to copy/paste a (perl or bash) script from the docs,
 make it executable under their system, then test it and run it in
 production. We can do better than that, and it's not even hard.

We're not going to make them cut/paste anything from the docs.  We're
going to provide a production-ready executable they can just use,
which should be installed (presumably, already with the correct
permissions) by their packaging system if they install
postgresql-contrib or the equivalent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Robert Haas robertmh...@gmail.com writes:
  The purpose of making this a standalone executable is
 so that people who have, for example, multiple standbys, can customize
 the logic without having to hack the backend.  Pushing this into the
 backend would defeat that goal; plus, it wouldn't be usable at all for
 people who aren't running Hot Standby.

 We're not going to make them cut/paste anything from the docs.  We're
 going to provide a production-ready executable they can just use,
 which should be installed (presumably, already with the correct
 permissions) by their packaging system if they install
 postgresql-contrib or the equivalent.

I still run against people not wanting to trust contrib. I still read
here from time to time that contrib's chapter is maintaining working
examples of extensibility, not maintaining production ready add-ons.

Other than that, you proposed something flexible and easy to customize,
and you end up with an executable binary that will only offer one
behavior (unlink), the only option is where to stop (%r).

The backend function I'm proposing uses the same option, but is easier
to call from a script, should you need to customize. You don't even have
to run the script locally or remember where is the XLOG directory of
that instance. You could operate over a JDBC connection, e.g.

I now realize that my proposal ain't helping if Streaming Replication is
filling the standby's pg_xlog and hot_standby = off. I don't remember
that SR rebuilds pg_xlog on the standby though, does it?

The proposed script will only cleanup XLOGDIR in fact, so if you use a
common archive elsewhere then you still need some external command not
provided by the project. So we still need the script example in the
docs.

I think that the pg_archivecleanup binary is a good solution, all the
more if not shipped in contrib, but that the SQL callable function is
better.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Andrew Dunstan



Dimitri Fontaine wrote:

I still read
here from time to time that contrib's chapter is maintaining working
examples of extensibility, not maintaining production ready add-ons.

  


Even if this were true, and I don't believe it is, ISTM the solution 
would be to have a utility command alongside the other utility commands 
like pg_controldata.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.0beta2 - server crash when using HS + SR

2010-06-13 Thread Martijn van Oosterhout
On Sun, Jun 13, 2010 at 12:42:49PM -0400, Robert Haas wrote:
 I think the behavior is correct (what else would we do? we must be
 able to replace the subsequent WAL records that use the new
 tablespace) but I agree that the hint is a little misleading.
 Ideally, it seems like we'd like to issue that hint if we're planning
 to restart, but not otherwise.  You get that same message, for
 example, if the DBA performs an immediate shutdown.

A bit of a comment from the sidelines: there's no particular reason why
the tablespaces on the master would need to match the tablespaces on
the slave. For a first cut it would seem to me that you should just be
able to ignore the tablespace commands on the slave. Not sure whether
that's easy or not though.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] 9.0beta2 - server crash when using HS + SR

2010-06-13 Thread Robert Haas
On Sun, Jun 13, 2010 at 4:52 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Sun, Jun 13, 2010 at 12:42:49PM -0400, Robert Haas wrote:
 I think the behavior is correct (what else would we do? we must be
 able to replace the subsequent WAL records that use the new
 tablespace) but I agree that the hint is a little misleading.
 Ideally, it seems like we'd like to issue that hint if we're planning
 to restart, but not otherwise.  You get that same message, for
 example, if the DBA performs an immediate shutdown.

 A bit of a comment from the sidelines: there's no particular reason why
 the tablespaces on the master would need to match the tablespaces on
 the slave. For a first cut it would seem to me that you should just be
 able to ignore the tablespace commands on the slave. Not sure whether
 that's easy or not though.

It's not particularly easy, and it might also not be what you want.

Perhaps in an ideal world we would have some system for mapping
tablespaces on the master to tablespaces on the slave, but I doubt
it's worth the effort: the existing system is not terribly onerous.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.0beta2 - server crash when using HS + SR

2010-06-13 Thread Rafael Martinez
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Robert Haas wrote:
 On Fri, Jun 11, 2010 at 9:29 PM, Rafael Martinez

 I'm somewhat disinclined to try to address this for 9.0.  We've had
 this problem for a long time, and I'm not sure that the fact that it
 can now happen in a slightly wider set of circumstances is enough
 reason to engineer a solution so close to release time, nor am I sure
 what that other solution would look like.  But I'm open to other
 opinions.
 

A minimum and probably the only feasible thing for 9.0 will be to update
the documentation. We need an entry in the hot-standby caveats section
explaining that if you create a tablespace and the directory needed does
not exist in the the standby, the standby will shutdown itself and will
not be able to start until the directory is also created in the standby.

For a DBA point of view, two possible solutions could be:

1) PostgreSQL creates the directory needed for the tablespace if the
user running postgres has privileges to do so at the OS level.

2) The standby discovers that the directory needed does not exist and
pauses the recovering (without shutting down the server) in the WAL
record that creates the tablespace. The standby will check periodically
if the directory is created before starting the recovery process again.

With this the users will be able to continue using and running queries
in the standby node. In very busy systems with many changes, the standby
will fall behind quite a lot if the error is not discovered and fixed
quickly. But in many other systems the delay will not be a problem as
serious as the loss of access to the standby.

regards,
- --
 Rafael Martinez, r.m.guerr...@usit.uio.no
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEUEARECAAYFAkwVSsgACgkQBhuKQurGihQ1HgCXQKdwOEHLkj7g6FpJG663NUiZ
2gCZAT70aIQZ5Wj3IqsLlY6n+leLruI=
=neA1
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.0b1: ERROR: btree index keys must be ordered by attribute

2010-06-13 Thread Ian Barwick
2010/6/1 Bruce Momjian br...@momjian.us:
 Tom Lane wrote:
(...)
 The index-based-max code is throwing in the IS NOT NULL condition
 without thought for where it has to go in the index condition order.
 Will look into fixing this tomorrow.

 FYI, this no longer throws an error in current CVS so was fixed by Tom.

Thanks for the update, I can confirm the issue  no longer occurs in
beta2.

Regards

Ian Barwick

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Command Prompt 8.4.4 PRMs compiled with debug/assert enabled

2010-06-13 Thread Bruce Momjian
Bruce Momjian wrote:
 Bruce Momjian wrote:
  Alvaro Herrera wrote:
   Excerpts from Tom Lane's message of jue jun 10 11:46:25 -0400 2010:
   
Yes, the folks at commandprompt need to be told about this.  Loudly.
It's a serious packaging error.
   
   Just notified Lacey, the packager (not so loudly, though); she's working
   on new packages, and apologizes for the inconvenience.
  
  [ Thread moved to hackers.  8.4.4 RPMs were built with debug flags. ]
  
  Uh, where are we on this?  Has it been completed?  How are people
  informed about this?  Do we need to post to the announce email list? 
  Does Yum just update them?  How did this mistake happen?  How many days
  did it take to detect the problem?
  
  Why has no news been posted here?
  
  https://public.commandprompt.com/projects/pgcore/news
 
 Why have I received no reply to this email?  Do people think this is not
 a serious issue?  I know it is a weekend but the problem was identified
 on Thursday, meaning there was a full workday for someone from
 CommandPrompt to reply to the issue and report a status:
 
   http://archives.postgresql.org/pgsql-performance/2010-06/msg00165.php

[ Updated subject line.]

I am on IM with Joshua Drake right now and am working to get answers to
the questions above.  He or I will report in the next few hours.

FYI, only Command Prompt-produced RPMs are affected.  Devrim's RPMs are
not:

http://yum.postgresqlrpms.org/

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.0beta2 - server crash when using HS + SR

2010-06-13 Thread Greg Smith

Rafael Martinez wrote:

A minimum and probably the only feasible thing for 9.0 will be to update
the documentation. We need an entry in the hot-standby caveats section
explaining that if you create a tablespace and the directory needed does
not exist in the the standby, the standby will shutdown itself and will
not be able to start until the directory is also created in the standby.
  


This is not a Hot Standby problem, and it's been documented since at 
least http://www.postgresql.org/docs/8.2/static/warm-standby.html ; read 
25.2.1 Planning in the current 
http://developer.postgresql.org/pgdocs/postgres/warm-standby.html where 
it's spelled out quite clearly.


It's a mixed blessing that it's now possible to actually get a 
replicated server up so much more easily that people don't have to read 
that particular document quite as carefully now and still get something 
going.  But if there's a documentation change to made, it should be 
highlighting the warning already in that section better; it's not 
something appropriate for the Hot Standby caveats.  Since this is 
clearly documented already, and there are bigger problems to worry about 
for the current release, the real minimum action to perform here (and 
the only one I would consider reasonable) is to change nothing at this 
point for 9.0 here.  I'm sorry you missed where this was covered, but 
adding redundant documentation for basics like this invariably leads to 
the multiple copies becoming out of sync with one another as changes are 
made in the future.



1) PostgreSQL creates the directory needed for the tablespace if the
user running postgres has privileges to do so at the OS level.
2) The standby discovers that the directory needed does not exist and
pauses the recovering (without shutting down the server) in the WAL
record that creates the tablespace. The standby will check periodically
if the directory is created before starting the recovery process again.
  


Given that the idea behind a tablespace is that you want to relocate it 
to a specific storage path, which may not map in the same way on the 
standby, your first idea will never get implemented; it's not something 
you want the server to guess about.  As for the second, I would rather 
see the standby go down--and hopefully set off some serious alarms for 
the DBA who has screwed up here--than to stay up in a dysfunctional 
polling state.  The very serious mistake made is far more likely to be 
discovered the way it's built right now.


I wouldn't be adverse to improving the error messages emitted when this 
happens by the server to make it more obvious what's gone wrong in 9.1.  
That's the only genuine improvement I'd see value in here, to cut down 
on other people running into what you did and being as confused by it.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to show individual statement latencies in pgbench output

2010-06-13 Thread Greg Smith

Florian Pflug wrote:

To be able to asses the performance characteristics of the different 
wal-related options, I patched pgbench to show the average latency of each 
individual statement. The idea is to be able to compare the latency of the 
COMMIT with the ones of the other statements.
  


That's an interesting idea, particularly given that people don't really 
understand where the time is going in the standard pgbench test.  Your 
first bit of review feedback is that this would have to be something you 
could toggle on and off, there's no way most people want to pay this 
penalty.  If you submit a new patch with a command line option to enable 
this alternate logging format and add the result to 
https://commitfest.postgresql.org/action/commitfest_view?id=6 , you can 
put my name down as a reviewer and I'll take a deeper look at it as part 
of that.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump(all) --quote-all-identifiers

2010-06-13 Thread Robert Haas
In response to a complaint from Hartmut Goebel:

http://archives.postgresql.org/pgsql-bugs/2010-06/msg00018.php

And per a design proposed by Tom Lane:

http://archives.postgresql.org/pgsql-bugs/2010-06/msg00211.php

PFA a patch to implement $SUBJECT.  One interesting aspect of this
patch is that types like integer and double precision don't get
quoted in the output, whereas types like text do.  But it turns out
that types like integer and double precision don't *work* if
they're quoted, so this is not a bad thing.  It might possibly be
judged to require documentation somewhere, however.

Suggestions welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


quote_all_identifiers.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SR slaves and .pgpass

2010-06-13 Thread Fujii Masao
On Fri, Jun 11, 2010 at 7:14 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 09/06/10 08:24, Fujii Masao wrote:

 On Wed, Jun 9, 2010 at 12:52 PM, Andrew Dunstanand...@dunslane.net
  wrote:

 There is precedent for .pgpass being a bit ambiguous. See the way
 localhost is used.

 OK. The attached patch allows us to use replication in the database
 field of the .pgpass file, for the replication connection.

 Thanks, committed with some rewording of the docs and comments. I hope I
 made them better, not worse.

I could confirm that this has been committed via git log, but not find
the related post on the pgsql-committers. Is there a problem in the
mailing-list?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] visibility map

2010-06-13 Thread Robert Haas
visibilitymap.c begins with a long and useful comment - but this part
seems to have a bit of split personality disorder.

 * Currently, the visibility map is not 100% correct all the time.
 * During updates, the bit in the visibility map is cleared after releasing
 * the lock on the heap page. During the window between releasing the lock
 * and clearing the bit in the visibility map, the bit in the visibility map
 * is set, but the new insertion or deletion is not yet visible to other
 * backends.
 *
 * That might actually be OK for the index scans, though. The newly inserted
 * tuple wouldn't have an index pointer yet, so all tuples reachable from an
 * index would still be visible to all other backends, and deletions wouldn't
 * be visible to other backends yet.  (But HOT breaks that argument, no?)

I believe that the answer to the parenthesized question here is yes
(in which case we might want to just delete this paragraph).

 * There's another hole in the way the PD_ALL_VISIBLE flag is set. When
 * vacuum observes that all tuples are visible to all, it sets the flag on
 * the heap page, and also sets the bit in the visibility map. If we then
 * crash, and only the visibility map page was flushed to disk, we'll have
 * a bit set in the visibility map, but the corresponding flag on the heap
 * page is not set. If the heap page is then updated, the updater won't
 * know to clear the bit in the visibility map.  (Isn't that prevented by
 * the LSN interlock?)

I *think* that the answer to this parenthesized question is no.
When we vacuum a page, we set the LSN on both the heap page and the
visibility map page.  Therefore, neither of them can get written to
disk until the WAL record is flushed, but they could get flushed in
either order.  So the visibility map page could get flushed before the
heap page, as the non-parenthesized portion of the comment indicates.
However, at least in theory, it seems like we could fix this up during
redo.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC - Materialized Views - is stale or fresh?

2010-06-13 Thread Pavel Baros

I am curious how could I solve the problem:

During refreshing I would like to know, if MV is stale or fresh? And I 
had an idea:


In fact, MV need to know if its last refresh (transaction id) is older 
than any INSERT, UPDATE, DELETE transaction launched against source 
tables. So if MV has information about last (highest) xmin in source 
tables, it could simply compare its own xmin to xmins (xmax for deleted 
rows) from source tables and decide, if is stale or fresh.


Whole realization could look like this:
1. Make new column in pg_class (or somewhere in pg_stat* ?):
   pg_class.rellastxid (of type xid)

2. After each INSERT, UPDATE, DELETE statement (transaction) 
pg_class.rellastxid would be updated. That should not be time- or 
memory- consuming (not so much) since pg_class is cached, I guess.


3. When refreshing, as described above, MV rellastxid compared to 
source tables rellastxid could answer if MV is stale or still fresh. 
Decision, if to run refreshing, would be as simple as it can.



a) Is the idea right?

b) Could appear some cases when it is not true? (except xid wraparound).

c) I was looking for some help with it in pg_stat*, but there is no 
information about transactions, last changes in relations or anything.


d) or there are other mechanisms or ideas how to check if MV source 
tables are changed from last refresh?



Thanks for replies

Pavel Baros

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC - Materialized Views - is stale or fresh?

2010-06-13 Thread Pavel Baros

I am curious how could I solve the problem:

During refreshing I would like to know, if MV is stale or fresh? And I 
had an idea:


In fact, MV need to know if its last refresh (transaction id) is older 
than any INSERT, UPDATE, DELETE transaction launched against source 
tables. So if MV has information about last (highest) xmin in source 
tables, it could simply compare its own xmin to xmins (xmax for deleted 
rows) from source tables and decide, if is stale or fresh.


Whole realization could look like this:
1. Make new column in pg_class (or somewhere in pg_stat* ?):
   pg_class.rellastxid (of type xid)

2. After each INSERT, UPDATE, DELETE statement (transaction) 
pg_class.rellastxid would be updated. That should not be time- or 
memory- consuming (not so much) since pg_class is cached, I guess.


3. When refreshing, as described above, MV rellastxid compared to 
source tables rellastxid could answer if MV is stale or still fresh. 
Decision, if to run refreshing, would be as simple as it can.



a) Is the idea right?

b) Could appear some cases when it is not true? (except xid wraparound).

c) I was looking for some help with it in pg_stat*, but there is no 
information about transactions, last changes in relations or anything.


d) or there are other mechanisms or ideas how to check if MV source 
tables are changed from last refresh?



Thanks for replies

Pavel Baros

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] No hash join across partitioned tables?

2010-06-13 Thread Robert Haas
On Thu, Jun 10, 2010 at 9:29 AM, Robert Haas robertmh...@gmail.com wrote:
 (moving to -hackers)

 On Wed, Jun 9, 2010 at 4:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 In going back through emails I had marked as possibly needing another
 look before 9.0 is released, I came across this issue again.  As I
 understand it, analyze (or analyse) now collects statistics for both
 the parent individually, and for the parent and its children together.
  However, as I further understand it, autovacuum won't actually fire
 off an analyze unless there's enough activity on the parent table
 considered individually to warrant it.  So if you have an empty parent
 and a bunch of children with data in it, your stats will still stink,
 unless you analyze by hand.

 Check.

 Assuming my understanding of the problem is correct, we could:

 (a) fix it,
 (b) document that you should consider periodic manual analyze commands
 in this situation, or
 (c) do nothing.

 Thoughts?

 The objections to (a) are that it might result in excessive ANALYZE work
 if not done intelligently, and that we haven't got a patch ready anyway.
 I would have liked to get to this for 9.0 but I feel it's a bit late
 now.

 I guess I can't really disagree with that.  Should we try to document
 this in some way?

Proposed patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


analyze-inherit-docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command to prune archive at restartpoints

2010-06-13 Thread Fujii Masao
On Mon, Jun 14, 2010 at 3:51 AM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 I now realize that my proposal ain't helping if Streaming Replication is
 filling the standby's pg_xlog and hot_standby = off. I don't remember
 that SR rebuilds pg_xlog on the standby though, does it?

In SR, WAL files in the pg_xlog directory on the standby are recycled
by every restartpoints. So your proposed function seems not to be helpful
even if hot_standby = on.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map

2010-06-13 Thread Heikki Linnakangas

On 14/06/10 06:08, Robert Haas wrote:

visibilitymap.c begins with a long and useful comment - but this part
seems to have a bit of split personality disorder.

  * Currently, the visibility map is not 100% correct all the time.
  * During updates, the bit in the visibility map is cleared after releasing
  * the lock on the heap page. During the window between releasing the lock
  * and clearing the bit in the visibility map, the bit in the visibility map
  * is set, but the new insertion or deletion is not yet visible to other
  * backends.
  *
  * That might actually be OK for the index scans, though. The newly inserted
  * tuple wouldn't have an index pointer yet, so all tuples reachable from an
  * index would still be visible to all other backends, and deletions wouldn't
  * be visible to other backends yet.  (But HOT breaks that argument, no?)

I believe that the answer to the parenthesized question here is yes
(in which case we might want to just delete this paragraph).


A HOT update can only update non-indexed columns, so I think we're still 
OK with HOT. To an index-only scan, it doesn't matter which tuple in a 
HOT update chain you consider as live, because they both must all the 
same value in the indexed columns. Subtle..



  * There's another hole in the way the PD_ALL_VISIBLE flag is set. When
  * vacuum observes that all tuples are visible to all, it sets the flag on
  * the heap page, and also sets the bit in the visibility map. If we then
  * crash, and only the visibility map page was flushed to disk, we'll have
  * a bit set in the visibility map, but the corresponding flag on the heap
  * page is not set. If the heap page is then updated, the updater won't
  * know to clear the bit in the visibility map.  (Isn't that prevented by
  * the LSN interlock?)

I *think* that the answer to this parenthesized question is no.
When we vacuum a page, we set the LSN on both the heap page and the
visibility map page.  Therefore, neither of them can get written to
disk until the WAL record is flushed, but they could get flushed in
either order.  So the visibility map page could get flushed before the
heap page, as the non-parenthesized portion of the comment indicates.


Right.


However, at least in theory, it seems like we could fix this up during
redo.


Setting a bit in the visibility map is currently not WAL-logged, but yes 
once we add WAL-logging, that's straightforward to fix.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - Materialized Views - is stale or fresh?

2010-06-13 Thread Heikki Linnakangas

On 12/06/10 17:18, Pavel Baros wrote:

I am curious how could I solve the problem:

During refreshing I would like to know, if MV is stale or fresh? And I
had an idea:

In fact, MV need to know if its last refresh (transaction id) is older
than any INSERT, UPDATE, DELETE transaction launched against source
tables. So if MV has information about last (highest) xmin in source
tables, it could simply compare its own xmin to xmins (xmax for deleted
rows) from source tables and decide, if is stale or fresh.

Whole realization could look like this:
1. Make new column in pg_class (or somewhere in pg_stat* ?):
pg_class.rellastxid (of type xid)

2. After each INSERT, UPDATE, DELETE statement (transaction)
pg_class.rellastxid would be updated. That should not be time- or
memory- consuming (not so much) since pg_class is cached, I guess.


rellastxid would have to be updated at every insert/update/delete. It 
would become a big bottleneck. That's not going to work.


Why do you need to know if a MV is stale?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC - Materialized Views - is stale or fresh?

2010-06-13 Thread Greg Smith

Pavel Baros wrote:
After each INSERT, UPDATE, DELETE statement (transaction) 
pg_class.rellastxid would be updated. That should not be time- or 
memory- consuming (not so much) since pg_class is cached, I guess.


An update in PostgreSQL is essentially an INSERT followed a later DELETE 
when VACUUM gets to the dead row no longer visible.  The problem with 
this approach is that it will leave behind so many dead rows in pg_class 
due to the heavy updates that the whole database could grind to a halt, 
as so many operations will have to sort through all that garbage.  It 
could potentially double the total write volume on the system, and 
you'll completely kill people who don't have autovacuum running during 
some periods of the day.


The basic idea of saving the last update time for each relation is not 
unreasonable, but you can't store the results by updating pg_class.  My 
first thought would be to send this information as a message to the 
statistics collector.  It's already being sent updates at the point 
you're interested in for the counters of how many INSERT/UPDATE/DELETE 
statements are executing against the table.  You might bundle your last 
update information into that existing message with minimal overhead.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers