date:20101230

[HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Mark Kirkwood


Well, it is none of the things I considered.

The problem seems to be due to use of "--delete" in the base backup 
rsync (see diff attached).  In fact I can now reproduce the 
uninitialized pages using the "bare bones" method:


primary:
$ grep archive_command postgresql.conf
 archive_command = 'rsync %p standby:/var/lib/postgresql/archive'
$ pgbench -c 4 -t 20 bench
(wait for approx 1 transactions)

standby:
$ psql -h primary -c "SELECT pg_start_backup('backup');"
$ rsync --exclude pg_xlog/\* --exclude postmaster.pid --delete 
--exclude=backup_label \

primary:/var/lib/postgresql/8.3/main/* \
/var/lib/postgresql/8.3/main
$ psql -h primary -c "SELECT pg_stop_backup();

$ grep restore_command recovery.conf
restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -t 
/tmp/trigger.5432 /var/lib/postgresql/archive %f %p %r'

$ /etc/init.d/postgresql-8.3 start
(wait for approx 14 transactions)
$ touch /tmp/trigger.5432

Removing the offending

--delete --exclude=backup_label

options from the base backup step makes everything work properly again.

I'd be interested to know if the other folks getting these warnings were 
using unusual rsync options either during backup or for archiving.


regards

Mark

On 30/12/10 13:32, Mark Kirkwood wrote:


I'm frankly puzzled about what Pitrtools is doing that is different - 
I only noticed it using rsync compression (-z) and doing rsync backups 
via pulling from the standby rather than pushing from the primary (I'm 
in the process of trying these variations out in the bare bones case). 
Just as I'm writing this I see Pitrtools rsync's pg_xlog - I wonder if 
there is/are timing issues which mean that recovery might use some 
(corrupted) logs from there before the (clean) archived ones arrive 
(will check).




*** cmd_standby.orig	Tue Dec 28 21:10:31 2010
--- cmd_standby	Thu Dec 30 05:20:04 2010
***
*** 175,181 
  
  if debug == "on":
 ssh_flags = "-vvv -o ConnectTimeout=%s -o StrictHostKeyChecking=no" % (str(ssh_timeout))
!rsync_flags = "-avzl --delete --stats --exclude=backup_label"
 pg_standby_flags = "-s5 -w0 -d -c"
 if pgversion == '8.2':
   pg_standby_args = "%%f %%p -k%s" % (float(numarchives))
--- 175,181 
  
  if debug == "on":
 ssh_flags = "-vvv -o ConnectTimeout=%s -o StrictHostKeyChecking=no" % (str(ssh_timeout))
!rsync_flags = "-a "
 pg_standby_flags = "-s5 -w0 -d -c"
 if pgversion == '8.2':
   pg_standby_args = "%%f %%p -k%s" % (float(numarchives))
***
*** 184,190 

  else:
 ssh_flags = "-o ConnectTimeout=%s -o StrictHostKeyChecking=no" % (str(ssh_timeout))
!rsync_flags = "-azl --delete --exclude=backup_label"
 pg_standby_flags = "-s5 -w0 -c"
 if pgversion == '8.2':
   pg_standby_args = "%%f %%p -k%s" % (float(numarchives))
--- 184,190 

  else:
 ssh_flags = "-o ConnectTimeout=%s -o StrictHostKeyChecking=no" % (str(ssh_timeout))
!rsync_flags = "-azl "
 pg_standby_flags = "-s5 -w0 -c"
 if pgversion == '8.2':
   pg_standby_args = "%%f %%p -k%s" % (float(numarchives))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Magnus Hagander

On Wed, Dec 29, 2010 at 22:30, Dimitri Fontaine  wrote:
> Magnus Hagander  writes:
 Would people be interested in putting pg_streamrecv
 (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
 9.1? I think it would make sense to do so.
>
> +1 for having that in core, only available for the roles WITH
> REPLICATION I suppose?

Yes.

Well, anybody who wants can run it, but they need those permissions on
the server to make it work. pg_streamrecv is entirely a client app.


>>> I think that the base backup feature is more important than simple streaming
>>> chunks of the WAL (SR already does this). Talking about the base backup over
>>> libpq, it is something we should implement to fulfill people's desire that
>>> claim an easy replication setup.
>>
>> Yes, definitely. But that also needs server side support.
>
> Yeah, but it's already in core for 9.1, we have pg_read_binary_file()
> there. We could propose a contrib module for previous version
> implementing the function in C, that should be pretty easy to code.

Oh. I didn't actually think about that one. So yeah, we could use that
- making it easy to code. However, I wonder how much less efficient it
would be than being able to stream the base backup. It's going to be a
*lot* more roundtrips across the network, and we're also going to
open/close the files a lot more.

Also, I haven't tested it, but a quick look at the code makes me
wonder how it will actually work with tablespaces - it seems to only
allow files under PGDATA? That could of course be changed..


>  The only reason I didn't do that for pg_basebackup is that I wanted a
>  self-contained python script, so that offering a public git repo is
>  all I needed as far as distributing the tool goes.

Right, there's an advantage with that when it comes to being able to
work on old versions.


>> Yeah, the WIP patch heikki posted is simliar, except it uses tar
>> format and is implemented natively in the backend with no need for
>> pl/pythonu to be installed.
>
> As of HEAD the dependency on pl/whatever is easily removed.
>
> The included C tool would need to have a parallel option from the get-go
> if at all possible, but if you look at the pg_basebackup prototype, it
> would be good to drop the wrong pg_xlog support in there and rely on a
> proper archiving setup on the master.
>
> Do you want to work on internal archive and restore commands over libpq
> in the same effort too?  I think this tool should be either a one time
> client or a daemon with support for:

Definitely a one-time client. If you want it to be a deamon, you write
a small wrapper that makes it one :)


>  - running a base backup when receiving a signal
>  - continuous WAL streaming from a master

Yes.

>  - accepting standby connections and streaming to them

I see that as a separate tool, I think. But still a useful one, sure.

>  - one-time libpq "streaming" of a WAL file, either way

Hmm. That might be interesting, yes.


> Maybe we don't need to daemonize the tool from the get-go, but if you're
> going parallel for the base-backup case you're almost there, aren't you?
> Also having internal commands for archive and restore commands that rely
> on this daemon running would be great too.

I don't want anything *relying* on this tool. I want to keep the
current way where you can choose whatever you prefer - I just want us
to ship a good default tool.


> I'd offer more help if it wasn't for finishing the extension patches,

:-) Yeah, focus on that, please - don't want to get it stalled.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Magnus Hagander

On Wed, Dec 29, 2010 at 20:19, Gurjeet Singh  wrote:
> On Wed, Dec 29, 2010 at 1:42 PM, Robert Haas  wrote:
>>
>> On Dec 29, 2010, at 1:01 PM, Tom Lane  wrote:
>> > Is it really stable enough for bin/?  My impression of the state of
>> > affairs is that there is nothing whatsoever about replication that
>> > is really stable yet.
>>
>> Well, that's not stopping us from shipping a core feature called
>> "replication".  I'll defer to others on how mature pg_streamrecv is, but if
>> it's no worse than replication in general I think putting it in bin/ is the
>> right thing to do.
>
> As the README says that is not self-contained (for no fault of its own) and
> one should typically set archive_command to guarantee zero WAL loss.

Yes. Though you can combine it fine with wal_keep_segments if you
think that's safe - but archive_command is push and this tool is pull,
so if your backup server goes down for a while, pg_streamrecv will get
a gap and fail. Whereas if you configure an archive_command, it will
queue up the log on the master if it stops working, up to the point of
shutting it down because of out-of-disk. Which you *want*, if you want
to be really sure about the backups.


> 
> TODO: Document some ways of setting up an archive_command that works well
> together with pg_streamrecv.
> 
>
>     I think implementing just that TODO might make it a candidate.

Well, yes, that's obviously a requirement.

>     I have neither used it nor read the code, but if it works as advertised
> then it is definitely a +1 from me; no preference of bin/ or contrib/, since
> the community will have to maintain it anyway.

It's not that much code, but some more eyes on it would always be good!


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Magnus Hagander

On Wed, Dec 29, 2010 at 19:42, Robert Haas  wrote:
> On Dec 29, 2010, at 1:01 PM, Tom Lane  wrote:
>> Is it really stable enough for bin/?  My impression of the state of
>> affairs is that there is nothing whatsoever about replication that
>> is really stable yet.
>
> Well, that's not stopping us from shipping a core feature called 
> "replication".  I'll defer to others on how mature pg_streamrecv is, but if 
> it's no worse than replication in general I think putting it in bin/ is the 
> right thing to do.

It has had less eyes on it, which puts it worse off than general
replication. OTOH, it's a lot simper code, which puts it better.

Either way, as long as it gets those eyes before release if we put it
in, it shouldn't be worse off than general replication.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) & a question about duplicated ctid

2010-12-30 Thread Marko Tiikkaja


On 2010-12-30 9:02 AM +0200, Greg Smith wrote:

Marko Tiikkaja wrote:

I have no idea why it worked in the past, but the patch was never
designed to work for UPSERT.  This has been discussed in the past and
some people thought that that's not a huge deal.


It takes an excessively large lock when doing UPSERT, which means its
performance under a heavy concurrent load can't be good.  The idea is
that if the syntax and general implementation issues can get sorted out,
fixing the locking can be a separate performance improvement to be
implemented later.  Using MERGE for UPSERT is the #1 use case for this
feature by a gigantic margin.  If that doesn't do what's expected, the
whole implementation doesn't provide the community anything really worth
talking about.  That's why I keep hammering on this particular area in
all my testing.


I'm confused.  Are you saying that the patch is supposed to lock the 
table against concurrent INSERT/UPDATE/DELETE/MERGE?  Because I don't 
see it in the patch, and the symptoms you're having are a clear 
indication of the fact that it's not happening.  I also seem to recall 
that people thought locking the table would be excessive.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Magnus Hagander

On Wed, Dec 29, 2010 at 20:12, Alvaro Herrera
 wrote:
> Excerpts from Magnus Hagander's message of mié dic 29 11:40:34 -0300 2010:
>> On Wed, Dec 29, 2010 at 15:05, Gurjeet Singh  wrote:
>
>> > Any specific reason NOREPLICATION_P and REPLICATION_P use the _P suffix?
>>
>> Um, I just copied it off a similar entry elsewhere. I saw no comment
>> about what _P actually means, and I can't say I know. I know very
>> little about the bison files :-)
>
> Some lexer keywords have a _P prefix because otherwise they'd collide
> with some symbol in Windows header files or something like that.  It's
> old stuff, but I think you, Magnus, were around at that time.

Heh. That doesn't mean I *remember* it :-)

But yes, I see in commit 12c942383296bd626131241c012c2ab81b081738 the
comment "convert some keywords.c symbols to KEYWORD_P to prevent
conflict".

Based on that, I should probably change it back, right? I just tried a
patch for it and it compiles and checks just fine with the _P parts
removed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Aidan Van Dyk

On Thu, Dec 30, 2010 at 6:41 AM, Magnus Hagander  wrote:

>> As the README says that is not self-contained (for no fault of its own) and
>> one should typically set archive_command to guarantee zero WAL loss.
>
> Yes. Though you can combine it fine with wal_keep_segments if you
> think that's safe - but archive_command is push and this tool is pull,
> so if your backup server goes down for a while, pg_streamrecv will get
> a gap and fail. Whereas if you configure an archive_command, it will
> queue up the log on the master if it stops working, up to the point of
> shutting it down because of out-of-disk. Which you *want*, if you want
> to be really sure about the backups.

I was thinking I'ld like use pg_streamrecv to "make" my archive, and
the archive script on the master would just "verify" the archive has
that complete segment.

This get's you an archive synced as it's made (as long as streamrecv
is running), and my "verify"archive command would make sure that if
for some reason, the backup archive went "down", the wal segments
would be blocked on the master until it's up again and current.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Joachim Wieland

The snapshot synchronization discussion from the parallel pg_dump
patch somehow ended without a clear way to go forward.

Let me sum up what has been brought up and propose a short- and
longterm solution.

Summary:

Passing snapshot sync information can be done either:

a) by returning complete snapshot information from the backend to the
client so that the client can pass it along to a different backend
b) or by returning only a unique identifier to the client and storing
the actual snapshot data somewhere on the server side

Advantage of a: no memory is used in the backend and no memory needs
to get cleaned up, it is also theoretically possible that we could
forward that data to a hot standby server and do e.g. a dump partially
on the master server and partially on the hot standby server or among
several hot standby servers.
Disadvantage of a: The snapshot must be validated to make sure that
its information is still current, it might be difficult to cover all
cases of this validation. A client can not only access exactly a
published snapshot, but just about any snapshot that fits and passes
the validation checks (this is more a disadvantage than an advantage
because it allows to see a database state that never existed in
reality).

Advantage of b: No validation necessary, as soon as the transaction
that publishes the snapshot loses that snapshot, it will also revoke
the snapshot information (either by removing a temp file or deleting
it from shared memory)
Disadvantage of b: It doesn't allow a snapshot to be installed on a
different server. It requires a serializable open transaction to hold
the snapshot.

What I am proposing now is the following:

We return snapshot information as a chunk of data to the client. At
the same time however, we set a checksum in shared memory to protect
against modification of the snapshot. A publishing backend can revoke
its snapshot by deleting the checksum and a backend that is asked to
install a snapshot can verify that the snapshot is correct and current
by calculating the checksum and comparing it with the one in shared
memory.

This only costs us a few bytes for the checksum * max_connection in
shared memory and apart from resetting the checksum it does not have
cleanup and verification issues. Note that we are also free to change
the internal format of the chunk of data we return whenever we like,
so we are free to enhance this feature in the future, transparently to
the client.


Thoughts?


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Magnus Hagander

On Thu, Dec 30, 2010 at 13:30, Aidan Van Dyk  wrote:
> On Thu, Dec 30, 2010 at 6:41 AM, Magnus Hagander  wrote:
>
>>> As the README says that is not self-contained (for no fault of its own) and
>>> one should typically set archive_command to guarantee zero WAL loss.
>>
>> Yes. Though you can combine it fine with wal_keep_segments if you
>> think that's safe - but archive_command is push and this tool is pull,
>> so if your backup server goes down for a while, pg_streamrecv will get
>> a gap and fail. Whereas if you configure an archive_command, it will
>> queue up the log on the master if it stops working, up to the point of
>> shutting it down because of out-of-disk. Which you *want*, if you want
>> to be really sure about the backups.
>
> I was thinking I'ld like use pg_streamrecv to "make" my archive, and
> the archive script on the master would just "verify" the archive has
> that complete segment.
>
> This get's you an archive synced as it's made (as long as streamrecv
> is running), and my "verify"archive command would make sure that if
> for some reason, the backup archive went "down", the wal segments
> would be blocked on the master until it's up again and current.

That's exactly the method I was envisionning, and in fact that I am
using in a couple of cases - jus thaven't documented it properly :)

Since pg_streamrecv only moves a segment into the correct archive
location when it's completed, the archive_command only needs to check
if the file *exists* - if it does, it's transferred, if not, it
returns an error to make sure the wal segments don't get cleaned out.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 3:55 AM, Mark Kirkwood
 wrote:
> Well, it is none of the things I considered.
>
> The problem seems to be due to use of "--delete" in the base backup rsync
> (see diff attached).  In fact I can now reproduce the uninitialized pages
> using the "bare bones" method:

Any time a relation is extended, we end up with a page of all zeros at
the end until the updated page is written out, which often doesn't
happen until the next checkpoint.  So it doesn't seem too mysterious
that you could end up with all zeroes pages on the standby initially,
but WAL replay ought to fix that.  I suppose the reason it isn't is
because you've excluded the backup label, so recovery will begin from
the wrong place.  Unless I'm missing something, that seems like a
really bad idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Old git repo

2010-12-30 Thread Magnus Hagander

Hi!

Are we ready to drop the old git mirror? The one that's still around
(as postgresql-old.git) from before we migrated the main repository to
git, and thus has the old hashes around.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Alvaro Herrera

Excerpts from Magnus Hagander's message of jue dic 30 08:57:09 -0300 2010:
> On Wed, Dec 29, 2010 at 20:12, Alvaro Herrera
>  wrote:

> > Some lexer keywords have a _P prefix because otherwise they'd collide
> > with some symbol in Windows header files or something like that.  It's
> > old stuff, but I think you, Magnus, were around at that time.
> 
> Heh. That doesn't mean I *remember* it :-)

:-)

> But yes, I see in commit 12c942383296bd626131241c012c2ab81b081738 the
> comment "convert some keywords.c symbols to KEYWORD_P to prevent
> conflict".

Wow, what a mess of a patch ... nowadays this would be like 10 commits
(or so I hope) ... hey, did Bruce sabotage the qnx4 port surreptitiously?

> Based on that, I should probably change it back, right? I just tried a
> patch for it and it compiles and checks just fine with the _P parts
> removed.

Hmm, I wouldn't bother really.  It's not that important anyway, IMHO.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Function for dealing with xlog data

2010-12-30 Thread Magnus Hagander

On Tue, Dec 28, 2010 at 16:30, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Excerpts from Magnus Hagander's message of mar dic 28 10:46:31 -0300 2010:
>>> Well, yeah, that was obvious ;) The question is, how much do we prefer
>>> the more elegant method? ;)
>
>> If we go the new type route, do we need it to have an implicit cast to
>> text, for backwards compatibility?
>
> I'd argue not.  Probably all existing uses are just selecting the
> function value.  What comes back to the client will just be the text
> form anyway.

That's certainly the only thing I've seen.


> I'm of the opinion that a new type isn't worth the work, myself,
> but it would mostly be up to whoever was doing the work.

Fair enough - at least enough people have said it won't be rejected
because it's done as a function rather than a datatype - so that seems
like the easiest way to proceed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Old git repo

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 8:31 AM, Magnus Hagander  wrote:
> Are we ready to drop the old git mirror? The one that's still around
> (as postgresql-old.git) from before we migrated the main repository to
> git, and thus has the old hashes around.

I see no reason to drop that ever, or at least not any time soon.
What is it costing us?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Old git repo

2010-12-30 Thread Magnus Hagander

On Thu, Dec 30, 2010 at 15:28, Robert Haas  wrote:
> On Thu, Dec 30, 2010 at 8:31 AM, Magnus Hagander  wrote:
>> Are we ready to drop the old git mirror? The one that's still around
>> (as postgresql-old.git) from before we migrated the main repository to
>> git, and thus has the old hashes around.
>
> I see no reason to drop that ever, or at least not any time soon.
> What is it costing us?

Some disk space, so almost nothing. And the potential that people grab
it by mistake - it adds a bit to confusion.

Looking at it from the other side, what's the use-case for keeping it?
If you want to "diff" against it or something like that, you can just
do that against your local clone (that you already had - if you
didn't, you shouldn't be using it at all)...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new patch of MERGE (merge_204) & a question about duplicated ctid

2010-12-30 Thread Andrew Dunstan




On 12/30/2010 02:02 AM, Greg Smith wrote:

Marko Tiikkaja wrote:
I have no idea why it worked in the past, but the patch was never 
designed to work for UPSERT.  This has been discussed in the past and 
some people thought that that's not a huge deal.


It takes an excessively large lock when doing UPSERT, which means its 
performance under a heavy concurrent load can't be good.  The idea is 
that if the syntax and general implementation issues can get sorted 
out, fixing the locking can be a separate performance improvement to 
be implemented later.  Using MERGE for UPSERT is the #1 use case for 
this feature by a gigantic margin.  If that doesn't do what's 
expected, the whole implementation doesn't provide the community 
anything really worth talking about.  That's why I keep hammering on 
this particular area in all my testing.


One of the reflexive "I can't switch to PostgreSQL easily" stopping 
points for MySQL users is "I can't convert my ON DUPLICATE KEY UPDATE 
code".  Every other use for MERGE is a helpful side-effect of adding 
the implementation in my mind, but not the primary driver of why this 
is important.  My hints in this direction before didn't get adopted, 
so I'm saying it outright now:  this patch must have an UPSERT 
implementation in its regression tests.  And the first thing I'm going 
to do every time a new rev comes in is try and break it with the 
pgbench test I attached.  If Boxuan can start doing that as part of 
his own testing, I think development here might start moving forward 
faster.  I don't care so much about the rate at which concurrent 
UPSERT-style MERGE happens, so long as it doesn't crash.  But that's 
where this has been stuck at for a while now.


I strongly agree. It *is* a huge deal.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Alvaro Herrera

Excerpts from Joachim Wieland's message of jue dic 30 09:31:47 -0300 2010:

> Advantage of b: No validation necessary, as soon as the transaction
> that publishes the snapshot loses that snapshot, it will also revoke
> the snapshot information (either by removing a temp file or deleting
> it from shared memory)
> Disadvantage of b: It doesn't allow a snapshot to be installed on a
> different server. It requires a serializable open transaction to hold
> the snapshot.

Why does it require a serializable transaction?  You could simply
register the snapshot in any transaction.  (Of course, the net effect
would be pretty similar to a serializable transaction).

> We return snapshot information as a chunk of data to the client. At
> the same time however, we set a checksum in shared memory to protect
> against modification of the snapshot. A publishing backend can revoke
> its snapshot by deleting the checksum and a backend that is asked to
> install a snapshot can verify that the snapshot is correct and current
> by calculating the checksum and comparing it with the one in shared
> memory.
> 
> This only costs us a few bytes for the checksum * max_connection in
> shared memory and apart from resetting the checksum it does not have
> cleanup and verification issues.

So one registered snapshot per transaction?  Sounds a reasonable
limitation (I doubt there's a use case for more than that, anyway).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

2010-12-30 Thread Florian Pflug

On Dec27, 2010, at 23:49 , Kevin Grittner wrote:
> Robert Haas  wrote:
> 
>> With respect to (b), I think I'd need to see a much more detailed
>> design for how you intend to make this work.  Off the top of my
>> head there seems to be some pretty serious feasibility problems.
> 
> I had one random thought on that -- it seemed like a large concern
> was that there would need to be at least an occasional scan of the
> entire table to rebuild the distinct value information.

I believe we could actually avoid that.

First, the paper "An Optimal Algorithm for the Distinct Elements Problem"
actually contains an algorithm with *does* handle deletes - it's called
"L_0" estimate there.

Second, as Tomas pointed out, the stream-based estimator is essentially a
simplified version of a bloom filter. It starts out with a field of
N zero bits, and sets K of them to 1 for each value v in the stream.
Which bits are set to 1 depends on some hash function(s) H_i(v). It's
then easy to compute how many 1-bits you'd expect to find in the bit
field after seeing D distinct values, and by reversing that you can
estimate D from the number of 1-bits in the bit field.

To avoid having to rescan large tables, instead of storing one such
bit field, we'd store one per B pages of data. We'd then only need
to scan a range of B pages around every updated or deleted tuple,
and could afterwards compute a new global estimate of D by combining
the individual bit fields with bitwise and.

Since the need to regularly VACUUM tables hit by updated or deleted
won't go away any time soon, we could piggy-back the bit field
rebuilding onto VACUUM to avoid a second scan.

A good value for B would probably be around
1000*/. If the bitfield needs ~100k, that'd
make B ~= 12000 pages ~= 100MB.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Florian Pflug

On Dec30, 2010, at 13:31 , Joachim Wieland wrote:
> We return snapshot information as a chunk of data to the client. At
> the same time however, we set a checksum in shared memory to protect
> against modification of the snapshot. A publishing backend can revoke
> its snapshot by deleting the checksum and a backend that is asked to
> install a snapshot can verify that the snapshot is correct and current
> by calculating the checksum and comparing it with the one in shared
> memory.

We'd still have to stream these checksums to the standbys though,
or would they be exempt from the checksum checks?

I still wonder whether these checks are worth the complexity. I
believe we'd only allow snapshot modifications for read-only queries
anyway, so what point is there in preventing clients from setting
broken snapshots?

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Old git repo

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 9:30 AM, Magnus Hagander  wrote:
> On Thu, Dec 30, 2010 at 15:28, Robert Haas  wrote:
>> On Thu, Dec 30, 2010 at 8:31 AM, Magnus Hagander  wrote:
>>> Are we ready to drop the old git mirror? The one that's still around
>>> (as postgresql-old.git) from before we migrated the main repository to
>>> git, and thus has the old hashes around.
>>
>> I see no reason to drop that ever, or at least not any time soon.
>> What is it costing us?
>
> Some disk space, so almost nothing. And the potential that people grab
> it by mistake - it adds a bit to confusion.

Well if it's clearly labeled "old" I don't think it should confuse
anyone much.  You could even tack one more commit on there adding a
README file with a big ol' warning.

> Looking at it from the other side, what's the use-case for keeping it?
> If you want to "diff" against it or something like that, you can just
> do that against your local clone (that you already had - if you
> didn't, you shouldn't be using it at all)...

I realize it's not as "official" as the CVS repository was, but I
still think we ought to hold onto it for a year or two.  Maybe no one
will ever look at it again, but I'm not prepared to bet on that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Peter Eisentraut

On tor, 2010-12-23 at 17:29 -0500, Tom Lane wrote:
> Josh Berkus  writes:
> > On 12/23/10 2:21 PM, Tom Lane wrote:
> >> Well, that's one laudable goal here, but "secure by default" is another
> >> one that ought to be taken into consideration.
> 
> > I don't see how *not* granting the superuser replication permissions
> > makes things more secure.  The superuser can grant replication
> > permissions to itself, so why is suspending them by default beneficial?
> >  I'm not following your logic here.
> 
> Well, the reverse of that is just as true: if we ship it without
> replication permissions on the postgres user, people can change that if
> they'd rather not create a separate role for replication.  But I think
> we should encourage people to NOT do it that way.  Setting it up that
> way by default hardly encourages use of a more secure arrangement.

I think this argument is a bit inconsistent in the extreme.  You might
as well argue that a superuser shouldn't have any permissions by
default, to discourage users from using it.  They can always grant
permissions back to it.  I don't see why this particular one is so
different.

If we go down this road, we'll end up with a mess of permissions that a
superuser has and doesn't have.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Peter Eisentraut

On ons, 2010-12-29 at 11:09 +0100, Magnus Hagander wrote:
> I've applied this version (with some minor typo-fixes).

This page is now somewhat invalidated:

http://developer.postgresql.org/pgdocs/postgres/role-attributes.html

First, it doesn't mention the replication privilege, and second it
continues to claim that superuser status bypasses all permission checks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 9:54 AM, Peter Eisentraut  wrote:
> On tor, 2010-12-23 at 17:29 -0500, Tom Lane wrote:
>> Josh Berkus  writes:
>> > On 12/23/10 2:21 PM, Tom Lane wrote:
>> >> Well, that's one laudable goal here, but "secure by default" is another
>> >> one that ought to be taken into consideration.
>>
>> > I don't see how *not* granting the superuser replication permissions
>> > makes things more secure.  The superuser can grant replication
>> > permissions to itself, so why is suspending them by default beneficial?
>> >  I'm not following your logic here.
>>
>> Well, the reverse of that is just as true: if we ship it without
>> replication permissions on the postgres user, people can change that if
>> they'd rather not create a separate role for replication.  But I think
>> we should encourage people to NOT do it that way.  Setting it up that
>> way by default hardly encourages use of a more secure arrangement.
>
> I think this argument is a bit inconsistent in the extreme.  You might
> as well argue that a superuser shouldn't have any permissions by
> default, to discourage users from using it.  They can always grant
> permissions back to it.  I don't see why this particular one is so
> different.
>
> If we go down this road, we'll end up with a mess of permissions that a
> superuser has and doesn't have.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SLRU API tweak

2010-12-30 Thread Alvaro Herrera

Excerpts from Kevin Grittner's message of mié dic 29 20:46:55 -0300 2010:
> Attached is a small patch to avoid putting an opaque structure into
> the slru.h file and using it in an external function call where
> external callers must always specify NULL.

Thanks, committed.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Tom Lane

I had an epiphany about this topic, or actually two of them.

1. Whether or not you think there's a significant performance reason
to support hash right joins, there's a functionality reason.  The
infrastructure for right join could just as easily do full joins.
And AFAICS, a hash full join would only require one hashable join
clause --- the other FULL JOIN ON conditions could be anything at all.
This is unlike the situation for merge join, where all the JOIN ON
conditions have to be mergeable or it doesn't work right.  So we could
greatly reduce the scope of the dreaded "FULL JOIN is only supported
with merge-joinable join conditions" error.  (Well, okay, it's not
*that* dreaded, but people complain about it occasionally.)

2. The obvious way to implement this would involve adding an extra bool
field to struct HashJoinTupleData.  The difficulty with that, and the
reason I'd been resistant to the whole idea, is that it'd eat up a full
word per hashtable entry because of alignment considerations.  (On
64-bit machines it'd be free because of alignment considerations, but
that's cold comfort when 32-bit machines are the ones pressed for
address space.)  But we only need one bit, so what about commandeering
an infomask bit in the tuple itself?  For the initial implementation
I'd be inclined to take one of the free bits in t_infomask2.  We could
actually get away with overlaying the flag bit with one of the tuple
visibility bits, since it will only be used in tuples that are in the
in-memory hash table, which don't need visibility info anymore.  But
that seems like a kluge that could wait until we really need the flag
space.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas

On Wed, Dec 29, 2010 at 5:14 PM, David Fetter  wrote:
> On Wed, Dec 29, 2010 at 04:53:47PM -0500, Robert Haas wrote:
>> On Wed, Dec 29, 2010 at 4:09 AM, Heikki Linnakangas
>>  wrote:
>> > On 29.12.2010 06:54, Robert Haas wrote:
>> >>
>> >>  With the patch:
>> >>
>> >> rhaas=# cluster v;
>> >> ERROR:  views do not support CLUSTER
>> >
>> > "do not support" sounds like a missing feature, rather than a nonsensical
>> > command. How about something like "CLUSTER cannot be used on views"
>>
>> In the latest version of this patch, I created four translatable
>> strings per object type:
>>
>>  do not support %s (where %s is an SQL command)
>>  do not support constraints
>>  do not support rules
>>  do not support triggers
>>
>> It's reasonable enough to write "CLUSTER cannot be used on views", but
>> does "constraints cannot be used on views" seems more awkward to me.
>> Or do we think that's OK?
>
> That particular one looks good insofar as it describes reality.  With
> predicate locks, etc., it may become untrue later, though :)

After further thought, I think it makes sense to change this around a
bit and create a family of functions that can be invoked like this:

void check_relation_for_FEATURE_support(Relation rel);

...where FEATURE is constraint, trigger, rule, index, etc.  The
function will be defined to throw an error if the relation isn't of a
type that can support the named feature.  The error message will be of
the form:

constraints can only be used on tables
triggers can be used only on tables and views
etc.

This avoids the need to define a separate error message for each
unsupported relkind, and I think it's just as informative as, e.g.,
"constraints cannot be used on ".  We can adopt the same language for commands, e.g.:
"CLUSTER can only be used on tables".

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane  wrote:
> I had an epiphany about this topic, or actually two of them.
>
> 1. Whether or not you think there's a significant performance reason
> to support hash right joins, there's a functionality reason.  The
> infrastructure for right join could just as easily do full joins.
> And AFAICS, a hash full join would only require one hashable join
> clause --- the other FULL JOIN ON conditions could be anything at all.
> This is unlike the situation for merge join, where all the JOIN ON
> conditions have to be mergeable or it doesn't work right.  So we could
> greatly reduce the scope of the dreaded "FULL JOIN is only supported
> with merge-joinable join conditions" error.  (Well, okay, it's not
> *that* dreaded, but people complain about it occasionally.)

Yeah, that would be neat.  It might be a lot faster in some cases, too.

> 2. The obvious way to implement this would involve adding an extra bool
> field to struct HashJoinTupleData.  The difficulty with that, and the
> reason I'd been resistant to the whole idea, is that it'd eat up a full
> word per hashtable entry because of alignment considerations.  (On
> 64-bit machines it'd be free because of alignment considerations, but
> that's cold comfort when 32-bit machines are the ones pressed for
> address space.)  But we only need one bit, so what about commandeering
> an infomask bit in the tuple itself?  For the initial implementation
> I'd be inclined to take one of the free bits in t_infomask2.  We could
> actually get away with overlaying the flag bit with one of the tuple
> visibility bits, since it will only be used in tuples that are in the
> in-memory hash table, which don't need visibility info anymore.  But
> that seems like a kluge that could wait until we really need the flag
> space.

I think that's a reasonable approach, although I might be inclined to
do the overlay sooner rather than later if it doesn't add too much
complexity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Tom Lane

Magnus Hagander  writes:
> But yes, I see in commit 12c942383296bd626131241c012c2ab81b081738 the
> comment "convert some keywords.c symbols to KEYWORD_P to prevent
> conflict".

> Based on that, I should probably change it back, right? I just tried a
> patch for it and it compiles and checks just fine with the _P parts
> removed.

I'd leave it be, it's fine as-is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Alvaro Herrera

Excerpts from Robert Haas's message of jue dic 30 12:47:42 -0300 2010:

> After further thought, I think it makes sense to change this around a
> bit and create a family of functions that can be invoked like this:
> 
> void check_relation_for_FEATURE_support(Relation rel);
> 
> ...where FEATURE is constraint, trigger, rule, index, etc.  The
> function will be defined to throw an error if the relation isn't of a
> type that can support the named feature.  The error message will be of
> the form:
> 
> constraints can only be used on tables
> triggers can be used only on tables and views
> etc.

So this will create a combinatorial explosion of strings to translate?
I liked the other idea because the number of translatable strings was
kept within reasonable bounds.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Old git repo

2010-12-30 Thread Tom Lane

Robert Haas  writes:
> On Thu, Dec 30, 2010 at 9:30 AM, Magnus Hagander  wrote:
>> On Thu, Dec 30, 2010 at 15:28, Robert Haas  wrote:
>>> I see no reason to drop that ever, or at least not any time soon.
>>> What is it costing us?

>> Some disk space, so almost nothing. And the potential that people grab
>> it by mistake - it adds a bit to confusion.

> I realize it's not as "official" as the CVS repository was, but I
> still think we ought to hold onto it for a year or two.  Maybe no one
> will ever look at it again, but I'm not prepared to bet on that.

I'm with Magnus on this: the risk of confusion seems to greatly
outweigh any possible benefit from keeping it.  There is no reason for
anyone to use that old repo unless they are still working with a local
clone of it, and even if they do have a local clone, such a clone is
self-sufficient.  And more to the point, it seems quite unlikely that
anyone is still working with such a clone rather than having rebased
by now.

We should wait a week or so to see if anyone does pipe up and say they
still use that repo; but in the absence of such feedback, it should go.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 11:00 AM, Alvaro Herrera
 wrote:
> Excerpts from Robert Haas's message of jue dic 30 12:47:42 -0300 2010:
>
>> After further thought, I think it makes sense to change this around a
>> bit and create a family of functions that can be invoked like this:
>>
>> void check_relation_for_FEATURE_support(Relation rel);
>>
>> ...where FEATURE is constraint, trigger, rule, index, etc.  The
>> function will be defined to throw an error if the relation isn't of a
>> type that can support the named feature.  The error message will be of
>> the form:
>>
>> constraints can only be used on tables
>> triggers can be used only on tables and views
>> etc.
>
> So this will create a combinatorial explosion of strings to translate?
> I liked the other idea because the number of translatable strings was
> kept within reasonable bounds.

No, quite the opposite.  With the other approach, you needed:

constraints cannot be used on views
constraints cannot be used on composite types
constraints cannot be used on TOAST tables
constraints cannot be used on indexes
constraints cannot be used on foreign tables

With this, you just need:

constraints can only be used on tables

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Old git repo

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 11:02 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Dec 30, 2010 at 9:30 AM, Magnus Hagander  wrote:
>>> On Thu, Dec 30, 2010 at 15:28, Robert Haas  wrote:
 I see no reason to drop that ever, or at least not any time soon.
 What is it costing us?
>
>>> Some disk space, so almost nothing. And the potential that people grab
>>> it by mistake - it adds a bit to confusion.
>
>> I realize it's not as "official" as the CVS repository was, but I
>> still think we ought to hold onto it for a year or two.  Maybe no one
>> will ever look at it again, but I'm not prepared to bet on that.
>
> I'm with Magnus on this: the risk of confusion seems to greatly
> outweigh any possible benefit from keeping it.  There is no reason for
> anyone to use that old repo unless they are still working with a local
> clone of it, and even if they do have a local clone, such a clone is
> self-sufficient.  And more to the point, it seems quite unlikely that
> anyone is still working with such a clone rather than having rebased
> by now.
>
> We should wait a week or so to see if anyone does pipe up and say they
> still use that repo; but in the absence of such feedback, it should go.

Well, I still have at least on repo against the old respository, which
is why I mentioned it.  Maybe there's nothing valuable in there and
maybe I don't need the origin anyway, but I haven't bothered to check
it over carefully yet because, well, there's no rush to clean up my
old repositories, and there is a rush to finish 9.1 development real
soon now.  I can, of course, carve out time to deal with it, but I
think that it's a poor use of time and that the risk of confusion that
you and Magnus are postulating is mostly hypothetical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Tom Lane

Robert Haas  writes:
> On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane  wrote:
>> ... But we only need one bit, so what about commandeering
>> an infomask bit in the tuple itself?  For the initial implementation
>> I'd be inclined to take one of the free bits in t_infomask2.  We could
>> actually get away with overlaying the flag bit with one of the tuple
>> visibility bits, since it will only be used in tuples that are in the
>> in-memory hash table, which don't need visibility info anymore.  But
>> that seems like a kluge that could wait until we really need the flag
>> space.

> I think that's a reasonable approach, although I might be inclined to
> do the overlay sooner rather than later if it doesn't add too much
> complexity.

Well, there's no "complexity" involved, it's just which bit do we define
the macro as.  Any complexity is conceptual.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Tom Lane

Robert Haas  writes:
> After further thought, I think it makes sense to change this around a
> bit and create a family of functions that can be invoked like this:
> void check_relation_for_FEATURE_support(Relation rel);

That seems like a reasonable idea, but ...

> ... The error message will be of the form:

> constraints can only be used on tables
> triggers can be used only on tables and views
> etc.

> This avoids the need to define a separate error message for each
> unsupported relkind, and I think it's just as informative as, e.g.,
> "constraints cannot be used on  invoke it on>".  We can adopt the same language for commands, e.g.:
> "CLUSTER can only be used on tables".

ISTM there are four things we might potentially want to state in the
error message: the feature/operation you tried to apply, the name of the
object you tried to apply it to, the type of that object, and the set of
object types that the feature/operation will actually work for.  Our
current wording ("foo is not a table or view") covers the second and
fourth of these, though the fourth is stated rather awkwardly.  Your
proposal above covers the first and fourth.  I'm not happy about leaving
out the object name, because there are going to be cases where people
get this type of error out of a long sequence or nest of operations and
it's not clear what it's talking about.  It'd probably be okay to leave
out the actual object type as long as you include its name, though.

One possibility is to break it down like this:

ERROR: foo is a sequence
DETAIL: Triggers can only be used on tables and views.

This could still be emitted by a function such as you suggest, and
indeed that would be the most practical way from both a consistency
and code-size standpoint.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Jie Li

On Thu, Dec 30, 2010 at 11:50 PM, Robert Haas  wrote:

> On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane  wrote:
> > I had an epiphany about this topic, or actually two of them.
> >
> > 1. Whether or not you think there's a significant performance reason
> > to support hash right joins, there's a functionality reason.  The
> > infrastructure for right join could just as easily do full joins.
> > And AFAICS, a hash full join would only require one hashable join
> > clause --- the other FULL JOIN ON conditions could be anything at all.
> > This is unlike the situation for merge join, where all the JOIN ON
> > conditions have to be mergeable or it doesn't work right.  So we could
> > greatly reduce the scope of the dreaded "FULL JOIN is only supported
> > with merge-joinable join conditions" error.  (Well, okay, it's not
> > *that* dreaded, but people complain about it occasionally.)
>
> Yeah, that would be neat.  It might be a lot faster in some cases, too.
>

Yeah, PostgreSQL should have this great feature.

Actually Oracle 10g already has the right hash join,
http://dbcrusade.blogspot.com/2008/01/oracle-hash-join-right-outer.html

 And Oracle 11g has the full hash join.
http://www.dba-oracle.com/oracle11g/oracle_11g_full_hash_join.htm

Haven't checked whether other DBMS have this feature.

Thanks,
Li Jie

[HACKERS] pl/python do not delete function arguments

2010-12-30 Thread Jan Urbański

(continuing the flurry of patches)

Here's a patch that stops PL/Python from removing the function's
arguments from its globals dict after calling it. It's
an incremental patch on top of the plpython-refactor patch sent in
http://archives.postgresql.org/message-id/4d135170.3080...@wulczer.org.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/dont-remove-arguments

Apart from being useless, as the whole dict is unreffed and thus freed
in PLy_procedure_delete, removing args actively breaks things for
recursive invocation of the same function. The recursive callee after
returning will remove the args from globals, and subsequent access to
the arguments in the caller will cause a NameError (see new regression
test in patch).

Cheers,
Jan
diff --git a/src/pl/plpython/expected/plpython_spi.out b/src/pl/plpython/expected/plpython_spi.out
index 7f4ae5c..cb11f60 100644
*** a/src/pl/plpython/expected/plpython_spi.out
--- b/src/pl/plpython/expected/plpython_spi.out
*** CONTEXT:  PL/Python function "result_nro
*** 133,135 
--- 133,163 
   2
  (1 row)
  
+ --
+ -- check recursion with same argument does not clobber globals
+ --
+ CREATE FUNCTION recursion_test(n integer) RETURNS integer
+ AS $$
+ if n in (0, 1):
+ return 1
+ 
+ return n * plpy.execute("select recursion_test(%d) as result" % (n - 1))[0]["result"]
+ $$ LANGUAGE plpythonu;
+ SELECT recursion_test(5);
+  recursion_test 
+ 
+ 120
+ (1 row)
+ 
+ SELECT recursion_test(4);
+  recursion_test 
+ 
+  24
+ (1 row)
+ 
+ SELECT recursion_test(1);
+  recursion_test 
+ 
+   1
+ (1 row)
+ 
diff --git a/src/pl/plpython/plpython.c b/src/pl/plpython/plpython.c
index 67eb0f3..1827fc9 100644
*** a/src/pl/plpython/plpython.c
--- b/src/pl/plpython/plpython.c
*** static Datum PLy_function_handler(Functi
*** 307,313 
  static HeapTuple PLy_trigger_handler(FunctionCallInfo fcinfo, PLyProcedure *);
  
  static PyObject *PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *);
- static void PLy_function_delete_args(PLyProcedure *);
  static PyObject *PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *,
  	   HeapTuple *);
  static HeapTuple PLy_modify_tuple(PLyProcedure *, PyObject *,
--- 307,312 
*** PLy_function_handler(FunctionCallInfo fc
*** 988,1001 
  			 */
  			plargs = PLy_function_build_args(fcinfo, proc);
  			plrv = PLy_procedure_call(proc, "args", plargs);
- 			if (!proc->is_setof)
- 			{
- /*
-  * SETOF function parameters will be deleted when last row is
-  * returned
-  */
- PLy_function_delete_args(proc);
- 			}
  			Assert(plrv != NULL);
  		}
  
--- 987,992 
*** PLy_function_handler(FunctionCallInfo fc
*** 1053,1060 
  Py_XDECREF(plargs);
  Py_XDECREF(plrv);
  
- PLy_function_delete_args(proc);
- 
  if (has_error)
  	ereport(ERROR,
  			(errcode(ERRCODE_DATA_EXCEPTION),
--- 1044,1049 
*** PLy_function_build_args(FunctionCallInfo
*** 1267,1287 
  	return args;
  }
  
- 
- static void
- PLy_function_delete_args(PLyProcedure *proc)
- {
- 	int			i;
- 
- 	if (!proc->argnames)
- 		return;
- 
- 	for (i = 0; i < proc->nargs; i++)
- 		if (proc->argnames[i])
- 			PyDict_DelItemString(proc->globals, proc->argnames[i]);
- }
- 
- 
  /* Decide if a cached PLyProcedure struct is still valid */
  static bool
  PLy_procedure_valid(PLyProcedure *proc, HeapTuple procTup)
--- 1256,1261 
diff --git a/src/pl/plpython/sql/plpython_spi.sql b/src/pl/plpython/sql/plpython_spi.sql
index 7f8f6a3..3b65f95 100644
*** a/src/pl/plpython/sql/plpython_spi.sql
--- b/src/pl/plpython/sql/plpython_spi.sql
*** else:
*** 105,107 
--- 105,123 
  $$ LANGUAGE plpythonu;
  
  SELECT result_nrows_test();
+ 
+ 
+ --
+ -- check recursion with same argument does not clobber globals
+ --
+ CREATE FUNCTION recursion_test(n integer) RETURNS integer
+ AS $$
+ if n in (0, 1):
+ return 1
+ 
+ return n * plpy.execute("select recursion_test(%d) as result" % (n - 1))[0]["result"]
+ $$ LANGUAGE plpythonu;
+ 
+ SELECT recursion_test(5);
+ SELECT recursion_test(4);
+ SELECT recursion_test(1);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue dic 30 13:49:20 -0300 2010:

> One possibility is to break it down like this:
> 
> ERROR: foo is a sequence
> DETAIL: Triggers can only be used on tables and views.
> 
> This could still be emitted by a function such as you suggest, and
> indeed that would be the most practical way from both a consistency
> and code-size standpoint.

This seems good to me.  There will only be as many messages as relkinds
we have, plus as many as "features" there are.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 11:49 AM, Tom Lane  wrote:
> One possibility is to break it down like this:
>
>        ERROR: foo is a sequence
>        DETAIL: Triggers can only be used on tables and views.
>
> This could still be emitted by a function such as you suggest, and
> indeed that would be the most practical way from both a consistency
> and code-size standpoint.

Great idea.  I should have thought of that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Stefan Kaltenbrunner


On 12/30/2010 06:26 PM, Simon Riggs wrote:


I've mulled over the design for sync rep for awhile now, and have come
up with a feature set that includes the final detailed feedback from
Fujii Masao, Aidan Van Dyk, Josh Berkus and others.

The design also draws from MySQL concepts to make the two interfaces as
similar and as simple as possible. It should be noted that the design
presented here has many features that the MySQL design does not.

I am currently finishing up my patch to offer these features, so its
time to begin final discussions.

As an interim step, I enclose a PDF version of relevant excerpts from
the doc patch. The patch will follow on a later post in the near future.

I would like to separate discussions on "user interface" from that of
internal design, to make it easier for more people to get involved.
Please read the following and post your comments. Thank you.


it would help if this would just be a simple text-only description of 
the design that people can actually comment on inline. I don't think 
sending technical design proposals as a pdf (which seems to be written 
in doc-style as well) is a good idea to encourage discussion on -hackers :(



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Problems with autovacuum and vacuum

2010-12-30 Thread JotaComm

Hello,

Last week I had a serious problem with my PostgreSQL database. My autovacuum
is OFF, but in September it started to prevent the transaction wraparoud;
however last week the following message appeared continuously in my log:

WARNING: database "production" must be vacuumed within 4827083 transactions
HINT: To avoid a database shutdown, execute a full-database VACUUM in
"production".

This message appeared for five to six hours; after that, the message
disappeared from log. Any idea about what could have happened?

Every day the vacuum is executed on some tables; and on Sundays it's
executed on all tables. But as the autovacuum is running since September,
and it runs for a long time, the vacuum was blocked because autovacuum had
been running on the same table. How should I procede in this case?

The table where the autovacuum is running and where the vacuum was blocked
has billion of rows.

I'm using the PostgreSQL 8.3.8

The configuration of the vacuum parameters are:

vacuum_cost_limit = 200
vacuum_cost_delay = 0
vacuum_freeze_min_age = 1
autovacuum = off
autovacuum_freeze_max_age = 2

Regards,

João Paulo

-- 
JotaComm
http://jotacomm.wordpress.com

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

2010-12-30 Thread Jim Nasby

On Dec 29, 2010, at 10:14 PM, Robert Haas wrote:
> +1 for trying to optimize these cases (but maybe after we optimize the
> varchar -> text and varchar(less) -> varchar(more) cases to skip the
> scan altogether).

+1 on getting the obvious cases of varchar and numeric done first; we run into 
those a lot at work and would be willing to sponsor work on a patch that makes 
those operations as fast as just adding a new column.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:

> it would help if this would just be a simple text-only description of 
> the design that people can actually comment on inline. I don't think 
> sending technical design proposals as a pdf (which seems to be written 
> in doc-style as well) is a good idea to encourage discussion on -hackers :(

25.2.6. Synchronous Replication
Streaming replication is by default asynchronous. Transactions on the
primary server write commit records to WAL, yet do not know whether or
when a standby has received and processed those changes. So with
asynchronous replication, if the primary crashes, transactions committed
on the primary might not have been received by any standby. As a result,
failover from primary to standby could cause data loss because
transaction completions are absent, relative to the primary. The amount
of data loss is proportional to the replication delay at the time of
failover. 

Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication. 

When synchronous replication is requested, the commit of a write
transaction will wait until confirmation that the commit record has been
transferred successfully to at least one standby server. Waiting for
confirmation increases the user's confidence that the changes will not
be lost in the event of server crashes but it also necessarily increases
the response time for the requesting transaction. The minimum wait time
is the roundtrip time from primary to standby. 

Read only transactions and transaction rollbacks need not wait for
replies from standby servers. Subtransaction commits do not wait for
responses from standby servers, only final top-level commits. Long
running actions such as data loading or index building do not wait until
the very final commit message. 

25.2.6.1. Basic Configuration
Synchronous replication must be enabled on both the primary and at least
one standby server. If synchronous replication is disabled on the
master, or enabled on the primary but not enabled on any slaves, the
primary will use asynchronous replication by default. 

We use a single parameter to enable synchronous replication, set in
postgresql.conf on both primary and standby servers: 

synchronous_replication = off (default) | on

On the primary, synchronous_replication can be set for particular users
or databases, or dynamically by applications programs. 

If more than one standby server specifies synchronous_replication, then
whichever standby replies first will release waiting commits. 

Turning this setting off for a standby allows the administrator to
exclude certain standby servers from releasing waiting transactions.
This is useful if not all standby servers are designated as potential
future primary servers. On the standby, this parameter only takes effect
at server start. 

25.2.6.2. Planning for Performance
Synchronous replication usually requires carefully planned and placed
standby servers to ensure applications perform acceptably. Waiting
doesn't utilise system resources, but transaction locks continue to be
held until the transfer is confirmed. As a result, incautious use of
synchronous replication will reduce performance for database
applications because of increased response times and higher contention. 

PostgreSQL allows the application developer to specify the durability
level required via replication. This can be specified for the system
overall, though it can also be specified for specific users or
connections, or even individual transactions. 

For example, an application workload might consist of: 10% of changes
are important customer details, while 90% of changes are less important
data that the business can more easily survive if it is lost, such as
chat messages between users. 

With synchronous replication options specified at the application level
(on the master) we can offer sync rep for the most important changes,
without slowing down the bulk of the total workload. Application level
options are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications. This feature
is unique to PostgreSQL. 

25.2.6.3. Planning for High Availability
The easiest and safest method of gaining High Availability using
synchronous replication is to configure at least two standby servers. To
understand why, we need to examine what can happen when you lose all
standby servers. 

Commits made when synchronous_replication is set will wait until at
least one standby responds. The response may never occur if the last, or
only, standby should crash or the network drops. What should we do in
that situation? 

Sitting and waiting will typically cause operational problems because it
is an effective outage of the pr

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 12:32 PM, Robert Haas  wrote:
> On Thu, Dec 30, 2010 at 11:49 AM, Tom Lane  wrote:
>> One possibility is to break it down like this:
>>
>>        ERROR: foo is a sequence
>>        DETAIL: Triggers can only be used on tables and views.
>>
>> This could still be emitted by a function such as you suggest, and
>> indeed that would be the most practical way from both a consistency
>> and code-size standpoint.
>
> Great idea.  I should have thought of that.

On further reflection, this can still turn into a laundry list in certain cases.

DETAIL: You can only comment on columns of tables, views, and composite types.

seems less helpful than:

DETAIL: Comments on relations with system-generated column names are
not supported.

I think that for rules, triggers, constraints, and anything that only
works on a single relkind, we can't do much better than to list the
specific object types.  But where there's some sort of guiding
principle involved I think we'd do well to articulate it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

2010-12-30 Thread Tomas Vondra

Dne 30.12.2010 15:43, Florian Pflug napsal(a):
> On Dec27, 2010, at 23:49 , Kevin Grittner wrote:
>> Robert Haas  wrote:
>>
>>> With respect to (b), I think I'd need to see a much more detailed
>>> design for how you intend to make this work.  Off the top of my
>>> head there seems to be some pretty serious feasibility problems.
>>
>> I had one random thought on that -- it seemed like a large concern
>> was that there would need to be at least an occasional scan of the
>> entire table to rebuild the distinct value information.
> 
> I believe we could actually avoid that.
> 
> First, the paper "An Optimal Algorithm for the Distinct Elements Problem"
> actually contains an algorithm with *does* handle deletes - it's called
> "L_0" estimate there.

Hmmm, that's interesting. I know there's a part about L_0 estimation,
but that's about estimating Hamming norm of a vector - so I've ignored
it as I thought we can't use it to estimate number of distinct values.
But if it really handles deletions and if we can use it, then it's
really interesting.

> Second, as Tomas pointed out, the stream-based estimator is essentially a
> simplified version of a bloom filter. It starts out with a field of
> N zero bits, and sets K of them to 1 for each value v in the stream.
> Which bits are set to 1 depends on some hash function(s) H_i(v). It's
> then easy to compute how many 1-bits you'd expect to find in the bit
> field after seeing D distinct values, and by reversing that you can
> estimate D from the number of 1-bits in the bit field.

No, I haven't said the stream-based estimators are simplified versions
of a Bloom filter. I said the approach is very similar - all the
algorithms use bitmaps and hash functions, but the algorithms (Bloom
filter vs. probabilistic counting and adaptive sampling) are very different.

The Bloom filter is much more straightforward. The other algorithms are
much more sophisticated which allows to use less space.

> To avoid having to rescan large tables, instead of storing one such
> bit field, we'd store one per B pages of data. We'd then only need
> to scan a range of B pages around every updated or deleted tuple,
> and could afterwards compute a new global estimate of D by combining
> the individual bit fields with bitwise and.

I don't think this could help.

1) This works just with the Bloom filters, not with the other
   algorithms (you can't combine the segments using bitmap OR).

2) With heavily modified tables the updates are usually 'spread'
   through the whole table, so you'll have to rebuild all the
   segments anyway.

> Since the need to regularly VACUUM tables hit by updated or deleted
> won't go away any time soon, we could piggy-back the bit field
> rebuilding onto VACUUM to avoid a second scan.

Well, I guess it's a bit more complicated. First of all, there's a local
VACUUM when doing HOT updates. Second, you need to handle inserts too
(what if the table just grows?).

But I'm not a VACUUM expert, so maybe I'm wrong and this is the right
place to handle rebuilds of distinct stats.

I was thinking about something else - we could 'attach' the rebuild to
an actual seq scan if the amount of changes reaches some threshold
(since the last rebuild). Only in case the amount of changes reaches a
higher threshold, we would rebuild the stats on our own.

Something like

IF (# of updates * deletes > 5%) THEN wait for seq scan
IF (# of updates * deletes > 10%) THEN rebuild the stats

I've found a nice ppt describing how Oracle does this:

   http://www.oraclegeek.net/downloads/One_Pass_Distinct_Sampling.ppt

and there's PDF version

   http://www.oraclegeek.net/downloads/OnePassDistinctSampling.pdf

According to this, Oracle is using the probabilistic counting approach
(see slide 26). And once they find out there were to many changes in the
table, they rebuild the whole thing.

I'm not saying we should do exactly the same, just that this might be a
good direction.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] C++ keywords in headers

2010-12-30 Thread Chris Browne

pete...@gmx.net (Peter Eisentraut) writes:

> On mån, 2010-12-27 at 12:33 -0500, Andrew Dunstan wrote:
>> On a more general point, it would be useful to have some
>> infrastructure for running quality checks like this and publishing
>> the results. We should be way beyond the point where we rely on
>> individuals doing this sort of stuff.
>
> I had a Hudson service set up for things like this, but the hosting
> was unreliable and then the thing faded away.  I could try to revive
> it.

Careful, Oracle has been trying to claim proprietary ownership of
that...
  
-- 
``God decided to take the  devil to court and settle their differences
once and for all.  When Satan heard of this, he grinned and said, "And
just where do you think you're going to find a lawyer?"''

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Treat

On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs  wrote:

> On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:
>
> > it would help if this would just be a simple text-only description of
> > the design that people can actually comment on inline. I don't think
> > sending technical design proposals as a pdf (which seems to be written
> > in doc-style as well) is a good idea to encourage discussion on -hackers
> :(
>
> 25.2.6. Synchronous Replication
> Streaming replication is by default asynchronous. Transactions on the
> primary server write commit records to WAL, yet do not know whether or
> when a standby has received and processed those changes. So with
> asynchronous replication, if the primary crashes, transactions committed
> on the primary might not have been received by any standby. As a result,
> failover from primary to standby could cause data loss because
> transaction completions are absent, relative to the primary. The amount
> of data loss is proportional to the replication delay at the time of
> failover.
>
> Synchronous replication offers the ability to guarantee that all changes
> made by a transaction have been transferred to at least one remote
> standby server. This is an extension to the standard level of durability
> offered by a transaction commit. This is referred to as semi-synchronous
> replication.
>
> When synchronous replication is requested, the commit of a write
> transaction will wait until confirmation that the commit record has been
> transferred successfully to at least one standby server. Waiting for
> confirmation increases the user's confidence that the changes will not
> be lost in the event of server crashes but it also necessarily increases
> the response time for the requesting transaction. The minimum wait time
> is the roundtrip time from primary to standby.
>
> Read only transactions and transaction rollbacks need not wait for
> replies from standby servers. Subtransaction commits do not wait for
> responses from standby servers, only final top-level commits. Long
> running actions such as data loading or index building do not wait until
> the very final commit message.
>
>
> 25.2.6.1. Basic Configuration
> Synchronous replication must be enabled on both the primary and at least
> one standby server. If synchronous replication is disabled on the
> master, or enabled on the primary but not enabled on any slaves, the
> primary will use asynchronous replication by default.
>
> We use a single parameter to enable synchronous replication, set in
> postgresql.conf on both primary and standby servers:
>
> synchronous_replication = off (default) | on
>
> On the primary, synchronous_replication can be set for particular users
> or databases, or dynamically by applications programs.
>
>
This seems like a potential issue, where I start a server with this off, and
then I start turning it on for specific transactions; it isn't exactly clear
what happens, since there may or may not be a running synchronous rep slave
available.  (I love the idea though)


> If more than one standby server specifies synchronous_replication, then
> whichever standby replies first will release waiting commits.
>
>
I don't want you to think I am setting an expectation, but I'm curious about
the possibility of requiring more than 1 server to reply?


> Turning this setting off for a standby allows the administrator to
> exclude certain standby servers from releasing waiting transactions.
> This is useful if not all standby servers are designated as potential
> future primary servers. On the standby, this parameter only takes effect
> at server start.
>
>
> 25.2.6.2. Planning for Performance
> Synchronous replication usually requires carefully planned and placed
> standby servers to ensure applications perform acceptably. Waiting
> doesn't utilise system resources, but transaction locks continue to be
> held until the transfer is confirmed. As a result, incautious use of
> synchronous replication will reduce performance for database
> applications because of increased response times and higher contention.
>
> PostgreSQL allows the application developer to specify the durability
> level required via replication. This can be specified for the system
> overall, though it can also be specified for specific users or
> connections, or even individual transactions.
>
> For example, an application workload might consist of: 10% of changes
> are important customer details, while 90% of changes are less important
> data that the business can more easily survive if it is lost, such as
> chat messages between users.
>
> With synchronous replication options specified at the application level
> (on the master) we can offer sync rep for the most important changes,
> without slowing down the bulk of the total workload. Application level
> options are an important and practical tool for allowing the benefits of
> synchronous replication for high performance applications. This feature
> is unique to PostgreS

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Marti Raudsepp

Most of your doc uses the terms "primary" and "standby", but a few
instances of "master" and "slave" have slipped in. I think it's better
to stick to consistent terminology.

On Thu, Dec 30, 2010 at 21:04, Simon Riggs  wrote:
> With synchronous replication options specified at the application level
> (on the master) we can offer sync rep for the most important changes,
> without slowing down the bulk of the total workload. Application level
> options are an important and practical tool for allowing the benefits of
> synchronous replication for high performance applications. This feature
> is unique to PostgreSQL.

I think a comment about the "head-of-line blocking" nature of
streaming repliaction is in order. If you execute massive writes in
async mode and then run a transaction in sync mode, its commit will be
delayed until all the async transactions before it have been applied
on the slave.

> synchronous_replication_timeout (boolean)

Doesn't look like a boolean to me :)

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

2010-12-30 Thread Alvaro Herrera

Excerpts from Tomas Vondra's message of jue dic 30 16:38:03 -0300 2010:

> > Since the need to regularly VACUUM tables hit by updated or deleted
> > won't go away any time soon, we could piggy-back the bit field
> > rebuilding onto VACUUM to avoid a second scan.
> 
> Well, I guess it's a bit more complicated. First of all, there's a local
> VACUUM when doing HOT updates. Second, you need to handle inserts too
> (what if the table just grows?).
> 
> But I'm not a VACUUM expert, so maybe I'm wrong and this is the right
> place to handle rebuilds of distinct stats.

I was thinking that we could have two different ANALYZE modes, one
"full" and one "incremental"; autovacuum could be modified to use one or
the other depending on how many changes there are (of course, the user
could request one or the other, too; not sure what should be the default
behavior).  So the incremental one wouldn't worry about deletes, only
inserts, and could be called very frequently.  The other one would
trigger a full table scan (or nearly so) to produce a better estimate in
the face of many deletions.

I haven't followed this discussion closely so I'm not sure that this
would be workable.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump --split patch

2010-12-30 Thread Robert Treat

On Thu, Dec 30, 2010 at 2:13 AM, Joel Jacobson  wrote:

> 2010/12/29 Dimitri Fontaine 
>
> Please have a look at getddl:
>>
>>  https://github.com/dimitri/getddl
>>
>>
> Nice! Looks like a nifty tool.
> When I tried it, "./getddl.py -f -F /crypt/funcs -d glue", I got the error
> "No such file or directory: 'sql/schemas.sql'".
>
> While the task of splitting objects into separate files could be solved by
> an external "wrapper tool" like yours around pg_dump,
> I argue it makes more sense of putting the (minimal required) logics into
> pg_dump, due to a number of reasons, most importantly because it's simplier
> and less complex, thus less error prone.
>
> My patch is only a few lines of code and doesn't add any logics to pg_dump,
> it merely reroutes the fwrite() system calls based on the toc entries.
>
> Just the fact you and others had to create own tools to do the splitting
> shows the feature is important, which I think should be included in the
> normal pg_dump tool.
>

As someone whose own version of "getddl" helped inspire Dimitri to create
his own version, I've both enjoyed reading this thread and seeing this wheel
reinvented yet again, and wholeheartedly +1 the idea of building this
directly into pg_dump. (The only thing better would be to make everything
thing sql callable, but that's a problem for another day).


Robert Treat
http://www.xzilla.net

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
> > If more than one standby server specifies synchronous_replication,
> then
> > whichever standby replies first will release waiting commits.

> I don't want you to think I am setting an expectation, but I'm curious
> about the possibility of requiring more than 1 server to reply?

I was initially interested in this myself, but after a long discussion
on "quorum commit" it was decided to go with "first past post".

That is easier to manage, requires one less parameter, performs better
and doesn't really add that much additional confidence.

It was also discussed that we would have a plugin API, but I'm less sure
about that now. Perhaps we can add that option in the future, but its
not high on my list of things for this release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
> > We use a single parameter to enable synchronous replication, set in
> > postgresql.conf on both primary and standby servers:
> >
> > synchronous_replication = off (default) | on
> >
> > On the primary, synchronous_replication can be set for particular
> users
> > or databases, or dynamically by applications programs.
> >
> >
> This seems like a potential issue, where I start a server with this
> off, and then I start turning it on for specific transactions; it
> isn't exactly clear what happens, since there may or may not be a
> running synchronous rep slave available.  (I love the idea though)

Not really an issue. Even if there was a standby there a moment ago, the
standby can go away at any time. So we must cope gracefully with what
happens if you do this. By default, the parameters specify that in the
case you mention we will just use async replication (no wait!).
Options exist to change that, since some people want to wait until the
sysadmin adds a standby.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Stefan Kaltenbrunner


On 12/30/2010 08:04 PM, Simon Riggs wrote:

On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:


it would help if this would just be a simple text-only description of
the design that people can actually comment on inline. I don't think
sending technical design proposals as a pdf (which seems to be written
in doc-style as well) is a good idea to encourage discussion on -hackers :(


25.2.6. Synchronous Replication
Streaming replication is by default asynchronous. Transactions on the
primary server write commit records to WAL, yet do not know whether or
when a standby has received and processed those changes. So with
asynchronous replication, if the primary crashes, transactions committed
on the primary might not have been received by any standby. As a result,
failover from primary to standby could cause data loss because
transaction completions are absent, relative to the primary. The amount
of data loss is proportional to the replication delay at the time of
failover.

Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication.

When synchronous replication is requested, the commit of a write
transaction will wait until confirmation that the commit record has been
transferred successfully to at least one standby server. Waiting for
confirmation increases the user's confidence that the changes will not
be lost in the event of server crashes but it also necessarily increases
the response time for the requesting transaction. The minimum wait time
is the roundtrip time from primary to standby.


hmm this is one of the main problems I see with the proposed "master is 
sometimes aware of the standby"(as in the feedback mode) concept this 
proposal has. If it waits for only one of the standbys there is some 
issue with the terminology. As a DBA I would expect the master to only 
return if ALL of the "sync replication" declared nodes replied ok.





Read only transactions and transaction rollbacks need not wait for
replies from standby servers. Subtransaction commits do not wait for
responses from standby servers, only final top-level commits. Long
running actions such as data loading or index building do not wait until
the very final commit message.


25.2.6.1. Basic Configuration
Synchronous replication must be enabled on both the primary and at least
one standby server. If synchronous replication is disabled on the
master, or enabled on the primary but not enabled on any slaves, the
primary will use asynchronous replication by default.

We use a single parameter to enable synchronous replication, set in
postgresql.conf on both primary and standby servers:


this reads as if you can only set it there



synchronous_replication = off (default) | on

On the primary, synchronous_replication can be set for particular users
or databases, or dynamically by applications programs.


this says otherwise



If more than one standby server specifies synchronous_replication, then
whichever standby replies first will release waiting commits.


see above for why I think this violates the configuration promise - if I 
say "this is a sync standby" I better expect it to be...




Turning this setting off for a standby allows the administrator to
exclude certain standby servers from releasing waiting transactions.
This is useful if not all standby servers are designated as potential
future primary servers. On the standby, this parameter only takes effect
at server start.


25.2.6.2. Planning for Performance
Synchronous replication usually requires carefully planned and placed
standby servers to ensure applications perform acceptably. Waiting
doesn't utilise system resources, but transaction locks continue to be
held until the transfer is confirmed. As a result, incautious use of
synchronous replication will reduce performance for database
applications because of increased response times and higher contention.

PostgreSQL allows the application developer to specify the durability
level required via replication. This can be specified for the system
overall, though it can also be specified for specific users or
connections, or even individual transactions.

For example, an application workload might consist of: 10% of changes
are important customer details, while 90% of changes are less important
data that the business can more easily survive if it is lost, such as
chat messages between users.

With synchronous replication options specified at the application level
(on the master) we can offer sync rep for the most important changes,
without slowing down the bulk of the total workload. Application level
options are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications. This feature
is unique to PostgreSQL.


that seems t

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Aidan Van Dyk

On Thu, Dec 30, 2010 at 3:07 PM, Robert Treat  wrote:

>> If primary crashes while commits are waiting for acknowledgement, those
>> transactions will be marked fully committed if the primary database
>> recovers, no matter how allow_standalone_primary is set.
>
> This seems backwards; if you are waiting for acknowledgement, wouldn't the
> normal assumption be that the transactions *didnt* make it to any standby,
> and should be rolled back ?

This is the standard 2-phase commit problem.  The primary server *has*
committed it, it's fsync has returned, and the only thing keeping it
from returning the commit to the client is that it's waiting on a
synchronous "ack" from a slave.

You've got 2 options:
1) initiate fsync on the slave first
   - In this case, the slave is farther ahead than the primary, and if
primary fails, you're *forced* to have a failover.  The standby is
head of the primary, so the primary recovering can cause divergence.
And you'll likely have to do a base-backup style sync to get a new
primary/standby setup.
2) initiate fsync on the primary first
   - In this case, the slave is always slightly behind.  If if your
primary falls over, you don't give commit messages to the clients, but
if it recovers, it might have committed data, and slaves will still be
able to catch up.

The thing is that currently, even without replication, #2 can happen.
If your db falls over before it gets the commit packet stuffed out the
network, you're in the same boat.  The data might be committed, even
though you didn't get the commit packet, and when  your DB recovers,
it's got the committed data that you never "knew" was committed.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Treat

On Thu, Dec 30, 2010 at 3:36 PM, Simon Riggs  wrote:
>
> On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
> > > If more than one standby server specifies synchronous_replication,
> > then
> > > whichever standby replies first will release waiting commits.
>
> > I don't want you to think I am setting an expectation, but I'm curious
> > about the possibility of requiring more than 1 server to reply?
>
> I was initially interested in this myself, but after a long discussion
> on "quorum commit" it was decided to go with "first past post".
>
> That is easier to manage, requires one less parameter, performs better
> and doesn't really add that much additional confidence.
>

Yes, I think with a single master, you are probably right (been
dealing with more than my fair share of multi-master based nosql
solutions lately)

Still, one thing that has me concerned is that in the case of two
slaves, you don't know which one is the more up-to-date one if you
need to failover. It'd be nice if you could just guarantee they both
are, but in lieu of that, I guess whatever decision tree is being
used, it needs to look at current xlog location of any potential
failover targets.

> It was also discussed that we would have a plugin API, but I'm less sure
> about that now. Perhaps we can add that option in the future, but its
> not high on my list of things for this release.
>

Agreed.

Robert Treat
http://www.xzilla.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Old git repo

2010-12-30 Thread Jeff Davis

On Thu, 2010-12-30 at 11:02 -0500, Tom Lane wrote:
> I'm with Magnus on this: the risk of confusion seems to greatly
> outweigh any possible benefit from keeping it.  There is no reason for
> anyone to use that old repo unless they are still working with a local
> clone of it, and even if they do have a local clone, such a clone is
> self-sufficient.

The reason I originally asked for it to be kept around was not because
it's hard to rebase, but because there might be references to SHA1s from
that repo floating around.

I don't think these would be very common, nor critical, but I know I
wrote a few emails that included things like "look at this commit".
Personally, my utility for the old repo is not much (if it was anything
important, I wouldn't have relied on the unofficial repo). But we should
probably give a little bit of warning for folks that might want to
rebase or translate some old notes.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
> > When allow_standalone_primary is set, a user will stop waiting once
> the
> > replication_timeout has been reached for their specific session.
> Users
> > are not waiting for a specific standby to reply, they are waiting
> for a
> > reply from any standby, so the unavailability of any one standby is
> not
> > significant to a user. It is possible for user sessions to hit
> timeout
> > even though standbys are communicating normally. In that case, the
> > setting of replication_timeout is probably too low.
> >
> >
> will a notice or warning be thrown in these cases? I'm thinking
> something
> like the checkpoint timeout warning, but could be something else; it
> just
> seems to me you need some way to know you're timing out.

We can do that, yes.

> > The standby sends regular status messages to the primary. If no
> status
> > messages have been received for replication_timeout the primary
> server
> > will assume the connection is dead and terminate it. This happens
> > whatever the setting of allow_standalone_primary.
> >
> >
> Does the standby attempt to reconnect in these scenarios?

Yes it would, but the reason why we terminated the connection was it
wasn't talking any more, so it is probably dead.

> > If primary crashes while commits are waiting for acknowledgement,
> those
> > transactions will be marked fully committed if the primary database
> > recovers, no matter how allow_standalone_primary is set.
> 
> 
> This seems backwards; if you are waiting for acknowledgement, wouldn't
> the
> normal assumption be that the transactions *didnt* make it to any
> standby,
> and should be rolled back ?

Well, we can't roll it back. We have already written the commit record
to WAL.

> > There is no way
> > to be certain that all standbys have received all outstanding WAL
> data
> > at time of the crash of the primary. Some transactions may not show
> as
> > committed on the standby, even though they show as committed on the
> > primary. The guarantee we offer is that the application will not
> receive
> > explicit acknowledgement of the successful commit of a transaction
> until
> > the WAL data is known to be safely received by the standby. Hence
> this
> > mechanism is technically "semi synchronous" rather than "fully
> > synchronous" replication. Note that replication still not be fully
> > synchronous even if we wait for all standby servers, though this
> would
> > reduce availability, as described previously.
> >
> >
> I think we ought to have an example of the best configuration for
> "cannot
> afford to lose any data" scenarios, where we would prefer an overall
> service
> interruption over the chance of having the primary / secondary out of
> synch.

I say "use two or more standbys" more than once...

> >>
> >
> somewhat concerned that we seem to need to use double negatives to
> describe
> whats going on here. it makes me think we ought to rename this to
> require_synchronous_standby or similar.

Don't see why we can't use double negatives. ;-)

The parameter is named directly from Fujii Masao's suggestion.

> > 18.5.6. Standby Servers
> > These settings control the behavior of a standby server that is to
> > receive replication data.
> >

...

> i was expecting this section to mention the synchronous_replication
> (bool)
> somewhere, to control if the standby will participate synchronously or
> asynch; granted it's the same config as listed in 18.5.5 right? Just
> that
> the heading of that section specifically targets the primary.

OK, good idea.

> HTH, looks pretty good at first glance. 

Thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Stefan Kaltenbrunner


On 12/29/2010 07:42 PM, Robert Haas wrote:

On Dec 29, 2010, at 1:01 PM, Tom Lane  wrote:

Is it really stable enough for bin/?  My impression of the state of
affairs is that there is nothing whatsoever about replication that
is really stable yet.


Well, that's not stopping us from shipping a core feature called "replication". 
 I'll defer to others on how mature pg_streamrecv is, but if it's no worse than 
replication in general I think putting it in bin/ is the right thing to do.


well I have not looked at how good pg_streamrecv really is but we 
desperately need to fix the basic usability issues in our current 
replication implementation and pg_streamrecv seems to be a useful tool 
to help with some.
From all the people I talked to with SR they where surprised how 
complex and fragile the initial setup procedure is - it is the lack of 
providing a simple and reliable tool to do a base backup over libpq and 
also a simple way to have that tool tell the master "keep the wal 
segments I need for starting the standby". I do realize we need to keep 
the ability to do the basebackup out-of-line but for 99% of the users it 
is tool complex, scary and failure proof (I know nobody who got the 
procedure right the first time - which is a strong hint that we need to 
work on that).




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:

> Still, one thing that has me concerned is that in the case of two
> slaves, you don't know which one is the more up-to-date one if you
> need to failover. It'd be nice if you could just guarantee they both
> are...

Regrettably, nobody can know that, without checking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small table left outer join big table

2010-12-30 Thread Dimitri Fontaine

Tom Lane  writes:
> I can't get all *that* excited about complicating hash joins as
> proposed.  The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table.  The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.

That sounds somewhat like Loose Indexscan as described in the following
wiki page, right?

  http://wiki.postgresql.org/wiki/Loose_indexscan

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Stefan Kaltenbrunner


On 12/30/2010 10:01 PM, Simon Riggs wrote:

On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:


Still, one thing that has me concerned is that in the case of two
slaves, you don't know which one is the more up-to-date one if you
need to failover. It'd be nice if you could just guarantee they both
are...


Regrettably, nobody can know that, without checking.


how exactly would you check? - this seems like something that needs to 
be done from the SQL and the CLI level and also very well documented 
(which I cannot see in your proposal).




Stefan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 22:11 +0200, Marti Raudsepp wrote:

> I think a comment about the "head-of-line blocking" nature of
> streaming repliaction is in order. If you execute massive writes in
> async mode and then run a transaction in sync mode, its commit will be
> delayed until all the async transactions before it have been applied
> on the slave.

Not really sure I understand what you want me to add there. The case you
mention is identical whether we use the word "async" or "sync" where you
mention "in async mode".

Replication doesn't wait until a sync commit is requested, it is
continuously active.

Sync rep's only addition are the reply messages.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 21:42 +0100, Stefan Kaltenbrunner wrote:
> >
> > Synchronous replication offers the ability to guarantee that all changes
> > made by a transaction have been transferred to at least one remote
> > standby server. This is an extension to the standard level of durability
> > offered by a transaction commit. This is referred to as semi-synchronous
> > replication.
> >
> > When synchronous replication is requested, the commit of a write
> > transaction will wait until confirmation that the commit record has been
> > transferred successfully to at least one standby server. Waiting for
> > confirmation increases the user's confidence that the changes will not
> > be lost in the event of server crashes but it also necessarily increases
> > the response time for the requesting transaction. The minimum wait time
> > is the roundtrip time from primary to standby.
> 
> hmm this is one of the main problems I see with the proposed "master is 
> sometimes aware of the standby"(as in the feedback mode) concept this 
> proposal has. If it waits for only one of the standbys there is some 
> issue with the terminology. As a DBA I would expect the master to only 
> return if ALL of the "sync replication" declared nodes replied ok.

Well, as a DBA, I expect it to work with just one. That's how MySQL and
Oracle work at least. If ALL standbys reply, it takes longer, makes the
code harder, how do you determine what "all" is robustly etc.. Plus its
been discussed already.

> What I'm really missing with that proposal is how people expect that 
> solution to be managed - 

What aspect do you wish to monitor? I'm happy to consider your
suggestions.

> given there is only sometimes a feedback 
> channel into the master you can't do the monitoring.

Not sure what you mean. Please explain more.

> Even if you could (which we really need!) there is nothing in the 
> proposal yet that will help to determine on what the most recent standby 
> (in the case of more >1 sync standby) might be.

Functions to determine that already exist.

>  - but it would require a real standby 
> registration or at least standby management possibility on the master 
> not a halfway done one - so do we really need hot_standby_feedback as 
> part of the inital sync-rep patch?

It is a Hot Standby feature, but so tightly integrated with this code
that it isn't possible for me to submit as two separate patches.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs  wrote:
> We use a single parameter to enable synchronous replication, set in
> postgresql.conf on both primary and standby servers:
>
> synchronous_replication = off (default) | on
>
> On the primary, synchronous_replication can be set for particular users
> or databases, or dynamically by applications programs.
>
> If more than one standby server specifies synchronous_replication, then
> whichever standby replies first will release waiting commits.
>
> Turning this setting off for a standby allows the administrator to
> exclude certain standby servers from releasing waiting transactions.
> This is useful if not all standby servers are designated as potential
> future primary servers. On the standby, this parameter only takes effect
> at server start.

I think it's a bad idea to use the same parameter to mean different
things on the master and standby.  You proposed this kind of double
meaning for the hot_standby parameter (possibly back when it was
called standby_connections, or something like that) and we (rightly, I
think) did not adopt that, instead ending up with wal_level to control
the master's behavior and hot_standby to control the slave's behavior.

> synchronous_replication (boolean)
>        Specifies whether transaction commit will wait for WAL records
>        to be replicated before the command returns a "success"
>        indication to the client.

The word "replicated" here could be taken to mean different things,
most obviously:

- slave has received the WAL
- slave has fsync'd the WAL
- slave has applied the WAL

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 22:08 +0100, Stefan Kaltenbrunner wrote:
> On 12/30/2010 10:01 PM, Simon Riggs wrote:
> > On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:
> >
> >> Still, one thing that has me concerned is that in the case of two
> >> slaves, you don't know which one is the more up-to-date one if you
> >> need to failover. It'd be nice if you could just guarantee they both
> >> are...
> >
> > Regrettably, nobody can know that, without checking.
> 
> how exactly would you check? - this seems like something that needs to 
> be done from the SQL and the CLI level and also very well documented 
> (which I cannot see in your proposal).

This is a proposal for sync rep, not multi-node failover. I'm definitely
not going to widen the scope of this project.

Functions already exist to check the thing you're asking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Heikki Linnakangas


On 30.12.2010 16:49, Florian Pflug wrote:

On Dec30, 2010, at 13:31 , Joachim Wieland wrote:

We return snapshot information as a chunk of data to the client. At
the same time however, we set a checksum in shared memory to protect
against modification of the snapshot. A publishing backend can revoke
its snapshot by deleting the checksum and a backend that is asked to
install a snapshot can verify that the snapshot is correct and current
by calculating the checksum and comparing it with the one in shared
memory.


We'd still have to stream these checksums to the standbys though,
or would they be exempt from the checksum checks?

I still wonder whether these checks are worth the complexity. I
believe we'd only allow snapshot modifications for read-only queries
anyway, so what point is there in preventing clients from setting
broken snapshots?


Hmm, our definition of "read-only" is a bit fuzzy. While a transaction 
doesn't modify the database itself, it could still send NOTIFYs or call 
a PL function to do all sorts of things outside the database. Imagine 
that you're paranoid about data integrity, and have a security definer 
function that runs cross checks on the data. If it finds any 
anomalities, it wakes up the operator or forces shutdown or similar.


Now a malicious user could set a snapshot that passes the basic validity 
checks, ie. xmin >= GlobalXmin, but contains a combination of still 
in-progress that never existed in reality. If he then calls the 
paranoia-function, it would see an inconsistent state of committed 
tuples and get upset.


Maybe that's a bit far-stretched, but it's not entirely clear that 
running with an inconsistent snapshot is harmless.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 3:42 PM, Stefan Kaltenbrunner
 wrote:
>> synchronous replication for high performance applications. This feature
>> is unique to PostgreSQL.
>
> that seems to be a bit too much marketing for a reference level document

+1.

> It also does not address the more general (not sync rep specific) problem of
> how to deal with max_keep_segments which is a wart and I was hoping we could
> get rid of in 9.1 - but it would require a real standby registration or at
> least standby management possibility on the master not a halfway done one -
> so do we really need hot_standby_feedback as part of the inital sync-rep
> patch?

And this is really the key point on which previous discussions of sync
rep stalled.  Simon is clearly of the opinion that any system where
the slaves have an individual identities (aka "standby registration")
is a bad idea, but the only justification he's offered for that
position is the assertion that it doesn't allow any added
functionality.  As you point out, and as has been pointed out before,
this is not true, but unless Simon has changed his position since the
last time we discussed this, he will not only refuse to include any
kind of standby identifier in any of his proposals, but will also
argue against including any such code even if it is written by someone
else.  I don't understand why, but that's how it is.

Synchronous replication would probably be done and committed by now if
it weren't for this issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Heikki Linnakangas


On 30.12.2010 10:55, Mark Kirkwood wrote:

Removing the offending

--delete --exclude=backup_label

options from the base backup step makes everything work properly again.


I don't see why --delete would make any difference, but you shouldn't 
exclude backup_label from the base backup. The backup label file is an 
important part of the online backup, it cannot be recovered safely 
without it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Mark Kirkwood


On 31/12/10 11:01, Heikki Linnakangas wrote:

On 30.12.2010 10:55, Mark Kirkwood wrote:

Removing the offending

--delete --exclude=backup_label

options from the base backup step makes everything work properly again.


I don't see why --delete would make any difference, but you shouldn't 
exclude backup_label from the base backup. The backup label file is an 
important part of the online backup, it cannot be recovered safely 
without it.




Yes, you (and Robert) are entirely correct, I was confused in my 
understanding of the "--delete --exclude=backup_label" and thought it to 
mean "exclude the backup label from the delete". Yeah the --delete is 
harmless, it is the exclude backup_label that is causing the problem.


Note to all current Pitrtools users, this impacts you! We need to get a 
corrected version out soon I would think.


I note that this uninitialized pages with standbys has cropped up from 
time to time - I wonder if in most/all the cases folk were using Pitrtools?


regards

Mark

Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Mark Kirkwood


On 31/12/10 11:11, Mark Kirkwood wrote:


Yes, you (and Robert) are entirely correct, I was confused in my 
understanding of the "--delete --exclude=backup_label" and thought it 
to mean "exclude the backup label from the delete". Yeah the --delete 
is harmless, it is the exclude backup_label that is causing the problem.


Note to all current Pitrtools users, this impacts you! We need to get 
a corrected version out soon I would think.




Also (not surprisingly) I can confirm that data corruption is possible:

1/ Perform approx 14 transactions against the primary
2/ Cancel Pgbench
3/ Issue "SELECT pg_switch_xlog()" on primary
4/ Bring up standby after checking it has applied last log

The resulting primary and standby should be identical, but:

primary:

bench=# SELECT count(*) FROM branches;
 count
---
   100

bench=# SELECT count(*) FROM accounts;
  count
--
 1000

standby:

bench=# SELECT count(*) FROM branches;
 count
---
   132

bench=# SELECT count(*) FROM accounts;
  count
-
 9998269

The other counts are the same. We have lost some accounts records, but 
have gained duplicates in branches:


bench=# REINDEX TABLE branches;
ERROR:  could not create unique index "branches_pkey"
DETAIL:  Table contains duplicated values.

regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Jim Nasby

On Dec 30, 2010, at 3:27 PM, Robert Haas wrote:
>> synchronous_replication (boolean)
>>Specifies whether transaction commit will wait for WAL records
>>to be replicated before the command returns a "success"
>>indication to the client.
> 
> The word "replicated" here could be taken to mean different things,
> most obviously:
> 
> - slave has received the WAL
> - slave has fsync'd the WAL
> - slave has applied the WAL

I think that comment is valid for the entire set of docs, actually. The 
document goes out of its way to avoid simple phrases like "replicated", but 
doesn't spell out exactly what is happening, ie:

"Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication."

Reading that, I'm left with the sense that this isn't a simple matter of "Oh, 
the data has been replicated to the slave before commit returns", but nothing 
does a good job of clearly explaining what the distinction is and what it 
means. This section:

"The guarantee we offer is that the application will not receive
explicit acknowledgement of the successful commit of a transaction until
the WAL data is known to be safely received by the standby. Hence this
mechanism is technically "semi synchronous" rather than "fully
synchronous" replication."

does provide some enlightenment, but it's at the end of the section. I think it 
would be best if there was a section right at the beginning that talked about 
the data quality issue of sync replication and how we're avoiding it with our 
semi-sync solution.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 16:27 -0500, Robert Haas wrote:

> I think it's a bad idea to use the same parameter to mean different
> things on the master and standby.  

Obviously if you phrase it like that, nobody would disagree. I would say
I have used the same parameter on both sides in a balanced way to
simplify the configuration, which had been an important factor in the
debate.

"You need to set parameter X on both primary and standby" seems simple
and clear. It certainly works OK for MySQL.

It's no bother to change, whichever way we decide and I'm happy to do
so.

My previous patch had two parameters:

primary: synchronous_replication = ...
standby: synchronous_replication_service = on | off

Which do people prefer?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 16:47 -0500, Robert Haas wrote:
> On Thu, Dec 30, 2010 at 3:42 PM, Stefan Kaltenbrunner
>  wrote:
> >> synchronous replication for high performance applications. This feature
> >> is unique to PostgreSQL.
> >
> > that seems to be a bit too much marketing for a reference level document
> 
> +1.

I've removed the "This feature is unique to PostgreSQL", which I agree
belongs in a press release, not docs. The explanation of a use case that
would benefit from the feature seems valid and I've left that in.

PostgreSQL docs are more technical and precise than any other DBMS, even
DB2. Having read everybody else's docs, I'm inclined to say it would be
easier to explain if I left out the details, as they do. You won't find
a detailed explanation of commit guarantees in MySQL docs, for example.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

2010-12-30 Thread Noah Misch

On Thu, Dec 30, 2010 at 12:57:45AM -0500, Robert Haas wrote:
> On Thu, Dec 30, 2010 at 12:24 AM, Noah Misch  wrote:
> > On Wed, Dec 29, 2010 at 11:14:37PM -0500, Robert Haas wrote:
> >> I think for any pair of types (T1, T2) we should first determine
> >> whether we can skip the scan altogether. ?If yes, we're done. ?If no,
> >> then we should have a way of determining whether a verify-only scan is
> >> guaranteed to be sufficient (in your terminology, the verification
> >> scan is guaranteed to return either positive or error, not negative).
> >> If yes, then we do a verification scan. ?If no, we do a rewrite.
> >
> > How would we answer the second question in general?
> 
> I am not sure - I guess we'd need to design some sort of mechanism for that.

Okay, here goes.  Given a Var "varexpr" representing the column we're changing
and an expression tree "expr" we need to answer two questions (argument lists
simplified -- assume the same RTEs in all cases):

always-noop: "Will datumIsEquals(ExecEvalExpr(varexpr), ExecEvalExpr(expr))
return true or yield an error for all possible tuples?"
never-error: "Will ExecEvalExpr(expr) never throw an error?"

Currently we're only interested in the second question when the first is also
true; I'm not sure if there's something fundamental there, or just an artifact
of current needs.  To support answering these questions, extend the CREATE CAST
changes from my earlier proposal, modifying the exemptor signature to return an
int, a bitmask containing one bit for each of these two questions.  Call the
function in find_typmod_coercion_function.  If its return value answers "yes" to
both questions, return COERCION_PATH_NONE, resulting in omission of the length
coercion node.  For other verdicts, generate the FuncExpr as normal and insert
the verdict in a new FuncExpr field "funcexempt".  (That need not increase the
size of FuncExpr, if that's a concern.)

ATPrepAlterColumnType, having generated its transformation expression, will call
a new function that recursively walks the tree to answer the two questions.  The
walker will apply these rules:

1. For a Var with the varno/varattno in question, intrinsically "yes" to both.
2. A RelabelType node inherits the answers of its sole argument.
3. A CoerceToDomain node inherits the always-noop answer of its sole argument.
When GetDomainConstraints() == NIL, it also inherits the never-error answer.
Otherwise, never-error becomes "no".
4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt field
and the answers from its first argument.
5. Any other node answers "no" to both questions.

If the transformation expression root has "yes" to both questions, we're done
with no scan.  If only always-noop is true, we do a verification scan only.
Otherwise, we optimize nothing and do a rewrite.

Thoughts?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 18:47 -0600, Jim Nasby wrote:
> On Dec 30, 2010, at 3:27 PM, Robert Haas wrote:
> >> synchronous_replication (boolean)
> >>Specifies whether transaction commit will wait for WAL records
> >>to be replicated before the command returns a "success"
> >>indication to the client.
> > 
> > The word "replicated" here could be taken to mean different things,
> > most obviously:
> > 
> > - slave has received the WAL
> > - slave has fsync'd the WAL
> > - slave has applied the WAL
> 
> I think that comment is valid for the entire set of docs, actually. The 
> document goes out of its way to avoid simple phrases like "replicated", but 
> doesn't spell out exactly what is happening, ie:
> 
> "Synchronous replication offers the ability to guarantee that all changes
> made by a transaction have been transferred to at least one remote
> standby server. This is an extension to the standard level of durability
> offered by a transaction commit. This is referred to as semi-synchronous
> replication."
> 
> Reading that, I'm left with the sense that this isn't a simple matter of "Oh, 
> the data has been replicated to the slave before commit returns", but nothing 
> does a good job of clearly explaining what the distinction is and what it 
> means. This section:
> 
> "The guarantee we offer is that the application will not receive
> explicit acknowledgement of the successful commit of a transaction until
> the WAL data is known to be safely received by the standby. Hence this
> mechanism is technically "semi synchronous" rather than "fully
> synchronous" replication."
> 
> does provide some enlightenment, but it's at the end of the section. I think 
> it would be best if there was a section right at the beginning that talked 
> about the data quality issue of sync replication and how we're avoiding it 
> with our semi-sync solution.

I'm happy to change the docs. It's the first draft...

If that's the only problem you've got, then I'm feeling good.

Any problems with the user interface itself?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs

On Thu, 2010-12-30 at 16:47 -0500, Robert Haas wrote:

> > It also does not address the more general (not sync rep specific) problem of
> > how to deal with max_keep_segments which is a wart and I was hoping we could
> > get rid of in 9.1 - but it would require a real standby registration or at
> > least standby management possibility on the master not a halfway done one -
> > so do we really need hot_standby_feedback as part of the inital sync-rep
> > patch?
> 
> And this is really the key point on which previous discussions of sync
> rep stalled.  Simon is clearly of the opinion that any system where
> the slaves have an individual identities (aka "standby registration")
> is a bad idea, but the only justification he's offered for that
> position is the assertion that it doesn't allow any added
> functionality.  As you point out, and as has been pointed out before,
> this is not true, but unless Simon has changed his position since the
> last time we discussed this, he will not only refuse to include any
> kind of standby identifier in any of his proposals, but will also
> argue against including any such code even if it is written by someone
> else.  I don't understand why, but that's how it is.
> 
> Synchronous replication would probably be done and committed by now if
> it weren't for this issue.

I'm not very clear what your response has to do with Stefan's comments.

My general perspective is that MySQL released a simple design a year
ahead of us, which should be to our collective shame. I will be working
towards delivering something useful in this release.

Standby registration is complicated and not necessary. If anybody needs
to justify anything, it is the people that claim it is somehow
essential. If you want increased complexity and features, you can have
it, one day, but don't prevent everybody else from benefiting from
simplicity, now. What we do need is performance, otherwise the feature
is mostly unusable for production systems, without splitting your
application into pieces.

I would rather concentrate on a minimal set of functionality that we can
all agree on. To show that, I have gone out of my way to include
features specified by others, including exact names and behaviours of
parameters.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Tom Lane

Robert Haas  writes:
> On further reflection, this can still turn into a laundry list in certain 
> cases.

> DETAIL: You can only comment on columns of tables, views, and composite types.

> seems less helpful than:

> DETAIL: Comments on relations with system-generated column names are
> not supported.

> I think that for rules, triggers, constraints, and anything that only
> works on a single relkind, we can't do much better than to list the
> specific object types.  But where there's some sort of guiding
> principle involved I think we'd do well to articulate it.

I'm unconvinced, because the "guiding principle" is likely to be an
implementation detail that won't actually mean much to users.  Your
example above is a case in point --- I do *not* think the average
user will see that as an improvement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

2010-12-30 Thread Tom Lane

Alvaro Herrera  writes:
> I was thinking that we could have two different ANALYZE modes, one
> "full" and one "incremental"; autovacuum could be modified to use one or
> the other depending on how many changes there are (of course, the user
> could request one or the other, too; not sure what should be the default
> behavior).

How is an incremental ANALYZE going to work at all?  It has no way to
find out the recent changes in the table, for *either* inserts or
deletes.  Unless you want to seqscan the whole table looking for tuples
with xmin later than something-or-other ... which more or less defeats
the purpose.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Old git repo

2010-12-30 Thread Tom Lane

Jeff Davis  writes:
> Personally, my utility for the old repo is not much (if it was anything
> important, I wouldn't have relied on the unofficial repo). But we should
> probably give a little bit of warning for folks that might want to
> rebase or translate some old notes.

Well, I guess the question is how much warning.  I suggested O(1 week)
but Robert seems to want O(1 year).  As long as there's some agreed
deadline, I'm not very picky about what it is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 8:58 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On further reflection, this can still turn into a laundry list in certain 
>> cases.
>
>> DETAIL: You can only comment on columns of tables, views, and composite 
>> types.
>
>> seems less helpful than:
>
>> DETAIL: Comments on relations with system-generated column names are
>> not supported.
>
>> I think that for rules, triggers, constraints, and anything that only
>> works on a single relkind, we can't do much better than to list the
>> specific object types.  But where there's some sort of guiding
>> principle involved I think we'd do well to articulate it.
>
> I'm unconvinced, because the "guiding principle" is likely to be an
> implementation detail that won't actually mean much to users.  Your
> example above is a case in point --- I do *not* think the average
> user will see that as an improvement.

I think this thread has worked itself around to where it's entirely
pointless.  My original complaint was about error messages like this:

"%s" is not a table, view, composite type, or index

which, once we have foreign tables, needs to be changed to read:

"%s" is not a table, view, composite type, index, or foreign table

I think that message is the epitome of worthless, and several other
people agreed.  After various proposals of greater and lesser merit,
we've somehow worked around to the suggestion that this should be
reworded to:

ERROR: "%s" is a sequence
DETAIL: Only attributes of tables, views, composite types, indexes, or
foreign tables can be renamed.

While that may be a marginal improvement in clarity, it does
absolutely nothing to address my original complaint, which is that
adding a relkind forces trivial revisions of messages all over the
system, some of which are already excessively long-winded.  This
message also does nothing to help the user understand WHY we don't
allow renaming the attributes of his sequence or TOAST table, whereas
the proposed revision does.

The absolute worst offenders are messages of the form:

 is not supported on X, Y, Z, or T.

which now have to be revised to read:

 is not supported on X,Y, Z, T, or W.

This problem could be avoided by writing:

 is supported on A and B

Or:

 is supported only for relation types which quack

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Tom Lane

Robert Haas  writes:
> I think this thread has worked itself around to where it's entirely
> pointless.

I understand your frustration, but it's not clear to me that there *is*
any simple solution to this problem.  Fundamentally, adding new relkinds
to the system is always going to require running around and looking at a
lot of code to see what's affected; and that goes for the error messages
too.  I put no stock at all in the idea that writing a "guiding
principle" in the error messages will avoid anything, because as often
as not, adding a fundamentally new relkind is going to involve some
tweaking of what those principles are.

> ... This message also does nothing to help the user understand WHY we don't
> allow renaming the attributes of his sequence or TOAST table, whereas
> the proposed revision does.

I remain unconvinced that the average user cares, or will be able to
extrapolate the message to understand what's supported or not, even
if he does care about the reason for the restriction.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Joachim Wieland

On Thu, Dec 30, 2010 at 9:40 AM, Alvaro Herrera
 wrote:
>> Disadvantage of b: It doesn't allow a snapshot to be installed on a
>> different server. It requires a serializable open transaction to hold
>> the snapshot.
>
> Why does it require a serializable transaction?  You could simply
> register the snapshot in any transaction.  (Of course, the net effect
> would be pretty similar to a serializable transaction).

I am not assuming that the publishing transaction blocks until its
snapshot is being picked up. A read committed transaction would get a
new snapshot for every other query, so the published snapshot is no
longer represented by an actual backend until it is being picked up by
one. Since nobody is holding off xmin/GlobalXmin, eventually vacuum
would remove tuples that the published-but-not-yet-picked-up snapshot
should still be able to see, no?

Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Joachim Wieland

On Thu, Dec 30, 2010 at 9:49 AM, Florian Pflug  wrote:
> On Dec30, 2010, at 13:31 , Joachim Wieland wrote:
>> We return snapshot information as a chunk of data to the client. At
>> the same time however, we set a checksum in shared memory to protect
>> against modification of the snapshot. A publishing backend can revoke
>> its snapshot by deleting the checksum and a backend that is asked to
>> install a snapshot can verify that the snapshot is correct and current
>> by calculating the checksum and comparing it with the one in shared
>> memory.
>
> We'd still have to stream these checksums to the standbys though,
> or would they be exempt from the checksum checks?

I am not talking about having synchronized snapshots among standby
servers at all.

I am only proposing a client API that will work for this future idea as well.


> I still wonder whether these checks are worth the complexity. I
> believe we'd only allow snapshot modifications for read-only queries
> anyway, so what point is there in preventing clients from setting
> broken snapshots?

What's the use case for it? As soon as nobody comes up with a
reasonable use case for it, let's aim for the robust version.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Joshua Tolley

On Thu, Dec 30, 2010 at 03:24:09PM -0500, Aidan Van Dyk wrote:
> On Thu, Dec 30, 2010 at 3:07 PM, Robert Treat  wrote:
> 
> >> If primary crashes while commits are waiting for acknowledgement, those
> >> transactions will be marked fully committed if the primary database
> >> recovers, no matter how allow_standalone_primary is set.
> >
> > This seems backwards; if you are waiting for acknowledgement, wouldn't the
> > normal assumption be that the transactions *didnt* make it to any standby,
> > and should be rolled back ?
> 
> This is the standard 2-phase commit problem.  The primary server *has*
> committed it, it's fsync has returned, and the only thing keeping it
> from returning the commit to the client is that it's waiting on a
> synchronous "ack" from a slave.



> 2) initiate fsync on the primary first
>- In this case, the slave is always slightly behind.  If if your
> primary falls over, you don't give commit messages to the clients, but
> if it recovers, it might have committed data, and slaves will still be
> able to catch up.
> 
> The thing is that currently, even without replication, #2 can happen.

For what little it's worth, I vote for this option, because it's a problem
that can already happen (as opposed to adding an entirely new type of problem
to the mix).

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 8:57 PM, Simon Riggs  wrote:
> I'm not very clear what your response has to do with Stefan's comments.
>
> My general perspective is that MySQL released a simple design a year
> ahead of us, which should be to our collective shame. I will be working
> towards delivering something useful in this release.

I don't feel ashamed of our feature set and I am not out to beat MySQL
or anyone else, just to deliver the best product that we can.  Our
community has different interests than the MySQL community and that is
fine.  Still, I don't disagree that we should be aiming at feature
parity.

I see now that you've tried to design this feature in a way that is
similar to MySQL's offering, which does have some value.  But it
appears to me that the documentation you've written here is
substantially similar to the MySQL 5.5 reference documentation.  That
could get us into a world of legal trouble - that documentation is not
even open source, let alone BSD.

http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html

> I would rather concentrate on a minimal set of functionality that we can
> all agree on.

Me too; and perhaps your proposal is it.  But I think it's a shame we
didn't put more work into standby registration when we had time to get
that done.  It might not be necessary, but it would have delivered
some nice functionality that we are now not going to have for 9.1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 9:30 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think this thread has worked itself around to where it's entirely
>> pointless.
>
> I understand your frustration, but it's not clear to me that there *is*
> any simple solution to this problem.  Fundamentally, adding new relkinds
> to the system is always going to require running around and looking at a
> lot of code to see what's affected; and that goes for the error messages
> too.  I put no stock at all in the idea that writing a "guiding
> principle" in the error messages will avoid anything, because as often
> as not, adding a fundamentally new relkind is going to involve some
> tweaking of what those principles are.

I think that's true in some cases but not all.  The system-generated
attribute names thing actually applies in several cases, and I think
it's pretty cut-and-dried.  When you get into something like which
kinds of relations support triggers, that's a lot more arbitrary.

>> ... This message also does nothing to help the user understand WHY we don't
>> allow renaming the attributes of his sequence or TOAST table, whereas
>> the proposed revision does.
>
> I remain unconvinced that the average user cares, or will be able to
> extrapolate the message to understand what's supported or not, even
> if he does care about the reason for the restriction.

I'm convinced, but that only makes one of us.  I think for now what I
had better do is try to get this SQL/MED patch finished up by
soldiering through this mess rather than trying to fix it.  I think
it's going to be kind of ugly, but we haven't got another plan then
we're just going to have to live with the ugliness.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problems with autovacuum and vacuum

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 12:56 PM, JotaComm  wrote:
> Last week I had a serious problem with my PostgreSQL database. My autovacuum
> is OFF, but in September it started to prevent the transaction wraparoud;
> however last week the following message appeared continuously in my log:
>
> WARNING: database "production" must be vacuumed within 4827083 transactions
> HINT: To avoid a database shutdown, execute a full-database VACUUM in
> "production".
>
> This message appeared for five to six hours; after that, the message
> disappeared from log. Any idea about what could have happened?

I'm thinking that autovacuum kicked into gear to prevent transaction
wraparound.  Once it did enough work to stave off disaster, the
warning messages stopped appearing in the log.

> Every day the vacuum is executed on some tables; and on Sundays it's
> executed on all tables. But as the autovacuum is running since September,
> and it runs for a long time, the vacuum was blocked because autovacuum had
> been running on the same table. How should I procede in this case?

I guess the obvious thing to do would be to turn on autovacuum and
forget about manual vacuums.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

2010-12-30 Thread Robert Haas

On Thu, Dec 30, 2010 at 8:35 PM, Noah Misch  wrote:
> 4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt 
> field
> and the answers from its first argument.

Why its first argument?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

2010-12-30 Thread Noah Misch

On Fri, Dec 31, 2010 at 12:34:50AM -0500, Robert Haas wrote:
> On Thu, Dec 30, 2010 at 8:35 PM, Noah Misch  wrote:
> > 4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt 
> > field
> > and the answers from its first argument.
> 
> Why its first argument?

funcexempt would only be nonzero for FuncExpr of length coercion casts.  Those
have the subject datum as a first argument, typmod as second, and is-explicit
boolean as third.  The other arguments are effectively already validated.

That brings up a point -- the exemptor function also needs an is-explicit
argument, as that affects the decision for some types.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Hannu Krosing


On 30.12.2010 22:27, Robert Haas wrote:

On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs  wrote:

synchronous_replication (boolean)
Specifies whether transaction commit will wait for WAL records
to be replicated before the command returns a "success"
indication to the client.

The word "replicated" here could be taken to mean different things,
most obviously:

- slave has received the WAL
- slave has fsync'd the WAL
- slave has applied the WAL
Perhaps the level of "replication guarantee" should be decided on the 
slave side, by

having a configuration parameter there

report_as_replicated = received|written_to_disk|fsynced|applied

for different types of hosts may have wildly different guarantees and 
performance
parameters for these. One could envision a WAL-archive type "standby" 
which is

there for data persistence only will and never "apply" WAL.

of couse we could put a bitmap in the status update messages from slave 
and have
some quorum on options on master for when the data is "in sync", say 
"need 5 received
or (1 applied and 1 fsynced)", but I am pretty sure that trying to get 
anywhere with this
before applying the basic sync rep patch would push back sync rep to at 
least 9.2 if not 9.5


-
Hannu Krosing


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Hannu Krosing


On 31.12.2010 6:02, Robert Haas wrote:

On Thu, Dec 30, 2010 at 8:57 PM, Simon Riggs  wrote:

I'm not very clear what your response has to do with Stefan's comments.

My general perspective is that MySQL released a simple design a year
ahead of us, which should be to our collective shame. I will be working
towards delivering something useful in this release.

I don't feel ashamed of our feature set and I am not out to beat MySQL
or anyone else, just to deliver the best product that we can.
The key word here is "deliver" .  The aim is to "deliver" sync rep, not 
"specify, leaving out
controversial details". The registration part has been left out for a 
reason - while the
registration itself is easy, deciding all the interactions with already 
running replication is
not. Doing just the minimal support for sync rep (need acknowledge from 
at least one
standby) and leaving the management of standbys to user enables us to 
get to actual

working code instead of a pie-in-the-sky wishlist.


  Our
community has different interests than the MySQL community and that is
fine.  Still, I don't disagree that we should be aiming at feature
parity.



I see now that you've tried to design this feature in a way that is
similar to MySQL's offering, which does have some value.  But it
appears to me that the documentation you've written here is
substantially similar to the MySQL 5.5 reference documentation.  That
could get us into a world of legal trouble - that documentation is not
even open source, let alone BSD.

http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html
Maybe we should get someone who has not read mysql docs to re-write a 
spec in a

"clean room" fashion, by just inspecting code and asking Simon et.al.


I would rather concentrate on a minimal set of functionality that we can
all agree on.

Me too; and perhaps your proposal is it.  But I think it's a shame we
didn't put more work into standby registration when we had time to get
that done.

When you need _just_ the registration, then make a table and two functions
pg_standby_register(name) and pg_standby_unregister(name)
For a little more added functionality add a third one 
pg_standby_last_seen(name)
to update last seen timestamp and a script that polls all standbys and 
calls this.

   It might not be necessary, but it would have delivered
some nice functionality that we are now not going to have for 9.1.
There are tons of "nice functionality we are not going to have for 9.1", 
lets just not

make this cause even more nice functionality being left out !

-
Hannu Krosing


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

90 matches

Mail list logo