date:20110111

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 23:51, Jeff Davis wrote:

On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:

   1. If it's a primary recovering from a crash, and there is a
backup_label file, and the WAL referenced in the backup_label exists,
then it does a bunch of extra work during recovery; and
   2. In the same situation, if the WAL referenced in the backup_label
does not exist, then it PANICs with a HINT to tell you to remove the
backup_label.

Is this an opportunity to solve these problems and simplify the code?


It won't change the situation for pg_start_backup(), but with the patch
the base backups done via streaming won't have those issues, because
backup_label is not created (with that name) in the master.


Do you think we should change the backup protocol for normal base
backups to try to fix this? Or do you think that the simplicity of
unrestricted file copy is worth these problems?

We could probably make some fairly minor changes, like making a file on
the primary and telling users to exclude it from any base backup. The
danger, of course, is that they do copy it, and their backup is
compromised.


Yeah, I don't think it's a good idea to provide such a foot-gun.

Last time we discussed this there was the idea of creating a file in 
$PGDATA, and removing read-permissions from it, so that it couldn't be 
accidentally included in the backup. Even with that safeguard, it 
doesn't feel very safe - a backup running as root or some other special 
privileges might still be able to read it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-11 Thread Martijn van Oosterhout

On Tue, Jan 11, 2011 at 09:24:08PM -0500, Robert Haas wrote:
> commit 6c412f0605afeb809014553ff7ad28cf9ed5526b
> Author: Tom Lane 
> Date:   Sun May 1 18:56:19 2005 +
> 
> Change CREATE TYPE to require datatype output and send functions to have
> only one argument.  (Per recent discussion, the option to accept multiple
> arguments is pretty useless for user-defined types, and would be a likely
> source of security holes if it was used.)  Simplify call sites of
> output/send functions to not bother passing more than one argument.
> 
> ...but I don't understand the motivation behind it.

IIRC, the issue is that a type output function has to interpret a
Datum. Type output functions can also be called by users, so it is not
guarenteed that the given OID would actually match the type of the
Datum given. Hence the decision that the output function must be able
to determine itself what kind of Datum it is dealing with.

Thought experiment: the datum is an integer, but the oid says it's a
pass-by-ref datum. Now the code may now to use the integer to derefence
an arbitrary place in memory.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first. 
>   - Charles de Gaulle

signature.asc
Description: Digital signature

Re: [HACKERS] multiset patch review

2011-01-11 Thread Peter Eisentraut

On ons, 2011-01-12 at 13:52 +0900, Itagaki Takahiro wrote:
> I added a short description about MULTISET and example of operators
> in "Arrays > 8.14.7. Multiset Support" section in the docs.
> Is it enough? or what kind of information do you want?
> 
> Separate patches for src and doc attached. It includes a few bug fixes
> and cleanup. I changed the error code in trim_array() to
> ERRCODE_ARRAY_ELEMENT_ERROR according to the spec.
> 

You may want to read this thread about the cardinality function are you
trying to add:

http://archives.postgresql.org/pgsql-hackers/2009-02/msg01388.php

Also, what happened to the idea of a separate MULTISET type?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-11 Thread Andrew Dunstan




On 01/11/2011 09:06 PM, Robert Haas wrote:

On Tue, Jan 11, 2011 at 7:55 PM, Andrew Dunstan  wrote:

On 01/11/2011 07:17 PM, David E. Wheeler wrote:

On Jan 11, 2011, at 3:44 PM, Andrew Dunstan wrote:


I think there's at least a danger of breaking legacy code doing that. Say
you have some code that does a ref test on the argument, for example. The
behavior would now be changed.

I think that'd be pretty rare.

Possibly it would. But we usually try pretty hard to avoid that sort of
breakage.

By the same token, I'm not convinced it's a good idea for this
behavior to be off by default.  Surely many people will altogether
fail to notice that it's an option?  If we're going to have a
backward-compatibility GUC at all, ISTM that you ought to get the good
stuff unless you ask for the old way.



Sure, that seems reasonable.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-01-11 Thread Fujii Masao

On Sat, Dec 25, 2010 at 2:09 PM, Maxim Boguk  wrote:
> While I trying create reproducible test case for BUG #5798 I
> encountered very strange effect on two of my servers (both servers
> have same hardware platform/OS (freebsd 7.2) and PostgreSQL 8.4.4).
>
> Very simple test table created as:
> CREATE TABLE test (id integer);
> INSERT INTO test select generate_series(0,1);
>
> And I trying repeateble vacuum of that table with script:
>  perl -e "foreach (1..10) {system \"psql -d test -h -c 'vacuum test'\";}"
>
> And once per like an minute (really random intervals can be 5 minutes
> without problems can be 3 vacuum in row show same error)  I getting
> next errors:
> WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test" page 1
> ...
> WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test"
> page 30 for all pages of the relation.

The same problem happened on my customer's PostgreSQL 8.4.2
environment. Here are the error messages:

Dec 13 13:11:59 test postgres[28249]: [2-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 38
Dec 13 13:11:59 test postgres[28249]: [2-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [3-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 61
Dec 13 13:11:59 test postgres[28249]: [3-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [4-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 88
Dec 13 13:11:59 test postgres[28249]: [4-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [5-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 96
Dec 13 13:11:59 test postgres[28249]: [5-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [6-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 98
Dec 13 13:11:59 test postgres[28249]: [6-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [7-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 107
Dec 13 13:11:59 test postgres[28249]: [7-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
...
Dec 13 13:15:59 test postgres[9640]: [2-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 38
Dec 13 13:15:59 test postgres[9640]: [2-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [3-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 61
Dec 13 13:15:59 test postgres[9640]: [3-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [4-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 88
Dec 13 13:15:59 test postgres[9640]: [4-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [5-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 93
Dec 13 13:15:59 test postgres[9640]: [5-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [6-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 96
Dec 13 13:15:59 test postgres[9640]: [6-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [7-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 107
Dec 13 13:15:59 test postgres[9640]: [7-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
...

This problem was reported some times, but has not been resolved yet.
http://archives.postgresql.org/message-id/4C23A3CF.4080506%40frolix.muddywaters.org
http://archives.postgresql.org/message-id/g2o4b46b5f01004010610ib8625426uae6ee90ac1435ba1%40mail.gmail.com

Though I investigated the source code around PD_ALL_VISIBLE flag,
I could not identify the cause.

Does anyone have any ideas what the cause is?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Andrew Dunstan




On 01/11/2011 10:24 PM, Shigeru HANADA wrote:

On Tue, 11 Jan 2011 13:24:33 +0100
Magnus Hagander  wrote:

Aha. Thanks for the pointers, pfa a new version.


Changing pg_stat replication view would require to fix regression test
"rule". Please find attached patch.




I have just committed a fix for this.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Shigeru HANADA

On Tue, 11 Jan 2011 13:24:33 +0100
Magnus Hagander  wrote:
> Aha. Thanks for the pointers, pfa a new version.

Changing pg_stat replication view would require to fix regression test
"rule". Please find attached patch.

Regards,
--
Shigeru Hanada


rule_test.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 1: recheck index-based constraints

2011-01-11 Thread Robert Haas

On Sun, Jan 9, 2011 at 5:00 PM, Noah Misch  wrote:
> When ALTER TABLE rewrites a table, it reindexes, but the reindex does not
> revalidate UNIQUE/EXCLUDE constraints.  This behaves badly in cases like this,
> neglecting to throw an error on the new UNIQUE violation:
>
> CREATE TABLE t (c numeric UNIQUE);
> INSERT INTO t VALUES (1.1),(1.2);
> ALTER TABLE t ALTER c TYPE int;
>
> The comment gave a reason for skipping the checks: it would cause deadlocks 
> when
> we rewrite a system catalog.  So, this patch changes things to only skip the
> check for system catalogs.

It looks like this behavior was introduced by Tom's commit
1ddc2703a936d03953657f43345460b9242bbed1 on 2010-02-07, and it appears
to be quite broken.  The behavior is reasonable in the case of VACUUM
FULL and CLUSTER, but your example demonstrates that it's completely
broken in the case of ALTER TABLE.  This strikes me as something we
need to fix in REL9_0_STABLE as well as master, and my gut feeling is
that we ought to fix it not by excluding system relations but by
making ALTER TABLE work differently from VACUUM FULL and CLUSTER.
There's an efficiency benefit to skipping this even on normal
relations in cases where it isn't necessary, and it shouldn't be
necessary if we're rewriting the rows unchanged.

Also, you need to start adding these patches to the CF app.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 7:25 AM, Noah Misch  wrote:
> On Tue, Jan 11, 2011 at 06:37:33AM -0500, Robert Haas wrote:
>> On Sun, Jan 9, 2011 at 4:59 PM, Noah Misch  wrote:
>> > This begins the patch series for the design I recently proposed[1] for 
>> > avoiding
>> > some table rewrites in ALTER TABLE ... ALTER COLUMN ... TYPE. ?I'm posting 
>> > these
>> > patches today:
>> >
>> > 0 - new test cases
>>
>> This doesn't look right.  You might be building it, but you sure
>> aren't rebuilding it.
>>
>> +CREATE TABLE parent (keycol numeric PRIMARY KEY);
>> +NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "parent_pkey" for table "parent"
>> +DEBUG:  Rebuilding index "parent_pkey"
>
> Yes.  I considered saying "Building" unconditionally.  Differentiating the 
> debug
> message by passing down the fact that the index recently existed seemed like
> overkill.  What do you think?

I'm wondering if we should consider moving this call to index_build()
so that it appears everywhere that we build an index rather than just
in ALTER TABLE, and saying something like:

building index "%s" on table "%s"

> The theoretical basis is that users can do little to directly change the need 
> to
> rebuild a TOAST index.  We'll rebuild the TOAST index if and only if we 
> rewrote
> the table.  The practical basis is that the TOAST relation names contain 
> parent
> relation OIDs, so we don't want them mentioned in regression test output.

Perhaps in this case we could write:

building TOAST index for table "%s"

There's limited need for users to know the actual name of the TOAST
table, but I think it's better to log a line for all the actions we're
performing or none of them, rather than some but not others.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Florian Pflug

On Jan11, 2011, at 23:55 , Joel Jacobson wrote:
> 2011/1/11 Florian Pflug :
>> Could you give an example of the kind of trouble you're experiencing trying
>> to use a topological sort?
> 
> Let's say you have a table t and a view v.
> The view v is defined as select * from t;
> If we put all objects in a tree, with the public schema as the root,
> both v and t will directly under the root, but in reality, v cannot be
> created before t.

AFAICS, you get the following dependencies (apart from the obvious
NORMAL dependencies from the pg_class entries of t and v on public)

  t (pg_class) <--[INTERNAL]-- t (pg_type)
/°\
 |
 [NORMAL]
 |
_RETURN (pg_rewrite)
 ||
 [NORMAL] [INTERNAL]
 ||  
\./  \./
v (pg_class) <--[INTERNAL]-- v (pg_type)

INTERNAL dependencies mark objects which spring into existence once the
referenced (target in my diagram) object is created. You can thus fold a
node I (the INTERNALly-depending object) into a node O (the object created
by the user) if there is an INTERNAL dependency from I to O.
The diagram then becomes

v (pg_class) --[NORMAL]--> t (pg_class)

Which correctly states that t must be created before v.

> This is the reason why a normal topological sort doesn't work.
> You have to look at the deptype and sort nodes having "internal" edges
> between them differently.

I suggest you try to node-folding strategy and see how far it gets you.

> I guess it's time for plan B, sorting based on oid, no biggie, it will
> work for my purpose, but it's damn ugly.

That will probably crash and burn once the OIDs have wrapped around once.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Range Types

2011-01-11 Thread Robert Haas

On Sun, Jan 9, 2011 at 4:03 AM, Jeff Davis  wrote:
> It also might be worth figuring out why input functions get the type oid
> and output functions do not. I see this comment above getTypeIOParam():
>
>  * As of PostgreSQL 8.1, output functions receive only the value
> itself
>  * and not any auxiliary parameters, so the name of this routine is
> now
>  * a bit of a misnomer ... it should be
> getTypeInputParam.
>
> So, why was it eliminated?

Good question.  The relevant commit is here:

commit 6c412f0605afeb809014553ff7ad28cf9ed5526b
Author: Tom Lane 
Date:   Sun May 1 18:56:19 2005 +

Change CREATE TYPE to require datatype output and send functions to have
only one argument.  (Per recent discussion, the option to accept multiple
arguments is pretty useless for user-defined types, and would be a likely
source of security holes if it was used.)  Simplify call sites of
output/send functions to not bother passing more than one argument.

...but I don't understand the motivation behind it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] reviewers needed!

2011-01-11 Thread Robert Haas

We have 46 patches in this CommitFest so far and I know that there are
quite a few patches that have been posted but not added to the
CommitFest application yet (please fix this, if you are a patch author
who has failed to do this) and that there will be lots more patches
posted over the next few days.  I expect that we will have at least 60
patches in the CommitFest, maybe 75 or more.  These patches *will not*
review themselves.  If you're available to review for this final
CommitFest (and if you've submitted any patches to it, you should make
yourself available to review as well), then please either sign up for
some patches, or (better) email me off-list and I will assign you a
patch.  You don't need to be an experienced patch reviewer (although
if you are, your help is even-more-appreciated) or even an experienced
C programmer, but you do need to be willing to:

(1) test the patch and comment on what you think is good and bad about it,
(2) read the code to the extent you are able and comment on what you
think is good and bad about that, even if it's just "the formatting is
all wrong",
(3) do (1) and (2) in a timely fashion so that the patch author has
time to update it,
(4) prod the patch author if s/he does not respond to your review comments, and
(5) keep the commitfest application up to date with respect to your
chosen/assigned patch.

See also http://wiki.postgresql.org/wiki/Reviewing_a_Patch

As I'm sure everyone is aware, this is the LAST CommitFest for 9.1 and
we have a TON of patches, many of them large and important, to review.
 Please help however you can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 7:55 PM, Andrew Dunstan  wrote:
> On 01/11/2011 07:17 PM, David E. Wheeler wrote:
>> On Jan 11, 2011, at 3:44 PM, Andrew Dunstan wrote:
>>
>>> I think there's at least a danger of breaking legacy code doing that. Say
>>> you have some code that does a ref test on the argument, for example. The
>>> behavior would now be changed.
>>
>> I think that'd be pretty rare.
>
> Possibly it would. But we usually try pretty hard to avoid that sort of
> breakage.

By the same token, I'm not convinced it's a good idea for this
behavior to be off by default.  Surely many people will altogether
fail to notice that it's an option?  If we're going to have a
backward-compatibility GUC at all, ISTM that you ought to get the good
stuff unless you ask for the old way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 7:24 AM, Magnus Hagander  wrote:
>> No, do this at top
>>
>> if (walsnd->state == state)
>>  return;
>>
>> Keep spinlocks when actually setting it.

I think this is safe...

> Aha. Thanks for the pointers, pfa a new version.

...but I think you also need to take the spinlock when reading the value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Cédric Villemain

2011/1/11 Heikki Linnakangas :
> On 11.01.2011 21:50, Dimitri Fontaine wrote:
>>
>> Heikki Linnakangas  writes:
>>
>>> Now that we have a basic over-the-wire base backup capability in
>>> walsender,
>>> it would be nice to allow taking multiple base backups at the same time.
>>
>> I would prefer to be able to take a base backup from a standby, or is
>> that already possible?  What about cascading the wal shipping?  Maybe
>> those ideas are much bigger projects, but if I don't ask, I don't get to
>> know :)
>
> Yeah, those are bigger projects..

Way more interesting, IMHO.
Feature to allow multiple backup at the same time looks useless to me.
If DB is small, then it is not that hard to do that before or after a
possible nightly backup.
If DB is large necessary to comment ?

The only case I find interesting is if you can use the bandwith of
more than one ethernet device, else I would use rsync between nodes
after the inital basebackup, pretty sure.

>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 7:14 PM, Tom Lane  wrote:
> Andreas Karlsson  writes:
>> On Tue, 2011-01-11 at 14:01 -0500, Tom Lane wrote:
>>> It really shouldn't be useful to include those.  Attend what it says in
>>> the fine manual for CREATE OPERATOR CLASS:
>
>> Hm, that is not what I see when reading the source.
>
>> There can exist several entries in pg_amproc for one operator family
>> with the same short_number and function (both name and types).
>
> We're cheating in a small number of places by using a binary-compatible
> hash function to implement hashing for a datatype other than the one
> it's declared to work on.  I don't think that the existence of that hack
> means that getObjectDescription should bloat the descriptions of every
> amproc entry with generally-useless information.

I don't see how you can claim that it's remotely sane for different
objects to have the same description.  The whole point is that someone
is going to say "DROP something" and the system is going to say "no,
there's an object that depends on it", and the description it gives
won't uniquely identify which one.  If the information is needed to
distinguish which object is implicated, it's not useless.  The fact
that someone with an expert-level knowledge of PostgreSQL and a
divining rod may be able to determine which object they should be
worried about given only half of its primary key fields is not a good
reason to omit the other half.  In fact, there isn't any such reason,
and you're the only one arguing otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK for non-tables

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 10:35 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jan 11, 2011 at 4:46 AM, Tatsuo Ishii  wrote:
>>> For query based replication tools like pgpool-II (I don't know any
>>> other tools, for example Postgres XC falls in this category or
>>> not...), we need to be able to lock sequences. Fortunately it is allowed to:
>>>
>>> SELECT 1 FROM foo_sequece FOR UPDATE;
>>>
>>> but LOCK foo_sequence looks more appropreate syntax for me.
>
>> Those aren't doing the same thing.  The first is locking the one and
>> only tuple that is contained within the sequence, while the second is
>> locking the sequence object itself.
>
>> At this point, I'm inclined to think that the pg_dump comment is just
>> wrong, and we ought to fix it to say that we don't really want to be
>> able to lock other relations after all, and call it good.
>
> The reason that pg_dump tries to acquire locks at all is to ensure that
> it dumps a consistent view of the database.  The general excuse for not
> locking non-table objects is that (at least in most cases) they are
> defined by single catalog entries and so there's no way to see a
> non-self-consistent view of them.  Tables, being defined by a collection
> of rows in different catalogs, are *very* risky to dump without any
> lock.  This doesn't get noticeably better for non-table relation types.
>
> An example of the sort of risk I'm thinking about is dumping a view
> without any lock while someone else does a CREATE OR REPLACE VIEW on
> it.  You could very easily see a set of attributes (in pg_attribute)
> that don't agree with the view rules you pulled from pg_rewrite.  The
> risk is minimal now since we don't allow C.O.R.V. to change the column
> set, but as soon as somebody creates a patch that allows that, pg_dump
> will have a problem.

Actually, we do allow C.O.R.V. to do just that - I believe since 8.4.

rhaas=# create view v(a) as select 1;
CREATE VIEW
rhaas=# create or replace view v(a,b) as select 1, 2;
CREATE VIEW

> Note that using a serializable transaction (with or without "true"
> serializability) doesn't fix this issue, since pg_dump depends so
> heavily on backend-side support functions that work in SnapshotNow mode.
> It really needs locks to ensure that the support functions see a view
> consistent with its own catalog reads.

In that case, can I have some comments on approaches mentioned in my OP?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spread checkpoint sync

2011-01-11 Thread Robert Haas

On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith  wrote:
> Having the pg_stat_bgwriter.buffers_backend_fsync patch available all the
> time now has made me reconsider how important one potential bit of
> refactoring here would be.  I managed to catch one of the situations where
> really popular relations were being heavily updated in a way that was
> competing with the checkpoint on my test system (which I can happily share
> the logs of), with the instrumentation patch applied but not the spread sync
> one:
>
> LOG:  checkpoint starting: xlog
> DEBUG:  could not forward fsync request because request queue is full
> CONTEXT:  writing block 7747 of relation base/16424/16442
> DEBUG:  could not forward fsync request because request queue is full
> CONTEXT:  writing block 42688 of relation base/16424/16437
> DEBUG:  could not forward fsync request because request queue is full
> CONTEXT:  writing block 9723 of relation base/16424/16442
> DEBUG:  could not forward fsync request because request queue is full
> CONTEXT:  writing block 58117 of relation base/16424/16437
> DEBUG:  could not forward fsync request because request queue is full
> CONTEXT:  writing block 165128 of relation base/16424/16437
> [330 of these total, all referring to the same two relations]
>
> DEBUG:  checkpoint sync: number=1 file=base/16424/16448_fsm
> time=10132.83 msec
> DEBUG:  checkpoint sync: number=2 file=base/16424/11645 time=0.001000 msec
> DEBUG:  checkpoint sync: number=3 file=base/16424/16437 time=7.796000 msec
> DEBUG:  checkpoint sync: number=4 file=base/16424/16448 time=4.679000 msec
> DEBUG:  checkpoint sync: number=5 file=base/16424/11607 time=0.001000 msec
> DEBUG:  checkpoint sync: number=6 file=base/16424/16437.1 time=3.101000 msec
> DEBUG:  checkpoint sync: number=7 file=base/16424/16442 time=4.172000 msec
> DEBUG:  checkpoint sync: number=8 file=base/16424/16428_vm time=0.001000
> msec
> DEBUG:  checkpoint sync: number=9 file=base/16424/16437_fsm time=0.001000
> msec
> DEBUG:  checkpoint sync: number=10 file=base/16424/16428 time=0.001000 msec
> DEBUG:  checkpoint sync: number=11 file=base/16424/16425 time=0.00 msec
> DEBUG:  checkpoint sync: number=12 file=base/16424/16437_vm time=0.001000
> msec
> DEBUG:  checkpoint sync: number=13 file=base/16424/16425_vm time=0.001000
> msec
> LOG:  checkpoint complete: wrote 3032 buffers (74.0%); 0 transaction log
> file(s) added, 0 removed, 0 recycled; write=1.742 s, sync=10.153 s,
> total=37.654 s; sync files=13, longest=10.132 s, average=0.779 s
>
> Note here how the checkpoint was hung on trying to get 16448_fsm written
> out, but the backends were issuing constant competing fsync calls to these
> other relations.  This is very similar to the production case this patch was
> written to address, which I hadn't been able to share a good example of yet.
>  That's essentially what it looks like, except with the contention going on
> for minutes instead of seconds.
>
> One of the ideas Simon and I had been considering at one point was adding
> some better de-duplication logic to the fsync absorb code, which I'm
> reminded by the pattern here might be helpful independently of other
> improvements.

Hopefully I'm not stepping on any toes here, but I thought this was an
awfully good idea and had a chance to take a look at how hard it would
be today while en route from point A to point B.  The answer turned
out to be "not very", so PFA a patch that seems to work.  I tested it
by attaching gdb to the background writer while running pgbench, and
it eliminate the backend fsyncs without even breaking a sweat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 4457df6..f6cd8dc 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -182,6 +182,7 @@ static void CheckArchiveTimeout(void);
 static void BgWriterNap(void);
 static bool IsCheckpointOnSchedule(double progress);
 static bool ImmediateCheckpointRequested(void);
+static bool CompactBgwriterRequestQueue(void);
 
 /* Signal handlers */
 
@@ -1090,10 +1091,20 @@ ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
 	/* Count all backend writes regardless of if they fit in the queue */
 	BgWriterShmem->num_backend_writes++;
 
+	/*
+	 * If the background writer isn't running or the request queue is full,
+	 * the backend will have to perform its own fsync request.  But before
+	 * forcing that to happen, we can try to compact the background writer
+	 * request queue.
+	 */
 	if (BgWriterShmem->bgwriter_pid == 0 ||
-		BgWriterShmem->num_requests >= BgWriterShmem->max_requests)
+		(BgWriterShmem->num_requests >= BgWriterShmem->max_requests
+		&& !CompactBgwriterRequestQueue()))
 	{
-		/* Also count the subset where backends have to do their own fsync */
+		/*
+		 * Count the subset of writes where backends have to do their own
+		 * fsync
+		 */
 		BgWriterShm

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-11 Thread Andrew Dunstan




On 01/11/2011 07:17 PM, David E. Wheeler wrote:

On Jan 11, 2011, at 3:44 PM, Andrew Dunstan wrote:


I think there's at least a danger of breaking legacy code doing that. Say you 
have some code that does a ref test on the argument, for example. The behavior 
would now be changed.

I think that'd be pretty rare.




Possibly it would. But we usually try pretty hard to avoid that sort of 
breakage.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK for non-tables

2011-01-11 Thread Tatsuo Ishii

> In the SEQUENCE example above, SELECT ... FOR UPDATE is certainly not
> adequate to protect the sequence against DDL-level changes.  Fortunately
> sequences don't have too many DDL commands, but still an ALTER RENAME
> might be enough to confuse pg_dump.
> 
> (By the way, does that SELECT ... FOR UPDATE actually accomplish
> anything at all?  nextval() doesn't go through heap_update, and neither
> does ALTER SEQUENCE, so I'd be a bit surprised if it really manages to
> block changes to the sequence.)

Of course "SELECT ... FOR UPDATE" does not block nextval().  It just
blocks concurrent "SELECT ... FOR UPDATE" in other session. This is
enough for pgpool's use case.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something fishy about the current Makefiles

2011-01-11 Thread Jeff Davis

On Thu, 2011-01-06 at 12:27 -0500, Robert Haas wrote:
> I've noticed something like this as well, but haven't been able to
> figure out exactly what is going wrong.

I've been having build problems ever since:

http://archives.postgresql.org/message-id/1291256879.25389.6.ca...@jdavis-ux.asterdata.local

It's gotten much worse recently, where changing a c file seems to
require multiple top-level "make install" commands to take effect (as
Tom describes). I don't think it's limited to the access/ subdirectory,
though.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-11 Thread David E. Wheeler

On Jan 11, 2011, at 3:44 PM, Andrew Dunstan wrote:

> I think there's at least a danger of breaking legacy code doing that. Say you 
> have some code that does a ref test on the argument, for example. The 
> behavior would now be changed.

I think that'd be pretty rare.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Andreas Karlsson  writes:
> On Tue, 2011-01-11 at 14:01 -0500, Tom Lane wrote:
>> It really shouldn't be useful to include those.  Attend what it says in
>> the fine manual for CREATE OPERATOR CLASS:

> Hm, that is not what I see when reading the source.

> There can exist several entries in pg_amproc for one operator family
> with the same short_number and function (both name and types).

We're cheating in a small number of places by using a binary-compatible
hash function to implement hashing for a datatype other than the one
it's declared to work on.  I don't think that the existence of that hack
means that getObjectDescription should bloat the descriptions of every
amproc entry with generally-useless information.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something fishy about the current Makefiles

2011-01-11 Thread Tom Lane

Peter Eisentraut  writes:
> On tor, 2011-01-06 at 11:57 -0500, Tom Lane wrote:
>> Whilst fooling around with GIN, I have repeatedly observed that doing
>> "make" in src/backend/access/gin, followed by "make install-bin" in
>> src/backend, fails to rebuild the postgres executable --- it just
>> installs the existing one.  A second execution of "make install-bin"
>> does notice that postgres is out of date and rebuilds it.  This
>> procedure for rebuilding after changing one or two .c files has always
>> worked for me before.  I can't avoid the suspicion that the recent
>> changes to make things more parallel-friendly broke something.

> This fixes it, but it's beyond me why.

Further experimentation suggests that *only* subdirectories of access/
are vulnerable to the issue.  I wonder whether this syntax:

> -$(SUBDIROBJS): | $(SUBDIRS:%=%-recursive) ;

is weird with respect to the first SUBDIRS value.  Or maybe it's a plain
old bug in gmake.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Andreas Karlsson

On Tue, 2011-01-11 at 14:01 -0500, Tom Lane wrote:
> It really shouldn't be useful to include those.  Attend what it says in
> the fine manual for CREATE OPERATOR CLASS:
> 
>   In a FUNCTION clause, the operand data type(s) the function is
>   intended to support, if different from the input data type(s) of
>   the function (for B-tree and hash indexes) or the class's data
>   type (for GIN and GiST indexes). These defaults are always
>   correct, so there is no point in specifying op_type in a
>   FUNCTION clause in CREATE OPERATOR CLASS, but the option is
>   provided for consistency with the comparable syntax in ALTER
>   OPERATOR FAMILY.
> 
> The reason the ALTER OPERATOR FAMILY DROP syntax has to include operand
> types is that it lacks the full name/types of the referenced function.
> Since getObjectDescription *does* provide those, it doesn't serve any
> real purpose to repeat the information.
> 
>   regards, tom lane

Hm, that is not what I see when reading the source.

There can exist several entries in pg_amproc for one operator family
with the same short_number and function (both name and types). The only
difference is in lefttype and righttype. For example these two in
array_ops.

SELECT *, amproc::oid FROM pg_amproc WHERE oid IN (10608,10612);
 amprocfamily | amproclefttype | amprocrighttype | amprocnum |  amproc   | 
amproc 
--++-+---+---+
 2745 |   1009 |1009 | 1 | bttextcmp |
360
 2745 |   1015 |1015 | 1 | bttextcmp |
360
(2 rows)

The reason you must specify the types in ALTER OPERATOR FAMILY DROP is
that otherwise it would not know which row of these two to drop. And
that information is not included in the current string returned by
getObjectDescription.

As I interpret it the pg_amproc entries belonging to the array_ops
family really belong to one of the opclasses (_int2_ops, _text_ops, ...)
and the lefttype and righttype are used to look up the amproc entries
based on the opcintype of the opclass of the index class in
IndexSupportInitialize. The comment in pg_amproc.h aslo seems to confirm
my view.[1]

So instead of

function 1 bttextcmp(text,text) of operator family array_ops for access method 
gin

or in a improved version of my stab at a patch

function 1 (text[],text[]) bttextcmp(text,text) of operator family array_ops 
for access method gin

it really is

function 1 bttextcmp(text,text) of operator class _text_ops for access method 
gin

The "imporved version" might be simpler to implement and more future
proof though if we start using pg_amproc for other lookups.

--

[1] From pg_amproc.h about lefttype and righttype.

> The primary key for this table is  amprocrighttype, amprocnum>.  The "default" support functions for a
> particular opclass within the family are those with amproclefttype =
> amprocrighttype = opclass's opcintype.  These are the ones loaded into the
> relcache for an index and typically used for internal index operations.
> Other support functions are typically used to handle cross-type indexable
> operators with oprleft/oprright matching the entry's amproclefttype and
> amprocrighttype. The exact behavior depends on the index AM, however, and
> some don't pay attention to non-default functions at all.

Regards,
Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-11 Thread Andrew Dunstan




On 01/11/2011 06:07 PM, David E. Wheeler wrote:



To maintain
compatibility with existing pl/perl code a new variable,
plperl.convert_array_arguments (better name?), is introduced. Its default
value is false, when set to true it triggers the new behavior, i.e.

Have you considered instead passing an array-based object with is string 
overloading designed to return the pg array string format? That would make for 
nice, transparent compatibility without the need for a GUC.

Not my idea, BTW, but suggested by Tim Bunce.




I think there's at least a danger of breaking legacy code doing that. 
Say you have some code that does a ref test on the argument, for 
example. The behavior would now be changed.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Jeff Davis

On Tue, 2011-01-11 at 23:07 +0100, Magnus Hagander wrote: 
> I think keeping the flexibility is important. If it does add an extra
> step I think that's ok once we have pg_basebackup, but it must be
> reasonably *safe*. Corrupt backups from forgetting to exclude a file
> seems not so.

Agreed.

> But if the problem is you forgot to exclude it, can't you just remove
> it at a later time?

If you think you are recovering the primary, and it's really the backup,
then you get corruption. It's too late to remove a file after that
(unless you have a backup of your backup ;) ).

If you think you are restoring a backup, and it's really a primary that
crashed, then you run into one of the two problems that I mentioned
(which are less severe than corruption, but very annoying).

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-11 Thread David E. Wheeler

On Jan 11, 2011, at 2:25 PM, Alexey Klyukin wrote:

> Hello,
> 
> Here's the patch that improves handling of arrays as pl/perl function input
> arguments, converting postgres arrays of arbitrary dimensions into perl array
> references.

Awesome! I've wanted this for *years*.

> It includes regression tests and a documentation changes, and it
> builds and runs successfully on my mac os x and linux boxes. To maintain
> compatibility with existing pl/perl code a new variable,
> plperl.convert_array_arguments (better name?), is introduced. Its default
> value is false, when set to true it triggers the new behavior, i.e.

Have you considered instead passing an array-based object with is string 
overloading designed to return the pg array string format? That would make for 
nice, transparent compatibility without the need for a GUC.

Not my idea, BTW, but suggested by Tim Bunce.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_regress multibyte setting

2011-01-11 Thread Peter Eisentraut

Is it a good idea that we run make check with MULTIBYTE = SQL_ASCII by
default?  We run it with the user's locale by default, so shouldn't we
use the encoding that belongs to the locale by default?  Otherwise we
are testing a fairly unrepresentative environment.  If you really want
to test SQL_ASCII you could of course choose it explicitly or set the
locale to C.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Joel Jacobson

2011/1/11 Florian Pflug :
> Could you give an example of the kind of trouble you're experiencing trying
> to use a topological sort?

Let's say you have a table t and a view v.
The view v is defined as select * from t;
If we put all objects in a tree, with the public schema as the root,
both v and t will directly under the root, but in reality, v cannot be
created before t.
This is the reason why a normal topological sort doesn't work.
You have to look at the deptype and sort nodes having "internal" edges
between them differently.
The pg_dump source code of course contains all the logic necessary to
do the trick, but it's not that easy to follow.

I guess it's time for plan B, sorting based on oid, no biggie, it will
work for my purpose, but it's damn ugly.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] arrays as pl/perl input arguments [PATCH]

2011-01-11 Thread Alexey Klyukin

Hello,

Here's the patch that improves handling of arrays as pl/perl function input
arguments, converting postgres arrays of arbitrary dimensions into perl array
references. It includes regression tests and a documentation changes, and it
builds and runs successfully on my mac os x and linux boxes. To maintain
compatibility with existing pl/perl code a new variable,
plperl.convert_array_arguments (better name?), is introduced. Its default
value is false, when set to true it triggers the new behavior, i.e.

SET plperl.convert_array_arguments = true;
CREATE OR REPLACE FUNCTION test_array(text[]) RETURNS TEXT AS $$
my $arg = shift;
if (ref $arg eq 'ARRAY') {
return "array conversion is enabled";
} else {
return "array conversion is disabled";
}
$$ LANGUAGE plperl;

test=# select test_array('{1,2,3}');
 test_array  
-
 array conversion is enabled
(1 row)

You can find other, less trivial examples, by examining plperl_array
regression test.

The implementation detects whether the input argument of a perl function is an
array, and calls plperl_ref_from_pg_array if it is. The latter obtains a flat
array of Datums from deconstruct_array and, using information about array
dimensions, recursively creates perl array references in split_array. To pass
array information between recursive calls a new 'plperl_array_info' struct was
added. Arrays as members of composite types are also handled in
plperl_hash_from_tuple.

/A
--
Alexey Klyukin
The PostgreSQL Company - Command Prompt, Inc.




pg_to_perl_arrays.diff
Description: Binary data


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread David Fetter

On Tue, Jan 11, 2011 at 05:06:34PM -0500, Robert Haas wrote:
> On Jan 11, 2011, at 2:07 PM, Tom Lane  wrote:
> > The whole thing just seems too fragile and dangerous to be worth dealing
> > with given that actual usage will be a corner case.  *I* sure wouldn't
> > trust it to work when the chips were down.
> 
> I hope this assessment proves to be incorrect, because like Magnus and 
> Heikki, I think this will be a major usability improvement. It takes us from 
> "there's a way to do that" to "it just works".

(It just works)++ :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something fishy about the current Makefiles

2011-01-11 Thread Peter Eisentraut

On tor, 2011-01-06 at 11:57 -0500, Tom Lane wrote:
> Whilst fooling around with GIN, I have repeatedly observed that doing
> "make" in src/backend/access/gin, followed by "make install-bin" in
> src/backend, fails to rebuild the postgres executable --- it just
> installs the existing one.  A second execution of "make install-bin"
> does notice that postgres is out of date and rebuilds it.  This
> procedure for rebuilding after changing one or two .c files has always
> worked for me before.  I can't avoid the suspicion that the recent
> changes to make things more parallel-friendly broke something.

This fixes it, but it's beyond me why.

diff --git i/src/backend/common.mk w/src/backend/common.mk
index 4e0a5da..5d599db 100644
--- i/src/backend/common.mk
+++ w/src/backend/common.mk
@@ -34,7 +34,7 @@ objfiles.txt: Makefile $(SUBDIROBJS) $(OBJS)
 expand_subsys = $(foreach file,$(1),$(if $(filter 
%/objfiles.txt,$(file)),$(patsubst ../../src/backend/%,%,$(addprefix 
$(top_builddir)/,$(shell cat $(file,$(file)))
 
 # Parallel make trickery
-$(SUBDIROBJS): | $(SUBDIRS:%=%-recursive) ;
+$(SUBDIROBJS): $(SUBDIRS:%=%-recursive) ;
 
 .PHONY: $(SUBDIRS:%=%-recursive)
 $(SUBDIRS:%=%-recursive):



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Robert Haas

On Jan 11, 2011, at 2:07 PM, Tom Lane  wrote:
> The whole thing just seems too fragile and dangerous to be worth dealing
> with given that actual usage will be a corner case.  *I* sure wouldn't
> trust it to work when the chips were down.

I hope this assessment proves to be incorrect, because like Magnus and Heikki, 
I think this will be a major usability improvement. It takes us from "there's a 
way to do that" to "it just works".

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 22:51, Jeff Davis  wrote:
> On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:
>> >   1. If it's a primary recovering from a crash, and there is a
>> > backup_label file, and the WAL referenced in the backup_label exists,
>> > then it does a bunch of extra work during recovery; and
>> >   2. In the same situation, if the WAL referenced in the backup_label
>> > does not exist, then it PANICs with a HINT to tell you to remove the
>> > backup_label.
>> >
>> > Is this an opportunity to solve these problems and simplify the code?
>>
>> It won't change the situation for pg_start_backup(), but with the patch
>> the base backups done via streaming won't have those issues, because
>> backup_label is not created (with that name) in the master.
>
> Do you think we should change the backup protocol for normal base
> backups to try to fix this? Or do you think that the simplicity of
> unrestricted file copy is worth these problems?
>
> We could probably make some fairly minor changes, like making a file on
> the primary and telling users to exclude it from any base backup. The
> danger, of course, is that they do copy it, and their backup is
> compromised.


I think keeping the flexibility is important. If it does add an extra
step I think that's ok once we have pg_basebackup, but it must be
reasonably *safe*. Corrupt backups from forgetting to exclude a file
seems not so.

But if the problem is you forgot to exclude it, can't you just remove
it at a later time?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Jeff Davis

On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:
> >   1. If it's a primary recovering from a crash, and there is a
> > backup_label file, and the WAL referenced in the backup_label exists,
> > then it does a bunch of extra work during recovery; and
> >   2. In the same situation, if the WAL referenced in the backup_label
> > does not exist, then it PANICs with a HINT to tell you to remove the
> > backup_label.
> >
> > Is this an opportunity to solve these problems and simplify the code?
> 
> It won't change the situation for pg_start_backup(), but with the patch 
> the base backups done via streaming won't have those issues, because 
> backup_label is not created (with that name) in the master.

Do you think we should change the backup protocol for normal base
backups to try to fix this? Or do you think that the simplicity of
unrestricted file copy is worth these problems?

We could probably make some fairly minor changes, like making a file on
the primary and telling users to exclude it from any base backup. The
danger, of course, is that they do copy it, and their backup is
compromised.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-01-11 Thread Euler Taveira de Oliveira


Em 10-01-2011 05:25, Greg Smith escreveu:

Euler Taveira de Oliveira wrote:

Em 07-01-2011 22:59, Greg Smith escreveu:

setrandom: invalid maximum number -2147467296


It is failing at atoi() circa pgbench.c:1036. But it just the first
one. There are some variables and constants that need to be converted
to int64 and some functions that must speak 64-bit such as getrand().
Are you working on a patch?


http://archives.postgresql.org/pgsql-hackers/2010-01/msg02868.php
http://archives.postgresql.org/message-id/4c326f46.4050...@2ndquadrant.com

Greg, I just improved your patch. I tried to work around the problems pointed 
out in the above threads. Also, I want to raise some points:


(i) If we want to support and scale factor greater than 21474 we have to 
convert some columns to bigint; it will change the test. From the portability 
point it is a pity but as we have never supported it I'm not too worried about 
it. Why? Because it will use bigint columns only if the scale factor is 
greater than 21474. Is it a problem? I don't think so because generally people 
compare tests with the same scale factor.


(ii) From the performance perspective, we need to test if the modifications 
don't impact performance. I don't create another code path for 64-bit 
modifications (it is too ugly) and I'm afraid some modifications affect the 
32-bit performance. I'm in a position to test it though because I don't have a 
big machine ATM. Greg, could you lead these tests?


(iii) I decided to copy scanint8() (called strtoint64 there) from backend 
(Robert suggestion [1]) because Tom pointed out that strtoll() has portability 
issues. I replaced atoi() with strtoint64() but didn't do any performance tests.


Comments?


[1] http://archives.postgresql.org/pgsql-hackers/2010-07/msg00173.php


--
  Euler Taveira de Oliveira
  http://www.timbira.com/
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 7c2ca6e..e9eb720 100644
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
***
*** 60,65 
--- 60,67 
  #define INT64_MAX	INT64CONST(0x7FFF)
  #endif
  
+ #define MAX_RANDOM_VALUE64	INT64_MAX
+ 
  /*
   * Multi-platform pthread implementations
   */
*** usage(const char *progname)
*** 364,378 
  		   progname, progname);
  }
  
  /* random number generator: uniform distribution from min to max inclusive */
! static int
! getrand(int min, int max)
  {
  	/*
  	 * Odd coding is so that min and max have approximately the same chance of
  	 * being selected as do numbers between them.
  	 */
! 	return min + (int) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0));
  }
  
  /* call PQexec() and exit() on failure */
--- 366,451 
  		   progname, progname);
  }
  
+ /*
+  * strtoint64 -- convert a string to 64-bit integer
+  *
+  * this function is a modified version of scanint8() from
+  * src/backend/utils/adt/int8.c.
+  *
+  * XXX should it have a return value?
+  *
+  */
+ static int64
+ strtoint64(const char *str)
+ {
+ 	const char *ptr = str;
+ 	int64		result = 0;
+ 	int			sign = 1;
+ 
+ 	/*
+ 	 * Do our own scan, rather than relying on sscanf which might be broken
+ 	 * for long long.
+ 	 */
+ 
+ 	/* skip leading spaces */
+ 	while (*ptr && isspace((unsigned char) *ptr))
+ 		ptr++;
+ 
+ 	/* handle sign */
+ 	if (*ptr == '-')
+ 	{
+ 		ptr++;
+ 
+ 		/*
+ 		 * Do an explicit check for INT64_MIN.	Ugly though this is, it's
+ 		 * cleaner than trying to get the loop below to handle it portably.
+ 		 */
+ 		if (strncmp(ptr, "9223372036854775808", 19) == 0)
+ 		{
+ 			result = -INT64CONST(0x7fff) - 1;
+ 			ptr += 19;
+ 			goto gotdigits;
+ 		}
+ 		sign = -1;
+ 	}
+ 	else if (*ptr == '+')
+ 		ptr++;
+ 
+ 	/* require at least one digit */
+ 	if (!isdigit((unsigned char) *ptr))
+ 		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+ 
+ 	/* process digits */
+ 	while (*ptr && isdigit((unsigned char) *ptr))
+ 	{
+ 		int64		tmp = result * 10 + (*ptr++ - '0');
+ 
+ 		if ((tmp / 10) != result)		/* overflow? */
+ 			fprintf(stderr, "value \"%s\" is out of range for type bigint\n", str);
+ 		result = tmp;
+ 	}
+ 
+ gotdigits:
+ 
+ 	/* allow trailing whitespace, but not other trailing chars */
+ 	while (*ptr != '\0' && isspace((unsigned char) *ptr))
+ 		ptr++;
+ 
+ 	if (*ptr != '\0')
+ 		fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
+ 
+ 	return ((sign < 0) ? -result : result);
+ }
+ 
  /* random number generator: uniform distribution from min to max inclusive */
! static int64
! getrand(int64 min, int64 max)
  {
  	/*
  	 * Odd coding is so that min and max have approximately the same chance of
  	 * being selected as do numbers between them.
  	 */
! 	return min + (int64) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE64 + 1.0));
  }
  
  /* call PQexec() and exit() on failure */
*** top:
*** 887,893 
  		if (commands[st->state] == NULL)
  		{
  			st->state = 0;
! 			st->u

Re: [HACKERS] Add function dependencies

2011-01-11 Thread Peter Eisentraut

On tis, 2011-01-11 at 16:57 +0100, Dimitri Fontaine wrote:
> Peter Eisentraut  writes:
> > Making it work for language SQL would be nice, though.
> 
> Please consider a new DEPENDENCY_XXX constant for that though, because
> otherwise I think it could cause problems in the extension's dependency
> tracking.  Even with a new DEPENDENCY_FUNCALL or other constant, the
> extension code would need to be aware of this new kind of not-to-follow
> dependencies somehow, I guess.

What's a "not-to-follow dependency"?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 22:16, Jeff Davis wrote:

On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote:

So, this patch modifies the internal do_pg_start/stop_backup functions
so that in addition to the traditional mode of operation, where a
backup_label file is created in the data directory where it's backed up
along with all other files, the backup label file is be returned to the
caller, and the caller is responsible for including it in the backup.
The code in replication/basebackup.c includes it in the tar file that's
streamed the client, as "backup_label".


Perhaps we can use this more intelligent form of base backup to
differentiate between:
  a. a primary that has crashed while a backup was in progress; and
  b. an online backup that is being restored.

Allowing the user to do an unrestricted file copy as a base backup
doesn't allow us to make that differentiation. That lead to the two bugs
that we fixed in StartupXLOG(). And right now there are still two
problems remaining (albeit less severe):

  1. If it's a primary recovering from a crash, and there is a
backup_label file, and the WAL referenced in the backup_label exists,
then it does a bunch of extra work during recovery; and
  2. In the same situation, if the WAL referenced in the backup_label
does not exist, then it PANICs with a HINT to tell you to remove the
backup_label.

Is this an opportunity to solve these problems and simplify the code?


It won't change the situation for pg_start_backup(), but with the patch 
the base backups done via streaming won't have those issues, because 
backup_label is not created (with that name) in the master.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Jan Urbański

On 11/01/11 18:59, Tom Lane wrote:
> =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
>> On 11/01/11 17:11, Tom Lane wrote:
>>> Huh?  Why in the world would the specific location of the #include have
>>> anything to do with the problem?
> 
>> I'v having a hard time convincing make to generate errcodes.h before
>> compiling any .c file that includes postgres.h. The only way I found was
>> to make src/include/errcodes.h a dependancy of the all target.
> 
> Peter would probably be a better person than me to answer that, but I
> imagine that what you want is similar to what src/backend/Makefile does
> for parser/gram.h, only applied at the src/ level or maybe even the
> root.

OK, that was a nudge in the right direction. Basing the rules from
src/backend/Makefile I changed src/Makefile to rebuild
src/include/errcodes.h before building the subdirectories and... it
failed miserably. There's some trickery beyond my understanding here.
There's a rule like this in src/backend/Makefile:

$(SUBDIRS:%=%-recursive): $(top_builddir)/src/include/parser/gram.h

that triggers checking whether gram.h needs to be rebuilt before
recursing into each SUBDIR.

A similar trick in src/Makefile doesn't work, because it's
src/backend/common.mk that is responsible for the SUBDIR-recursive
calls, and src/Makefile does not include it.

>From what I gathered by reading Makefile.global, the $(recurse) call in
enters each SUBDIR and builds a target called --recurse.
And actually, if I change my rule to read:

$(SUBDIRS:%=all-%-recurse): $(top_builddir)/src/include/utils/errcodes.h

it works. Now whether that's acceptable or not is another thing entirely...

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Jeff Davis

On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote:
> So, this patch modifies the internal do_pg_start/stop_backup functions 
> so that in addition to the traditional mode of operation, where a 
> backup_label file is created in the data directory where it's backed up 
> along with all other files, the backup label file is be returned to the 
> caller, and the caller is responsible for including it in the backup. 
> The code in replication/basebackup.c includes it in the tar file that's 
> streamed the client, as "backup_label".

Perhaps we can use this more intelligent form of base backup to
differentiate between:
 a. a primary that has crashed while a backup was in progress; and
 b. an online backup that is being restored.

Allowing the user to do an unrestricted file copy as a base backup
doesn't allow us to make that differentiation. That lead to the two bugs
that we fixed in StartupXLOG(). And right now there are still two
problems remaining (albeit less severe):

 1. If it's a primary recovering from a crash, and there is a
backup_label file, and the WAL referenced in the backup_label exists,
then it does a bunch of extra work during recovery; and
 2. In the same situation, if the WAL referenced in the backup_label
does not exist, then it PANICs with a HINT to tell you to remove the
backup_label.

Is this an opportunity to solve these problems and simplify the code?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 21:50, Dimitri Fontaine wrote:

Heikki Linnakangas  writes:


Now that we have a basic over-the-wire base backup capability in walsender,
it would be nice to allow taking multiple base backups at the same time.


I would prefer to be able to take a base backup from a standby, or is
that already possible?  What about cascading the wal shipping?  Maybe
those ideas are much bigger projects, but if I don't ask, I don't get to
know :)


Yeah, those are bigger projects..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL ; stored procedures

2011-01-11 Thread Alvaro Herrera

Excerpts from Stephen Frost's message of vie ene 07 15:29:52 -0300 2011:
> * Stephen Frost (sfr...@snowman.net) wrote:
> > * Robert Haas (robertmh...@gmail.com) wrote:
> > > > Making it part of DISCARD PLANS; and back-patching it to 8.3 where
> > > > DISCARD was introduced would be awesome for me. :)
> > > 
> > > I'd need to look at this more closely before committing anything, but
> > > at first blush I think that's reasonable.  Have a patch?
> > 
> > To be honest, I was fully expecting a response of "that's hard to do."
> 
> Soo, yeah, I found the "this is hard" part.  Basically, the plan
> cacheing, etc, is all handled by the stored procedures themselves, and
> we havn't got any way to tell a PL "destroy all your plans."  We might
> be able to hack up fmgr to throw away all references to functions, but
> that wouldn't release the memory they use up, making 'discard plans;'
> leak like a sieve.

What this discussion suggests to me is that cached plans need to be
tracked by a resource owner that's linked to the function.  The problem
I see with this idea is figure out what module would keep track of
resowners that need to be reset ...  Other than that I think it should
be straightforward :-)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Dimitri Fontaine

Heikki Linnakangas  writes:

> Now that we have a basic over-the-wire base backup capability in walsender,
> it would be nice to allow taking multiple base backups at the same time.

I would prefer to be able to take a base backup from a standby, or is
that already possible?  What about cascading the wal shipping?  Maybe
those ideas are much bigger projects, but if I don't ask, I don't get to
know :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: RangeTypes

2011-01-11 Thread David Fetter

On Tue, Jan 11, 2011 at 01:16:47AM -0800, Jeff Davis wrote:
> Ok, I have made some progress. This is still a proof-of-concept patch,
> but the important pieces are working together.
> 
> Synopsis:
> 
>   CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, 
> SUBTYPE_CMP=numeric_cmp);
> 
>   SELECT range_eq('[1,2.2)'::numrange,'[1,2.2]');
>   SELECT range_lbound('(3.7,9]'::numrange);
>   SELECT range(6.7);
>   SELECT '-'::numrange; -- empty
>   SELECT '[1, NULL]'::numrange; -- ] will become )
>   SELECT '(INF, 3)'::numrange;
> 
> I haven't completed many of the other generic functions, because I'd
> like to make sure I'm on the right track first. The important thing
> about the functions above is that they show ANYRANGE working in
> conjunction with ANYELEMENT in various combinations, which was a
> significant part of this patch.
> 
> Here are the open items:
> 
> 1. Generic functions -- most of which are fairly obvious. However, I
> want to make sure I'm on the right track first.
> 
> 2. GiST -- I'll need a mechanism to implement the "penalty" function,
> and perhaps I'll also need additional support for the picksplit
> function. For the "penalty" function, I think I'll need to require a
> function to convert the subtype into a float, and I can use that to find
> a distance (which can be the penalty). That should also satisfy anything
> that picksplit might need.
> 
> 3. Typmod -- There is still one annoyance about typmod remaining. I need
> to treat it like an array in find_typmod_coercion_function(), and then
> create a coercion expression. Is it worth it? Would typmod on a range be
> confusing, or should I just finish this item up?

Probably not worth it for the first round.

> 4. Docs

Happy to help evenings this week :)

> 5. Tests

Same.  What do you have so far?

> 6. pg_dump -- should be pretty easy; I just want to settle some of the
> other stuff first.
> 
> 7. Right now the parse function is quite dumb. Is there some example
> code I should follow to make sure I get this right?

KISS is a fine principle.  Do you really need it smart on the first
round? :)

> 8. In order to properly support the various combinations of ANYRANGE and
> ANYELEMENT in a function definition (which are all important), we need
> to be able to determine the range type given a subtype. That means that
> each subtype can only have one associated range, which sounds somewhat
> limiting, but it can be worked around by using domains. I don't think
> this is a major limitation. Comments?

As we get a more nuanced type system, this is one of the things that
will need to get reworked, so I'd say it's better not to put too much
effort into things that a refactor of the type system
 would make much
better, at least right now.

> Also related to representation:
> 
>   * Right now I always align the subtypes within the range according to
> typalign. I could avoid that by packing the bytes tightly, and then
> copying them around later. Suggestions? And what should the overall
> alignment of the range type be?

For the first cut, the simplest possible.

>   * If it's a fixed-length type, we can save the varlena header byte on
> the overall range; but we lose the ability to save space when one of the
> boundaries of the range is missing (NULL or INF), and it would
> complicate the code a little. Thoughts?

Probably not worth complicating the code at this stage.  KISS again :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 20:17, Heikki Linnakangas wrote:

Patches for both approaches attached. They're also available in my
github repository at g...@github.com:hlinnaka/postgres.git.


Just so people won't report the same issues again, a couple of bugs have 
already cropped up (thanks Magnus):


* a backup_label file in the data directory should now not be tarred up 
- we're injecting a different backup_label file there.


* pg_start_backup_callback needs to be updated to just decrement 
forcePageWrites, not reset it to 'false'


* pg_stop_backup() decrements forcePageWrites twice. oops.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Tom Lane

Magnus Hagander  writes:
> On Tue, Jan 11, 2011 at 19:51, Tom Lane  wrote:
>> Seems like either one of these is fairly problematic in that you have to
>> have some monstrous kluge to get the backup_label file to appear with
>> the right name in the tarfile.  How badly do we actually need this?
>> I don't think the use-case for concurrent base backups is all that large
>> in practice given the I/O hit it's going to involve.

> I think it can be done cleaner in the tar file injection - I've been
> chatting with Heikki offlist about that. Not sure, but I have a
> feeling it does.

One point that I'm particularly interested to see how you'll kluge it
is ensuring that the tarball contains only the desired temp data and not
also the "real" $PGDATA/backup_label, should there be a normal base
backup being done concurrently with the streamed one.

The whole thing just seems too fragile and dangerous to be worth dealing
with given that actual usage will be a corner case.  *I* sure wouldn't
trust it to work when the chips were down.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Josh Berkus


> It makes it very convenient to set up standbys, without having to worry
> that you'll conflict e.g with a nightly backup. I don't imagine people
> will use streaming base backups for very large databases anyway.

Also, imagine that you're provisioning a 10-node replication cluster on
EC2.  This would make that worlds easier.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 20:51, Tom Lane wrote:

Heikki Linnakangas  writes:

I implemented this in two ways, and can't decide which I like better:



1. The contents of the backup label file are returned to the caller of
do_pg_start_backup() as a palloc'd string.



2. do_pg_start_backup() creates a temporary file that the backup label
is written to (instead of "backup_label").



Implementation 1 changes more code, as pg_start/stop_backup() need to be
changed to write/read from memory instead of file, but the result isn't
any more complicated. Nevertheless, I somehow feel more comfortable with 2.


Seems like either one of these is fairly problematic in that you have to
have some monstrous kluge to get the backup_label file to appear with
the right name in the tarfile.


Oh. I'm surprised you feel that way - that part didn't feel ugly or 
kludgey at all to me.



 How badly do we actually need this?
I don't think the use-case for concurrent base backups is all that large
in practice given the I/O hit it's going to involve.


It makes it very convenient to set up standbys, without having to worry 
that you'll conflict e.g with a nightly backup. I don't imagine people 
will use streaming base backups for very large databases anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Andreas Karlsson  writes:
> So would anyone be confused by a description of pg_amproc not including
> the types?

It really shouldn't be useful to include those.  Attend what it says in
the fine manual for CREATE OPERATOR CLASS:

In a FUNCTION clause, the operand data type(s) the function is
intended to support, if different from the input data type(s) of
the function (for B-tree and hash indexes) or the class's data
type (for GIN and GiST indexes). These defaults are always
correct, so there is no point in specifying op_type in a
FUNCTION clause in CREATE OPERATOR CLASS, but the option is
provided for consistency with the comparable syntax in ALTER
OPERATOR FAMILY.

The reason the ALTER OPERATOR FAMILY DROP syntax has to include operand
types is that it lacks the full name/types of the referenced function.
Since getObjectDescription *does* provide those, it doesn't serve any
real purpose to repeat the information.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 19:51, Tom Lane  wrote:
> Heikki Linnakangas  writes:
>> I implemented this in two ways, and can't decide which I like better:
>
>> 1. The contents of the backup label file are returned to the caller of
>> do_pg_start_backup() as a palloc'd string.
>
>> 2. do_pg_start_backup() creates a temporary file that the backup label
>> is written to (instead of "backup_label").
>
>> Implementation 1 changes more code, as pg_start/stop_backup() need to be
>> changed to write/read from memory instead of file, but the result isn't
>> any more complicated. Nevertheless, I somehow feel more comfortable with 2.
>
> Seems like either one of these is fairly problematic in that you have to
> have some monstrous kluge to get the backup_label file to appear with
> the right name in the tarfile.  How badly do we actually need this?
> I don't think the use-case for concurrent base backups is all that large
> in practice given the I/O hit it's going to involve.

I think it can be done cleaner in the tar file injection - I've been
chatting with Heikki offlist about that. Not sure, but I have a
feeling it does.

As for the use-case, it doesn't necessarily involve a huge I/O hit if
either the entire DB is in RAM (not all that uncommon - though then
the backup finishes quickly as well), or if the *client* is slow, thus
making the server backlogged on sending the base backup.

Having it possible to do it concurrently also makes for *much* easier
use of the feature. It might be just about overlapping with a few
seconds, for example - which would probably have no major effect, but
takes a lot more code on the other end to work around.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Tom Lane

Heikki Linnakangas  writes:
> I implemented this in two ways, and can't decide which I like better:

> 1. The contents of the backup label file are returned to the caller of 
> do_pg_start_backup() as a palloc'd string.

> 2. do_pg_start_backup() creates a temporary file that the backup label 
> is written to (instead of "backup_label").

> Implementation 1 changes more code, as pg_start/stop_backup() need to be 
> changed to write/read from memory instead of file, but the result isn't 
> any more complicated. Nevertheless, I somehow feel more comfortable with 2.

Seems like either one of these is fairly problematic in that you have to
have some monstrous kluge to get the backup_label file to appear with
the right name in the tarfile.  How badly do we actually need this?
I don't think the use-case for concurrent base backups is all that large
in practice given the I/O hit it's going to involve.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Florian Pflug

On Jan11, 2011, at 19:41 , Heikki Linnakangas wrote:
> On 11.01.2011 20:08, Florian Pflug wrote:
>> Unfortunately, it seems that doing things this way will undermine the 
>> guarantee
>> that retrying a failed SSI transaction won't fail due to the same conflict as
>> it did originally. Consider
>> 
>> T1>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
>> T1>  SELECT * FROM T
>> T1>  UPDATE T ...
>> T1>  PREPARE TRANSACTION
>> 
>> T2>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
>> T2>  SELECT * FROM T
>> T2>  UPDATE T ...
>> ->  Serialization Error
>> 
>> Retrying T2 won't help as long as T1 isn't COMMITTED.
> 
> T2 should block until T1 commits.

The serialization error will occur even if T1 and T2 update *different* rows. 
This is
due to the SELECTs in the interleaved schedule above returning the state of T 
prior to
both T1 and T2. Which of course never the case for a serial schedule.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Andreas Karlsson

On Tue, 2011-01-11 at 11:43 -0500, Tom Lane wrote:
> If that's what you're after, getObjectDescription is entirely
> unsuitable, because of the fact that its results are dependent
> on search path and language settings.
> 
>   regards, tom lane

Agreed, and as long as the additional information added to the
description by my patch is not useful for any other purpose I see no
reason for applying it.

So would anyone be confused by a description of pg_amproc not including
the types? I personally have no idea since I have not had to work with
indexes enough to say.

Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Kevin Grittner

Heikki Linnakangas  wrote:
> On 11.01.2011 20:08, Florian Pflug wrote:
>> Unfortunately, it seems that doing things this way will undermine
>> the guarantee that retrying a failed SSI transaction won't fail
>> due to the same conflict as it did originally. Consider
>>
>> T1>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
>> T1>  SELECT * FROM T
>> T1>  UPDATE T ...
>> T1>  PREPARE TRANSACTION
>>
>> T2>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
>> T2>  SELECT * FROM T
>> T2>  UPDATE T ...
>>  ->  Serialization Error
>>
>> Retrying T2 won't help as long as T1 isn't COMMITTED.
> 
> T2 should block until T1 commits. I would be very surprised if it 
> doesn't behave like that already. In general, a prepared
> transaction should be treated like an in-progress transaction - it
> might still abort too.

It shouldn't block if the updates were to different rows, which is
what I took Florian to mean; otherwise this would be handled by SI
and would have nothing to do with the SSI patch.  SSI doesn't
introduce any new blocking (with the one exception of the READ ONLY
DEFERRABLE style we invented to support long-running reports and
backups, and all blocking there is at the front -- once it's
running, it's going full speed ahead).

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 20:08, Florian Pflug wrote:

Unfortunately, it seems that doing things this way will undermine the guarantee
that retrying a failed SSI transaction won't fail due to the same conflict as
it did originally. Consider

T1>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
T1>  SELECT * FROM T
T1>  UPDATE T ...
T1>  PREPARE TRANSACTION

T2>  BEGIN TRANSACTION ISOLATION SERIALIZABLE
T2>  SELECT * FROM T
T2>  UPDATE T ...
 ->  Serialization Error

Retrying T2 won't help as long as T1 isn't COMMITTED.


T2 should block until T1 commits. I would be very surprised if it 
doesn't behave like that already. In general, a prepared transaction 
should be treated like an in-progress transaction - it might still abort 
too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Kevin Grittner

Jeff Davis  wrote:

> I don't expect this to be a huge roadblock for the feature though.
> It seems fairly contained. I haven't read the 2PC code either, but
> I don't expect that you'll need to change the rest of your
> algorithm just to support it.

Agreed; but I am starting to get concerned about whether this
particular area can be completed by the start of the CF.  I might
run a few days over on 2PC support.  Unless ... Dan?  Could you look
into this while I chase down the issue Anssi raised?

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Kevin Grittner

Florian Pflug  wrote:
> On Jan10, 2011, at 18:50 , Kevin Grittner wrote:
>> I'm trying not to panic here, but I haven't looked at 2PC before
>> yesterday and am just dipping into the code to support it, and
>> time is short.  Can anyone give me a pointer to anything I should
>> read before I dig through the 2PC code, which might accelerate
>> this?
> 
> 
> It roughly works as follows
> 
> Upon PREPARE, the locks previously held by the transaction are
> transferred to a kind of virtual backend which only consists of a
> special proc array entry. The transaction will thus still appear
> to be running, and will still be holding its locks, even after the
> original backend is gone. The information necessary to reconstruct
> that proc array entry is also written to the 2PC state, and used
> to recreate the "virtual backend" after a restart or crash.
> 
> There are also some additional pieces of transaction state which
> are stored in the 2PC state file like the full list of
> subtransaction xids (The proc array entry may not contain all of
> them if it overflowed). 
> 
> Upon COMMIT PREPARED, the information in the 2PC state file is
> used to write a COMMIT wal record and to update the clog. The
> transaction is then committed, and the special proc array entry is
> removed and all lockmgr locks it held are released.
> 
> For 2PC to work for SSI transaction, I guess you must check for
> conflicts during PREPARE - at any later point the COMMIT may only
> fail transiently, not permanently. Any transaction that adds a
> conflict with an already prepared transaction must check if that
> conflict completes a dangerous structure, and abort if this is the
> case, since the already PREPAREd transaction can no longer be
> aborted. COMMIT PREPARED then probably doesn't need to do anything
> special for SSI transactions, apart from some cleanup actions
> maybe.

Thanks; that all makes sense.  The devil, as they say, is in the
details.  As far as I've worked it out, the PREPARE must persist
both the predicate locks and any conflict pointers which are to
other prepared transactions.  That leaves some fussy work around the
coming and going of prepared transactions, because on recovery you
need to be prepared to ignore conflict pointers with prepared
transactions which committed or rolled back.

What I haven't found yet is the right place and means to persist and
recover this stuff, but that's just a matter of digging through
enough source code.  Any tips regarding that may save time.  I'm
also not clear on what, if anything, needs to be written to WAL. I'm
really fuzzy on that, still.

> Unfortunately, it seems that doing things this way will undermine
> the guarantee that retrying a failed SSI transaction won't fail
> due to the same conflict as it did originally.

I hadn't thought of that, but you're right.  Of course, I can't
enforce that guarantee, anyway, without some other patch first being
there to allow me to cancel other transactions with
serialization_failure, even if they are "idle in transaction".

> There doesn't seems a way around that, however - any correct
> implementation of 2PC for SSI will have to behave that way I fear
> :-(

I think you're right.

> Hope this helps & best regards,

It does.  Even the parts which just confirm my tentative conclusions
save me time in not feeling like I need to cross-check so much.  I
can move forward with more confidence.  Thanks.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Garick Hamlin

On Tue, Jan 11, 2011 at 12:45:02PM -0500, Tom Lane wrote:
> Florian Pflug  writes:
> > On Jan11, 2011, at 18:09 , Garick Hamlin wrote:
> >> My gut was that direct io would likely work right on Linux
> >> and Solaris, at least.
> 
> > Didn't we discover recently that O_DIRECT fails for ext4 on linux
> > if ordered=data, or something like that?
> 
> Quite.  Blithe assertions that something like this "should work" aren't
> worth the electrons they're written on.

Indeed.  I wasn't making such a claim in case that wasn't clear.  I believe,
in fact, there is no single way that will work everywhere.  This isn't
needed for correctness of course, it is merely a tweak for performance as
long as the 'not working case' on platform + filesystem X case degrades to
something close to what would have happened if we didn't try.  I expected
POSIX_FADV_NOREUSE not to work on Linux, but haven't looked at it recently
and not all systems are Linux so I mentioned it.  This was why I thought
direct io might be more realistic.

I did not have a chance to test before I wrote this email so I attempted to 
make my uncertainty clear.  I _know_ it will not work in some environments,
but I thought it was worth looking at if it worked on more than one sane 
common setup, but I can understand if you feel differently about that.

Garick

> 
>   regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas

Now that we have a basic over-the-wire base backup capability in 
walsender, it would be nice to allow taking multiple base backups at the 
same time. It might not seem very useful at first, but it makes it 
easier to set up standbys for small databases. At the moment, if you 
want to set up two standbys, you have to either take a single base 
backup and distribute it to both standbys, or somehow coordinate that 
they don't try to take the base backup at the same time. Also, you don't 
want initializing a standby to conflict with a nightly backup cron script.


So, this patch modifies the internal do_pg_start/stop_backup functions 
so that in addition to the traditional mode of operation, where a 
backup_label file is created in the data directory where it's backed up 
along with all other files, the backup label file is be returned to the 
caller, and the caller is responsible for including it in the backup. 
The code in replication/basebackup.c includes it in the tar file that's 
streamed the client, as "backup_label".


To make that safe, I've changed forcePageWrites into an integer. 
Whenever a backup is started, it's incremented, and when one ends, it's 
decremented. When forcePageWrites == 0, there's no backup in progress.


The user-visible pg_start_backup() function is not changed. You can only 
have one backup started that way in progress at a time. But you can do 
streaming base backups at the same time with traditional pg_start_backup().


I implemented this in two ways, and can't decide which I like better:

1. The contents of the backup label file are returned to the caller of 
do_pg_start_backup() as a palloc'd string.


2. do_pg_start_backup() creates a temporary file that the backup label 
is written to (instead of "backup_label").


Implementation 1 changes more code, as pg_start/stop_backup() need to be 
changed to write/read from memory instead of file, but the result isn't 
any more complicated. Nevertheless, I somehow feel more comfortable with 2.


Patches for both approaches attached. They're also available in my 
github repository at g...@github.com:hlinnaka/postgres.git.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5b6a230..cc4d46e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -338,7 +338,8 @@ typedef struct XLogCtlInsert
 	XLogPageHeader currpage;	/* points to header of block in cache */
 	char	   *currpos;		/* current insertion point in cache */
 	XLogRecPtr	RedoRecPtr;		/* current redo point for insertions */
-	bool		forcePageWrites;	/* forcing full-page writes for PITR? */
+	int			forcePageWrites;	/* forcing full-page writes for PITR? */
+	bool		exclusiveBackup;	/* a backup was started with pg_start_backup() */
 } XLogCtlInsert;
 
 /*
@@ -8313,7 +8314,7 @@ pg_start_backup(PG_FUNCTION_ARGS)
 
 	backupidstr = text_to_cstring(backupid);
 
-	startpoint = do_pg_start_backup(backupidstr, fast);
+	startpoint = do_pg_start_backup(backupidstr, fast, true, NULL);
 
 	snprintf(startxlogstr, sizeof(startxlogstr), "%X/%X",
 			 startpoint.xlogid, startpoint.xrecoff);
@@ -8321,7 +8322,7 @@ pg_start_backup(PG_FUNCTION_ARGS)
 }
 
 XLogRecPtr
-do_pg_start_backup(const char *backupidstr, bool fast)
+do_pg_start_backup(const char *backupidstr, bool fast, bool exclusive, char **labelfile)
 {
 	XLogRecPtr	checkpointloc;
 	XLogRecPtr	startpoint;
@@ -8332,6 +8333,7 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	uint32		_logSeg;
 	struct stat stat_buf;
 	FILE	   *fp;
+	StringInfoData labelfbuf;
 
 	if (!superuser() && !is_authenticated_user_replication_role())
 		ereport(ERROR,
@@ -8368,15 +8370,19 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	 * ensure adequate interlocking against XLogInsert().
 	 */
 	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
-	if (XLogCtl->Insert.forcePageWrites)
+	if (exclusive)
 	{
-		LWLockRelease(WALInsertLock);
-		ereport(ERROR,
-(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("a backup is already in progress"),
- errhint("Run pg_stop_backup() and try again.")));
+		if (XLogCtl->Insert.exclusiveBackup)
+		{
+			LWLockRelease(WALInsertLock);
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg("a backup is already in progress"),
+	 errhint("Run pg_stop_backup() and try again.")));
+		}
+		XLogCtl->Insert.exclusiveBackup = true;
 	}
-	XLogCtl->Insert.forcePageWrites = true;
+	XLogCtl->Insert.forcePageWrites++;
 	LWLockRelease(WALInsertLock);
 
 	/*
@@ -8393,7 +8399,7 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	RequestXLogSwitch();
 
 	/* Ensure we release forcePageWrites if fail below */
-	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
+	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
 	{
 		/*
 		 * Force a CHECKPOINT.	Aside from being necessary to prevent torn

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Florian Pflug

On Jan10, 2011, at 18:50 , Kevin Grittner wrote:
> I'm trying not to panic here, but I haven't looked at 2PC before
> yesterday and am just dipping into the code to support it, and time
> is short.  Can anyone give me a pointer to anything I should read
> before I dig through the 2PC code, which might accelerate this?

It roughly works as follows

Upon PREPARE, the locks previously held by the transaction are transferred
to a kind of virtual backend which only consists of a special proc array
entry. The transaction will thus still appear to be running, and will still
be holding its locks, even after the original backend is gone. The information
necessary to reconstruct that proc array entry is also written to the 2PC state,
and used to recreate the "virtual backend" after a restart or crash.

There are also some additional pieces of transaction state which are stored
in the 2PC state file like the full list of subtransaction xids (The proc array
entry may not contain all of them if it overflowed). 

Upon COMMIT PREPARED, the information in the 2PC state file is used to write
a COMMIT wal record and to update the clog. The transaction is then committed,
and the special proc array entry is removed and all lockmgr locks it held are
released.

For 2PC to work for SSI transaction, I guess you must check for conflicts
during PREPARE - at any later point the COMMIT may only fail transiently,
not permanently. Any transaction that adds a conflict with an already
prepared transaction must check if that conflict completes a dangerous
structure, and abort if this is the case, since the already PREPAREd transaction
can no longer be aborted. COMMIT PREPARED then probably doesn't need to do
anything special for SSI transactions, apart from some cleanup actions maybe.

Unfortunately, it seems that doing things this way will undermine the guarantee
that retrying a failed SSI transaction won't fail due to the same conflict as
it did originally. Consider

T1> BEGIN TRANSACTION ISOLATION SERIALIZABLE
T1> SELECT * FROM T
T1> UPDATE T ...
T1> PREPARE TRANSACTION

T2> BEGIN TRANSACTION ISOLATION SERIALIZABLE
T2> SELECT * FROM T
T2> UPDATE T ...
-> Serialization Error

Retrying T2 won't help as long as T1 isn't COMMITTED.

There doesn't seems a way around that, however - any correct implementation
of 2PC for SSI will have to behave that way I fear :-(

Hope this helps & best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Tom Lane

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
> On 11/01/11 17:11, Tom Lane wrote:
>> Huh?  Why in the world would the specific location of the #include have
>> anything to do with the problem?

> I'v having a hard time convincing make to generate errcodes.h before
> compiling any .c file that includes postgres.h. The only way I found was
> to make src/include/errcodes.h a dependancy of the all target.

Peter would probably be a better person than me to answer that, but I
imagine that what you want is similar to what src/backend/Makefile does
for parser/gram.h, only applied at the src/ level or maybe even the
root.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Tom Lane

Florian Pflug  writes:
> On Jan11, 2011, at 18:09 , Garick Hamlin wrote:
>> My gut was that direct io would likely work right on Linux
>> and Solaris, at least.

> Didn't we discover recently that O_DIRECT fails for ext4 on linux
> if ordered=data, or something like that?

Quite.  Blithe assertions that something like this "should work" aren't
worth the electrons they're written on.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Jan Urbański

On 11/01/11 17:11, Tom Lane wrote:
> =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
>>> I tried wiring it into the build system, but failed, I can't figure out
>>> which Makefiles should be updated in order to make errcodes.h and
>>> plerrcodes.h generated headers. Could someone help with that?
> 
>> Trying a bit harder to make src/include/utils/errcodes.h a generated
>> file I found that it's included so early it makes the task not trivial
>> at all. postgres.h includes elog.h, which includes errcodes.h (after
>> defining the MAKE_SQLSTATE macro). I'm not sure how to proceed:
> 
> Huh?  Why in the world would the specific location of the #include have
> anything to do with the problem?

I'v having a hard time convincing make to generate errcodes.h before
compiling any .c file that includes postgres.h. The only way I found was
to make src/include/errcodes.h a dependancy of the all target.

For instance, I tried to copy the way we generate fmgroids.h and it
turned out that it doesn't work (it tries to compile things in src/port
before entering src/include, and things in src/port include postgres.h).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Jeff Davis

On Mon, 2011-01-10 at 11:50 -0600, Kevin Grittner wrote:
> I'm trying not to panic here, but I haven't looked at 2PC before
> yesterday and am just dipping into the code to support it, and time
> is short.  Can anyone give me a pointer to anything I should read
> before I dig through the 2PC code, which might accelerate this?

I don't see much about 2PC outside of twophase.c.

Regarding the original post, I agree that we should have two
phase-commit support for SSI. We opted not to support it for
notifications, but there was a fairly reasonable argument why users
wouldn't value the combination of 2PC and NOTIFY.

I don't expect this to be a huge roadblock for the feature though. It
seems fairly contained. I haven't read the 2PC code either, but I don't
expect that you'll need to change the rest of your algorithm just to
support it.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Cédric Villemain

2011/1/11 Garick Hamlin :
> On Tue, Jan 11, 2011 at 11:39:20AM -0500, Cédric Villemain wrote:
>> 2011/1/11 Garick Hamlin :
>> > On Mon, Jan 10, 2011 at 09:09:28AM -0500, Magnus Hagander wrote:
>> >> On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
>> >>  wrote:
>> >> > 2011/1/7 Magnus Hagander :
>> >> >> On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
>> >> >>  wrote:
>> >> >>> 2011/1/5 Magnus Hagander :
>> >>  On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine 
>> >>   wrote:
>> >> > Magnus Hagander  writes:
>> >> >> * Stefan mentiond it might be useful to put some
>> >> >> posix_fadvise(POSIX_FADV_DONTNEED)
>> >> >>   in the process that streams all the files out. Seems useful, as 
>> >> >> long as that
>> >> >>   doesn't kick them out of the cache *completely*, for other 
>> >> >> backends as well.
>> >> >>   Do we know if that is the case?
>> >> >
>> >> > Maybe have a look at pgfincore to only tag DONTNEED for blocks that 
>> >> > are
>> >> > not already in SHM?
>> >> 
>> >>  I think that's way more complex than we want to go here.
>> >> 
>> >> >>>
>> >> >>> DONTNEED will remove the block from OS buffer everytime.
>> >> >>
>> >> >> Then we definitely don't want to use it - because some other backend
>> >> >> might well want the file. Better leave it up to the standard logic in
>> >> >> the kernel.
>> >> >
>> >> > Looking at the patch, it is (very) easy to add the support for that in
>> >> > basebackup.c
>> >> > That supposed allowing mincore(), so mmap(), and so probably switch
>> >> > the fopen() to an open() (or add an open() just for mmap
>> >> > requirement...)
>> >> >
>> >> > Let's go ?
>> >>
>> >> Per above, I still don't think we *should* do this. We don't want to
>> >> kick things out of the cache underneath other backends, and since we
>> >> can't control that. Either way, it shouldn't happen in the beginning,
>> >> and if it does, should be backed with proper benchmarks.
>> >
>> > Another option that occurs to me is an option to use direct IO (or another
>> > means as needed) to bypass the cache.  So rather than kicking it out of
>> > the cache, we attempt just not to pollute the cache by bypassing it for 
>> > cold
>> > pages and use either normal io for 'hot pages', or use a 'read()' to "heat"
>> > the cache afterward.
>>
>> AFAIR, even Linus is rejecting the idea to use it seriously, except if
>> I shuffle in my memory.
>
> Direct IO is generally a pain.
>
> POSIX_FADV_NOREUSE is an alternative (I think).  Realistically I wasn't sure 
> which
> way(s) actually worked.  My gut was that direct io would likely work right on 
> Linux
> and Solaris, at least.  If POSIX_FADV_NOREUSE works than maybe that is the 
> answer
> instead, but I haven't tested either.

yes it should be the best option, unfortunely it is a ghost flag, it
doesn't do anythig.
At some point there were a libprefetch library and a linux fincore()
syscall in the air. Unfortunely actors of those items stop
communication with open source afais. (I didn't get answers myself,
neither linux ML get ones.)


>
> Garick
>
>
>>
>> >
>> > Garick
>> >
>> >>
>> >> I've committed the backend side of this, without that. Still working
>> >> on the client, and on cleaning up Heikki's patch for grammar/parser
>> >> support.
>> >>
>> >> --
>> >>  Magnus Hagander
>> >>  Me: http://www.hagander.net/
>> >>  Work: http://www.redpill-linpro.com/
>> >>
>> >> --
>> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> >> To make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>>
>>
>>
>> --
>> Cédric Villemain               2ndQuadrant
>> http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Florian Pflug

On Jan11, 2011, at 18:09 , Garick Hamlin wrote:
> My gut was that direct io would likely work right on Linux
> and Solaris, at least.

Didn't we discover recently that O_DIRECT fails for ext4 on linux
if ordered=data, or something like that?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Florian Pflug

On Jan11, 2011, at 16:54 , Joel Jacobson wrote:
> Has anyone written a in-depth description on how to traverse the pg_depend 
> tree?
> The 'a' and 'i' deptype really makes it hard to figure out the
> dependency order, a topological sort does not work.

Could you give an example of the kind of trouble you're experiencing trying
to use a topological sort?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Garick Hamlin

On Tue, Jan 11, 2011 at 11:39:20AM -0500, Cédric Villemain wrote:
> 2011/1/11 Garick Hamlin :
> > On Mon, Jan 10, 2011 at 09:09:28AM -0500, Magnus Hagander wrote:
> >> On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
> >>  wrote:
> >> > 2011/1/7 Magnus Hagander :
> >> >> On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
> >> >>  wrote:
> >> >>> 2011/1/5 Magnus Hagander :
> >>  On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine 
> >>   wrote:
> >> > Magnus Hagander  writes:
> >> >> * Stefan mentiond it might be useful to put some
> >> >> posix_fadvise(POSIX_FADV_DONTNEED)
> >> >>   in the process that streams all the files out. Seems useful, as 
> >> >> long as that
> >> >>   doesn't kick them out of the cache *completely*, for other 
> >> >> backends as well.
> >> >>   Do we know if that is the case?
> >> >
> >> > Maybe have a look at pgfincore to only tag DONTNEED for blocks that 
> >> > are
> >> > not already in SHM?
> >> 
> >>  I think that's way more complex than we want to go here.
> >> 
> >> >>>
> >> >>> DONTNEED will remove the block from OS buffer everytime.
> >> >>
> >> >> Then we definitely don't want to use it - because some other backend
> >> >> might well want the file. Better leave it up to the standard logic in
> >> >> the kernel.
> >> >
> >> > Looking at the patch, it is (very) easy to add the support for that in
> >> > basebackup.c
> >> > That supposed allowing mincore(), so mmap(), and so probably switch
> >> > the fopen() to an open() (or add an open() just for mmap
> >> > requirement...)
> >> >
> >> > Let's go ?
> >>
> >> Per above, I still don't think we *should* do this. We don't want to
> >> kick things out of the cache underneath other backends, and since we
> >> can't control that. Either way, it shouldn't happen in the beginning,
> >> and if it does, should be backed with proper benchmarks.
> >
> > Another option that occurs to me is an option to use direct IO (or another
> > means as needed) to bypass the cache.  So rather than kicking it out of
> > the cache, we attempt just not to pollute the cache by bypassing it for cold
> > pages and use either normal io for 'hot pages', or use a 'read()' to "heat"
> > the cache afterward.
> 
> AFAIR, even Linus is rejecting the idea to use it seriously, except if
> I shuffle in my memory.

Direct IO is generally a pain.

POSIX_FADV_NOREUSE is an alternative (I think).  Realistically I wasn't sure 
which
way(s) actually worked.  My gut was that direct io would likely work right on 
Linux
and Solaris, at least.  If POSIX_FADV_NOREUSE works than maybe that is the 
answer
instead, but I haven't tested either.

Garick


> 
> >
> > Garick
> >
> >>
> >> I've committed the backend side of this, without that. Still working
> >> on the client, and on cleaning up Heikki's patch for grammar/parser
> >> support.
> >>
> >> --
> >>  Magnus Hagander
> >>  Me: http://www.hagander.net/
> >>  Work: http://www.redpill-linpro.com/
> >>
> >> --
> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-hackers
> >
> 
> 
> 
> -- 
> Cédric Villemain               2ndQuadrant
> http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: [TESTERS] [TEST REPORT] 9.1Alpha3 Feature E.1.4.7.2 in release notes.

2011-01-11 Thread Peter Eisentraut

On tis, 2011-01-11 at 11:03 -0500, Tom Lane wrote:
> We explicitly
> rejected the idea of providing direct casts to/from floating point
> types, on the grounds of not wanting any roundoff error; so I don't
> think this is a point that should be revisited.

We also explicitly chose floating point as the result of the money/money
operator over numeric.  Seems a bit inconsistent.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Tom Lane

Joel Jacobson  writes:
> I need to figure out the order of creation of all objects, not just
> the dependencies for a single object.

In that case try pg_dump's pg_dump_sort.c.  You will never get "the"
order of creation of objects, because that isn't tracked; but you can
find out what a safe order would be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Joel Jacobson  writes:
> 2011/1/11 Tom Lane :
>> Seems like concatenating the OIDs would accomplish that.  (If you
>> think not, well, you still haven't explained what problem you're trying
>> to solve...)

> The can be different in two different databases sharing the same
> original schema, but of two different versions.
> In this case it is better to compare textual strings describing the
> objects than to compare based on oids.

If that's what you're after, getObjectDescription is entirely
unsuitable, because of the fact that its results are dependent
on search path and language settings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Joel Jacobson

2011/1/11 Tom Lane :
> Try reading the code in src/backend/catalog/dependency.c.

I've tried but failed to figure it out anyway. The focus in
dependency.c is to find out dependencies of a given object.
What I want to do is something slighly different.
I need to figure out the order of creation of all objects, not just
the dependencies for a single object.

Basically, I want to do "order by oid", but since you cannot trust the
oid order (like we did in pre-7.3), I need to get the sorted list
using pg_depend somehow.

I guess I could make findDependentObjects() callable from sql and call
it for each and every object, but that's a quite big project, I was
hoping it was possible to do it in plain sql, or at least only by
relying on plpgsql/plperl.

I've implemented tsort() in plperl, but that didn't really help me
much, since you need to jump around in the tree when you encounter
'internal' and 'auto' nodes.

My last resort is to sort by oid, but I really don't want to do that
since it would render the entire solution unreliable and I would never
feel comfortable using it in the production environment.

This is the last component I need in order to complete the work on the
pov-project http://www.github.com/gluefinance/pov

I would highly appreciate your help, I feel a bit lame since I've
spent over two days working on this. It's not difficult if you are
allowed to build specialized queries for each class, but my goal is a
general query only relying on pg_depend.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Cédric Villemain

2011/1/11 Garick Hamlin :
> On Mon, Jan 10, 2011 at 09:09:28AM -0500, Magnus Hagander wrote:
>> On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
>>  wrote:
>> > 2011/1/7 Magnus Hagander :
>> >> On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
>> >>  wrote:
>> >>> 2011/1/5 Magnus Hagander :
>>  On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine  
>>  wrote:
>> > Magnus Hagander  writes:
>> >> * Stefan mentiond it might be useful to put some
>> >> posix_fadvise(POSIX_FADV_DONTNEED)
>> >>   in the process that streams all the files out. Seems useful, as 
>> >> long as that
>> >>   doesn't kick them out of the cache *completely*, for other backends 
>> >> as well.
>> >>   Do we know if that is the case?
>> >
>> > Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
>> > not already in SHM?
>> 
>>  I think that's way more complex than we want to go here.
>> 
>> >>>
>> >>> DONTNEED will remove the block from OS buffer everytime.
>> >>
>> >> Then we definitely don't want to use it - because some other backend
>> >> might well want the file. Better leave it up to the standard logic in
>> >> the kernel.
>> >
>> > Looking at the patch, it is (very) easy to add the support for that in
>> > basebackup.c
>> > That supposed allowing mincore(), so mmap(), and so probably switch
>> > the fopen() to an open() (or add an open() just for mmap
>> > requirement...)
>> >
>> > Let's go ?
>>
>> Per above, I still don't think we *should* do this. We don't want to
>> kick things out of the cache underneath other backends, and since we
>> can't control that. Either way, it shouldn't happen in the beginning,
>> and if it does, should be backed with proper benchmarks.
>
> Another option that occurs to me is an option to use direct IO (or another
> means as needed) to bypass the cache.  So rather than kicking it out of
> the cache, we attempt just not to pollute the cache by bypassing it for cold
> pages and use either normal io for 'hot pages', or use a 'read()' to "heat"
> the cache afterward.

AFAIR, even Linus is rejecting the idea to use it seriously, except if
I shuffle in my memory.

>
> Garick
>
>>
>> I've committed the backend side of this, without that. Still working
>> on the client, and on cleaning up Heikki's patch for grammar/parser
>> support.
>>
>> --
>>  Magnus Hagander
>>  Me: http://www.hagander.net/
>>  Work: http://www.redpill-linpro.com/
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Tom Lane

Joel Jacobson  writes:
> Has anyone written a in-depth description on how to traverse the pg_depend 
> tree?

Try reading the code in src/backend/catalog/dependency.c.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Florian Pflug  writes:
> @OP: Wouldn't it be sufficient to provide such a thing for structure
> objects that people are actually going to modify on a regular basis?

Yeah, I still don't see the need to argue over whether the elements of
an operator class are uniquely identifiable or not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Tom Lane

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
>> I tried wiring it into the build system, but failed, I can't figure out
>> which Makefiles should be updated in order to make errcodes.h and
>> plerrcodes.h generated headers. Could someone help with that?

> Trying a bit harder to make src/include/utils/errcodes.h a generated
> file I found that it's included so early it makes the task not trivial
> at all. postgres.h includes elog.h, which includes errcodes.h (after
> defining the MAKE_SQLSTATE macro). I'm not sure how to proceed:

Huh?  Why in the world would the specific location of the #include have
anything to do with the problem?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: [TESTERS] [TEST REPORT] 9.1Alpha3 Feature E.1.4.7.2 in release notes.

2011-01-11 Thread Tom Lane

Peter Eisentraut  writes:
> On tis, 2011-01-11 at 12:30 +0900, Itagaki Takahiro wrote:
>> If we won't to add accept integers for money, we should fix the docs.
>> | integer and floating-point string literals
>> |~~~
>> Will it get better?

> I think adding a cast from integer to money is probably quite
> reasonable.  The other way around, maybe not, or only an explicit cast.

As near as I can tell, this entire thread started because someone
thought that the reference to "numeric" in the release notes implied
"any numerical type", not "the type named numeric".  We explicitly
rejected the idea of providing direct casts to/from floating point
types, on the grounds of not wanting any roundoff error; so I don't
think this is a point that should be revisited.  Perhaps it would be
sufficient to clarify the release-note item.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jan 10, 2011 at 8:56 PM, Tom Lane  wrote:
>> Not really.  AFAIR, there are two cases that exist in practice,
>> depending on which AM you're talking about:
>> 
>> 1. The recorded types match the input types of the operator/function
>>   (btree & hash).
>> 2. The recorded types are always the same as the opclass's input type
>>   (gist & gin).
>> 
>> In neither case does printing those types really add much information.
>> That's why it's not there now.

> I don't get it.  If two different items that exist in the system out
> of the box have the same description, it seems clear that relevant
> piece of disambiguating information exists nowhere in the description
> string.

The "relevant piece of disambiguating information" is the function
name+parameters in the first case, or the opclass name in the second.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add function dependencies

2011-01-11 Thread Dimitri Fontaine

Peter Eisentraut  writes:
> Making it work for language SQL would be nice, though.

Please consider a new DEPENDENCY_XXX constant for that though, because
otherwise I think it could cause problems in the extension's dependency
tracking.  Even with a new DEPENDENCY_FUNCALL or other constant, the
extension code would need to be aware of this new kind of not-to-follow
dependencies somehow, I guess.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Garick Hamlin

On Mon, Jan 10, 2011 at 09:09:28AM -0500, Magnus Hagander wrote:
> On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
>  wrote:
> > 2011/1/7 Magnus Hagander :
> >> On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
> >>  wrote:
> >>> 2011/1/5 Magnus Hagander :
>  On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine  
>  wrote:
> > Magnus Hagander  writes:
> >> * Stefan mentiond it might be useful to put some
> >> posix_fadvise(POSIX_FADV_DONTNEED)
> >>   in the process that streams all the files out. Seems useful, as long 
> >> as that
> >>   doesn't kick them out of the cache *completely*, for other backends 
> >> as well.
> >>   Do we know if that is the case?
> >
> > Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
> > not already in SHM?
> 
>  I think that's way more complex than we want to go here.
> 
> >>>
> >>> DONTNEED will remove the block from OS buffer everytime.
> >>
> >> Then we definitely don't want to use it - because some other backend
> >> might well want the file. Better leave it up to the standard logic in
> >> the kernel.
> >
> > Looking at the patch, it is (very) easy to add the support for that in
> > basebackup.c
> > That supposed allowing mincore(), so mmap(), and so probably switch
> > the fopen() to an open() (or add an open() just for mmap
> > requirement...)
> >
> > Let's go ?
> 
> Per above, I still don't think we *should* do this. We don't want to
> kick things out of the cache underneath other backends, and since we
> can't control that. Either way, it shouldn't happen in the beginning,
> and if it does, should be backed with proper benchmarks.

Another option that occurs to me is an option to use direct IO (or another
means as needed) to bypass the cache.  So rather than kicking it out of 
the cache, we attempt just not to pollute the cache by bypassing it for cold
pages and use either normal io for 'hot pages', or use a 'read()' to "heat" 
the cache afterward.

Garick

> 
> I've committed the backend side of this, without that. Still working
> on the client, and on cleaning up Heikki's patch for grammar/parser
> support.
> 
> -- 
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_depend explained

2011-01-11 Thread Joel Jacobson

Has anyone written a in-depth description on how to traverse the pg_depend tree?
The 'a' and 'i' deptype really makes it hard to figure out the
dependency order, a topological sort does not work.

My latest attempt involved trying to group by all objects connected to
each other via deptype 'a' or 'i', and replacing all such nodes in the
tree with the "source node", i.e. the node which according to the
topological order could be created first.

Am I on the right path, trying to "fuse" the internal/auto objects
together, replacing them with the top most object in the tree?
Or is there a simplier way to figure out the order in which objects
can be created?

I need a general solution, not a custom-made query for each regclass,
which is quite trivial but feels far from bullet proof, I want
something only relying on pg_depend, since it should be the safest
method of them all.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] casts: max double precision > text > double precision fails with out or range error

2011-01-11 Thread Alvaro Herrera

Excerpts from Maciej Sakrejda's message of mar ene 11 03:28:13 -0300 2011:
> Tried asking this in pgsql-general but I got no response, so I thought
> I'd give hackers a shot:
> 
> postgres=# select (((1.7976931348623157081e+308)::double
> precision)::text)::double precision;
> ERROR:  "1.79769313486232e+308" is out of range for type double precision
> 
> I'm working on a pg driver and in my float data decoder functional
> tests, I ran into some errors that I eventually traced back to this
> behavior. Essentially, postgres seems to cast the max normal double
> (i.e., the bits of ~(1ULL<<52 | 1ULL<<63)) to text in such a manner
> that it's rounded up, and the reverse cast, text-to-double-precision,
> does not recognize it as being in range. Curiously, pg_dump seems to
> print doubles with more precision (in both COPY and INSERT modes),
> avoiding this issue.

Yeah, it sets the extra_float_digits parameter.

alvherre=# set extra_float_digits to 3;
SET
alvherre=# select (((1.7976931348623157081e+308)::double 
precision)::text)::double precision;
  float8  
──
 1.79769313486231571e+308
(1 fila)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Florian Pflug

On Jan11, 2011, at 16:12 , Tom Lane wrote:
> Joel Jacobson  writes:
>> I instead propose we introduce a new function named
>> pg_get_object_unique_identifier( classid oid, objid oid, objsubid
>> integer ) returns text.
> 
> Seems like concatenating the OIDs would accomplish that.  (If you
> think not, well, you still haven't explained what problem you're trying
> to solve...)

I think the OP wants a unique identifier which changes neither on dump &
reload nor on major version upgrades. I can see the value in that if you
e.g. want to compare the structures of two different postgres databases.

It seems impossible to guarantee the identifier to not change between
major versions, though - if the structure of that catalog change, so will
the identifier.

@OP: Wouldn't it be sufficient to provide such a thing for structure
objects that people are actually going to modify on a regular basis?
I suggest restricting this to

tables (including SQL/MED foreign tables)
views
sequences
types
servers (SQL/MED)
tsearch configurations
tsearch dictionaries

for which . suffices,

triggers
constraints
columns

for which .. should work, and

functions
operators

which need to include the argument types.

The reg* types already solve this for

tables, views, sequences (regclass)
tsearch configurations (regconfig)
tsearch dictionaries (regdictionary)
types (regtype)
functions (regprocedure)
operators (regoperator)

which leaves

servers
triggers
constraints
columns

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-11 Thread Anssi Kääriäinen


On 01/11/2011 04:53 PM, Kevin Grittner wrote:

Thanks much for testing.  You're managing to exercise some code
paths I didn't think to test, which is great!  I guess this is the
up side of having posted yesterday.  :-)


Glad that I can help. This feature is something that is very important
to our usage of PostgreSQL.

Just to add some positive feedback: I tried this patch with our in-house
in development EAV database. Everything was working perfectly, and time
to import current and history data for 7000 employees (total of 95000
attributes) to the DB using stored procedures went from 4 minutes 20
seconds to 4 minutes 30 seconds (the procedures are doing a lot of
validation...). So in this case the real-world performance difference
was barely noticeable. SSI was also able to block serialization
anomalies and I did not have any false rollbacks in my (very limited)
testing. So, thank you for the work on this feature. And, of course, I
will try to break it some more.:-)

 - Anssi



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK for non-tables

2011-01-11 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jan 11, 2011 at 4:46 AM, Tatsuo Ishii  wrote:
>> For query based replication tools like pgpool-II (I don't know any
>> other tools, for example Postgres XC falls in this category or
>> not...), we need to be able to lock sequences. Fortunately it is allowed to:
>> 
>> SELECT 1 FROM foo_sequece FOR UPDATE;
>> 
>> but LOCK foo_sequence looks more appropreate syntax for me.

> Those aren't doing the same thing.  The first is locking the one and
> only tuple that is contained within the sequence, while the second is
> locking the sequence object itself.

> At this point, I'm inclined to think that the pg_dump comment is just
> wrong, and we ought to fix it to say that we don't really want to be
> able to lock other relations after all, and call it good.

The reason that pg_dump tries to acquire locks at all is to ensure that
it dumps a consistent view of the database.  The general excuse for not
locking non-table objects is that (at least in most cases) they are
defined by single catalog entries and so there's no way to see a
non-self-consistent view of them.  Tables, being defined by a collection
of rows in different catalogs, are *very* risky to dump without any
lock.  This doesn't get noticeably better for non-table relation types.

An example of the sort of risk I'm thinking about is dumping a view
without any lock while someone else does a CREATE OR REPLACE VIEW on
it.  You could very easily see a set of attributes (in pg_attribute)
that don't agree with the view rules you pulled from pg_rewrite.  The
risk is minimal now since we don't allow C.O.R.V. to change the column
set, but as soon as somebody creates a patch that allows that, pg_dump
will have a problem.

Note that using a serializable transaction (with or without "true"
serializability) doesn't fix this issue, since pg_dump depends so
heavily on backend-side support functions that work in SnapshotNow mode.
It really needs locks to ensure that the support functions see a view
consistent with its own catalog reads.

In the SEQUENCE example above, SELECT ... FOR UPDATE is certainly not
adequate to protect the sequence against DDL-level changes.  Fortunately
sequences don't have too many DDL commands, but still an ALTER RENAME
might be enough to confuse pg_dump.

(By the way, does that SELECT ... FOR UPDATE actually accomplish
anything at all?  nextval() doesn't go through heap_update, and neither
does ALTER SEQUENCE, so I'd be a bit surprised if it really manages to
block changes to the sequence.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Joel Jacobson

2011/1/11 Tom Lane :
> Seems like concatenating the OIDs would accomplish that.  (If you
> think not, well, you still haven't explained what problem you're trying
> to solve...)

The can be different in two different databases sharing the same
original schema, but of two different versions.
In this case it is better to compare textual strings describing the
objects than to compare based on oids.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar ene 11 10:52:12 -0300 2011:

> Well, we shouldn't change them randomly or arbitrarily, but improving them is 
> another thing altogether.  I think the contention that any user or 
> application anywhere is depending on the exact textual representation of a 
> pg_amproc entry is exceedingly dubious.  And I think the current messages are 
> flat-out confusing.

Tom also mentioned that the descriptions are translated, so an
application built to parse anything out of the strings is already
broken.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Joel Jacobson  writes:
> I instead propose we introduce a new function named
> pg_get_object_unique_identifier( classid oid, objid oid, objsubid
> integer ) returns text.

Seems like concatenating the OIDs would accomplish that.  (If you
think not, well, you still haven't explained what problem you're trying
to solve...)

> The name would make sense since we already have a
> pg_get_function_identity_arguments( func_oid ), for a similar purpose
> but solely for functions.

No, that does not exist for the purpose you claim it does.  And it's far
from obvious what the extension of that to all object classes would
look like.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Jan Urbański

On 28/12/10 12:25, Jan Urbański wrote:
> Here's the basic errcodes.txt file and three scripts to generate
> errcodes.h, plerrcodes.h and part of errcodes.sgml.
> 
> I tried wiring it into the build system, but failed, I can't figure out
> which Makefiles should be updated in order to make errcodes.h and
> plerrcodes.h generated headers. Could someone help with that?

Trying a bit harder to make src/include/utils/errcodes.h a generated
file I found that it's included so early it makes the task not trivial
at all. postgres.h includes elog.h, which includes errcodes.h (after
defining the MAKE_SQLSTATE macro). I'm not sure how to proceed:

 1) just give up on it, and keep on maintaining 3 places where new error
codes have to go
 2) do not include errcodes.h in elog.h, move the MAKE_SQLSTATE
definition to the top of errcodes.h, guarded by a #ifndef MAKE_SQLSTATE,
fix every place that included elog.h and assumed it has the sqlstate
values to explicitly include errcodes.h. This means that you can still
define your own MAKE_SQLSTATE macro and then include errcodes.h if you
want to redefine what it does.
 3) try to wire generating errcodes.h somewhere very early in the makefiles

My preference is 2) followed by 1), because 3) seems too horrible. But I
can understand if people will want to keep things as they are and just
forget about generating errcodes.h

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-11 Thread Kevin Grittner

Anssi Kääriäinen wrote:

> I think I found a problem. This is using SSI v8.

Thanks much for testing.  You're managing to exercise some code
paths I didn't think to test, which is great!  I guess this is the
up side of having posted yesterday.  :-)

> So, something seems to be broken when using partial indexes.

Yes there is.  The predicate locks against the heap tuples should
have prevented that, but obviously we're missing a call to
PredicateLockTuple from the code path for the partial index which is
present on the code path for complete indexes.  I'll go looking for
the spot to add that line of code.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-11 Thread Kevin Grittner

Anssi Kääriäinen wrote:
 
> A speed test showing a significant drop in performance when using
SSI:
> 
> hot2=> create table test_t2 as (select generate_series(0, 100));
> hot2=> \timing
> begin transaction isolation level repeatable read;
> Time: 0.185 ms
> hot2=> select count(*) from test_t2;
> Time: 134.521 ms
> hot2=> select count(*) from test_t2;
> Time: 142.132 ms
> hot2=> select count(*) from test_t2;
> Time: 147.469 ms
> hot2=> commit;
> hot2=> begin transaction isolation level serializable;
> Time: 0.165 ms
> hot2=> select count(*) from test_t2;
> Time: 349.219 ms
> hot2=> select count(*) from test_t2;
> Time: 354.010 ms
> hot2=> select count(*) from test_t2;
> Time: 355.960 ms
> hot2=> commit;
> 
> So, count(*) queries are more than twice as slow compared to the old

> serializable transaction isolation level.
 
Thanks for the report.  I'll profile tat and see what can be done about
it.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Csaba Nagy

On Tue, 2011-01-11 at 07:14 -0500, Noah Misch wrote:
> On Tue, Jan 11, 2011 at 09:24:46AM +, Simon Riggs wrote:
> > I have a concern that by making the ALTER TABLE more complex that we
> > might not be able to easily tell if a rewrite happens, or not.

What about add EXPLAIN support to it, then whoever wants to know what it
does should just run explain on it. I have no idea how hard that would
be to implement and if it makes sense at all, but I sure would like to
see a plan for DDLs too to estimate how long it would take.

Cheers,
Csaba.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-11 Thread Anssi Kääriäinen


On 01/10/2011 06:03 PM, Kevin Grittner wrote:

Due to popular request (Hey, David's popular, right?), I'm posting a
patch for Serializable Snapshot Isolation (SSI), although I don't
yet have everything in it that I was planning on submitting before
the CF.  I will probably be submitting another version before the
deadline with the following, but there should be plenty here for
people to test and benchmark.  We're done with the major refactoring
needed to address concerns raised in earlier reviews, and I don't
expect the remaining work to destabilize what's there or to have a
significant impact on performance.

A speed test showing a significant drop in performance when using SSI:

hot2=> create table test_t2 as (select generate_series(0, 100));
hot2=> \timing
begin transaction isolation level repeatable read;
Time: 0.185 ms
hot2=> select count(*) from test_t2;
Time: 134.521 ms
hot2=> select count(*) from test_t2;
Time: 142.132 ms
hot2=> select count(*) from test_t2;
Time: 147.469 ms
hot2=> commit;
hot2=> begin transaction isolation level serializable;
Time: 0.165 ms
hot2=> select count(*) from test_t2;
Time: 349.219 ms
hot2=> select count(*) from test_t2;
Time: 354.010 ms
hot2=> select count(*) from test_t2;
Time: 355.960 ms
hot2=> commit;

So, count(*) queries are more than twice as slow compared to the old 
serializable transaction isolation level. The relative speed difference 
stays the same if testing with more rows. Also, the same speed 
difference is there if testing with more columns:


create table test_t4 as (select g g1, g g2, g g3, g g4, g g5, g g6 from 
(select generate_series as g from generate_series(0, 100)) as t1);


repeatable read times:
140.113 ms 134.628 ms 140.113 ms 156.166 ms

serializable:
353.257 ms 366.558 ms 373.914 ms 359.682 ms

 - Anssi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Robert Haas

On Jan 11, 2011, at 8:50 AM, Noah Misch  wrote:
> Okay; I'll see what I can come up with.  The other part I was going to try to
> finish before the last commitfest begins is avoiding unnecessary rebuilds of
> indexes involving changed columns.  Is that more or less important than having
> an EXPLAIN ALTER TABLE?

I think something like EXPLAIN ALTER TABLE would be #%^* awesome (and kudos to 
Simon for thinking of it), but your chances of getting into 9.1 are not very 
good.  So I'd focus on the index rebuild issue, which is more clear-cut.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Robert Haas

On Jan 11, 2011, at 8:25 AM, Joel Jacobson  wrote:
> 2011/1/11 Robert Haas :
>> I don't get it.  If two different items that exist in the system out
>> of the box have the same description, it seems clear that relevant
>> piece of disambiguating information exists nowhere in the description
>> string.
> 
> I guess it is a question of prioritization.
> If backwards compatibility is to be guaranteed, even for functions
> returning text intended to be read by humans, then the function cannot
> be modified, without violating that golden rule, if such a rule exists
> within the PostgreSQL development project?
> 
> If it's not a golden rule, then it's a totally different story and
> there is no excuse why it should return the same descriptions for the
> same objects.
> Any other reasoning is just silly.

Well, we shouldn't change them randomly or arbitrarily, but improving them is 
another thing altogether.  I think the contention that any user or application 
anywhere is depending on the exact textual representation of a pg_amproc entry 
is exceedingly dubious.  And I think the current messages are flat-out 
confusing.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Noah Misch

On Tue, Jan 11, 2011 at 01:17:23PM +, Simon Riggs wrote:
> On Tue, 2011-01-11 at 08:06 -0500, Noah Misch wrote:
> > On Tue, Jan 11, 2011 at 12:37:28PM +, Simon Riggs wrote:
> > > Given your thoughts above, my preference would be for 
> > > EXPLAIN ALTER TABLE to describe the actions that will take place.
> > 
> > That does seem like the best UI.  Offhand, I would guess that's a project 
> > larger
> > than the patch series I have here.  We'd need to restructure ALTER TABLE 
> > into
> > clear planning and execution stages, if not use the actual planner and 
> > executor.
> 
> Please do something that works in this release, whatever that is. I will
> follow your lead in putting a similar mechanism in for judging lock
> levels.

Okay; I'll see what I can come up with.  The other part I was going to try to
finish before the last commitfest begins is avoiding unnecessary rebuilds of
indexes involving changed columns.  Is that more or less important than having
an EXPLAIN ALTER TABLE?

> I don't want to be looking through the docs each time I run this,
> sweating between it taking 5 secs and 5 hours on a big server.
> We need to be able to run stuff overnight, with certainty that we know
> what will happen.
> 
> And I want a clear answer when the "but how can you be certain?"
> question gets asked.

Just to clarify: does Robert's suggestion of starting the command in a
transaction block and cancelling it after messages appear (on any other DB with
the same schema, if need be) give too little certainty?  You could check
pg_locks to see what lock was taken, too.  It's surely not the ideal user
experience, but it seems dependable at least.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Joel Jacobson

2011/1/11 Robert Haas :
> I don't get it.  If two different items that exist in the system out
> of the box have the same description, it seems clear that relevant
> piece of disambiguating information exists nowhere in the description
> string.

I guess it is a question of prioritization.
If backwards compatibility is to be guaranteed, even for functions
returning text intended to be read by humans, then the function cannot
be modified, without violating that golden rule, if such a rule exists
within the PostgreSQL development project?

If it's not a golden rule, then it's a totally different story and
there is no excuse why it should return the same descriptions for the
same objects.
Any other reasoning is just silly.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Simon Riggs

On Tue, 2011-01-11 at 08:06 -0500, Noah Misch wrote:
> On Tue, Jan 11, 2011 at 12:37:28PM +, Simon Riggs wrote:
> > On Tue, 2011-01-11 at 07:14 -0500, Noah Misch wrote:
> > 
> > > These changes do make it harder to guess how much work the ALTER TABLE
> > > will do. Indeed, about 1/4 of my own guesses prior to writing were
> > > wrong.  Something like WITHOUT REWRITE might be the way to go, though
> > > there are more questions: if it does not rewrite, does it scan the
> > > table?  Which indexes, if any, does it rebuild?  Which foreign key
> > > constraints, if any, does it recheck?  With patch 0, you can answer
> > > all these questions by enabling DEBUG1 messages and trying the command
> > > on your test system.  For this reason, I did consider adding a VERBOSE
> > > clause to show those messages at DETAIL, rather than unconditionally
> > > showing them at DEBUG1.  In any case, if a WITHOUT REWRITE like you
> > > describe covers the important question, it's certainly easy enough to
> > > implement.
> > 
> > Trouble is, only superusers can set DEBUG1.
> 
> Setting client_min_messages in one's session works, as does "ALTER ROLE myself
> SET client_min_messages = debug1".  Still, indeed, DEBUG1 is not the usual 
> place
> to send a user for information.
> 
> > You're right, its more complex than I made out, though that strengthens
> > the feeling that we need a way to check what it does before it does it,
> > or a way to limit your expectations. Ideally that would be a
> > programmatic way, so that pgAdmin et al can issue a warning.
> > 
> > Given your thoughts above, my preference would be for 
> > EXPLAIN ALTER TABLE to describe the actions that will take place.
> 
> That does seem like the best UI.  Offhand, I would guess that's a project 
> larger
> than the patch series I have here.  We'd need to restructure ALTER TABLE into
> clear planning and execution stages, if not use the actual planner and 
> executor.

Please do something that works in this release, whatever that is. I will
follow your lead in putting a similar mechanism in for judging lock
levels.

I don't want to be looking through the docs each time I run this,
sweating between it taking 5 secs and 5 hours on a big server.
We need to be able to run stuff overnight, with certainty that we know
what will happen.

And I want a clear answer when the "but how can you be certain?"
question gets asked.

Thank you for the rest of the patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Noah Misch

On Tue, Jan 11, 2011 at 07:27:46AM -0500, Robert Haas wrote:
> On Tue, Jan 11, 2011 at 7:14 AM, Noah Misch  wrote:
> > True. ?At least we could completely document the lock choices on the ALTER 
> > TABLE
> > reference page. ?The no-rewrite cases are defined at arms length from ALTER
> > TABLE, and users can define their own, so it's a tougher fit.
> 
> I don't think it's prohibitively, tough, though, and I think it would
> be very valuable.  We may have to scratch our heads over exactly where
> to put the information, but we can figure out something that works.

Agreed.  I don't mean to suggest giving up on documenting anything, just that
some behaviors are clear enough by documentation alone, while others benefit
from additional syntax/messages to constrain/see what actually happens.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 135 matches

Mail list logo