Re: [HACKERS] docs: WITH queries and VALUES

2012-04-13 Thread Peter Eisentraut
On tor, 2012-04-12 at 11:59 +0200, Magnus Hagander wrote:
> The SELECT manpage has:
> 
> and with_query is:
> 
> with_query_name [ ( column_name [, ...] ) ] AS ( select | insert |
> update | delete )
> 
> 
> Should that list that you can use values as well? Or is it something
> we generally consider "wherever select works you can use values"?

Added.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 10:43 PM, Pavel Stehule  wrote:
>> Yeah.  I think it would be a good idea for UPDATE and DELETE to expose
>> a LIMIT option, but I can't really see the virtue in making that
>> functionality available only through SPI.
>
> I don't agree - LIMIT after UPDATE or DELETE has no sense. Clean
> solution should be based on using updateable CTE.

It has a lot of sense.  Without it, it's very difficult to do logical
replication on a table with no primary key.

(Whether or not people should create such tables in the first place
is, of course, beside the point.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 10:01 PM, Tom Lane  wrote:
> Greg Stark  writes:
>> On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas  wrote:
>>> That's probably true, but I'm not sure it's worth worrying about -
>>> one-in-four-billion is a pretty small probability.
>
>> Is this not subject to the birthday paradox? If you have a given hash
>> you're worried about a collision with then you have a
>> one-in-four-billion chance. But if you have a collection of hashes and
>> you're worried about any collisions then it only takes about 64k
>> before there's likely a collision.
>
> ... so, if pg_stat_statements.max were set as high as 64k, you would
> need to worry.

Well... at 64k, you'd be very likely to have a collision.  But the
whole birthday paradox thing means that there's a non-trivial
collision probability even at lower numbers of entries.  Seems like
maybe we ought to be using 64 bits here...

> Realistically, I'm more worried about collisions due to inadequacies in
> the jumble calculation logic (Peter already pointed out some risk
> factors in that regard).

...especially if collisions are even more frequent than random chance
would suggest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Peter Geoghegan
On 14 April 2012 03:01, Tom Lane  wrote:
> Realistically, I'm more worried about collisions due to inadequacies in
> the jumble calculation logic (Peter already pointed out some risk
> factors in that regard).

It's important to have a sense of proportion about the problem. The
worst thing that a collision can do is lead the DBA on a bit of a wild
goose chase. Importantly, collisions across databases and users are
impossible. I've always taken the view that aggregating query
statistics is a lossy process, and these kinds of problems seem like a
more than acceptable price to pay for low-overhead dynamic query
statistics .

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus

2012-04-13 Thread Pavel Stehule
>
> Yeah.  I think it would be a good idea for UPDATE and DELETE to expose
> a LIMIT option, but I can't really see the virtue in making that
> functionality available only through SPI.
>

I don't agree - LIMIT after UPDATE or DELETE has no sense. Clean
solution should be based on using updateable CTE.

Regards

Pavel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Tom Lane
Greg Stark  writes:
> On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas  wrote:
>> That's probably true, but I'm not sure it's worth worrying about -
>> one-in-four-billion is a pretty small probability.

> Is this not subject to the birthday paradox? If you have a given hash
> you're worried about a collision with then you have a
> one-in-four-billion chance. But if you have a collection of hashes and
> you're worried about any collisions then it only takes about 64k
> before there's likely a collision.

... so, if pg_stat_statements.max were set as high as 64k, you would
need to worry.

Realistically, I'm more worried about collisions due to inadequacies in
the jumble calculation logic (Peter already pointed out some risk
factors in that regard).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Greg Stark
On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas  wrote:
> That's probably true, but I'm not sure it's worth worrying about -
> one-in-four-billion is a pretty small probability.

Is this not subject to the birthday paradox? If you have a given hash
you're worried about a collision with then you have a
one-in-four-billion chance. But if you have a collection of hashes and
you're worried about any collisions then it only takes about 64k
before there's likely a collision.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command counter increment vs updating an active snapshot

2012-04-13 Thread Tom Lane
Ozgun Erdogan  writes:
> (1) What's the difference between advancing the command counter and
> updating an active snapshot? For example, I see that DefineRelation()
> increments the command counter, but explain.c / copy.c explicitly
> calls UpdateActiveSnapshotCommandId(). Is that because the latter call
> intends to make its changes visible to other concurrent processes?

No, that's all local.  CommandCounterIncrement automatically propagates
the new command counter into a couple of "standard" snapshots that are
meant to be always up-to-date, but if you want a stacked ActiveSnapshot
to follow suit you need to call UpdateActiveSnapshotCommandId.

> (2) The following change in pquery.c asserts that, if more than one
> utility statement exists in portal statements, they all have to be
> Notify statements.

That's because the only way that can happen is expansion of a rule, and
the only type of utility command allowed in rules is Notify.

> When I modify the code so that one statement in portal->stmts gets
> translated into four separate statements that all depend on each other
> (one planned, two utility, and another planned statement) and remove
> the assertion, all four statements still run fine.
> Looking into the code, I understand this isn't the expected behavior
> in terms of snapshot handling. In what scenarios can I expect our code
> to break?

[ shrug... ]  You expect an answer to that when you haven't shown us
the code?  But in general I'd be really wary of putting most utility
statements into a portal along with DML.  The kindest thing to be said
about that is that it's utterly untested.  Notify is pretty safe because
it doesn't affect the state of the database in any way that DML
statements would care about; this isn't the case for most others.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus

2012-04-13 Thread Robert Haas
On Thu, Apr 5, 2012 at 2:39 AM, Hitoshi Harada  wrote:
> On Wed, Apr 4, 2012 at 8:00 AM, Tom Lane  wrote:
>> umi.tan...@gmail.com writes:
>>> http://www.postgresql.org/docs/9.1/static/spi-spi-execute.html
>>
>>> ===
>>> SPI_execute("INSERT INTO foo SELECT * FROM bar", false, 5);
>>> will allow at most 5 rows to be inserted into the table.
>>> ===
>>
>>> This seems not true unless I'm missing something.
>>
>> Hmm ... that did work as described, until we broke it :-(.  This is an
>> oversight in the 9.0 changes that added separate ModifyTuple nodes to
>> plan trees.  ModifyTuple doesn't return after each updated row, unless
>> there's a RETURNING clause; which means that the current_tuple_count
>> check logic in ExecutePlan() no longer stops execution as intended.
>>
>> Given the lack of complaints since 9.0, maybe we should not fix this
>> but just redefine the new behavior as being correct?  But it seems
>> mighty inconsistent that the tuple limit would apply if you have
>> RETURNING but not when you don't.  In any case, the ramifications
>> are wider than one example in the SPI docs.
>>
>> Thoughts?
>
> To be honest, I was surprised when I found tcount parameter is said to
> be applied to even INSERT.  I believe people think that parameter is
> to limit memory consumption when returning tuples thus it'd be applied
> for only SELECT or DML with RETURNING.  So I'm +1 for non-fix but
> redefine the behavior.  Who wants to limit the number of rows
> processed inside the backend, from SPI?

Yeah.  I think it would be a good idea for UPDATE and DELETE to expose
a LIMIT option, but I can't really see the virtue in making that
functionality available only through SPI.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Tom Lane
Jim Nasby  writes:
> I agree that ms is on it's way out... and frankly it wouldn't surprise me if 
> at some point we actually had need of ns resolution.

> Given that, I don't think ms or us definitions make sense... float8 seconds 
> seams much more logical to me.

Well, the important point is that it be float8, so that fractions of
whatever units you choose can be represented.  I do not feel a strong
need to change the units in all the existing pg_stat_ columns from msec
to sec; that strikes me as just breaking things to little gain.  If the
majority of them were in sec then I'd want to converge on that, but
since the majority are in msec it seems like the path of least breakage
is to converge on that.

> Though, if we're going to end up seriously breaking things anyway,
> perhaps it would make sense to switch everything over to interval... I
> realize that there's more overhead there, but I don't think selecting
> from the stats views is exactly performance critical.

But (a) I *don't* want to seriously break things, and don't see a need
to; (b) interval is expensive and has got its own problems, notably an
internal limitation to usec resolution that we would not be able to get
rid of easily.  It's not even apparent to me that interval is
semantically The Right Thing for values that represent an accumulation
of a lot of different measurements.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] typo fix

2012-04-13 Thread Peter Eisentraut
On fre, 2012-04-13 at 17:27 +0900, Etsuro Fujita wrote:
> This is a little patch to fix a typo in file-fdw.sgml

Fixed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Peter Eisentraut
On tis, 2012-04-10 at 09:33 -0400, Robert Haas wrote:
> So, should we make the new columns exposed by pg_stat_statements use
> milliseconds, so that the block read/write timings are everywhere in
> milliseconds, or should we keep them as a float8, so that all the
> times exposed by pg_stat_statements use float8?

Wouldn't interval be the proper type to represent elapsed time?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-13 Thread Peter Eisentraut
On tor, 2012-04-12 at 18:19 -0400, Christopher Browne wrote:
> On Thu, Apr 12, 2012 at 6:11 PM, Jay Levitt  wrote:
> > Rather than extend the CF app into a trivial-patch workflow app, it might be
> > worth looking at integrating it with github.
> 
> There's a reluctance to require a proprietary component that could
> disappear on us without notice.

Yeah, I think initially this was more of a matter-of-principle decision,
but seeing the current spam problem on github.com, I'm glad we didn't do
it.  For me it has disqualified itself as a platform for intense
collaborative development.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-13 Thread Peter Eisentraut
On tor, 2012-04-12 at 07:59 +0200, Magnus Hagander wrote:
> It might be helpful (if the CF app had a trivial API) with a small
> tool that could run from a git hook (or manual script or alias) that
> would prompt for "which cf entry, if any, did this commit close"?

An API for the CF app would actually be nice in general, because then I
could write an small command-line tool and just type 'cf close 123' or
something instead of having to click around a bunch of times.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-13 Thread Peter Eisentraut
On ons, 2012-04-11 at 23:30 -0400, Robert Haas wrote:
> Now what would be sort of neat is if we had a way to keep all the
> versions of patch X plus author and reviewer information, links to
> reviews and discussion, etc. in some sort of centralized place.

Well, a properly linked email thread contains all this.  I have seen a
couple of anti-patterns evolving, though, including patch authors
starting a new email thread for each patch version, and reviewers
starting a new email thread for each review.  When that happens, the
existence of the commitfest app makes things worse, in a way.  (Of
course, any discussion here about bug trackers emphasizes the need for
email to be the primary communications method for this very reason.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Jim Nasby

On 4/10/12 5:07 PM, Greg Smith wrote:

I'd prefer to see at least usec resolution and 8 bytes of "dynamic range" for 
query related statistics.  Any of these would be fine from a UI perspective to me:

-float8 seconds
-float8 msec
-float8 usec
-int8 usec

I don't think int8 msec will be enough resolution to time queries for very 
long, if it's not already obsolete.  The committed example for pg_test_timing 
on good hardware already clocks trivial events at a single usec.  Even I/O is 
getting there.  I've measured my Fusion-io loaner card peaking at 8GB/s, which 
works out to 1 usec per 8K page. None of that is even price no object hardware 
today; it's the stuff sitting in my office.

If anything, I'd expect more timing code in the database that only has ms 
resolution right now will start looking fat in a year or two, and more things 
might need to be shifted to usec instead.  Checkpoint timing can survive having 
less resolution because its primary drumbeat is very unlikely to drop below the 
minutes range.


I agree that ms is on it's way out... and frankly it wouldn't surprise me if at 
some point we actually had need of ns resolution.

Given that, I don't think ms or us definitions make sense... float8 seconds 
seams much more logical to me.

Though, if we're going to end up seriously breaking things anyway, perhaps it 
would make sense to switch everything over to interval... I realize that 
there's more overhead there, but I don't think selecting from the stats views 
is exactly performance critical.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 20:15, Robert Haas  wrote:
> On Wed, Apr 11, 2012 at 9:02 AM, k...@rice.edu  wrote:
>> By using all 64-bits of the hash that we currently calculate, instead
>> of the current use of 32-bits only, the collision probabilities are
>> very low.
>
> That's probably true, but I'm not sure it's worth worrying about -
> one-in-four-billion is a pretty small probability.

That assumes that our hashing of query trees will exhibit uniform
distribution. While it's easy enough to prove that hash_any does that,
it would seem to me that that does not imply that we will exhibit
perfectly uniform distribution within pg_stat_statements. The reason
that I invented the jumble nomenclature to distinguish it from a
straight serialisation is that, apart from the fact that many fields
are simply ignored, we still couldn't deserialize what we do store
from the jumble, because the correct way to serialise a tree entails
serialising NULL pointers too - two non-equivalent jumbles could
actually be bitwise identical. In practice, I think that adding
NodeTags ought to be sufficient here, but I wouldn't like to bet my
life on it. Actually, that is a point that perhaps should have been
touched on in the comments at the top of the file.

You may wish to take a look at my original analysis in the
pg_stat_statements normalisation thread, which attempts to quantify
the odds by drawing upon the mathematics of the birthday problem.

> On the broader issue, Peter's argument here seems to boil down to
> "there is probably a reason why this is useful" and Tom's argument
> seems to boil down to "i don't want to expose it without knowing what
> that reason is". Peter may well be right, but that doesn't make Tom
> wrong.  If we are going to expose this, we ought to be able to
> document why we have it, and right now we can't, because we don't
> *really* know what it's good for, other than raising awareness of the
> theoretical possibility of collisions, which doesn't seem like quite
> enough.

Well, it's important to realise that the query string isn't really
stable, to the extent that it could differ as the query is evicted and
re-enters pg_stat_statements over time. The hash value is necessarily
a stable identifier, as it is the very identifier used by
pg_stat_statements. People are going to want to aggregate this
information over long periods and across database clusters, and I
would really like to facilitate that.

To be honest, with the plans that we have for replication, with the
addition of things like cascading replication, I think it will
increasingly be the case that people prefer built-in replication. The
fact that the actual hash value is affected by factors like the
endianness of the architecture in question seems mostly irrelevant.

If we were to expose this, I'd suggest that the documentation in the
table describing each column say of the value:

query_hash | stable identifier used by pg_stat_statements for the query

There'd then be additional clarification after the existing reference
to the hash value, which gave the required caveats about the hash
value being subject to various implementation artefacts that
effectively only make the values persistently indentify queries
referencing particular objects in the same database (that is, the
object OID values are used), or across databases that are binary
clones, such as between a streaming replica master and its standby.
The OID restriction alone effectively shadows all others, so there's
no need to mention endianness or anything like that.

Anyone who jumped to the conclusion that their aggregation tool would
work fine with Slony or something based on the query_hash description
alone would probably have bigger problems.

> On another note, I think this is a sufficiently major change that we
> ought to add Peter's name to the "Author" section of the
> pg_stat_statements documentation.

Thanks. I actually thought this myself, but didn't want to mention it
because I didn't think that it was up to me to decide, or to attempt
to influence that decision.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog location arithmetic

2012-04-13 Thread Robert Haas
On Mon, Mar 12, 2012 at 10:34 PM, Fujii Masao  wrote:
>>> I can see a usecase for having a pg_size_pretty(numeric) as an option.
>>> Not necessarily a very big one, but a >0 one.
>>
>> +1.
>
> +1, too.

I did some beautification of this patch.  I think the attached version
is cleaner and easier to read.  Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


sizepretty_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 4:01 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On another note, I think this is a sufficiently major change that we
>> ought to add Peter's name to the "Author" section of the
>> pg_stat_statements documentation.
>
> +1 ... as long as we have those at all, they probably ought to credit
> anybody who did substantial work on the module.
>
> I think that eventually we'll have to give them up, just the way that
> we don't credit anybody in particular as author of the core code; but
> for now this is a good change.

Yeah.  I'd actually be in favor of removing them, and similar
references in the comments, because they do lead and have led to
disputes about who deserves to be mentioned.  However, a change of
this magnitude certainly deserves mention; it's not really the same
module any more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Tom Lane
Robert Haas  writes:
> On another note, I think this is a sufficiently major change that we
> ought to add Peter's name to the "Author" section of the
> pg_stat_statements documentation.

+1 ... as long as we have those at all, they probably ought to credit
anybody who did substantial work on the module.

I think that eventually we'll have to give them up, just the way that
we don't credit anybody in particular as author of the core code; but
for now this is a good change.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.

2012-04-13 Thread Guillaume Lelarge

On 04/13/2012 08:15 PM, Kevin Grittner wrote:

Robert Haas  wrote:


In my view, remote_write seems a lot more clear than write


+1

I sure didn't understand it to mean remote_write when I read the
subject line.



Neither did I. So definitely +1.


--
Guillaume
 http://www.postgresql.fr
 http://dalibo.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PREPARE TRANSACTION compatibility?

2012-04-13 Thread Heikki Linnakangas

On 13.04.2012 22:02, Tom Lane wrote:

Heikki Linnakangas  writes:

On 13.04.2012 21:43, Peter Eisentraut wrote:

The commands PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED
are the only ones that do not have a Compatibility section on their
reference page.  Does anyone remember whether they were our invention or
copied from or inspired by some other implementation?



They are our invention.


Entirely?  I'm quite sure I remember some discussion about compatibility
of the prepared-transaction GIDs.  There must be at least part of that
API that's standard.


There's the X/Open standard, and the Java Transaction API (JTA) standard 
that's based on it. There's probably other specs for other languages, 
but those two are what I've looked at. The specs define the client-side 
interface, but they don't say anything about the client-server protocol.


X/Open and JTA have the concept of GIDs, called Xids in the JDBC 
nomenclature. IIRC they have a certain width, and consist of two parts, 
a transaction id, and a branch id. But from the server's point of view, 
the GIDs are just unique identifiers, assigned by the client (= 
transaction manager), with no further special meaning.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.

2012-04-13 Thread Thom Brown
On 13 April 2012 19:15, Kevin Grittner  wrote:
> Robert Haas  wrote:
>
>> In my view, remote_write seems a lot more clear than write
>
> +1
>
> I sure didn't understand it to mean remote_write when I read the
> subject line.

Whatever this option value is named, it needs to be referenced in the
postgresql.conf comment for this option, as it isn't currently.

I have a question though.  What happens when this is set to "write"
(or "remote_write" as proposed) but it's being used on a standalone
primary?  At the moment it's not documented what level of guarantee
this would provide.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-13 Thread Robert Haas
On Wed, Apr 11, 2012 at 9:02 AM, k...@rice.edu  wrote:
> By using all 64-bits of the hash that we currently calculate, instead
> of the current use of 32-bits only, the collision probabilities are
> very low.

That's probably true, but I'm not sure it's worth worrying about -
one-in-four-billion is a pretty small probability.

On the broader issue, Peter's argument here seems to boil down to
"there is probably a reason why this is useful" and Tom's argument
seems to boil down to "i don't want to expose it without knowing what
that reason is". Peter may well be right, but that doesn't make Tom
wrong.  If we are going to expose this, we ought to be able to
document why we have it, and right now we can't, because we don't
*really* know what it's good for, other than raising awareness of the
theoretical possibility of collisions, which doesn't seem like quite
enough.

On another note, I think this is a sufficiently major change that we
ought to add Peter's name to the "Author" section of the
pg_stat_statements documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PREPARE TRANSACTION compatibility?

2012-04-13 Thread Tom Lane
Heikki Linnakangas  writes:
> On 13.04.2012 21:43, Peter Eisentraut wrote:
>> The commands PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED
>> are the only ones that do not have a Compatibility section on their
>> reference page.  Does anyone remember whether they were our invention or
>> copied from or inspired by some other implementation?

> They are our invention.

Entirely?  I'm quite sure I remember some discussion about compatibility
of the prepared-transaction GIDs.  There must be at least part of that
API that's standard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PREPARE TRANSACTION compatibility?

2012-04-13 Thread Heikki Linnakangas

On 13.04.2012 21:43, Peter Eisentraut wrote:

The commands PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED
are the only ones that do not have a Compatibility section on their
reference page.  Does anyone remember whether they were our invention or
copied from or inspired by some other implementation?


They are our invention.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PREPARE TRANSACTION compatibility?

2012-04-13 Thread Peter Eisentraut
The commands PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED
are the only ones that do not have a Compatibility section on their
reference page.  Does anyone remember whether they were our invention or
copied from or inspired by some other implementation?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.

2012-04-13 Thread Kevin Grittner
Robert Haas  wrote:
 
> In my view, remote_write seems a lot more clear than write
 
+1
 
I sure didn't understand it to mean remote_write when I read the
subject line.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 18:33, Robert Haas  wrote:
> We do use insertion sort for partitions of size < 7.  I assume you are
> referring to something else.

I stand corrected. My confusion came from the removal of a later
*switch* to insertion sort in
a3f0b3d68f9a5357a3f72b40a45bcc714a9e0649, which also added the
pre-sorted check where you'd expect to see the insertion switch. Of
course, the n < 7 insertion sort thing is right beside the added
check. So another, redundant copy of insertion sort was removed, and
not the one that almost every real-world quicksort implementation has.

> Heap-sorting could also benefit from some optimization in this area.
> Right now, if you heap-sort data that's already in order, it gets
> progressively slower as you crank up work_mem, because the heap gets
> deeper and so extraction and siftup get more expensive, for no real
> benefit.  We could do something like check, every time we use up our
> available memory, whether the heap is already entirely in order.  If
> so, we could dump all but one tuple to the current run and the start
> reading tuples again.  Or maybe dump some percentage of the array,
> though that might require a bit more bookkeeping.  Or maybe there's a
> smarter algorithm that would also cater to mostly-sorted data.

Well, timsort is specifically designed to take advantage of pre-sorted
data. It does appear to have a lot of traction, as wikipedia points
out:

"Timsort has been Python's standard sorting algorithm since version
2.3. It is now also used to sort arrays in Java SE 7,[2] and on the
Android platform.[3] "

Somebody has produced a BSD-licensed C implementation that draws
heavily on the Python/Java one here:

http://code.google.com/p/timsort/source/browse/trunk/timSort.c?spec=svn17&r=17

It looks like it has exactly the same interface as a c stdlib qsort.
So it'd be fairly easy to produce a timsort_arg() based on this, if
anyone cares to.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE

2012-04-13 Thread Tom Lane
Alvaro Herrera  writes:
> Excerpts from Andrew Dunstan's message of mié abr 11 15:51:51 -0300 2012:
>> On 04/11/2012 02:45 PM, Tom Lane wrote:
>>> I don't really care for the idea that the ONLY goes in a different place
>>> for this operation than for every other kind of ALTER TABLE, but it does
>>> make sense if you subscribe to the quoted theory that ONLY is a property
>>> of the constraint and not the ALTER command as such.

>> I think I rather dislike it. ONLY should be followed by the name of the 
>> parent table whose children it causes us to exclude, IMNSHO. Moving it 
>> elsewhere doesn't seem to me to be a blow for clarity at all.

> If that's the only objection, maybe we could use a different keyword
> then, perhaps NOINHERIT:

> ALTER TABLE constraint_rename_test ADD CONSTRAINT con2 CHECK NOINHERIT (b>  
> 0);

I could live with that.  "CHECK ONLY" isn't particularly transparent as
to what it means, anyway.  "CHECK NOINHERIT" seems a lot clearer.

I'd propose "CHECK NO INHERIT", though, as (a) it seems better English
and (b) it avoids creating any new keyword.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Command counter increment vs updating an active snapshot

2012-04-13 Thread Ozgun Erdogan
Hi all (re-posting from pgsql-general),

I'm looking into Postgres' internals, and had two quick questions that
relate to each other.

(1) What's the difference between advancing the command counter and
updating an active snapshot? For example, I see that DefineRelation()
increments the command counter, but explain.c / copy.c explicitly
calls UpdateActiveSnapshotCommandId(). Is that because the latter call
intends to make its changes visible to other concurrent processes?

(2) The following change in pquery.c asserts that, if more than one
utility statement exists in portal statements, they all have to be
Notify statements.

https://github.com/postgres/postgres/commit/c0b00760365c74308e9e0719c993eadfbcd090c2#diff-6

When I modify the code so that one statement in portal->stmts gets
translated into four separate statements that all depend on each other
(one planned, two utility, and another planned statement) and remove
the assertion, all four statements still run fine.

Looking into the code, I understand this isn't the expected behavior
in terms of snapshot handling. In what scenarios can I expect our code
to break?

Thanks,

Ozgun.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 1:15 PM, Peter Geoghegan  wrote:
> On 13 April 2012 17:42, Peter Geoghegan  wrote:
>> One insight that I had at the time was that text comparisons where the
>> c locale isn't used are really rather expensive, and I doubt that
>> there is too much that can be done to address that directly.  However,
>> if we were to support timsort, that could substantially cut down on
>> the number of comparisons for text columns, which could be a real win.
>> Maybe that would happen through some explicit mechanism, or maybe the
>> planner could actually infer that it was likely to be optimal to use
>> timsort based on a high statistical correlation between physical row
>> ordering and logical ordering of a key column's values.
>
> Further thoughts:
>
> At the time we committed our own quicksort implementation, based on
> the NetBSD one, we eschewed the optimisation of using insertion sort
> when n is fairly low. This happens to be a very common optimisation,
> so I'm not really super-confident that that was a good decision.
> However, we also added our own optimisation, which is to attempt,
> regardless of the size of n, to ascertain that the array is
> pre-sorted, in which case we don't quicksort at all.

We do use insertion sort for partitions of size < 7.  I assume you are
referring to something else.

One thing that has struck me as odd is that we check for presorted
arrays at every level of recursion.  That is, if we start out with 100
elements, we'll check whether the input is presorted.  Then, if we
partition it into a block of 60 elements and a block of 40 elements,
we'll check each of those partitions over again to see whether they
are presorted.  There is definitely some potential upside here,
because if the input is mostly sorted it may contain runs where
everything is in order, and we'll detect that and avoid some work.
But it's also kind of expensive; I've wondered whether we should, for
example, move the test for presorted input down a bit, so that it only
happens in the n > 40 case.  Another idea is to arrange things so that
we remember the offset at which the last presort check failed.  When
we tail-recurse into the left half of the array, there's no need to
recheck - if the presort check fell out after our partition boundary,
it's presorted; if it fell out before the partition boundary, it
isn't.

> So if we attempt to quicksort an array which is almost pre-sorted, but
> say only has its very last element out-of-order, we'll do fairly
> horribly, because we waste the effort of almost an entire "bubble sort
> iteration". So almost-sorted data would seem to be a compelling thing
> to optimise for that reason, as well as for the simple observation
> that it isn't exactly uncommon in a relational database.

Heap-sorting could also benefit from some optimization in this area.
Right now, if you heap-sort data that's already in order, it gets
progressively slower as you crank up work_mem, because the heap gets
deeper and so extraction and siftup get more expensive, for no real
benefit.  We could do something like check, every time we use up our
available memory, whether the heap is already entirely in order.  If
so, we could dump all but one tuple to the current run and the start
reading tuples again.  Or maybe dump some percentage of the array,
though that might require a bit more bookkeeping.  Or maybe there's a
smarter algorithm that would also cater to mostly-sorted data.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 release notes, beta time?

2012-04-13 Thread Bruce Momjian
On Fri, Apr 13, 2012 at 12:40:09PM -0400, Robert Haas wrote:
> On Fri, Apr 13, 2012 at 11:23 AM, Bruce Momjian  wrote:
> > I just talked to Tom about the 9.2 release notes.  Do people want me to
> > write the 9.2 release notes?
> 
> +1.
> 
> > When do you think we will be ready for 9.2
> > beta?
> 
> Well, we've got a bunch of open issues, but most of them don't look
> *too* serious.  If everyone dropped what they're doing and worked on
> them, I think we could be done in 2 weeks, but realistically I think
> it's likely to take 1-2 months.

I was hoping for at least two weeks because I am away Monday to Thursday
of next week for EnterpriseDB.  I can do it by March 27, or the latest
March 30, that is 2-2.5 weeks from now.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 17:42, Peter Geoghegan  wrote:
> One insight that I had at the time was that text comparisons where the
> c locale isn't used are really rather expensive, and I doubt that
> there is too much that can be done to address that directly.  However,
> if we were to support timsort, that could substantially cut down on
> the number of comparisons for text columns, which could be a real win.
> Maybe that would happen through some explicit mechanism, or maybe the
> planner could actually infer that it was likely to be optimal to use
> timsort based on a high statistical correlation between physical row
> ordering and logical ordering of a key column's values.

Further thoughts:

At the time we committed our own quicksort implementation, based on
the NetBSD one, we eschewed the optimisation of using insertion sort
when n is fairly low. This happens to be a very common optimisation,
so I'm not really super-confident that that was a good decision.
However, we also added our own optimisation, which is to attempt,
regardless of the size of n, to ascertain that the array is
pre-sorted, in which case we don't quicksort at all.

So if we attempt to quicksort an array which is almost pre-sorted, but
say only has its very last element out-of-order, we'll do fairly
horribly, because we waste the effort of almost an entire "bubble sort
iteration". So almost-sorted data would seem to be a compelling thing
to optimise for that reason, as well as for the simple observation
that it isn't exactly uncommon in a relational database.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.

2012-04-13 Thread Robert Haas
On Thu, Jan 26, 2012 at 1:21 AM, Fujii Masao  wrote:
> On Wed, Jan 25, 2012 at 11:12 PM, Simon Riggs  wrote:
>> On Wed, Jan 25, 2012 at 1:57 PM, Robert Haas  wrote:
>>
>>> I think that this would be a lot more clear if we described this as
>>> synchronous_commit = remote_write rather than just synchronous_commit
>>> = write.  Actually, the internal constant is named that way already,
>>> but it's not exposed as such to the user.
>>
>> That's something to discuss at the end of the CF when people are less
>> busy and we get more input.
>>
>> It's an easy change whatever we decide to do.
>
> Added this to 9.2 Open Items.

OK, so I think it's time to decide what we want to do here.  In my
view, remote_write seems a lot more clear than write (since someone
might otherwise think we were talking about a local write) and it has
the additional advantage of matching the internal naming convention -
surely, if it's write to call it SYNCHRONOUS_COMMIT_REMOTE_WRITE
inside the system, then there's not really much reason to drop the
word "remote" on the user-visible side of things.  However, I just
work here.  Does anyone want to make a counter-argument?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 release notes, beta time?

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 12:51 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Apr 13, 2012 at 11:23 AM, Bruce Momjian  wrote:
>>> When do you think we will be ready for 9.2
>>> beta?
>
>> Well, we've got a bunch of open issues, but most of them don't look
>> *too* serious.  If everyone dropped what they're doing and worked on
>> them, I think we could be done in 2 weeks, but realistically I think
>> it's likely to take 1-2 months.
>
> Well, I would sure like to see a beta out sooner than that.  I don't
> think we necessarily have to resolve every known open issue before
> we put out beta1.  I believe we do need to resolve anything that would
> force initdb (such as Peter's bytea_agg change), and of course we need
> to write release notes so that beta testers know what to test.

Fair enough.

> At the same time, I'd like to light a fire under people to deal with
> the open issues.  I agree that it will take some time for all of
> them to get dealt with, but it doesn't help if the responsible hackers
> are procrastinating.

The joys of an all-volunteer project.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 release notes, beta time?

2012-04-13 Thread Tom Lane
Robert Haas  writes:
> On Fri, Apr 13, 2012 at 11:23 AM, Bruce Momjian  wrote:
>> When do you think we will be ready for 9.2
>> beta?

> Well, we've got a bunch of open issues, but most of them don't look
> *too* serious.  If everyone dropped what they're doing and worked on
> them, I think we could be done in 2 weeks, but realistically I think
> it's likely to take 1-2 months.

Well, I would sure like to see a beta out sooner than that.  I don't
think we necessarily have to resolve every known open issue before
we put out beta1.  I believe we do need to resolve anything that would
force initdb (such as Peter's bytea_agg change), and of course we need
to write release notes so that beta testers know what to test.

At the same time, I'd like to light a fire under people to deal with
the open issues.  I agree that it will take some time for all of
them to get dealt with, but it doesn't help if the responsible hackers
are procrastinating.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [trivial patch] grammar fixes in doc/src/sgml/high-availability.sgml

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 8:12 AM, Christoph Berg  wrote:
> diff --git a/doc/src/sgml/high-availability.sgml 
> b/doc/src/sgml/high-availability.sgml

Thanks for the patch; however, please attach patches rather than
including them inlined; they don't extract cleanly when included
inline, at least not for me.  I recreated this one (I think) and
committed it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 16:03, Greg Stark  wrote:
> Fwiw it also only holds for comparison sorts. If you can sort your
> items without actually comparing each item with the others then you
> aren't bound by it at all. Notably algorithms like radix sort and
> direct sort aren't bound by it and are O(n). I'm hoping to look at
> trying some of these for integer sorts when they apply using the new
> sort specialization code Peter added.

Actually, though a later revision of the patch did nominally allow for
user-defined sorting functions through the SortSupport infrastructure
(I didn't intend that it would be complete/useful enough to be really
worth documenting), the version that was finally committed did not.
However, there is a fairly obvious entry point for a radix sort, which
is here:

/*
 * We were able to accumulate all the tuples within the 
allowed
 * amount of memory.  Just qsort 'em and we're done.
 */
if (state->memtupcount > 1)
{
/* Can we use the single-key sort function? */
if (state->onlyKey != NULL)
qsort_ssup(state->memtuples, 
state->memtupcount,
   state->onlyKey);
else
qsort_tuple(state->memtuples,

state->memtupcount,

state->comparetup,
state);
}

The only specialisation that occurs here is between tuple sorts of a
single key and multiple (type-specific specialisations were ultimately
not judged to be worth the increase in binary size relative to their
diminishing benefits).  You'd probably set-up a type-specific
positional notation machinery within the state's SortSupport struct
(the type-specific abstraction through which we elide the SQL function
call machinery for types that support it).

One insight that I had at the time was that text comparisons where the
c locale isn't used are really rather expensive, and I doubt that
there is too much that can be done to address that directly.  However,
if we were to support timsort, that could substantially cut down on
the number of comparisons for text columns, which could be a real win.
Maybe that would happen through some explicit mechanism, or maybe the
planner could actually infer that it was likely to be optimal to use
timsort based on a high statistical correlation between physical row
ordering and logical ordering of a key column's values.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 10:51 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Apr 13, 2012 at 10:32 AM, Tom Lane  wrote:
>>> After some reflection I think that the blame should be pinned on
>>> have_relevant_joinclause(), which is essentially defined as "is there
>>> any join clause that can be evaluated at the join of these two
>>> relations?".  I think it would work better to define it as "is there any
>>> join clause that both these relations participate in?".
>
>> I think it's getting a little late in the day to be whacking the
>> planner around too much, but I have to admit that seems like a pretty
>> good and safe change to me, so maybe we should go ahead and do it.
>> I'm a bit worried, though, that with all the planner changes this
>> release we are going to spend a lot of time tracking down regressions
>> either in planning time or in plan quality.
>
> Could be.  I think though that this fits in pretty naturally with the
> parameterized-path changes, since both of them are really directed
> towards being able to apply inner indexscans in cases where we could
> not before.

Yeah, I had the same thought.  I am more worried that either the
changes we made for index-only scans or the parameterized paths stuff
is going to turn out to introduce nasty, as-yet-unforeseen
regressions.  But at this point we're in for that pain whatever we do
about this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 release notes, beta time?

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 11:23 AM, Bruce Momjian  wrote:
> I just talked to Tom about the 9.2 release notes.  Do people want me to
> write the 9.2 release notes?

+1.

> When do you think we will be ready for 9.2
> beta?

Well, we've got a bunch of open issues, but most of them don't look
*too* serious.  If everyone dropped what they're doing and worked on
them, I think we could be done in 2 weeks, but realistically I think
it's likely to take 1-2 months.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] index-only scans vs. Hot Standby, round two

2012-04-13 Thread Robert Haas
Currently, we have a problem with index-only scans in Hot Standby
mode: the xmin horizon on the standby might lag the master, and thus
an index-only scan might mistakenly conclude that no heap fetch is
needed when in fact it is.  I suggested that we handle this by
suppressing generation of index-only scan plans in Hot Standby mode,
but Simon, Noah, and Dimitri were arguing that we should instead do
the following, which is now on the open items list:

* Make XLOG_HEAP2_VISIBLE records generate recovery snapshot conflicts
so that IndexOnlyScans work on Hot Standby

Ideally, this would also allow us to remove the following hack from
heapgetpage(), which would improve sequential-scan performance for Hot
Standby.

/*
 * If the all-visible flag indicates that all tuples on the page are
 * visible to everyone, we can skip the per-tuple visibility tests. But
 * not in hot standby mode. A tuple that's already visible to all
 * transactions in the master might still be invisible to a read-only
 * transaction in the standby.
 */
all_visible = PageIsAllVisible(dp) && !snapshot->takenDuringRecovery;

As far as I can see, to make this work, we'd need to have
lazy_scan_heap() compute the latest xmin on any potentially
all-visible page and treat that as a cutoff XID, cancelling queries
with snapshots whose xmins equal or precede that value, just as we
currently do when freezing tuples (cf xlog_heap_freeze).  There are
two things to worry about: correctness, and more query cancellations.

In the department of query cancellations, I believe Noah argued
previously that this wasn't really going to cause a problem.  And,
indeed, if the master has a mix of inserts, updates, and deletes, then
it seems likely that any recovery conflicts generated this way would
be hitting about the same place in the XID space as the ones caused by
pruning away dead tuples.  What will be different, if we go this
route, is that an insert-only master, which right now only generates
conflicts at freezing time, will instead generate conflicts much
sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
enough to know whether this is a problem, and I'm not trying to block
this approach, but I do want to make sure that everyone agrees that
it's acceptable before we go do it, because inevitably somebody is
going to get bit.  In making our decision, I think we should keep in
mind that we are currently pretty laid-back about marking pages
all-visible, and that's probably something we're going to want to
change over time: for example, we recently discussed the fact that a
page with one dead tuple in it currently needs *two* vacuums to become
all-visible, because only the first vacuum pass is smart enough to
mark things all-visible, and the first heap pass only truncates the
dead tuple to a dead line pointer, so the page isn't all-visible a
that point.  When we fix that, which I think we're almost certainly
going to want to do at some point, then that means these conflicts
will occur that much sooner.  I think we will likely also want to
consider marking things all-visible on the fly at some point in the
future; for example, if we pull a buffer in for an index-only scan and
dirty it by setting a hint bit, we might want to take the plunge and
mark it all-visible before evicting it, to save effort the next time.
Again, not trying to dissuade anyone, just making sure we've thought
through it enough to avoid being unhappy later.  It's worth noting
that none of this will normally matter if hot_standby_feedback=on, so
part of the analysis here may depend on how many people we think have
flipped that switch.

As to correctness, after further review of lazy_scan_heap(), I think
there are some residual bugs here that need to be fixed whatever we
decide to do about the Hot Standby case:

1. I noticed that when we do PageSetAllVisible() today we follow it up
with SetBufferCommitInfoNeedsSave().  PD_ALL_VISIBLE is not merely a
hint, so I think these should be changed to MarkBufferDirty(), which
shouldn't make much difference given current code, but proposals to
handle hint changes differently than non-hint changes abound, so it's
probably not good to rely on those meaning the same thing forever.

2. More seriously, lazy_scan_heap() releases the buffer lock before
setting the all-visible bit on the heap.  This looks just plain wrong
(and it's my fault, to be clear).  Somebody else can come along after
we've released the buffer lock and mark the page not-all-visible.
That concurrent process will check the visibility map bit, find it
already clear, and go on its merry way.  Then, we set the visibility
map bit and go on our merry way.  Now we have a not-all-visible page
with the all-visible bit set in the visibility map.   Oops.  I think
this probably needs to be rewritten so that lazy_scan_heap() acquires
a pin on the visibility map page before locking the buffer for cleanup
and holds onto that pin until either we exit the ran

Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?

2012-04-13 Thread Guillaume Lelarge
On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote:
> On 08.04.2012 11:59, Guillaume Lelarge wrote:
> > Hi,
> >
> > I recently wrote a plpgsql plugin. I wanted to enable the use of pgxs,
> > to make it easier to compile the plugin, but I eventually found that I
> > can't do that because the plpgsql.h file is not available in the include
> > directory.
> >
> > I'm wondering if we shouldn't put the header files of plpgsql source
> > code in the include directory. It would help compiling the PL/pgsql
> > debugger, and profiler (and of course my own plugin).
> 
> Yep, I just bumped into this myself, while trying to make pldebugger 
> module compilable with pgxs.
> 
> > There could be a good reason which would explain why we can't (or don't
> > want to) do this, but I don't see it right now.
> 
> Me neither, except a general desire to keep internals hidden. I propose 
> the attached.
> 

Sounds good to me. I would love to see this happening in 9.2.

Thanks, Heikki.


-- 
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.2 release notes, beta time?

2012-04-13 Thread Bruce Momjian
I just talked to Tom about the 9.2 release notes.  Do people want me to
write the 9.2 release notes?  When do you think we will be ready for 9.2
beta?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Greg Stark
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas  wrote:
> On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane  wrote:
>> So has somebody found a hole in the n log n lower bound on the cost of
>> comparison-based sorting?  I thought that had been proven pretty
>> rigorously.
>
> There's not much danger of anyone finding a way around that bound
> since the proof is quite trivial, but keep in mind that it's a
> worst-case bound.

Fwiw it also only holds for comparison sorts. If you can sort your
items without actually comparing each item with the others then you
aren't bound by it at all. Notably algorithms like radix sort and
direct sort aren't bound by it and are O(n). I'm hoping to look at
trying some of these for integer sorts when they apply using the new
sort specialization code Peter added.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-13 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> I'm a bit worried, though, that with all the planner changes this
> release we are going to spend a lot of time tracking down regressions
> either in planning time or in plan quality.

I dunno, I agree that there will likely be more regressions, but I
also think people might be happier with more changes in release 'X' than
some changes in X and then some more in Y, both of which end up
destablizing their environment, particularly if we advertise that 'X'
has lots of changes and that people should test it thoroughly..

My feeling is that this would be good to include in 9.2, but I could be
swayed either way- I certainly agree with the sentiment that it's
getting to a pretty late date.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-13 Thread Tom Lane
Robert Haas  writes:
> On Fri, Apr 13, 2012 at 10:32 AM, Tom Lane  wrote:
>> After some reflection I think that the blame should be pinned on
>> have_relevant_joinclause(), which is essentially defined as "is there
>> any join clause that can be evaluated at the join of these two
>> relations?".  I think it would work better to define it as "is there any
>> join clause that both these relations participate in?".

> I think it's getting a little late in the day to be whacking the
> planner around too much, but I have to admit that seems like a pretty
> good and safe change to me, so maybe we should go ahead and do it.
> I'm a bit worried, though, that with all the planner changes this
> release we are going to spend a lot of time tracking down regressions
> either in planning time or in plan quality.

Could be.  I think though that this fits in pretty naturally with the
parameterized-path changes, since both of them are really directed
towards being able to apply inner indexscans in cases where we could
not before.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 10:32 AM, Tom Lane  wrote:
> I looked into the behavior complained of here:
> http://archives.postgresql.org/pgsql-performance/2012-04/msg00093.php
>
> The problem query can be abstracted to
>
>        select * from a, b, c, d
>        where a.x = b.y and (a.z = c.c or a.z = d.d)
>
> Table a is much larger than the others (in fact, in the given example
> c and d are known to be single rows), and there are indexes on the
> mentioned columns of a.  In this situation, the best plan is to
> cross-join c and d, then use a BitmapOr indexscan to pick out the rows
> of a that satisfy the OR condition, and finally join that small number
> of rows to b.  The planner will use a cross-join-first plan if we omit
> b and the first WHERE clause from the query; but in the query as given,
> it fails to discover that plan and falls back on a vastly inferior plan
> that involves forming the a/b join first.
>
> The reason for this behavior is the anti-clauseless-join heuristics in
> join_search_one_level().  Without b, there are no join clauses available
> at join level 2, so the planner is forced to form all three 2-way cross
> joins; and then at level 3 it finds out that joining a to c/d works
> well.  With b, we find the a/b join has a usable join clause so we form
> that join, and then we decide not to make any 2-way clauseless joins.
> So the c/d join is never constructed and there is no way to exploit the
> desirable indexscan at higher levels.
>
> After some reflection I think that the blame should be pinned on
> have_relevant_joinclause(), which is essentially defined as "is there
> any join clause that can be evaluated at the join of these two
> relations?".  I think it would work better to define it as "is there any
> join clause that both these relations participate in?".  In the majority
> of real-world queries, join clauses relate exactly two relations, so
> that these two definitions are equivalent.  However, when we do have
> join clauses involving 3 or more relations, such as the OR clause in
> this example, it's evidently useful to consider cross-product joins of
> the smaller relations so that the join clause can be applied during the
> scan of the largest table.
>
> It would probably not be a good idea to back-patch such a change, since
> it might have consequences I can't foresee at the moment.  But I'm
> strongly tempted to squeeze it into 9.2.  Thoughts?

I think it's getting a little late in the day to be whacking the
planner around too much, but I have to admit that seems like a pretty
good and safe change to me, so maybe we should go ahead and do it.
I'm a bit worried, though, that with all the planner changes this
release we are going to spend a lot of time tracking down regressions
either in planning time or in plan quality.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Robert Haas
On Wed, Mar 21, 2012 at 8:57 PM, Jeff Janes  wrote:
> On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas  wrote:
> ...
>>
>> One thing that seems inefficient to me about our current algorithm is
>> the use of the run number as a leading column in the sort key.
>> There's no real reason why the tuples destined for the next run need
>> to be maintained in heap order; we could just store them unordered and
>> heapify the whole lot of them when it's time to start the next run.
>> That ought to save comparison cycles for the current run, since the
>> heap will be less deep, and there are other possible savings as well -
>> in particular, an unordered array of tuples can be heapify'd in linear
>> time, whereas our current algorithm is O(n lg n).  However, when I
>> actually implemented this, I found that doing it this way was a loser.
>>  In fact, excluding the next-run tuples from the heap for the current
>> run was a BIG loser even before you add in the time required to
>> re-heapify.  This confused the daylights out of me for a while,
>> because it's hard to understand how insert and siftup can be slower on
>> a small heap than a large one.
>>
>> My working theory is that, even though we must necessarily do more
>> comparisons when manipulating a larger heap, many of those comparisons
>> are resolved by comparing run numbers, and that's a lot cheaper than
>> invoking the real comparator.  For example, if we insert into a heap
>> whose rightmost child is in the next run, and the new tuple goes into
>> the current run, and the current size of the heap is such that the
>> initial position of the new element is a descendent of the right node,
>> it will very quickly crash through all of the next-run tuples and only
>> need one REAL comparison - against the root.  Either the new element
>> ends up as the root, or it ends up as the direct child of the root;
>> now we remove the root and, perhaps, replace it with its rightmost
>> child, meaning that the next element we read in can do the exact same
>> thing all over again.  If we exclude the next-run elements from the
>> heap, then the average depth of the heap when inserting a new element
>> is smaller, but all the elements are in the same-run, and we have to
>> invoke the real comparator every time.  In other words, those next-run
>> tuples act as dummies which effectively create a heap of uneven depth,
>> and the average depth encountered when inserting tuples turns out to
>> be less than what we get by pulling out the dummies and making the
>> depth uniform.
>>
>> This makes me kind of nervous, because I don't understand why things
>> should work out so well as all that.  If throwing some dummy elements
>> into the heap makes things work better, then maybe we should throw in
>> some more.  Or maybe it would work better to take some but not all of
>> them out.  There certainly doesn't seem to be any indication in the
>> code that this is an anticipated effect, and it seems an awfully
>> providential accident.
>
> Is there a patch or a git repo available for this change?  I'd like to
> see if I can analyze that if I get some spare time.

Sorry for the slow response to this.  Patch is attached
(crummy-external-sort.patch).

> Clever.  Rather than doing a binary search of the path, it might make
> sense to do a reverse search of it.  The tuple is likely to send up
> somewhere at the bottom of the heap, both because that is where most
> of the data is, and because the tuple being reinserted just came from
> the bottom so it is likely to be biased that way.

Patches for these approaches also attached (siftup-using-bsearch.patch
and siftup-reverse-linear.patch).  Neither approach seemed terribly
promising in my testing, IIRC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


crummy-external-sort.patch
Description: Binary data


siftup-reverse-linear.patch
Description: Binary data


siftup-using-bsearch.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Improving our clauseless-join heuristics

2012-04-13 Thread Tom Lane
I looked into the behavior complained of here:
http://archives.postgresql.org/pgsql-performance/2012-04/msg00093.php

The problem query can be abstracted to

select * from a, b, c, d
where a.x = b.y and (a.z = c.c or a.z = d.d)

Table a is much larger than the others (in fact, in the given example
c and d are known to be single rows), and there are indexes on the
mentioned columns of a.  In this situation, the best plan is to
cross-join c and d, then use a BitmapOr indexscan to pick out the rows
of a that satisfy the OR condition, and finally join that small number
of rows to b.  The planner will use a cross-join-first plan if we omit
b and the first WHERE clause from the query; but in the query as given,
it fails to discover that plan and falls back on a vastly inferior plan
that involves forming the a/b join first.

The reason for this behavior is the anti-clauseless-join heuristics in
join_search_one_level().  Without b, there are no join clauses available
at join level 2, so the planner is forced to form all three 2-way cross
joins; and then at level 3 it finds out that joining a to c/d works
well.  With b, we find the a/b join has a usable join clause so we form
that join, and then we decide not to make any 2-way clauseless joins.
So the c/d join is never constructed and there is no way to exploit the
desirable indexscan at higher levels.

After some reflection I think that the blame should be pinned on
have_relevant_joinclause(), which is essentially defined as "is there
any join clause that can be evaluated at the join of these two
relations?".  I think it would work better to define it as "is there any
join clause that both these relations participate in?".  In the majority
of real-world queries, join clauses relate exactly two relations, so
that these two definitions are equivalent.  However, when we do have
join clauses involving 3 or more relations, such as the OR clause in
this example, it's evidently useful to consider cross-product joins of
the smaller relations so that the join clause can be applied during the
scan of the largest table.

It would probably not be a good idea to back-patch such a change, since
it might have consequences I can't foresee at the moment.  But I'm
strongly tempted to squeeze it into 9.2.  Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-04-13 Thread Robert Haas
On Thu, Apr 12, 2012 at 3:27 PM, Tom Lane  wrote:
> 1. Lobotomize add_path_precheck so it always returns true for a
> parameterized path.  This sounds horrid, but in the test cases I'm using
> it seems that this only results in doing the full path construction for
> a very small number of additional paths.

Query planner engineering is hard, because it's hard to predict what
kind of queries people will write, but this seems basically sane to
me.  Given that (if I recall our previous discuss on this point
correctly) we avoid generating parameterized paths in situations where
we could have simply revised the join order instead, we should only
really be getting any parameterized paths at all in situations where
they are likely to help.  Queries involving only inner joins, for
example, should never need a parameterized path covering more than a
single baserel; and even if you've got outer joins in the mix, most of
my queries tend to look like A IJ B IJ C IJ D LJ E LJ F LJ G, rather
than A IJ (B LJ C) which is where we actually need this machinery.  If
we spend a little more effort in that case it's quite likely to be
worth it; the trick is just to keep the extra work from bleeding into
the cases where it won't help.

> 3. Rearrange plan generation so that a parameterized path always uses
> all join clauses available from the specified outer rels.  (Any that
> don't work as indexquals would have to be applied as filter conditions.)
> If we did that, then we would be back to a situation where all paths
> with the same parameterization should yield the same rowcount, thus
> justifying letting add_path_precheck work as it does now.
>
> #3 would amount to pushing quals that would otherwise be checked at the
> nestloop join node down to the lowest inner-relation level where they
> could be checked.  This is something I'd suspected would be a good idea
> to start with, but hadn't gotten around to implementing for non-index
> quals.  It had not occurred to me that it might simplify cost estimation
> to always do that.

This seems like it could be quite a significant win.  It doesn't
really matter in <= 9.1 because in an old-style parameterized nestloop
the join filter is going to get applied immediately after the index
filter anyway, though I guess it's possible that you might save a
little bit by optimizing the order in which multiple filter conditions
are applied.  But if there can be intermediate joins in there then
it's a big deal; and the fact that it makes it easier to compare paths
and prune away bad ones earlier seems like a major benefit as well.
So I would be in favor of doing this if possible, but...

> I'm going to take a closer look at #3, but it may not be practical to
> try to squeeze it into 9.2; if not, I think #1 will do as a stopgap.

I agree with this, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Tom Lane
Jeff Davis  writes:
> On Sun, 2012-03-18 at 11:25 -0400, Tom Lane wrote:
>> Yeah, that was me, and it came out of actual user complaints ten or more
>> years back.  (It's actually not 2X growth but more like 4X growth
>> according to the comments in logtape.c, though I no longer remember the
>> exact reasons why.)  We knew when we put in the logtape logic that we
>> were trading off speed for space, and we accepted that.

> I skimmed through TAOCP, and I didn't find the 4X number you are
> referring to, and I can't think what would cause that, either. The exact
> wording in the comment in logtape.c is "4X the actual data volume", so
> maybe that's just referring to per-tuple overhead?

My recollection is that that was an empirical measurement using the
previous generation of code.  It's got nothing to do with per-tuple
overhead IIRC, but with the fact that the same tuple can be sitting on
multiple "tapes" during a polyphase merge, because some of the tapes can
be lying fallow waiting for future use --- but data on them is still
taking up space, if you do nothing to recycle it.  The argument in the
comment shows why 2X is the minimum space growth for a plain merge
algorithm, but that's only a minimum.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-13 Thread Peter Eisentraut
On tor, 2012-04-12 at 10:12 -0500, Joshua Berkus wrote:
> Well actually, the other advantage of using branches is that it would
> encourage committers to bounce a patch back to the submitter for
> modification *instead of* doing it themselves.  This would both have
> the advantage of saving time for the committer, and doing a better job
> of teaching submitters how to craft patches which don't need to be
> modified.

I don't see how using branches changes that.  You could (or should) be
doing that already anyway.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] not null validation option in contrib/file_fdw

2012-04-13 Thread Andrew Dunstan


On 04/13/2012 07:21 AM, Shigeru HANADA wrote:
> (2012/04/13 16:59), Etsuro Fujita wrote:
>> I updated the patch added to CF 2012-Next [1].  Attached is the updated
>> version of the patch.
> I applied the patch and ran regression tests of file_fdw, and I got
> SIGSEGV X-(
>
> The failure occurs in fileGetOptions, and it is caused by
> list_delete_cell used in foreach loop; ListCell points delete target has
> been free-ed in list_delete_cell, but foreach accesses it to get next
> element.
>
> Some of backend functions which use list_delete_cell in loop use "for"
> loop instead of foreach, and other functions exit the loop after calling
> list_delete_cell.  Since we can't stop searching non-COPY options until
> meeting the end of the options list, we would need to choose former
> ("for" loop), or create another list which contains only valid COPY
> options and return it via other_options parameter.
>

Yes, the code in fileGetOptions() appears to be bogus.

Also, "validate" is a terrible name for the option (and in the code)
IMNSHO. It's far too generic. "validate_not_null" or some such would
surely be better.

cheers

andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [trivial patch] grammar fixes in doc/src/sgml/high-availability.sgml

2012-04-13 Thread Christoph Berg
diff --git a/doc/src/sgml/high-availability.sgml 
b/doc/src/sgml/high-availability.sgml
new file mode 100644
index ed34dac..c5f3ff9
*** a/doc/src/sgml/high-availability.sgml
--- b/doc/src/sgml/high-availability.sgml
*** primary_conninfo = 'host=192.168.1.50 po
*** 1022,1028 
  
 
  Users will stop waiting if a fast shutdown is requested.  However, as
! when using asynchronous replication, the server will does not fully
  shutdown until all outstanding WAL records are transferred to the 
currently
  connected standby servers.
 
--- 1022,1028 
  
 
  Users will stop waiting if a fast shutdown is requested.  However, as
! when using asynchronous replication, the server will not fully
  shutdown until all outstanding WAL records are transferred to the 
currently
  connected standby servers.
 
*** primary_conninfo = 'host=192.168.1.50 po
*** 1126,1132 
  
 
  If you need to re-create a standby server while transactions are
! waiting, make sure that the commands to run pg_start_backup() and
  pg_stop_backup() are run in a session with
  synchronous_commit = off, otherwise those
  requests will wait forever for the standby to appear.
--- 1126,1132 
  
 
  If you need to re-create a standby server while transactions are
! waiting, make sure that the commands pg_start_backup() and
  pg_stop_backup() are run in a session with
  synchronous_commit = off, otherwise those
  requests will wait forever for the standby to appear.


Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


signature.asc
Description: Digital signature


Re: [HACKERS] not null validation option in contrib/file_fdw

2012-04-13 Thread Shigeru HANADA
(2012/04/13 16:59), Etsuro Fujita wrote:
> I updated the patch added to CF 2012-Next [1].  Attached is the updated
> version of the patch.

I applied the patch and ran regression tests of file_fdw, and I got
SIGSEGV X-(

The failure occurs in fileGetOptions, and it is caused by
list_delete_cell used in foreach loop; ListCell points delete target has
been free-ed in list_delete_cell, but foreach accesses it to get next
element.

Some of backend functions which use list_delete_cell in loop use "for"
loop instead of foreach, and other functions exit the loop after calling
list_delete_cell.  Since we can't stop searching non-COPY options until
meeting the end of the options list, we would need to choose former
("for" loop), or create another list which contains only valid COPY
options and return it via other_options parameter.

Regards,
-- 
Shigeru HANADA

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-13 Thread Dimitri Fontaine
Robert Haas  writes:
> The real problem with the command triggers patch is that we got a
> blizzard of code.  It's unrealistic to expect anyone to devote serious
> review time to a patch that's under constant development.  It also
> strikes me that a tremendous amount of pain could have been avoided by
> posting a clear and detailed design sketch for that patch before
> beginning to code.  Dimitri contended that without code, no one will
> read design sketches, but that doesn't for the most part jive with my
> experience, and I think that the strategy he actually chose backfired,
> because it was clear that any review would be hitting a moving target.

In my mind at least it's been more subtle. I had an agreed-on design
months before I started to code anything, at the Cluster Hackers Meeting
in Ottawa, where several commiters and long term contributors have been
participating in the discussion.

The big mistake seems to be starting to code with that rather than
spending another couple of months (or easily way more than that) of
rehashing it on-list.

About the design sketches, IME, what you can agree on on-list is a very
high level view about how to implement a feature, anything detailed
enough to have practical impact on the code you're writing happens while
reviewing code. Again, as I though the high level view was ok as of the
Cluster Hackers Meeting, I skipped that part and went directly to the
code level review.

Greg Smith will certainly stamp that email with a big red “Lesson
Learned” stamp here: nothing happened if you don't have a link to a
pgsql-hackers thread on the archives.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-13 Thread Dimitri Fontaine
Tom Lane  writes:
> Andres Freund  writes:
>> They might have been half-baked. But several of those didn't get design-level
>> review for several weeks which makes it rather hard to fully bake them in 
>> time...
>
> But if they didn't already have design-level review, that means they
> were not seen in any previous CF, which means they were not following
> the expectation that nontrivial patches should be submitted earlier than
> the last CF.

That's not true, unfortunately.

I had several rounds of design review in the previous CF, then some more
in the last CF, then another one that basically meant baking another
patch: same feature but spelled out differently (renaming catalog, files
and APIs is no fun in a big patch) to be able to have a new way to
organize the code and docs. It's for the best, and it's better to do
that before commit, no question. I appreciate having had that review,
really.

The problem with the command trigger has been another one. First, if we
want the last CF to be only about commits and not about feedback, we
should simply bite the bullet and spell it out loud. Second, we should
have been talking about the hard decision rather than pretending we
could still do something in this release. That's mainly on me and I know
it, but not being a commiter I have no other power than influence, and
I've been optimistic (and playing by the rules).

> I think the key point here is that people have to expect that it's going
> to take more than one round of review to land most nontrivial patches.
> And we have to discourage the expectation that that can happen within
> the last CF of a release cycle.  If anything, the last CF has to be
> tighter not looser than others on what we will accept, because there is
> no time to recover if something proves wrong with a patch after a
> month or three.

The more I think about that, the more I think we should declare the last
CF to be about preparing a release, not providing another round of
feedback. That could well be all we need here.

> I keep coming back to the thought that we need more and shorter CFs,
> and/or ReviewFests that are meant to help push WIP patches further
> down the track.  We need to make it easier to get those early reviews
> done while there's still development time left.

Separating away commit fest made for commits and those meant for
feedback is a good idea too. I'm not sure how much benefit we would get
there for non-last CFs, though.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-13 Thread Dimitri Fontaine
Robert Haas  writes:
> Even before this CommitFest, it's felt to me like this hasn't been a
> great cycle for reviewing.  I think we have generally had fewer people
> doing reviews than we did during the 9.0 and 9.1 cycles.  I think we
> had a lot of momentum with the CommitFest process when it was new, but
> three years on I think there's been some ebbing of the relative
> enthusiastic volunteerism that got off the ground.  I don't have a

I have another analysis here. Tom once said that commiters won't grow on
tree, and that is true for submitters too, obviously. I think the CF
process has been good into growing more submitters and growing existing
ones abilities too.

So you don't have that fewer reviewers available, it may be that they're
just heavily involved into being submitters too.

Tom Lane  writes:
> Yeah, this is something I was thinking about yesterday.  In the first
> couple of release cycles with the CommitFest process, we were willing to
> let the last fest of a release cycle go on for "as long as it takes",
> or at least that was what I felt the policy to be.  This time we
> eventually gave up and declared closure, but in hindsight we should
> likely have done that a month earlier.  The fact of the matter is that
> quite a few of the patches we were dealing with were *not* ready to
> commit, or even close to that, at the start of the fest.  If it weren't
> the last fest they would have gotten marked Returned With Feedback a
> lot sooner.

This and other posts in this threads are all hinting the same thing to
me: the last commit fest is *not* about feedback at all. If you still
need reviewers rather than commiters, you're out of luck for this
release, see you next time.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] typo fix

2012-04-13 Thread Etsuro Fujita
This is a little patch to fix a typo in file-fdw.sgml

Best regards,
Etsuro Fujita



file_fdw_typo_fix_20120413.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] not null validation option in contrib/file_fdw

2012-04-13 Thread Etsuro Fujita
I updated the patch added to CF 2012-Next [1].  Attached is the updated
version of the patch.

For the discussion in [2], I've introduced a new generic option, validate
for file_fdw foreign tables, which specifies if file_fdw verifies that
tuples meet NOT NULL constraints.  The default value for the option is
'false', and if the value is set to 'true', then file_fdw verifies NOT NULL
constraints.  For example, a user can issue the following to let file_fdw
verify the constraint of the msg column.

CREATE FOREIGN TABLE ft (id INTEGER, msg TEXT NOT NULL)
  SERVER fs
 OPTIONS (filename '/path/to/file', format 'csv',
  delimiter ',', validate 'true');

At the SELECT time, ABORT is issued when a NOT NULL violation error has been
found, like COPY FROM.  Once the validation is done successfully using e.g.
SELECT COUNT(*), the user can set the option to 'false' using ALTER FOREIGN
TABLE.  I think this option is needed for flat files due to their lack of
ability to check such a constraint.

(I added NOT NULL checking to ileIterateForeignScan(), but not to
file_acquire_sample_rows().  Should we do that at
file_acquire_sample_rows()?  I think it is good to just recommend that users
do ANALYZE a foreign table after the validation.)

For the discussion in [3], I've added a new external function
ExecNotNullCheck() and call it from fileIterateForeignScan.

Any comments are welcome.

Best regards,
Etsuro Fujita

[1] https://commitfest.postgresql.org/action/patch_view?id=822
[2] http://archives.postgresql.org/message-id/1038.1331738...@sss.pgh.pa.us
[3] http://archives.postgresql.org/pgsql-hackers/2012-03/msg00809.php



file_fdw_notnull_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers