[HACKERS] proposal 9.4 plpgsql: allows access to call stack from GET DIAGNOSTICS statement

2013-02-02 Thread Pavel Stehule
Hello

I propose enhancing GET DIAGNOSTICS statement about new field
PG_CONTEXT. It is similar to GET STACKED DIAGNOSTICS'
PG_EXCEPTION_CONTEXT.

Motivation for this proposal is possibility to get  call stack for
debugging without raising exception.

This code is based on cleaned code from Orafce, where is used four
years without any error reports.

CREATE OR REPLACE FUNCTION public.inner(integer)
 RETURNS integer
 LANGUAGE plpgsql
AS $function$
declare _context text;
begin
  get diagnostics _context = pg_context;
  raise notice '***%***', _context;
  return 2 * $1;
end;
$function$

postgres=# select outer_outer(10);
NOTICE:  ***PL/pgSQL function inner(integer) line 4 at GET DIAGNOSTICS
PL/pgSQL function outer(integer) line 3 at RETURN
PL/pgSQL function outer_outer(integer) line 3 at RETURN***
CONTEXT:  PL/pgSQL function outer(integer) line 3 at RETURN
PL/pgSQL function outer_outer(integer) line 3 at RETURN
 outer_outer
─
  20
(1 row)

Ideas, comments?

Regards

Pavel Stehule


get_diagnostics_context_initial.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Using indexes for partial index builds

2013-02-02 Thread Paul Norman
Hello,
After a discussion on IRC in #postgresql, I had a feature suggestion and it
was suggested I write it up here.

I have a large (200GB, 1.7b rows) table with a number of columns, but the
two of interest here are a hstore column, tags and a postgis geometry
column, geom. 

There is a GIN index on tags and a gist index on geom. These took about
36-48 hours to build in total. Obviously index building on a table this size
is not trivial.

Periodically I want to do a number of specialized queries on objects with a
particular tag or in a particular area. To do this I often want to create a
partial index. For example, I created the index btree ((tags -
'name_1'::text) text_pattern_ops) WHERE tags ? 'name_1'::text. My
understanding is to create this index PostgreSQL does a scan of the entire
table, even though the GIN index on tags could be used to identify which
rows could belong in the index. Where the WHERE condition selects only a
small portion of the table this is scanning a lot more data than is
necessary.

Another case where it would be useful is when I am conducting a detailed
analysis of some aspect of the rows in a particular city. This leads to all
the queries being of the form SELECT ... FROM ... WHERE
is_in_my_area(geom)[1]. 

My current project is doing analysis involving addresses. The ability to
create an index like btree((tags - 'addr:housenumber'), (tags -
'addr:street'), (tags - 'addr:city')) WHERE is_in_my_area(geom) in a
reasonable time would allow me to use a view instead of copying the local
area to a temporary table and indexing that table. The local area is about
350k rows, or about 0.02% of the database.

[1] The actual function for determining if it's in my area is long and not
really essential to the point here.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - assign result of query to psql variable

2013-02-02 Thread Pavel Stehule
Hello

2013/2/1 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 here is patch related to your proposal

 I looked at this a bit.  It's getting there, though I still don't trust
 the places where you've chosen to clear the prefix setting.  (Looking at
 it, I'm now not sure that I trust the implementation of \g either.)

 However, what I wanted to ask about was this:

 + if (PQgetisnull(result, 0, i))
 + value = pset.popt.nullPrint ? 
 pset.popt.nullPrint : ;
 + else
 + value = PQgetvalue(result, 0, i);

 What's the argument for using nullPrint here?  ISTM that's effectively a
 form of escaping, and I'd not expect that to get applied to values going
 into variables, any more than any other formatting we do when printing
 results.

 Admittedly, if we just take the PQgetvalue result blindly, there'll
 be no way to tell the difference between empty-string and NULL results.
 But I'm not convinced that this approach is better.  It would certainly
 need more than no documentation.


I have not strong opinion about storing NULL value - and nullPrint is
a best from simple variants -

possible variants

a) don't store NULL  values - and remove existing variable when NULL
be assigned - it is probably best, but should be confusing for users
b) implement flag IS NULL - for variables
c) use nullPrint
d) use empty String

@d is subset of @c, and I think so it put some better possibilities
with only two lines more

@a is probably best - but significant change - not hard to implement it

SELECT NULL AS x \g pref_
SELECT :'pref_' IS NULL;

is can be nice

but it should be premature optimization too - nullPrint is enough for
typical use cases

Regards

Pavel

Regards

Pavel




 regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 1 February 2013 23:56, Tom Lane t...@sss.pgh.pa.us wrote:

 Well, if we were tracking the latest value in shared memory, we could
 certainly clamp to that to ensure it didn't go backwards.  The problem
 is where to find storage for a per-DB value.

Adding new data columns to catalogs in backbranches seems like a great
reason to have an hstore column on every catalog table.

That way we can just add anything we need without causing other problems.

Obviously something for the future.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 1 February 2013 23:56, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Feb 1, 2013 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In any case, I no longer have much faith in the idea that letting
 GetOldestXmin go backwards is really safe.

 That is admittedly kind of weird behavior, but I think one could
 equally blame this on CLUSTER.  This is hardly the first time we've
 had to patch CLUSTER's handling of TOAST tables (cf commits
 21b446dd0927f8f2a187d9461a0d3f11db836f77,
 7b0d0e9356963d5c3e4d329a917f5fbb82a2ef05,
 83b7584944b3a9df064cccac06822093f1a83793) and it doesn't seem unlikely
 that we might go the full ten rounds.

 Yeah, but I'm not sure whether CLUSTER is the appropriate blamee or
 whether it's more like the canary in the coal mine, first to expose
 problematic behaviors elsewhere.  The general problem here is really
 that we're cleaning out toast tuples while the referencing main-heap
 tuple still physically exists.  How safe do you think that is?

Agree that CLUSTER is just first to hit the problem. As long as the
problem exists, we have issues.

 That
 did not ever happen before we decoupled autovacuuming of main and toast
 tables, either --- so a good case could be made that that idea is
 fundamentally broken.

It's broken, but not fundamentally. We can decouple the autovacuuming,
as long as we don't decouple the xid used.

Fixing GetOldestXmin() seems like the easiest thing to do for
backbranches, but seems a little heavy handed for a general fix -
though doing that will probably close any future thinkos of similar
nature.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - assign result of query to psql variable

2013-02-02 Thread Shigeru Hanada
On Sat, Feb 2, 2013 at 7:30 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 possible variants

 a) don't store NULL  values - and remove existing variable when NULL
 be assigned - it is probably best, but should be confusing for users
 b) implement flag IS NULL - for variables
 c) use nullPrint
 d) use empty String

+1 for a).  If users want to determine whether the result was NULL, or
want to use substitute string for NULL result, they can use coalesce
in SELECT clause.  Anyway the feature should be documented clearly.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Andres Freund
On 2013-02-01 15:09:34 -0800, Jeff Janes wrote:
 On Fri, Feb 1, 2013 at 2:34 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-02-01 14:05:46 -0800, Jeff Janes wrote:
 
  As far as I can tell this bug kicks in when your cluster gets to be
  older than freeze_min_age, and then lasts forever after.  After that
  point pretty much every auto-vacuum inspired by update/deletion
  activity will get promoted to a full table scan.  (Which makes me
  wonder how much field-testing the vm-only vacuum has received, if it
  was rarely happening in practice due to this bug.)
 
  I think you're misreading the code. freezeTableLimit is calculated by
 
   limit = ReadNewTransactionId() - freezetable;
 
  which is always relative to the current xid. The bug was that
  freezetable had the wrong value in autovac due to freeze_min_age being
  used instead of freeze_table_age.
 
 Right.  Since freeze_min_age was mistakenly being used, the limit
 would be 50 million in the past (rather than 150 million) under
 defaults.  But since the last full-table vacuum, whenever that was,
 used freeze_min_age for its intended purpose, that means the 50
 million in the past *at the time of that last vacuum* is the highest
 that relfrozenxid can be.  And that is going to be further back than
 50 million from right now, so the vacuum will always be promoted to a
 full scan.

Oh, wow. Youre right. I shouldn't answer emails after sport with cramped
fingers on a friday night... And I should have thought about this
scenario, because I essentially already explained it upthread, just with
a different set of variables.

This is rather scary. How come nobody noticed that this major
performance improvement was effectively disabled for that long?

I wonder if Kevin's observations about the price of autovac during
OLTPish workloads isn't at least partially caused by this. It will cause
lots of io prematurely because it scans far more than it should and a
VACUUM FREEZE will push it off.

 As an aside, it does seem like log_autovacuum_min_duration=0 should
 log whether a scan_all was done, and if so what relfrozenxid got set
 to.  But looking at where the log message is generated, I don't know
 where to retrieve that info.

Yes, I agree, I already had been thinking about that because its really
hard to get that information right now.
It seems easy enough to include it in the ereport() at the bottom of
lazy_vacuum_rel, we determine scan_all in that function, so that seems
ok?
For head I would actually vote for two data points, full_table_scan:
yes/no, skipped_percentage..., both are already available, so it seems
like it should be an easy thing to do.
I'd like to do this for 9.3, agreed? I would even like to add it to the
back branches, but I guess I cannot convince people of that...

 [1] I don't know why it is that a scan_all vacuum with a
 freeze_min_age of 50m (or a freezeLimit of 50 million ago) will not
 set relfrozenxid to a higher value than that if it discovers that it
 can, but it doesn't seem to.

There currently is no code to track whats the oldest observed xid, so a
simple implementation limitiation. Making that code better might be
rather worthwile if youre loading your table in a batch and don't touch
it later anymore...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Andres Freund
On 2013-02-01 16:59:52 -0500, Robert Haas wrote:
 I don't think I really understand the origin of the formula, so
 perhaps if someone would try to characterize why it seems to behave
 reasonably that would be helpful (at least to me).
 
  f(deadtuples, relpages, age) =
 deadtuples/relpages + e ^ (age*ln(relpages)/2^32)
 
 To maybe make that discussion go more quickly let me kvetch about a
 few things to kick things off:

I am not too happy with the formula yet either, but it seems youve
started the discussion into the right direction...
 
 - It's probably important to have a formula where we can be sure that
 the wrap-around term will eventually dominate the dead-tuple term,
 with enough time to spare to make sure nothing really bad happens; on
 the other hand, it's also desirable to avoid the case where a table
 that has just crossed the threshold for wraparound vacuuming doesn't
 immediately shoot to the top of the list even if it isn't truly
 urgent.  It's unclear to me just from looking at this formula how well
 the second term meets those goals.

I just wanted to mention that if everything goes well, we won't *ever*
get to an anti-wraparound-vacuum. Normally the table should cross the
vacuum_table_age barrier earlier and promote a normal vacuum to a
full-table vacuum which will set relfrozenxid to a new and lower value
and thus prevent anti-wraparound vacuums from occurring.
So priorizing anti-wraparound vacuums immediately and heavily doesn't
seem to be too bad.

 - More generally, it seems to me that we ought to be trying to think
 about the units in which these various quantities are measured.  Each
 term ought to be unit-less.  So perhaps the first term ought to divide
 dead tuples by total tuples, which has the nice property that the
 result is a dimensionless quantity that never exceeds 1.0.  Then the
 second term can be scaled somehow based on that value.

I think we also need to be careful to not try to get too elaborate on
this end. Once the general code for priorization is in, the exact
priorization formula can be easily incrementally tweaked. Just about any
half-way sensible priorization is better than what we have right now and
we might discover new effects once we do marginally better.

Imo the browne_strength field should be called 'priority' and the
priorization calculation formula should be moved qinto an extra
function.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Andres Freund
On 2013-02-01 19:24:02 -0500, Tom Lane wrote:
  * There are also replication-related effects: a walsender
  * process can set its xmin based on transactions that are no longer running
  * in the master but are still being replayed on the standby, thus possibly
  * making the GetOldestXmin reading go backwards.  In this case there is a
  * possibility that we lose data that the standby would like to have, but
  * there is little we can do about that --- data is only protected if the
  * walsender runs continuously while queries are executed on the standby.
  * (The Hot Standby code deals with such cases by failing standby queries
  * that needed to access already-removed data, so there's no integrity bug.)
 
 This is just bogus.  Why don't we make it a requirement on walsenders
 that they never move their advertised xmin backwards (or initially set
 it to less than the prevailing global xmin)?  There's no real benefit to
 allowing them to try to move the global xmin backwards, because any data
 that they might hope to protect that way could be gone already.

The problem imo has multiple sides.

a) We currently don't have a way to assign the minimal valid xmin to the
current backend. I already need that for logical replication, so we
might just as well add it now:
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=commitdiff;h=84f6ed9550b65b0584589977877e9a997daeb61d

With that its just a matter of taking ProcArrayLock exlusively before
calling GetOldestXmin and assigning it to MyPgXact-xmin before
releasing it.

b) We don't assign the xmin early enough, we only set it when the first
feedback message arrives, but we should set it when walsender starts
streaming.

c) After a disconnect the feedback message will rather likely ask for an
xmin horizon thats not valid anymore on the primary. If the disconnect
was short enough often enough that doesn't matter because nothing has
been cleared out, but it doesn't really work all that well.
Thats still better than setting it to the currently valid minimal xmin
horizon because it prevents cleanup from that moment on.
I don't see how this can be significantly improved without persistent
knowledge about standbys.

  * The return value is also adjusted with vacuum_defer_cleanup_age, so
  * increasing that setting on the fly is another easy way to make
  * GetOldestXmin() move backwards, with no consequences for data integrity.
 
 And as for that, it's been pretty clear for awhile that allowing
 vacuum_defer_cleanup_age to change on the fly was a bad idea we'd
 eventually have to undo.  The day of reckoning has arrived: it needs
 to be PGC_POSTMASTER.

Not sure if that would really solve anything, similar problems seem to
exist even if the postmaster has been restarted inbetween.


Btw, I wonder if that comment in GetOldestXmin still has good
justification, now that we allow cascading standbys:
 Also note that we intentionally don't apply vacuum_defer_cleanup_age
 on standby servers.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - assign result of query to psql variable

2013-02-02 Thread Pavel Stehule
2013/2/2 Shigeru Hanada shigeru.han...@gmail.com:
 On Sat, Feb 2, 2013 at 7:30 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 possible variants

 a) don't store NULL  values - and remove existing variable when NULL
 be assigned - it is probably best, but should be confusing for users
 b) implement flag IS NULL - for variables
 c) use nullPrint
 d) use empty String

 +1 for a).  If users want to determine whether the result was NULL, or
 want to use substitute string for NULL result, they can use coalesce
 in SELECT clause.  Anyway the feature should be documented clearly.


yes, this has one other advantage - it doesn't block possible
enhancing variables about NULL support in future. And other - it
doesn't depends on psql settings

Regards

Pavel

 --
 Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FREEZE has no warning

2013-02-02 Thread Noah Misch
On Fri, Feb 01, 2013 at 12:57:18PM -0500, Bruce Momjian wrote:
 On Tue, Jan 29, 2013 at 08:34:24PM -0500, Noah Misch wrote:
  On Fri, Jan 25, 2013 at 11:28:58PM -0500, Bruce Momjian wrote:
 BEGIN;
 TRUNCATE vistest;
 SAVEPOINT s1;
 COPY vistest FROM stdin CSV FREEZE;
 ERROR:  cannot perform FREEZE because of previous table activity in the 
   current transaction
 COMMIT;
   
   Clearly it was truncated in the same transaction, but the savepoint
   somehow invalidates the freeze.  There is a C comment about it:
  
  The savepoint prevents the COPY FREEZE, because COPY FREEZE needs the table 
  to
  have been created or truncated in the current *sub*transaction.  Issuing
  RELEASE s1 before the COPY makes it work again, for example.
  
   
* BEGIN;
* TRUNCATE t;
* SAVEPOINT save;
* TRUNCATE t;
* ROLLBACK TO save;
* COPY ...
  
  This is different.  The table was truncated in the current subtransaction, 
  and
  it's safe in principle to apply the optimization.  Due to an implementation
  artifact, we'll reject it anyway.
 
 OK, so, should we change the error message:
 
   cannot perform FREEZE because of transaction activity after table
   creation or truncation
 
 to
 
   cannot perform FREEZE because the table was not created or
   truncated in the current subtransaction
 
 or do we need to keep the transaction activity weasel wording because
 of the second case you listed above?  I am suspecting the later.

Let's touch on the exception in passing by using the phrase last truncated,
giving this wording for both the second and the third COPY FREEZE error sites:

cannot perform FREEZE because the table was not created or last
truncated in the current subtransaction


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FREEZE has no warning

2013-02-02 Thread Bruce Momjian
On Sat, Feb  2, 2013 at 09:51:13AM -0500, Noah Misch wrote:
  OK, so, should we change the error message:
  
  cannot perform FREEZE because of transaction activity after table
  creation or truncation
  
  to
  
  cannot perform FREEZE because the table was not created or
  truncated in the current subtransaction
  
  or do we need to keep the transaction activity weasel wording because
  of the second case you listed above?  I am suspecting the later.
 
 Let's touch on the exception in passing by using the phrase last truncated,
 giving this wording for both the second and the third COPY FREEZE error sites:
 
   cannot perform FREEZE because the table was not created or last
   truncated in the current subtransaction

Well, so you are saying that there really isn't any use-visible logic
for those messages to be different, i.e. that the transaction id can be
set to invalid even if we created/truncated in the same transaction, but
not the same subtransaction?  

The comparisons that trigger the two messages are:

if (cstate-rel-rd_createSubid != InvalidSubTransactionId ||
cstate-rel-rd_newRelfilenodeSubid != InvalidSubTransactionId)

and
if (cstate-rel-rd_createSubid != GetCurrentSubTransactionId() ||
cstate-rel-rd_newRelfilenodeSubid != GetCurrentSubTransactionId())

The first comparison is creation, the second, truncation.

Please confirm and I will make the change, or you can.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)

2013-02-02 Thread Vlad Bailescu
On Fri, Feb 1, 2013 at 5:54 PM, Pavan Deolasee pavan.deola...@gmail.comwrote:

 There is another problem that I noticed while looking at this case.
 The analyze took close to 500sec on a fairly good hardware (40GB RAM,
 10K rpm disks on RAID10) because many large child tables were scanned
 at once.


Just a small correction to get a good benchmark

The ~500 sec analyze time was on a test VM running on a i5 2.4 Ghz with 2
dedicate cores, 4 GB of RAM and stored on a notebook HDD. The partitioned
data was about 80GB.

On our production environment (described by Pavan) it took ~90 second for
~55GB of data in 8 child/partition tables (we stopped the import of
partitioned data when we discovered this issue - we COPYed and TRUNCATEd
partitions to speed up upgrade procedure from 8.4 to 9.1 by
pg_dump/pg_restore).

Vlad


Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Robert Haas
On Fri, Feb 1, 2013 at 7:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Having said that, I agree that a fix in GetOldestXmin() would be nice
 if we could find one, but since the comment describes at least three
 different ways the value can move backwards, I'm not sure that there's
 really a practical solution there, especially if you want something we
 can back-patch.

 Actually, wait a second.  As you say, the comment describes three known
 ways to make it go backwards.  It strikes me that all three are fixable:

  * if allDbs is FALSE and there are no transactions running in the current
  * database, GetOldestXmin() returns latestCompletedXid. If a transaction
  * begins after that, its xmin will include in-progress transactions in other
  * databases that started earlier, so another call will return a lower value.

 The reason this is a problem is that GetOldestXmin ignores XIDs of
 processes that are connected to other DBs.  It now seems to me that this
 is a flat-out bug.  It can ignore their xmins, but it should include
 their XIDs, because the point of considering those XIDs is that they may
 contribute to the xmins of snapshots computed in the future by processes
 in our own DB.  And snapshots never exclude any XIDs on the basis of
 which DB they're in.  (They can't really, since we can't know when the
 snap is taken whether it might be used to examine shared catalogs.)

This idea is superficially appealing, but I can't get excited about
it, because it will cause GetOldestXmin() to compute older (that is,
more conservative) XIDs than it does today, which means we'll vacuum
away less.  Considering the amount of pain we have around vacuum
today, that's surely got to be going the wrong direction.  I'm
inclined to think that the root of the problem is that the computation
is too conservative already.  I mean, what do we really need to
include in the computation in order not to vacuum away any data that's
still needed?  There are basically two concerns:

- You can't remove data that any snapshot in the current database
might try to read.  So you have to include the xmins of those
snapshots in your oldest-xmin value.  But, not really, because those
xmins were computed by looking at all running XIDs across the whole
database, and if their oldest xmin value came from some XID in another
database, it's not relevant.  What you want (but of course, can't get)
is the db-specific xmin horizon for each other snapshot.

- You have to worry about new snapshots that are about to appear but
haven't yet.  But, really, how big of a concern is this?  No
transactions can exit the running set while we hold ProcArrayLock, so
it seems to me that there ought to be a pretty tight limit on what can
show up here.  For example, if we've already taken a snapshot, surely
a subsequently taken snapshot can't really need an older xmin horizon
than the one we computed for our own snapshot.

In theory, it seems like we might be able to solve these problems by
having each backend advertise TWO xmin horizons: one representing the
oldest xmin it might be able to see in a system catalog, and the other
representing the oldest xmin it might be able to see in its own
database.  This would allow significantly more precise tracking of
what is really needed in the current database, but there's an obvious
implementation concern, which is that GetSnapshotData is hot enough
that individual instructions matter.

  * There are also replication-related effects: a walsender
  * process can set its xmin based on transactions that are no longer running
  * in the master but are still being replayed on the standby, thus possibly
  * making the GetOldestXmin reading go backwards.  In this case there is a
  * possibility that we lose data that the standby would like to have, but
  * there is little we can do about that --- data is only protected if the
  * walsender runs continuously while queries are executed on the standby.
  * (The Hot Standby code deals with such cases by failing standby queries
  * that needed to access already-removed data, so there's no integrity bug.)

 This is just bogus.  Why don't we make it a requirement on walsenders
 that they never move their advertised xmin backwards (or initially set
 it to less than the prevailing global xmin)?  There's no real benefit to
 allowing them to try to move the global xmin backwards, because any data
 that they might hope to protect that way could be gone already.

I think there is already some code that aims to do this, but I might
be misremembering.  IOW, this comment might already be incorrect.

  * The return value is also adjusted with vacuum_defer_cleanup_age, so
  * increasing that setting on the fly is another easy way to make
  * GetOldestXmin() move backwards, with no consequences for data integrity.

 And as for that, it's been pretty clear for awhile that allowing
 vacuum_defer_cleanup_age to change on the fly was a bad idea we'd
 eventually 

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Robert Haas
On Fri, Feb 1, 2013 at 6:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That is admittedly kind of weird behavior, but I think one could
 equally blame this on CLUSTER.  This is hardly the first time we've
 had to patch CLUSTER's handling of TOAST tables (cf commits
 21b446dd0927f8f2a187d9461a0d3f11db836f77,
 7b0d0e9356963d5c3e4d329a917f5fbb82a2ef05,
 83b7584944b3a9df064cccac06822093f1a83793) and it doesn't seem unlikely
 that we might go the full ten rounds.

 Yeah, but I'm not sure whether CLUSTER is the appropriate blamee or
 whether it's more like the canary in the coal mine, first to expose
 problematic behaviors elsewhere.  The general problem here is really
 that we're cleaning out toast tuples while the referencing main-heap
 tuple still physically exists.  How safe do you think that is?  That
 did not ever happen before we decoupled autovacuuming of main and toast
 tables, either --- so a good case could be made that that idea is
 fundamentally broken.

I confess I'm not entirely sanguine about that.  A similar issue
arises with index tuples referencing heap tuples, and we go to great
lengths to make sure that the last vestiges of the heap tuple can't be
removed until the index tuples are well and truly dead.  If we were to
adopt the same approach here, it would mean that, instead of
consulting XIDs at all, we'd remove TOAST pointers only when there
were provably no main-table tuples pointing to them.  I'm not really
proposing that, but it's an interesting thought-experiment.  What
we're actually doing is relying on values to match exactly that were
never intended as more than an approximation.

Can you point me to the commit that decoupled this?  What was the
motivation for it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-02-02 Thread Robert Haas
On Fri, Feb 1, 2013 at 12:04 AM, Amit Kapila amit.kap...@huawei.com wrote:
 I think user should be aware of effect before using SET commands, as these 
 are used at various levels (TRANSACTION, SESSION, ...).

Ideally, sure.  But these kinds of mistakes are easy to make.  That's
why LOCK and DECLARE CURSOR already emit errors in this case - why
should this one be any different?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Robert Haas
On Fri, Feb 1, 2013 at 6:09 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 As an aside, it does seem like log_autovacuum_min_duration=0 should
 log whether a scan_all was done, and if so what relfrozenxid got set
 to.

That would be nifty.

 [1] I don't know why it is that a scan_all vacuum with a
 freeze_min_age of 50m (or a freezeLimit of 50 million ago) will not
 set relfrozenxid to a higher value than that if it discovers that it
 can, but it doesn't seem to.

That also seems very much worth fixing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Robert Haas
On Sat, Feb 2, 2013 at 8:41 AM, Andres Freund and...@2ndquadrant.com wrote:
 - It's probably important to have a formula where we can be sure that
 the wrap-around term will eventually dominate the dead-tuple term,
 with enough time to spare to make sure nothing really bad happens; on
 the other hand, it's also desirable to avoid the case where a table
 that has just crossed the threshold for wraparound vacuuming doesn't
 immediately shoot to the top of the list even if it isn't truly
 urgent.  It's unclear to me just from looking at this formula how well
 the second term meets those goals.

 I just wanted to mention that if everything goes well, we won't *ever*
 get to an anti-wraparound-vacuum. Normally the table should cross the
 vacuum_table_age barrier earlier and promote a normal vacuum to a
 full-table vacuum which will set relfrozenxid to a new and lower value
 and thus prevent anti-wraparound vacuums from occurring.
 So priorizing anti-wraparound vacuums immediately and heavily doesn't
 seem to be too bad.

IMHO, this is hopelessly optimistic.  Yes, it's intended to work that
way.  But INSERT-only or INSERT-mostly tables are far from an uncommon
use case; and in fact they're probably the most common cause of pain
in this area.  You insert a gajillion tuples, and vacuum never kicks
off, and then eventually you either update some tuples or hit
autovacuum_freeze_max_age and suddenly, BAM, you get this gigantic
vacuum that rewrites the entire table.  And then you open a support
ticket with your preferred PostgreSQL support provider and say
something like WTF?.

 - More generally, it seems to me that we ought to be trying to think
 about the units in which these various quantities are measured.  Each
 term ought to be unit-less.  So perhaps the first term ought to divide
 dead tuples by total tuples, which has the nice property that the
 result is a dimensionless quantity that never exceeds 1.0.  Then the
 second term can be scaled somehow based on that value.

 I think we also need to be careful to not try to get too elaborate on
 this end. Once the general code for priorization is in, the exact
 priorization formula can be easily incrementally tweaked. Just about any
 half-way sensible priorization is better than what we have right now and
 we might discover new effects once we do marginally better.

I agree.  It would be nice to have some way of measuring the positive
or negative impact of what we introduce, too, but I don't have a good
idea what that would be.

 Imo the browne_strength field should be called 'priority' and the
 priorization calculation formula should be moved qinto an extra
 function.

Yeah, or maybe vacuum_priority, since that would be easier to grep for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FREEZE has no warning

2013-02-02 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Well, so you are saying that there really isn't any use-visible logic
 for those messages to be different,

No, and in fact the whole block of code is badly written because it
conflates two unrelated tests.  I guess somebody was trying to save
a couple of nanoseconds by not calling GetCurrentSubTransactionId
if a previous test had failed, but really why should we care about
that number of cycles in COPY preliminaries?  The code ought to be
more like this:

/* comment about skipping FSM or WAL here */
if (cstate-rel-rd_createSubid != InvalidSubTransactionId ||
cstate-rel-rd_newRelfilenodeSubid != InvalidSubTransactionId)
{
hi_options |= HEAP_INSERT_SKIP_FSM;
if (!XLogIsNeeded())
hi_options |= HEAP_INSERT_SKIP_WAL;
}
/* comment about when we can perform FREEZE here */
if (cstate-freeze)
{
if (!ThereAreNoPriorRegisteredSnapshots() || !ThereAreNoReadyPortals())
ereport(ERROR,
(ERRCODE_INVALID_TRANSACTION_STATE,
errmsg(cannot perform FREEZE because of prior transaction 
activity)));

if (cstate-rel-rd_createSubid != GetCurrentSubTransactionId() 
cstate-rel-rd_newRelfilenodeSubid != GetCurrentSubTransactionId())
ereport(ERROR,
(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE,
errmsg(cannot perform FREEZE because the table was not 
created or truncated in the current subtransaction)));
hi_options |= HEAP_INSERT_FROZEN;
}


regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - assign result of query to psql variable

2013-02-02 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 On Sat, Feb 2, 2013 at 7:30 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 possible variants
 
 a) don't store NULL  values - and remove existing variable when NULL
 be assigned - it is probably best, but should be confusing for users
 b) implement flag IS NULL - for variables
 c) use nullPrint
 d) use empty String

 +1 for a).  If users want to determine whether the result was NULL, or
 want to use substitute string for NULL result, they can use coalesce
 in SELECT clause.  Anyway the feature should be documented clearly.

Yeah, I was considering that one too.  Let's do it that way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FREEZE has no warning

2013-02-02 Thread Bruce Momjian
On Sat, Feb  2, 2013 at 12:09:05PM -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Well, so you are saying that there really isn't any use-visible logic
  for those messages to be different,
 
 No, and in fact the whole block of code is badly written because it
 conflates two unrelated tests.  I guess somebody was trying to save
 a couple of nanoseconds by not calling GetCurrentSubTransactionId
 if a previous test had failed, but really why should we care about
 that number of cycles in COPY preliminaries?  The code ought to be
 more like this:
 
 /* comment about skipping FSM or WAL here */
 if (cstate-rel-rd_createSubid != InvalidSubTransactionId ||
 cstate-rel-rd_newRelfilenodeSubid != InvalidSubTransactionId)
 {
 hi_options |= HEAP_INSERT_SKIP_FSM;
 if (!XLogIsNeeded())
 hi_options |= HEAP_INSERT_SKIP_WAL;
 }
 /* comment about when we can perform FREEZE here */
 if (cstate-freeze)
 {
 if (!ThereAreNoPriorRegisteredSnapshots() || 
 !ThereAreNoReadyPortals())
 ereport(ERROR,
 (ERRCODE_INVALID_TRANSACTION_STATE,
 errmsg(cannot perform FREEZE because of prior 
 transaction activity)));
 
 if (cstate-rel-rd_createSubid != GetCurrentSubTransactionId() 
 cstate-rel-rd_newRelfilenodeSubid != 
 GetCurrentSubTransactionId())
 ereport(ERROR,
 (ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE,
 errmsg(cannot perform FREEZE because the table was 
 not created or truncated in the current subtransaction)));
 hi_options |= HEAP_INSERT_FROZEN;
 }

Yes, I found the blocking odd too --- the test for
InvalidSubTransactionId is used by hi_options, and for freeze checking. 
I assumed != InvalidSubTransactionId and !=
GetCurrentSubTransactionId() had different meanings for freeze
checking.

I compounded the problem because originally there was no FREEZE failure
so no action was taken if != InvalidSubTransactionId.

Applied patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
new file mode 100644
index 49cc8dd..523c1e0
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** CopyFrom(CopyState cstate)
*** 1996,2031 
  		hi_options |= HEAP_INSERT_SKIP_FSM;
  		if (!XLogIsNeeded())
  			hi_options |= HEAP_INSERT_SKIP_WAL;
  
! 		/*
! 		 * Optimize if new relfilenode was created in this subxact or
! 		 * one of its committed children and we won't see those rows later
! 		 * as part of an earlier scan or command. This ensures that if this
! 		 * subtransaction aborts then the frozen rows won't be visible
! 		 * after xact cleanup. Note that the stronger test of exactly
! 		 * which subtransaction created it is crucial for correctness
! 		 * of this optimisation.
! 		 */
! 		if (cstate-freeze)
! 		{
! 			if (!ThereAreNoPriorRegisteredSnapshots() || !ThereAreNoReadyPortals())
! ereport(ERROR,
! 		(ERRCODE_INVALID_TRANSACTION_STATE,
! 		errmsg(cannot perform FREEZE because of prior transaction activity)));
  
! 			if (cstate-rel-rd_createSubid == GetCurrentSubTransactionId() ||
! cstate-rel-rd_newRelfilenodeSubid == GetCurrentSubTransactionId())
! hi_options |= HEAP_INSERT_FROZEN;
! 			else
! ereport(ERROR,
! 		(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE,
! 		errmsg(cannot perform FREEZE because of transaction activity after table creation or truncation)));
! 		}
  	}
- 	else if (cstate-freeze)
- 		ereport(ERROR,
- (ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE,
-  errmsg(cannot perform FREEZE because the table was not created or truncated in the current transaction)));
  
  	/*
  	 * We need a ResultRelInfo so we can use the regular executor's
--- 1996,2027 
  		hi_options |= HEAP_INSERT_SKIP_FSM;
  		if (!XLogIsNeeded())
  			hi_options |= HEAP_INSERT_SKIP_WAL;
+ 	}
  
! 	/*
! 	 * Optimize if new relfilenode was created in this subxact or
! 	 * one of its committed children and we won't see those rows later
! 	 * as part of an earlier scan or command. This ensures that if this
! 	 * subtransaction aborts then the frozen rows won't be visible
! 	 * after xact cleanup. Note that the stronger test of exactly
! 	 * which subtransaction created it is crucial for correctness
! 	 * of this optimisation.
! 	 */
! 	if (cstate-freeze)
! 	{
! 		if (!ThereAreNoPriorRegisteredSnapshots() || !ThereAreNoReadyPortals())
! 			ereport(ERROR,
! 	(ERRCODE_INVALID_TRANSACTION_STATE,
! 	errmsg(cannot perform FREEZE because of prior transaction activity)));
  
! 		if (cstate-rel-rd_createSubid != GetCurrentSubTransactionId() 
! 			cstate-rel-rd_newRelfilenodeSubid != GetCurrentSubTransactionId())
! 			ereport(ERROR,
! 	

Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 2 February 2013 14:24, Andres Freund and...@2ndquadrant.com wrote:

 b) We don't assign the xmin early enough, we only set it when the first
 feedback message arrives, but we should set it when walsender starts
 streaming.

That's easy to fix.

 c) After a disconnect the feedback message will rather likely ask for an
 xmin horizon thats not valid anymore on the primary. If the disconnect
 was short enough often enough that doesn't matter because nothing has
 been cleared out, but it doesn't really work all that well.
 Thats still better than setting it to the currently valid minimal xmin
 horizon because it prevents cleanup from that moment on.
 I don't see how this can be significantly improved without persistent
 knowledge about standbys.

We could delay startup of the standby until the xmin on the standby
reaches the xmin on the master.

So when the standby has hot_standby_feedback = on,  at standby
connection we set the xmin of the walsender to be the current value on
the master, then we disallow connections on standby until we have
replayed up to that xmin on the standby. That way the xmin of the
walsender never goes backwards nor do we get cancelations on the
standby.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 2 February 2013 00:24, Tom Lane t...@sss.pgh.pa.us wrote:

  * The return value is also adjusted with vacuum_defer_cleanup_age, so
  * increasing that setting on the fly is another easy way to make
  * GetOldestXmin() move backwards, with no consequences for data integrity.

 And as for that, it's been pretty clear for awhile that allowing
 vacuum_defer_cleanup_age to change on the fly was a bad idea we'd
 eventually have to undo.  The day of reckoning has arrived: it needs
 to be PGC_POSTMASTER.

Agreed, will fix.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 2 February 2013 00:24, Tom Lane t...@sss.pgh.pa.us wrote:

  * if allDbs is FALSE and there are no transactions running in the current
  * database, GetOldestXmin() returns latestCompletedXid. If a transaction
  * begins after that, its xmin will include in-progress transactions in other
  * databases that started earlier, so another call will return a lower value.

 The reason this is a problem is that GetOldestXmin ignores XIDs of
 processes that are connected to other DBs.  It now seems to me that this
 is a flat-out bug.  It can ignore their xmins, but it should include
 their XIDs, because the point of considering those XIDs is that they may
 contribute to the xmins of snapshots computed in the future by processes
 in our own DB.  And snapshots never exclude any XIDs on the basis of
 which DB they're in.  (They can't really, since we can't know when the
 snap is taken whether it might be used to examine shared catalogs.)

Agree thats a bug.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Andres Freund
On 2013-02-02 18:32:44 +, Simon Riggs wrote:
 On 2 February 2013 14:24, Andres Freund and...@2ndquadrant.com wrote:
 
  b) We don't assign the xmin early enough, we only set it when the first
  feedback message arrives, but we should set it when walsender starts
  streaming.
 
 That's easy to fix.

Not trivial, but not too hard, yes. When the standby initially connects
we don't yet know which xid will be required because consistency hasn't
yet been achieved.

  c) After a disconnect the feedback message will rather likely ask for an
  xmin horizon thats not valid anymore on the primary. If the disconnect
  was short enough often enough that doesn't matter because nothing has
  been cleared out, but it doesn't really work all that well.
  Thats still better than setting it to the currently valid minimal xmin
  horizon because it prevents cleanup from that moment on.
  I don't see how this can be significantly improved without persistent
  knowledge about standbys.
 
 We could delay startup of the standby until the xmin on the standby
 reaches the xmin on the master.
 
 So when the standby has hot_standby_feedback = on,  at standby
 connection we set the xmin of the walsender to be the current value on
 the master, then we disallow connections on standby until we have
 replayed up to that xmin on the standby. That way the xmin of the
 walsender never goes backwards nor do we get cancelations on the
 standby.

Thats easy enough when the standby is initially started but doesn't work
that well if just the connection between both failed (or the master was
restarted) and a reconnect worked. To make it work similarly in that
case we would have to throw everyone out which would kind of counteract
the whole idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-02-02 Thread Robert Haas
On Thu, Jan 31, 2013 at 9:48 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/01/2013 12:01 PM, Josh Berkus wrote:
 If we're going to start installing safeguards against doing stupid
 things, there's a long list of scenarios that happen far more
 regularly than this ever will and cause far more damage.

 What's wrong with making it easier for sysadmins to troubleshoot things?
  Again, I'm not talking about erroring out, I'm talking about logging a
 warning.

 Or to put it another way:  Robert, you just did a nobody wants that to
 me.  I thought you were opposed to such things on this list.

I respectfully disagree.  I'm saying that *I* don't want that, which I
think is different.  To interpret my opposition against saying nobody
wants that to mean you can never oppose anything someone else thinks
is a good idea would preclude meaningful dialogue on most of what we
talk about here.  And clearly there is at least some demand for this
feature, because you and Craig Ringer both want it.  So let me try to
restate my objection to this specific feature more clearly.

I think that we should be careful about warning the user about things
that might not actually be mistakes.  I'm not aware that we currently
issue ANY warnings of that type.  When we emit error messages, we
sometimes suggest one possible cause of the error, and such messages
are clearly labelled as HINT.  But we don't, for example, emit an
error or a WARNING or ERROR about a DELETE or UPDATE statement that
lacks a WHERE clause, even though many people might like to have such
a feature.  We don't warn a user hey, float8 is imprecise, consider
using numeric or hey, numeric is slow, consider using float8 or
setting autovacuum_naptime to an hour is probably dummer than pouring
sugar in your gas tank, even though all of those things are true and
some people might like to be warned.  We only warn or error out when
something happens that we are 100% sure is bad.  And, in this
particular case, it has been suggested that there are legitimate
reasons why a replication topology might temporarily involve loops, so
I believe this fails that criterion.

Second, we have often discussed the importance of avoiding log spam.
Warnings that are likely to be repeated a large number of times when
they occur have repeatedly been voted down on those grounds.  I
believe that objection also applies to this case.  It is more
appropriate to make information about the status of the system
available via some status-inquiry function; for example, if you were
to recast this as adding a slave-side function that attempts to return
the IP of the current master, or NULL if no master, that would answer
this objection (but not necessarily all of the other ones).

Third, we usually apply a criterion that warnings or errors must
represent conditions that we can reliably detect; in other words, we
typically do not add checks for situations that we will only sometimes
be able to identify.  And, in this case, it's a little unclear how we
would actually identify loops.  Presumably, we'd do it by sending a
chain of unique per-node identifiers along with the WAL, and looking
for your own identifier in the path, but we don't have any sort of
unique per-node identifier right now, and how would you create one?
If someone shuts down the cluster, duplicates it, and starts up both
copies, we want that to work.  Any identifier embedded in the cluster
by such a process would be duplicated.  You could use something like
the node IP and port number, which wouldn't have that pitfall, but as
we all know, IPs can be duplicated (e.g. due to NAT) so this isn't
necessarily reliable either.  If you do come up with a suitable unique
per-node identifier, then this is fairly simple to make work for
streaming replication, but it's tricky to see how to make it work with
archiving.

Is that more clear?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Andres Freund
On 2013-02-02 11:25:01 -0500, Robert Haas wrote:
 On Sat, Feb 2, 2013 at 8:41 AM, Andres Freund and...@2ndquadrant.com wrote:
  - It's probably important to have a formula where we can be sure that
  the wrap-around term will eventually dominate the dead-tuple term,
  with enough time to spare to make sure nothing really bad happens; on
  the other hand, it's also desirable to avoid the case where a table
  that has just crossed the threshold for wraparound vacuuming doesn't
  immediately shoot to the top of the list even if it isn't truly
  urgent.  It's unclear to me just from looking at this formula how well
  the second term meets those goals.
 
  I just wanted to mention that if everything goes well, we won't *ever*
  get to an anti-wraparound-vacuum. Normally the table should cross the
  vacuum_table_age barrier earlier and promote a normal vacuum to a
  full-table vacuum which will set relfrozenxid to a new and lower value
  and thus prevent anti-wraparound vacuums from occurring.
  So priorizing anti-wraparound vacuums immediately and heavily doesn't
  seem to be too bad.
 
 IMHO, this is hopelessly optimistic.  Yes, it's intended to work that
 way.  But INSERT-only or INSERT-mostly tables are far from an uncommon
 use case; and in fact they're probably the most common cause of pain
 in this area.  You insert a gajillion tuples, and vacuum never kicks
 off, and then eventually you either update some tuples or hit
 autovacuum_freeze_max_age and suddenly, BAM, you get this gigantic
 vacuum that rewrites the entire table.  And then you open a support
 ticket with your preferred PostgreSQL support provider and say
 something like WTF?.

You're right, this doesn't work superbly well, especially for
insert-only tables... But imo the place to fix it is not the
priorization logic but relation_needs_vacanalyze, since fixing it in
priorization won't prevent the BAM just the timing of it.

I think scheduling a table for a partial vacuum every min_freeze * 2
xids, even if its insert only, would go a long way of reducing the
impact of full-table vacuums. Obviously that would require to retain the
last xid a vacuum was executed in...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY FREEZE has no warning

2013-02-02 Thread Noah Misch
On Sat, Feb 02, 2013 at 10:12:54AM -0500, Bruce Momjian wrote:
 On Sat, Feb  2, 2013 at 09:51:13AM -0500, Noah Misch wrote:
  Let's touch on the exception in passing by using the phrase last 
  truncated,
  giving this wording for both the second and the third COPY FREEZE error 
  sites:
  
  cannot perform FREEZE because the table was not created or last
  truncated in the current subtransaction
 
 Well, so you are saying that there really isn't any use-visible logic
 for those messages to be different, i.e. that the transaction id can be
 set to invalid even if we created/truncated in the same transaction, but
 not the same subtransaction?  

Right.  The latest committed code makes sense to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Robert Haas
On Sat, Feb 2, 2013 at 1:49 PM, Andres Freund and...@2ndquadrant.com wrote:
 You're right, this doesn't work superbly well, especially for
 insert-only tables... But imo the place to fix it is not the
 priorization logic but relation_needs_vacanalyze, since fixing it in
 priorization won't prevent the BAM just the timing of it.

Agreed.

 I think scheduling a table for a partial vacuum every min_freeze * 2
 xids, even if its insert only, would go a long way of reducing the
 impact of full-table vacuums. Obviously that would require to retain the
 last xid a vacuum was executed in...

I'm not sure that min_freeze * 2 is the right value, but otherwise agreed.

I keep coming back to the idea that vacuum should have a high-priority
queue and a low-priority queue.  When stuff meets the current
thresholds, it goes into the high-priority queue.  But then there
should be a low-priority queue where we do partial vacuums of things
that meet some lower threshold - like the unfrozen portions of
insert-only tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GetOldestXmin going backwards is dangerous after all

2013-02-02 Thread Simon Riggs
On 2 February 2013 18:38, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-02-02 18:32:44 +, Simon Riggs wrote:
 On 2 February 2013 14:24, Andres Freund and...@2ndquadrant.com wrote:

  b) We don't assign the xmin early enough, we only set it when the first
  feedback message arrives, but we should set it when walsender starts
  streaming.

 That's easy to fix.

 Not trivial, but not too hard, yes. When the standby initially connects
 we don't yet know which xid will be required because consistency hasn't
 yet been achieved.

We don't need to change the protocol, we just need to issue a
hot_standby_feedback message immediately after connection.

That looks trivial to me.

  c) After a disconnect the feedback message will rather likely ask for an
  xmin horizon thats not valid anymore on the primary. If the disconnect
  was short enough often enough that doesn't matter because nothing has
  been cleared out, but it doesn't really work all that well.
  Thats still better than setting it to the currently valid minimal xmin
  horizon because it prevents cleanup from that moment on.
  I don't see how this can be significantly improved without persistent
  knowledge about standbys.

 We could delay startup of the standby until the xmin on the standby
 reaches the xmin on the master.

 So when the standby has hot_standby_feedback = on,  at standby
 connection we set the xmin of the walsender to be the current value on
 the master, then we disallow connections on standby until we have
 replayed up to that xmin on the standby. That way the xmin of the
 walsender never goes backwards nor do we get cancelations on the
 standby.

 Thats easy enough when the standby is initially started but doesn't work
 that well if just the connection between both failed (or the master was
 restarted) and a reconnect worked. To make it work similarly in that
 case we would have to throw everyone out which would kind of counteract
 the whole idea.

Agreed. It was worth considering, at least.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - assign result of query to psql variable

2013-02-02 Thread Pavel Stehule
Hello

2013/2/2 Tom Lane t...@sss.pgh.pa.us:
 Shigeru Hanada shigeru.han...@gmail.com writes:
 On Sat, Feb 2, 2013 at 7:30 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 possible variants

 a) don't store NULL  values - and remove existing variable when NULL
 be assigned - it is probably best, but should be confusing for users
 b) implement flag IS NULL - for variables
 c) use nullPrint
 d) use empty String

 +1 for a).  If users want to determine whether the result was NULL, or
 want to use substitute string for NULL result, they can use coalesce
 in SELECT clause.  Anyway the feature should be documented clearly.

 Yeah, I was considering that one too.  Let's do it that way.

updated version

Regards

Pavel


 regards, tom lane


gset_20130202.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python result object str handler

2013-02-02 Thread Steve Singer

On 13-01-07 09:58 PM, Peter Eisentraut wrote:

By implementing a str handler for the result object, it now prints
something like

PLyResult status=5 nrows=2 rows=[{'foo': 1, 'bar': '11'}, {'foo': 2, 'bar': 
'22'}]

Patch attached for review.



Here is a review:

This patch adds a function that pl/python functions can call to convert 
a query result hash into a string suitable for debug purposes. The use 
case for this feature is primarily for debugging and logging purposes.   
I feel that this is useful since a lot of debugging of stored functions 
is usually done with print/elog style debugging.


There already some discussion on the thread as if the number of rows 
printed should be limited, the consensus seemed to be 'no' since someone 
would be unhappy with any limit and printing everything is the same 
behaviour you get with the standard python print.


I've tested this with python2.6 and 3.1 and it seems to work as described.

I've looked through the code and everything looks fine.

The patch includes no documentation.   Adding a few lines to the 
Utility Functions section of the plpython documentation so people know 
about this feature would be good.


Other than that I think it is fine to commit.  I am setting this as 
ready for committer,  I assume you'll commit this yourself and that you 
can add a paragraph to the docs as you commit it.



Steve










--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting visibility map in VACUUM's second phase

2013-02-02 Thread Jeff Janes
On Sat, Jan 26, 2013 at 11:25 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 On Thu, Jan 24, 2013 at 9:31 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, Jan 24, 2013 at 1:28 AM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:


 Good idea. Even though the cost of pinning/unpinning may not be high
 with respect to the vacuum cost itself, but it seems to be a good idea
 because we already do that at other places. Do you have any other
 review comments on the patch or I'll fix this and send an updated
 patch soon.

 That was the only thing that stood out to me.

 The attached patch gets that improvement. Also rebased on the latest head.

Hi Pavan,

I get this warning:
vacuumlazy.c:890: warning: passing argument 6 of 'lazy_vacuum_page'
makes pointer from integer without a cast

and make check then fails.

I've added '' to that line, and it now passes make check with --enable-cassert.



At line 1096, when you release the vmbuffer, you don't set it to
InvalidBuffer like the other places in the code do.  It seems like
this does would lead to a crash or assertion failure, but it does not
seem to do so.

other places:
if (BufferIsValid(vmbuffer))
  {
  ReleaseBuffer(vmbuffer);
  vmbuffer = InvalidBuffer;
  }

Also, the Note: If you change anything below, also look at should
probably say Note: If you change anything in the for loop below, also
look at.  Otherwise I'd be wondering how far below the caveat
applies.


I've attached a patch with these changes made.  Does this look OK?

Thanks,

Jeff


vacuum-secondphase-setvm-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Andres Freund
On 2013-02-02 14:54:10 -0500, Robert Haas wrote:
 On Sat, Feb 2, 2013 at 1:49 PM, Andres Freund and...@2ndquadrant.com wrote:
  I think scheduling a table for a partial vacuum every min_freeze * 2
  xids, even if its insert only, would go a long way of reducing the
  impact of full-table vacuums. Obviously that would require to retain the
  last xid a vacuum was executed in...

 I'm not sure that min_freeze * 2 is the right value, but otherwise agreed.

Yes, min_freeze * 2 was purely a very first quick guess. My reasoning
was basically that values smaller than that won't really do all that
much work since barely any of the tuples are old enough and values much
bigger won't help much in reducing huge amounts of writes being
submitted at the same time and they might miss the window before the
next anti-freeze vacuum.
I guess something like 5 or so would theoretically be better because it
would reduce the number of tuples that are too young for freezing which
will be marked all-visible nonetheless and thus not scanned again. But
it would not work with the current relevant default settings:

vacuum_freeze_table_age = 150,000,000
vacuum_freeze_min_age = 50,000,000
autovacuum_freeze_max_age = 200,000,000

I guess we will have to think about the default for those values. Adhoc
I am thinking something like:

vacuum_freeze_table_age = 300,000,000
vacuum_freeze_min_age = 20,000,000
autovacuum_freeze_max_age = 800,000,000

and scheduling a vacuum independent from n_dead_tuples every
(freeze_table_age - freeze_max_age ) / 5 or so xids. That would mean
that approx 4/5 (or more on a busy system) of the tuples would get
frozen before a full-table vacuum.
I don't the disk size argument for freeze_max_age = 2 is
particularly relevant anymore, especially as we would normally keep the
size at vacuum_freeze_table_age.

To finally fix the issue ISTM that we need an 'age map' to know which
parts to scan again and which parts never need to be scanned again, but
thats a separate, not too small, feature.

 I keep coming back to the idea that vacuum should have a high-priority
 queue and a low-priority queue.  When stuff meets the current
 thresholds, it goes into the high-priority queue.  But then there
 should be a low-priority queue where we do partial vacuums of things
 that meet some lower threshold - like the unfrozen portions of
 insert-only tables.

I don't think thats the most crucial part atm. Such a queue doesn't
solve the problem that we don't want to do unneccesary, repetitive work.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-02-02 Thread Andres Freund
On 2013-01-28 16:55:52 -0500, Steve Singer wrote:
 If your using non-surragate /natural primary keys this tends to come up
 occasionally due to data-entry errors or renames.  I'm  looking at this from
 the point of view of what do I need to use this as a source for a production
 replication system with fewer sharp-edges compared to trigger source slony.
 My standard is a bit higher than 'first' version because I intent to use it
 in the version 3.0 of slony not 1.0.  If others feel I'm asking for too much
 they should speak up, maybe I am. Also the way things will fail if someone
 were to try and update a primary key value is pretty nasty (it will leave
 them with inconsistent databases).We could  install UPDATE triggers to
 try and detect this type of thing but I'd rather see us just log the old
 values so we can use them during replay.

I pushed support for this. I am not yet 100% happy with this due to two
issues:

* it increases the xlog size logged by heap_update by 2 bytes even with
  wal_level  logical as it uses a variant of xl_heap_header that
  includes its lenght. Conditionally using xl_heap_header would make the
  code even harder to read. Is that acceptable?
* multi_insert should be converted to use xl_heap_header_len as well,
  instead of using xl_multi_insert_tuple, that would also reduce the
  amount of multi-insert specific code in decode.c
* both for update and delete we should denote more explicitly that
  -oldtuple points to an index tuple, not to an full tuple

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - assign result of query to psql variable

2013-02-02 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 2013/2/2 Tom Lane t...@sss.pgh.pa.us:
 Shigeru Hanada shigeru.han...@gmail.com writes:
 +1 for a).  If users want to determine whether the result was NULL, or
 want to use substitute string for NULL result, they can use coalesce
 in SELECT clause.  Anyway the feature should be documented clearly.

 Yeah, I was considering that one too.  Let's do it that way.

 updated version

Applied with corrections.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-02 Thread Jeff Janes
On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 20:33, Magnus Hagander wrote:

 Yeah, +1 for a separate directory not in global.

 OK, I moved the files from global/stat to stat.

This has a warning:

pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
no prototype before its definition

I plan to do some performance testing, but that will take a while so I
wanted to post this before I get distracted.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

2013-02-02 Thread Tomáš Vondra
Ok, thanks for the info. I'll look into that and I'll also post info from some 
of our production systems (we've deployed a 9.1-backpatched version about 2 
weeks ago).

T.

 Původní zpráva 
Od: Jeff Janes jeff.ja...@gmail.com 
Datum:  
Komu: Tomas Vondra t...@fuzzy.cz 
Kopie: pgsql-hackers@postgresql.org 
Předmět: Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum
  stress-testing our system 
 
On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra t...@fuzzy.cz wrote:
 On 3.1.2013 20:33, Magnus Hagander wrote:

 Yeah, +1 for a separate directory not in global.

 OK, I moved the files from global/stat to stat.

This has a warning:

pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
no prototype before its definition

I plan to do some performance testing, but that will take a while so I
wanted to post this before I get distracted.

Cheers,

Jeff


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Andres Freund
On 2013-02-01 15:09:34 -0800, Jeff Janes wrote:
 As an aside, it does seem like log_autovacuum_min_duration=0 should
 log whether a scan_all was done, and if so what relfrozenxid got set
 to.  But looking at where the log message is generated, I don't know
 where to retrieve that info.

What about the following, very rough and quick, patch:

LOG:  automatic vacuum of table postgres.public.data_1: index scans: 1
  pages: 2703 removed, 2702 remain, 5405 (100.00%) scanned
  tuples: 49 removed, 51 remain
  full-scan: 1, freeze-limit: 28824, new-frozen-xid: 28824
  buffer usage: 29957 hits, 2 misses, 4 dirtied
  avg read rate: 0.020 MB/s, avg write rate: 0.040 MB/s
  system usage: CPU 0.01s/0.67u sec elapsed 0.77 sec
...
LOG:  automatic vacuum of table postgres.public.data_1: index scans: 1
  pages: 2703 removed, 5404 remain, 5411 (66.74%) scanned
  tuples: 49 removed, 171 remain
  full-scan: 0, freeze-limit: 28828, new-frozen-xid: -
  buffer usage: 34085 hits, 3 misses, 4 dirtied
  avg read rate: 0.027 MB/s, avg write rate: 0.036 MB/s
  system usage: CPU 0.01s/0.73u sec elapsed 0.86 sec

It obviously needs more polish:

- I opted for using the 64bit representation of xids, seems to be better
  in a log which very well might be looked at only after some
  wraparounds
- exporting 'txid' from adt/txid.c is pretty ugly. I don't like the
  invention of the type in general, but making it visible outside of
  txid.c is even uglier, but using both, plain uint64 and txid inside
  txid.c isn't nice either.
- txid_from_xid should be renamed, don't have a good idea to what right
  now.
- is there agreement on the additionally logged information?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Andres Freund
On 2013-02-03 02:40:04 +0100, Andres Freund wrote:
 On 2013-02-01 15:09:34 -0800, Jeff Janes wrote:
  As an aside, it does seem like log_autovacuum_min_duration=0 should
  log whether a scan_all was done, and if so what relfrozenxid got set
  to.  But looking at where the log message is generated, I don't know
  where to retrieve that info.
 
 What about the following, very rough and quick, patch:

-EINTR

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 5ec65ea..881d8d6 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -304,6 +304,11 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 			TimestampDifferenceExceeds(starttime, endtime,
 	   Log_autovacuum_min_duration))
 		{
+			double		scanned_percent;
+			txid		freeze_txid;
+			txid		relfrozen_txid;
+			char		freeze_txid_buf[80];
+			char		relfrozen_txid_buf[80];
 			TimestampDifference(starttime, endtime, secs, usecs);
 
 			read_rate = 0;
@@ -315,10 +320,29 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 write_rate = (double) BLCKSZ *VacuumPageDirty / (1024 * 1024) /
 			(secs + usecs / 100.0);
 			}
+
+			scanned_percent = ((double) vacrelstats-scanned_pages /
+			   (vacrelstats-rel_pages +
+vacrelstats-pages_removed)) * 100;
+
+			freeze_txid = txid_from_xid(FreezeLimit);
+			relfrozen_txid = txid_from_xid(new_frozen_xid);
+
+			/*
+			 * print to buffer, so we don't have to include UINT64_FORMAT in a
+			 * translatable string.
+			 */
+			snprintf(freeze_txid_buf, sizeof(freeze_txid_buf),
+	 UINT64_FORMAT, freeze_txid);
+
+			snprintf(relfrozen_txid_buf, sizeof(relfrozen_txid_buf),
+	 UINT64_FORMAT, relfrozen_txid);
+
 			ereport(LOG,
 	(errmsg(automatic vacuum of table \%s.%s.%s\: index scans: %d\n
-			pages: %d removed, %d remain\n
+			pages: %d removed, %d remain, %d (%.02f%%) scanned\n
 			tuples: %.0f removed, %.0f remain\n
+			full-scan: %d, freeze-limit: %s, new-frozen-xid: %s\n
 			buffer usage: %d hits, %d misses, %d dirtied\n
 	avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n
 			system usage: %s,
@@ -328,8 +352,13 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 			vacrelstats-num_index_scans,
 			vacrelstats-pages_removed,
 			vacrelstats-rel_pages,
+			vacrelstats-scanned_pages,
+			scanned_percent,
 			vacrelstats-tuples_deleted,
 			vacrelstats-new_rel_tuples,
+			scan_all,
+			freeze_txid_buf,
+			scan_all ? relfrozen_txid_buf : -,
 			VacuumPageHit,
 			VacuumPageMiss,
 			VacuumPageDirty,
diff --git a/src/backend/utils/adt/txid.c b/src/backend/utils/adt/txid.c
index de86756..78edec5 100644
--- a/src/backend/utils/adt/txid.c
+++ b/src/backend/utils/adt/txid.c
@@ -33,9 +33,6 @@
 /* txid will be signed int8 in database, so must limit to 63 bits */
 #define MAX_TXID   UINT64CONST(0x7FFF)
 
-/* Use unsigned variant internally */
-typedef uint64 txid;
-
 /* sprintf format code for uint64 */
 #define TXID_FMT UINT64_FORMAT
 
@@ -98,11 +95,17 @@ convert_xid(TransactionId xid, const TxidEpoch *state)
 	if (!TransactionIdIsNormal(xid))
 		return (txid) xid;
 
-	/* xid can be on either side when near wrap-around */
 	epoch = (uint64) state-epoch;
+	/* xid is too old to be real */
 	if (xid  state-last_xid 
+		TransactionIdPrecedes(xid, state-last_xid) 
+		epoch == 0)
+		xid = InvalidTransactionId;
+	/* xid in last epoch */
+	else if (xid  state-last_xid 
 		TransactionIdPrecedes(xid, state-last_xid))
 		epoch--;
+	/* xid after wraparound */
 	else if (xid  state-last_xid 
 			 TransactionIdFollows(xid, state-last_xid))
 		epoch++;
@@ -317,6 +320,16 @@ bad_format:
  * communicate with core xid machinery.  All the others work on data
  * returned by them.
  */
+txid
+txid_from_xid(TransactionId xid)
+{
+	txid		val;
+	TxidEpoch	state;
+
+	load_xid_epoch(state);
+	val = convert_xid(xid, state);
+	return val;
+}
 
 /*
  * txid_current() returns int8
@@ -327,9 +340,6 @@ bad_format:
 Datum
 txid_current(PG_FUNCTION_ARGS)
 {
-	txid		val;
-	TxidEpoch	state;
-
 	/*
 	 * Must prevent during recovery because if an xid is not assigned we try
 	 * to assign one, which would fail. Programs already rely on this function
@@ -338,11 +348,7 @@ txid_current(PG_FUNCTION_ARGS)
 	 */
 	PreventCommandDuringRecovery(txid_current());
 
-	load_xid_epoch(state);
-
-	val = convert_xid(GetTopTransactionId(), state);
-
-	PG_RETURN_INT64(val);
+	return txid_from_xid(GetTopTransactionId());
 }
 
 /*
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 23a41fd..5d99f4d 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -127,6 +127,13 @@ typedef struct VariableCacheData
 
 typedef VariableCacheData *VariableCache;
 
+/*
+ * 64bit representation of xids 

Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Christopher Browne
On Sat, Feb 2, 2013 at 2:54 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Feb 2, 2013 at 1:49 PM, Andres Freund and...@2ndquadrant.com wrote:
 You're right, this doesn't work superbly well, especially for
 insert-only tables... But imo the place to fix it is not the
 priorization logic but relation_needs_vacanalyze, since fixing it in
 priorization won't prevent the BAM just the timing of it.

 Agreed.

 I think scheduling a table for a partial vacuum every min_freeze * 2
 xids, even if its insert only, would go a long way of reducing the
 impact of full-table vacuums. Obviously that would require to retain the
 last xid a vacuum was executed in...

 I'm not sure that min_freeze * 2 is the right value, but otherwise agreed.

 I keep coming back to the idea that vacuum should have a high-priority
 queue and a low-priority queue.  When stuff meets the current
 thresholds, it goes into the high-priority queue.  But then there
 should be a low-priority queue where we do partial vacuums of things
 that meet some lower threshold - like the unfrozen portions of
 insert-only tables.

When I was thinking about your desire for unitless values, I found
myself uncomfortable about that, and I think I've mentioned that.

On further reflection, there's good reason.  The need to vacuum
tables with lots of dead tuples has very different characteristics
from the need to vacuum tables to avoid XID rollover.  Trying to
force them onto the same units seems unlikely to turn out
happily.

On the other hand, I always thought that there was use for having
multiple autovacuum queues, and giving queues different
shaped policies, one for each purpose, seems like a mighty
fine idea.  That way we don't need to worry about mixing the
policies.  There can be two best policies.

I'd go further, and have 3 queues:

a) A queue devoted to vacuuming small tables.  Anything
with more than [some number of relpages] need not apply.

b) A queue devoted to vacuuming tables with a lot of dead
tuples.

c) A queue devoted to vacuuming tables before their XID
rollover.

The appropriate strength functions for b) and c) can be
pretty simple, possibly the relevant bits of the functions that
Nasby and I have suggested.  And any time b) and c) find
small tables, throw them to queue a), essentially doing
the quick  easy vacuums.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum not prioritising for-wraparound tables

2013-02-02 Thread Gavin Flower

On 03/02/13 15:08, Christopher Browne wrote:

On Sat, Feb 2, 2013 at 2:54 PM, Robert Haas robertmh...@gmail.com wrote:

On Sat, Feb 2, 2013 at 1:49 PM, Andres Freund and...@2ndquadrant.com wrote:

You're right, this doesn't work superbly well, especially for
insert-only tables... But imo the place to fix it is not the
priorization logic but relation_needs_vacanalyze, since fixing it in
priorization won't prevent the BAM just the timing of it.

Agreed.


I think scheduling a table for a partial vacuum every min_freeze * 2
xids, even if its insert only, would go a long way of reducing the
impact of full-table vacuums. Obviously that would require to retain the
last xid a vacuum was executed in...

I'm not sure that min_freeze * 2 is the right value, but otherwise agreed.

I keep coming back to the idea that vacuum should have a high-priority
queue and a low-priority queue.  When stuff meets the current
thresholds, it goes into the high-priority queue.  But then there
should be a low-priority queue where we do partial vacuums of things
that meet some lower threshold - like the unfrozen portions of
insert-only tables.

When I was thinking about your desire for unitless values, I found
myself uncomfortable about that, and I think I've mentioned that.

On further reflection, there's good reason.  The need to vacuum
tables with lots of dead tuples has very different characteristics
from the need to vacuum tables to avoid XID rollover.  Trying to
force them onto the same units seems unlikely to turn out
happily.

On the other hand, I always thought that there was use for having
multiple autovacuum queues, and giving queues different
shaped policies, one for each purpose, seems like a mighty
fine idea.  That way we don't need to worry about mixing the
policies.  There can be two best policies.

I'd go further, and have 3 queues:

a) A queue devoted to vacuuming small tables.  Anything
with more than [some number of relpages] need not apply.

b) A queue devoted to vacuuming tables with a lot of dead
tuples.

c) A queue devoted to vacuuming tables before their XID
rollover.

The appropriate strength functions for b) and c) can be
pretty simple, possibly the relevant bits of the functions that
Nasby and I have suggested.  And any time b) and c) find
small tables, throw them to queue a), essentially doing
the quick  easy vacuums.

Hmm...

Could there be some measure of bloatedness?

A table with 10 live rows and a 100 dead tuples should surely have a 
higher priority of being vacuumed than a table with a 1000 rows and 100 
dead tuples?  Especially for tables with hundreds of millions of rows!




Cheers,
Gavin


Re: [HACKERS] [PATCH] pg_isready (was: [WIP] pg_ping utility)

2013-02-02 Thread Phil Sorber
On Tue, Jan 29, 2013 at 11:43 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jan 29, 2013 at 3:12 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Phil Sorber escribió:
 On Mon, Jan 28, 2013 at 11:20 AM, Fujii Masao masao.fu...@gmail.com wrote:
  Maybe. But I'm not inclined to add new libpq interface at this stage.
  Because we are in the last CommitFest and I'm not sure whether
  we have enough time to implement that. Instead, how about using
  both PQconninfoParse() and PQconndefaults() like uri-regress.c does?
  Or just remove that output message? At least I don't think that the
  information about host and port needs to be output. Which would make
  the code very simpler.

 I think that output is important as do others up thread. I think it'd
 be simpler to just disable dbname's ability to double as conninfo. If
 we are looking for easy, that is.

 I don't mind duplicating the behavior from conninfo_array_parse() or
 uri-regress.c either. I can just put a comment that at some point in
 the future we should add a libpq interface for it.

 I suggest duplicate the code for 9.3, and submit a patch to refactor
 into a new libpq function for CF2013-next.  If the patch is simple
 enough, we can consider putting it into 9.3.

 Agreed.

 Regards,

 --
 Fujii Masao

OK, here is the patch that handles the connection string in dbname.
I'll post the other patch under a different posting because I am sure
it will get plenty of debate on it's own.


pg_isready_con_str.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Add PQconninfoParseParams and PQconninfodefaultsMerge to libpq

2013-02-02 Thread Phil Sorber
This patch came up from discussion about pg_isready.

PQconninfoParseParams is similar to PQconninfoParse but takes two
arrays like PQconnectdbParams. It essentially exposes
conninfo_array_parse().

PQconninfodefaultsMerge essentially exposes conninfo_add_defaults().
It allows you to pass a PQconninfoOption struct and it adds defaults
for all NULL values.

There are no docs yet. I assumed I would let bikeshedding ensue, and
also debate on whether we even want these first.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python result object str handler

2013-02-02 Thread Peter Eisentraut
On Sat, 2013-02-02 at 15:43 -0500, Steve Singer wrote:
 I've looked through the code and everything looks fine.
 
 The patch includes no documentation.   Adding a few lines to the 
 Utility Functions section of the plpython documentation so people know 
 about this feature would be good.

Added some documentation and committed.  Thanks.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting visibility map in VACUUM's second phase

2013-02-02 Thread Pavan Deolasee
On Sun, Feb 3, 2013 at 2:31 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 Hi Pavan,

 I get this warning:
 vacuumlazy.c:890: warning: passing argument 6 of 'lazy_vacuum_page'
 makes pointer from integer without a cast

 and make check then fails.

 I've added '' to that line, and it now passes make check with 
 --enable-cassert.


Stupid me. Obviously I did not run make check before submitting the
patch, but I'm surprised my short pgbench test did not catch this.
Thanks a lot for finding and fixing this.


 At line 1096, when you release the vmbuffer, you don't set it to
 InvalidBuffer like the other places in the code do.  It seems like
 this does would lead to a crash or assertion failure, but it does not
 seem to do so.


That's harmless because vmbuffer is just a local variable in that
function and we are at the end of the function and that variable is
not used again. But it helps to just be consistent. So I'm OK with
your change.


 Also, the Note: If you change anything below, also look at should
 probably say Note: If you change anything in the for loop below, also
 look at.  Otherwise I'd be wondering how far below the caveat
 applies.

Ok.


 I've attached a patch with these changes made.  Does this look OK?


Looks good to me. I also repeated pgbench and make check and they work
as expected. I'll add it to the CF and also mark the patch ready for
committer

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add PQconninfoParseParams and PQconninfodefaultsMerge to libpq

2013-02-02 Thread Magnus Hagander
On Feb 3, 2013 4:16 AM, Phil Sorber p...@omniti.com wrote:

 This patch came up from discussion about pg_isready.

 PQconninfoParseParams is similar to PQconninfoParse but takes two
 arrays like PQconnectdbParams. It essentially exposes
 conninfo_array_parse().

 PQconninfodefaultsMerge essentially exposes conninfo_add_defaults().
 It allows you to pass a PQconninfoOption struct and it adds defaults
 for all NULL values.

 There are no docs yet. I assumed I would let bikeshedding ensue, and
 also debate on whether we even want these first.

I think you forgot to attach the patch.

/Magnus


Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-02-02 Thread Amit kapila

On Saturday, February 02, 2013 9:08 PM Robert Haas wrote:
On Fri, Feb 1, 2013 at 12:04 AM, Amit Kapila amit.kap...@huawei.com wrote:
 I think user should be aware of effect before using SET commands, as these 
 are used at various levels (TRANSACTION, SESSION, ...).

 Ideally, sure.  But these kinds of mistakes are easy to make.  

  You mean to say that after using SET Transaction, user can think below 
statements will
  use modified transaction property. But I think if user doesn't understand by 
default
  transaction will be committed after the statement execution, he might expect 
after 
  few statements he can rollback.

 That's why LOCK and DECLARE CURSOR already emit errors in this case - why
 should this one be any different?

IMO, I think error should be given when it is not possible to execute current 
statement.

Not only LOCK,DECLARE CURSOR but SAVEPOINT and some other statements also give 
the same error,
so if we want to throw error for such behavior, we can find all such similar 
statements 
(SET TRANSACTION, SET LOCAL, etc) and do it for all.

This can be helpful to some users, but not sure if such behavior (statement can 
be executed but it will not have any sense) 
can be always detectable and maintaible.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add PQconninfoParseParams and PQconninfodefaultsMerge to libpq

2013-02-02 Thread Phil Sorber
On Sun, Feb 3, 2013 at 1:37 AM, Magnus Hagander mag...@hagander.net wrote:

 On Feb 3, 2013 4:16 AM, Phil Sorber p...@omniti.com wrote:

 This patch came up from discussion about pg_isready.

 PQconninfoParseParams is similar to PQconninfoParse but takes two
 arrays like PQconnectdbParams. It essentially exposes
 conninfo_array_parse().

 PQconninfodefaultsMerge essentially exposes conninfo_add_defaults().
 It allows you to pass a PQconninfoOption struct and it adds defaults
 for all NULL values.

 There are no docs yet. I assumed I would let bikeshedding ensue, and
 also debate on whether we even want these first.

 I think you forgot to attach the patch.

 /Magnus

/me hangs head in shame :-(

Here is is...


libpq_params_parse_merge.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers