Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.

2014-11-15 Thread Michael Paquier
On Sat, Nov 15, 2014 at 3:42 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-11-15 03:25:16 +0900, Fujii Masao wrote:
 On Fri, Nov 14, 2014 at 7:22 PM,  furu...@pm.nttdata.co.jp wrote:
  pg_ctl stop does't work propley, if --slot option is specified when WAL 
  is flushed only it has switched.
  These processes still continue even after the posmaster 
  failed:pg_receivexlog, walsender and logger.

 I could reproduce this problem. At normal shutdown, walsender keeps waiting
 for the last WAL record to be replicated and flushed in pg_receivexlog. But
 pg_receivexlog issues sync command only when WAL file is switched. Thus,
 since pg_receivexlog may never flush the last WAL record, walsender may have
 to keep waiting infinitely.

 Right.
It is surprising that nobody complained about that before,
pg_receivexlog has been released two years ago.

 pg_recvlogical handles this problem by calling fsync() when it receives the
 request of immediate reply from the server. That is, at shutdown, walsender
 sends the request, pg_receivexlog receives it, flushes the last WAL record,
 and sends the flush location back to the server. Since walsender can see that
 the last WAL record is successfully flushed in pg_receivexlog, it can
 exit cleanly.

 One idea to the problem is to introduce the same logic as pg_recvlogical has,
 to pg_receivexlog. Thought?

 Sounds sane to me. Are you looking into doing that?
Yep, sounds a good thing to do if master requested answer from the
client in the keepalive message. Something like the patch attached
would make the deal.
-- 
Michael
diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index c6c90fb..b93d52b 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -149,6 +149,34 @@ open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir,
 }
 
 /*
+ * Flush the current WAL file to disk and update the last WAL flush
+ * positions as well as the last fsync timestamp. On failure, print
+ * a message to stderr and return false, otherwise return true.
+ */
+static bool
+fsync_walfile(XLogRecPtr blockpos, int64 timestamp)
+{
+	/* nothing to do if no WAL file open */
+	if (walfile == -1)
+		return true;
+
+	/* nothing to do if data has already been flushed */
+	if (blockpos = lastFlushPosition)
+		return true;
+
+	if (fsync(walfile) != 0)
+	{
+		fprintf(stderr, _(%s: could not fsync file \%s\: %s\n),
+progname, current_walfile_name, strerror(errno));
+		return false;
+	}
+
+	lastFlushPosition = blockpos;
+	last_fsync = timestamp;
+	return true;
+}
+
+/*
  * Close the current WAL file (if open), and rename it to the correct
  * filename if it's complete. On failure, prints an error message to stderr
  * and returns false, otherwise returns true.
@@ -787,21 +815,12 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 		 * If fsync_interval has elapsed since last WAL flush and we've written
 		 * some WAL data, flush them to disk.
 		 */
-		if (lastFlushPosition  blockpos 
-			walfile != -1 
-			((fsync_interval  0 
-			  feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) ||
-			 fsync_interval  0))
+		if ((fsync_interval  0 
+			 feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) ||
+			 fsync_interval  0)
 		{
-			if (fsync(walfile) != 0)
-			{
-fprintf(stderr, _(%s: could not fsync file \%s\: %s\n),
-		progname, current_walfile_name, strerror(errno));
+			if (!fsync_walfile(blockpos, now))
 goto error;
-			}
-
-			lastFlushPosition = blockpos;
-			last_fsync = now;
 		}
 
 		/*
@@ -999,7 +1018,6 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
 {
 	int			pos;
 	bool		replyRequested;
-	int64		now;
 
 	/*
 	 * Parse the keepalive message, enclosed in the CopyData message.
@@ -1021,7 +1039,12 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
 	/* If the server requested an immediate reply, send one. */
 	if (replyRequested  still_sending)
 	{
-		now = feGetCurrentTimestamp();
+		int64		now = feGetCurrentTimestamp();
+
+		/* fsync data, so a fresh flush position is sent */
+		if (!fsync_walfile(blockpos, now))
+			return false;
+
 		if (!sendFeedback(conn, blockpos, now, false))
 			return false;
 		*last_status = now;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-15 Thread Simon Riggs
On 15 November 2014 04:32, Steve Singer st...@ssinger.info wrote:

 The use cases I'm talking about aren't really replication related. Often I
 have come across systems that want to do something such as 'select * from
 orders where X  the_last_row_I_saw order by X' and then do further
 processing on the order.

Yes, existing facilities provide mechanisms for different types of
application change queues.

If you want to write a processing queue in SQL, that isn't the best
way. You'll need some way to keep track of whether or not its been
successfully processed. That's either a column in the table, or a
column in a queue table maintained by triggers, with the row write
locked on read. You can then have multiple readers from this queue
using the new SKIP LOCKED feature, which was specifically designed to
facilitate that.

Logical decoding was intended for much more than just replication. It
provides commit order access to changed data in a form that is both
usable and efficient for high volume applicatiion needs.

I don't see any reason to add LSN into a SLRU updated at commit to
support those application needs.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-11-15 Thread Robert Haas
On Fri, Nov 14, 2014 at 1:18 PM, Andres Freund and...@2ndquadrant.com wrote:
 I think it's a good idea to structure independent features in a way that
 other solutions can reuse them. But I sure as hell can't force them to
 use it - especially as there's unfortunately not too much development
 going on in the existing logical replication solutions for postgres.

 Btw, I really think there's quite legitimate use cases for this besides
 logical replication solutions - e.g. schema tracking is quite a sensible
 use case.

Well, as I already said, despite my doubts about the general utility
of this feature, I'm willing to see us take it IF we have a testing
framework that will reliably catch bugs, including bugs of omission.
Without that, I'm very confident it's going to be a maintenance
nightmare, and I believe you admitted yourself that that concern was
reasonable.

 Can you please hint at some other workable design? I don't think there
 really is anything else.

I think this really depends on what you mean by anything else.  Any
DDL replication solution is necessarily going to involve the following
steps:

1. Define some set of primitives such that any imaginable DDL
operation can be expressed as a series of those primitives.
2. Find a way to capture those events as they happen.
3. Serialize them into some wire format and transport that format to
the replica.
4. Apply them, possibly coordinating in some way with the master so
that the user's original request fails if the apply fails.

There are meaningful choices at every step.  You're proposing that the
primitives should be anything that can be expressed as a complete SQL
command against a single object (I think - what are you going to emit
for an ALL IN SCHEMA op - that thing itself, or a similar operation
against each object in the schema?); that the capture mechanism should
be an event trigger that inserts into a queue table; that the
serialization format should be a JSON language designed to allow
reassembly of the corresponding SQL statement; and that the apply
coordination mechanism should be 2PC.  But none of that is written in
stone.

As far as deciding what primitives to use, I think the principal
decision to be made here is as to the appropriate level of
granularity.  For example, CREATE TABLE foo (a int DEFAULT 1, b int,
CHECK (b  42)) could be emitted as a single event saying that a table
was created.  But it could also be emitted as create-table (foo),
add-column (foo, a, int), add-column (foo, b, int), add-column-default
(a, 1), add-check-constraint (foo, b  42).  The appeal of a more
fine-grained set of primitives is that there might be fewer of them,
and that each of them might be simpler; one of the things that makes
physical replication so reliable is that its primitives are very
simple and thus easy to verify.

The possible event-capture mechanisms seem to be to have either (a)
event trigger or (b) a hook function in some yet-to-be-defined place
or (c) core code which will either (i) write each event to a table,
(ii) write each event directly into WAL, or perhaps (iii) write it
someplace else (file? in-memory queue?  network socket?).

There are lots of possible serialization formats.

Coordinating with the master could involve 2PC, as you propose; or
trying to somehow validate that a given series of events is a valid
state transformation based on the starting state on the standby before
doing the operation on the master; or the use of a distributed
transaction coordinated by something like PG-XC's global transaction
manager; or you can skip it and hope for the best.

In addition to the decisions above, you can try to prevent failures by
restricting certain changes from happening, or you can let users
change what they like and hope for the best.  Different solutions can
have different mechanisms for controlling which objects are under
replication and which changes are not; or even allowing some
individual DDL statements to opt out of replication while forcing
others to participate.  Administratively, solutions can be built to
facilitate easy replication of an entire database to another node, or
more specific applications like sharding, where creating a table on a
master node creates child tables on a bunch of slave nodes, but
they're not all identical, because we're partitioning the data so that
only some of it will be on each node - thus the constraints and so on
will be different.

BDR has one set of answers to all of these questions, and despite my
worries about a few points here and there, they are not stupid
answers.  But they are not the ONLY answers either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failback to old master

2014-11-15 Thread Maeldron T.

On 12/11/14 14:28, Ants Aasma wrote:

On Tue, Nov 11, 2014 at 11:52 PM, Maeldron T. maeld...@gmail.com wrote:

As far as I remember (I can’t test it right now but I am 99% sure) promoting 
the slave makes it impossible to connect the old master to the new one without 
making a base_backup. The reason is the timeline change. It complains.

A safely shut down master (-m fast is safe) can be safely restarted as
a slave to the newly promoted master. Fast shutdown shuts down all
normal connections, does a shutdown checkpoint and then waits for this
checkpoint to be replicated to all active streaming clients. Promoting
slave to master creates a timeline switch, that prior to version 9.3
was only possible to replicate using the archive mechanism. As of
version 9.3 you don't need to configure archiving to follow timeline
switches, just add a recovery.conf to the old master to start it up as
a slave and it will fetch everything it needs from the new master.

I took your advice and I understood that removing the recovery.conf 
followed by a restart is wrong. I will not do that on my production servers.


However, I can't make it work with promotion. What did I wrong? It was 
9.4beta3.


mkdir 1
mkdir 2
initdb -D 1/
edit config: change port, wal_level to hot_standby, hot_standby to on, 
max_wal_senders=7, wal_keep_segments=100, uncomment replication in hba.conf

pg_ctl -D 1/ start
createdb -p 5433
psql -p 5433
pg_basebackup -p 5433 -R -D 2/
mcedit 2/postgresql.conf change port
chmod -R 700 1
chmod -R 700 2
pg_ctl -D 2/ start
psql -p 5433
psql -p 5434
everything works
pg_ctl -D 1/ stop
pg_ctl -D 2/ promote
psql -p 5434
cp 2/recovery.done 1/recovery.conf
mcedit 1/recovery.conf change port
pg_ctl -D 1/ start

LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1 at 0/3000AE0.
LOG:  restarted WAL streaming at 0/300 on timeline 1
LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1 at 0/3000AE0.

This is what I experienced in the past when I tried with promote. The 
old master disconnects from the new. What am I missing?





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2014-11-15 Thread Kevin Grittner
Tomas Vondra t...@fuzzy.cz wrote:
 Dne 13 Listopad 2014, 16:51, Katharina Büchse napsal(a):
 On 13.11.2014 14:11, Tomas Vondra wrote:

 The only place where I think this might work are the associative rules.
 It's simple to specify rules like (ZIP code implies city) and we could
 even do some simple check against the data to see if it actually makes
 sense (and 'disable' the rule if not).

 and even this simple example has its limits, at least in Germany ZIP
 codes are not unique for rural areas, where several villages have the
 same ZIP code.

 as you point out most real-world data either contain bugs
 or are inherently imperfect (we have the same kind of ZIP/city
 inconsistencies in Czech).

You can have lots of fun with U.S. zip code, too. Just on the
nominally Madison, Wisconsin zip codes (those starting with 537),
there are several exceptions:

select zipcode, city, locationtype
from zipcode
where zipcode like '537%'
and Decommisioned = 'false'
and zipcodetype = 'STANDARD'
and locationtype in ('PRIMARY', 'ACCEPTABLE')
order by zipcode, city;

zipcode | city | locationtype
-+---+--
53703 | MADISON | PRIMARY
53704 | MADISON | PRIMARY
53705 | MADISON | PRIMARY
53706 | MADISON | PRIMARY
53711 | FITCHBURG | ACCEPTABLE
53711 | MADISON | PRIMARY
53713 | FITCHBURG | ACCEPTABLE
53713 | MADISON | PRIMARY
53713 | MONONA | ACCEPTABLE
53714 | MADISON | PRIMARY
53714 | MONONA | ACCEPTABLE
53715 | MADISON | PRIMARY
53716 | MADISON | PRIMARY
53716 | MONONA | ACCEPTABLE
53717 | MADISON | PRIMARY
53718 | MADISON | PRIMARY
53719 | FITCHBURG | ACCEPTABLE
53719 | MADISON | PRIMARY
53725 | MADISON | PRIMARY
53726 | MADISON | PRIMARY
53744 | MADISON | PRIMARY
(21 rows)

If you eliminate the quals besides the zipcode column you get 61
rows and it gets much stranger, with legal municipalities that are
completely surrounded by Madison that the postal service would
rather you didn't use in addressing your envelopes, but they have
to deliver to anyway, and organizations inside Madison receiving
enough mail to (literally) have their own zip code -- where the
postal service allows the organization name as a deliverable
city.

If you want to have your own fun with this data, you can download
it here:

http://federalgovernmentzipcodes.us/free-zipcode-database.csv

I was able to load it into PostgreSQL with this:

create table zipcode
(
recordnumber integer not null,
zipcode text not null,
zipcodetype text not null,
city text not null,
state text not null,
locationtype text not null,
lat double precision,
long double precision,
xaxis double precision not null,
yaxis double precision not null,
zaxis double precision not null,
worldregion text not null,
country text not null,
locationtext text,
location text,
decommisioned text not null,
taxreturnsfiled bigint,
estimatedpopulation bigint,
totalwages bigint,
notes text
);
comment on column zipcode.zipcode is 'Zipcode or military postal code(FPO/APO)';
comment on column zipcode.zipcodetype is 'Standard, PO BOX Only, Unique, 
Military(implies APO or FPO)';
comment on column zipcode.city is 'offical city name(s)';
comment on column zipcode.state is 'offical state, territory, or quasi-state 
(AA, AE, AP) abbreviation code';
comment on column zipcode.locationtype is 'Primary, Acceptable,Not Acceptable';
comment on column zipcode.lat is 'Decimal Latitude, if available';
comment on column zipcode.long is 'Decimal Longitude, if available';
comment on column zipcode.location is 'Standard Display (eg Phoenix, AZ ; Pago 
Pago, AS ; Melbourne, AU )';
comment on column zipcode.decommisioned is 'If Primary location, Yes implies 
historical Zipcode, No Implies current Zipcode; If not Primary, Yes implies 
Historical Placename';
comment on column zipcode.taxreturnsfiled is 'Number of Individual Tax Returns 
Filed in 2008';
copy zipcode from 'filepath' with (format csv, header);
alter table zipcode add primary key (recordnumber);
create unique index zipcode_city on zipcode (zipcode, city);

I bet there are all sorts of correlation possibilities with, for
example, latitude and longitude and other variables.  With 81831
rows and so many correlations among the columns, it might be a
useful data set to test with.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2014-11-15 Thread Tomas Vondra
On 15.11.2014 18:49, Kevin Grittner
 If you eliminate the quals besides the zipcode column you get 61
 rows and it gets much stranger, with legal municipalities that are
 completely surrounded by Madison that the postal service would
 rather you didn't use in addressing your envelopes, but they have
 to deliver to anyway, and organizations inside Madison receiving
 enough mail to (literally) have their own zip code -- where the
 postal service allows the organization name as a deliverable
 city.
 
 If you want to have your own fun with this data, you can download
 it here:
 
 http://federalgovernmentzipcodes.us/free-zipcode-database.csv

...
 
 I bet there are all sorts of correlation possibilities with, for
 example, latitude and longitude and other variables.  With 81831
 rows and so many correlations among the columns, it might be a
 useful data set to test with.

Thanks for the link. I've been looking for a good dataset with such
data, and this one is by far the best one.

The current version of the patch supports only data types passed by
value (i.e. no varlena types - text, ), which means it's impossible to
build multivariate stats on some of the interesting columns (state,
city, ...).

I guess it's time to start working on removing this limitation.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-11-15 Thread Fabrízio de Royes Mello
On Sat, Nov 15, 2014 at 2:23 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Michael Paquier wrote:

  Btw, perhaps this diff should be pushed as a different patch as this is
a
  rather different thing:
  -   if (heapRelation-rd_rel-relpersistence ==
RELPERSISTENCE_UNLOGGED
  
  +   if (indexRelation-rd_rel-relpersistence ==
  RELPERSISTENCE_UNLOGGED 
  !smgrexists(indexRelation-rd_smgr, INIT_FORKNUM)
  As this is a correctness fix, it does not seem necessary to back-patch
it:
  the parent relation always has the same persistence as its indexes.

 There was an argument for doing it this way that only applies if this
 patch went in, but I can't remember now what it was.

 Anyway I pushed the patch after some slight additional changes.  Thanks!


Thanks!

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] Patch to support SEMI and ANTI join removal

2014-11-15 Thread Simon Riggs
On 15 October 2014 11:03, David Rowley dgrowle...@gmail.com wrote:

 The explain analyze from the above query looks like:
 test=# explain (analyze, costs off, timing off) select count(*) from t1
 inner join t2 on t1.t2_id=t2.id;
 QUERY PLAN
 --
  Aggregate (actual rows=1 loops=1)
-  Nested Loop (actual rows=100 loops=1)
  -  Seq Scan on t1 (actual rows=100 loops=1)
  -  Index Only Scan using t2_pkey on t2 (never executed)
Index Cond: (id = t1.t2_id)
Heap Fetches: 0
  Execution time: 124.990 ms
 (7 rows)

 As you can see the scan on t2 never occurred.

Very good, happy to see this happening (yay FKs!) and with
PostgreSQL-style rigour.

I've reviewed the patch from cold and I have a few comments.

The plan you end up with here works quite differently from left outer
join removal, where the join is simply absent. That inconsistency
causes most of the other problems I see.

I propose that we keep track of whether there are any potentially
skippable joins at the top of the plan. When we begin execution we do
a single if test to see if there is run-time work to do. If we pass
the run-time tests we then descend the tree and prune the plan to
completely remove unnecessary nodes. We end with an EXPLAIN and
EXPLAIN ANALYZE that looks like this

 QUERY PLAN
 --
  Aggregate (actual rows=1 loops=1)
  -  Seq Scan on t1 (actual rows=100 loops=1)

Doing that removes all the overheads and complexity; it also matches
how join removal currently works.

The alternative is accepting some pretty horrible additional code in
most join types, plus a small regression on nested loop joins which I
would have to call out as regrettably unacceptable. (Horrible in this
sense that we don't want that code, not that David's code is poor).

The tests on the patch are pretty poor. If we should use EXPLAINs to
show a join removal that works and a join removal that fails. With a
few of the main permutations.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-11-15 Thread Simon Riggs
On 16 October 2014 02:26, Jeff Davis pg...@j-davis.com wrote:

 The inheritance is awkward anyway, though. If you create a tracked
 context as a child of an already-tracked context, allocations in the
 newer one won't count against the original. I don't see a way around
 that without introducing even more performance problems.

This seems to have reached impasse, which is a shame. Let me throw out
a few questions to see if that might help.

Do I understand correctly that we are trying to account for exact
memory usage at palloc/pfree time? Why??

Surely we just want to keep track of the big chunks? Does this need to
be exact? How exact does it need to be?

Or alternatively, can't we just sample the allocations to reduce the overhead?

Are there some assumptions we can make about micro-contexts never
needing to be tracked at all? Jeff seems not too bothered by
inheritance, whereas Tomas thinks its essential. Perhaps there is a
middle ground, depending upon the role of the context?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-11-15 Thread Simon Riggs
On 16 October 2014 20:31, Michael Banck michael.ba...@credativ.de wrote:

 I'll attach it to the next commitfest and see whether anybody likes it.

Not much...

We may decide we wanted to always-log shutdown checkpoints. I'm
neutral about that, but I can see the logic. But if we did, we would
use exactly the same log message as if log_checkpoints = on. Having
two different message texts is just confusing.

I don't see the point of logging waiting for clients to disconnect
since we might not wait very long. We do already log the type of
shutdown received.

A few things seem worth pursuing...

* Logging a message that shows how many buffers need to be written for
a shutdown checkpoint. (We might even store some info that allows us
to estimate a time to shutdown, later). At present the starting:
checkpoint shutdown isn't much use. I can see we could split
CheckpointGuts() into two, so  we mark the buffers first, then report
how many there are to write, then go back to do the writing and fsync
in part two.

* Introducing a new shutdown mode that is genuinely smart. We send
all backends a signal that will disconnect them *only* if they are
idle and not in a transaction. I've never thought the current smart
mode deserved its name.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PgBench's \setrandom could be better

2014-11-15 Thread David Rowley
Hi,

I've just been looking into the TPC-H benchmark with intention to put
together some scripts which can be run easily to output some run times for
each of the 22 queries.

I've not had a great deal of exposure to pgbench yet, but I had thought
that it might be able to help me run these queries with the randomly chosen
inputs that are described in the TPC-H spec.

From the spec under 2.4.6.3:

1. DATE is the first of January of a randomly selected year within [1993 ..
1997];
2. DISCOUNT is randomly selected within [0.02 .. 0.09];
3. QUANTITY is randomly selected within [24 .. 25].

With pgbench I can do:

\setrandom qty 24 25
then throw a :qty into the query and have pgbench replace that with either
24 or 25.

The problem is that this does not work for anything apart from integer
types.

I also can't seem to do anything with the date type. Perhaps it would be
nice to be able to do something like:

\setrandom date '1993-01-01' '1994-01-01' '1995-01-01' '1996-01-01'
'1997-01-01'

But this would be ambiguous if I wanted to choose either from either 10 or
20 as:

\setrandom var 10 20

would choose a random number between 10 and 20, not 10 or 20.

I would imagine that code could be added to allow the random to choose a
floating point number, if one or more of the inputs had a decimal point in
them, but I can't see around how to make it work with lists without
changing the syntax.

Perhaps [between] could become a optional keyword in there so that we could
do:

\setrandom var between 0 10

And for lists it could work like:

\setrandom var list 'a' 'b' 'c'

There should be no conflicts with the original syntax, as as far as I'm
aware only an integer number can follow the variable name.

I was planning on putting something together to help me with this, but I
thought I'd post here first so that I might actually end up with something
useful at the end of it.

So my questions are:

1. Would anyone else find this useful?
2. Is my proposed syntax ok? Can you think of anything better or more
flexible?

Regards

David Rowley


Re: [HACKERS] controlling psql's use of the pager a bit more

2014-11-15 Thread Andrew Dunstan


On 11/13/2014 11:41 AM, Andrew Dunstan wrote:


On 11/13/2014 11:09 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
I often get annoyed because psql is a bit too aggressive when it 
decides

whether to put output through the pager, and the only way to avoid this
is to turn the pager off (in which case your next query might dump many
thousands of lines to the screen). I'd like a way to be able to specify
a minumum number of lines of output before psql would invoke the pager,
rather than just always using the terminal window size.

Are you saying you'd want to set the threshold to *more* than the window
height?  Why?



Because I might be quite happy with 100 or 200 lines I can just scroll 
in my terminal's scroll buffer, but want to use the pager for more 
than that. This is useful especially if I want to scroll back and see 
the results from a query or two ago.








This patch shows more or less what I had in mind.

However, there is more work to do. As Tom noted upthread, psql's 
calculation of the number of lines is pretty bad. For example, if I do:


   \pset pager_min_lines 100
   select * from generate_series(1,50);

the pager still gets invoked, which is unfortunate to say the least.

So I'm going to take a peek at that.

cheers

andrew

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 26089352..b16149a 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -786,7 +786,7 @@ exec_command(const char *cmd,
 opt[--len] = '\0';
 		}
 
-		helpSQL(opt, pset.popt.topt.pager);
+		helpSQL(opt, pset.popt.topt.pager, pset.popt.topt.pager_min_lines);
 		free(opt);
 	}
 
@@ -1053,7 +1053,8 @@ exec_command(const char *cmd,
 			static const char *const my_list[] = {
 border, columns, expanded, fieldsep, fieldsep_zero,
 footer, format, linestyle, null,
-numericlocale, pager, recordsep, recordsep_zero,
+numericlocale, pager, pager_min_lines, 
+recordsep, recordsep_zero,
 tableattr, title, tuples_only,
 unicode_border_linestyle,
 unicode_column_linestyle,
@@ -1252,7 +1253,8 @@ exec_command(const char *cmd,
 	lines++;
 }
 
-output = PageOutput(lineno, pset.popt.topt.pager);
+output = PageOutput(lineno, pset.popt.topt.pager,
+	pset.popt.topt.pager_min_lines);
 is_pager = true;
 			}
 			else
@@ -1504,13 +1506,13 @@ exec_command(const char *cmd,
 	OT_NORMAL, NULL, false);
 
 		if (!opt0 || strcmp(opt0, commands) == 0)
-			slashUsage(pset.popt.topt.pager);
+			slashUsage(pset.popt.topt.pager, pset.popt.topt.pager_min_lines);
 		else if (strcmp(opt0, options) == 0)
-			usage(pset.popt.topt.pager);
+			usage(pset.popt.topt.pager, pset.popt.topt.pager_min_lines);
 		else if (strcmp(opt0, variables) == 0)
-			helpVariables(pset.popt.topt.pager);
+			helpVariables(pset.popt.topt.pager, pset.popt.topt.pager_min_lines);
 		else
-			slashUsage(pset.popt.topt.pager);
+			slashUsage(pset.popt.topt.pager, pset.popt.topt.pager_min_lines);
 	}
 
 #if 0
@@ -2506,6 +2508,13 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
 			popt-topt.pager = 1;
 	}
 
+	/* set minimum lines for pager use */
+	else if (strcmp(param, pager_min_lines) == 0)
+	{
+		if (value)
+			popt-topt.pager_min_lines = atoi(value);		
+	}
+
 	/* disable (x rows) footer */
 	else if (strcmp(param, footer) == 0)
 	{
@@ -2627,6 +2636,13 @@ printPsetInfo(const char *param, struct printQueryOpt *popt)
 			printf(_(Pager usage is off.\n));
 	}
 
+	/* show minimum lines for pager use */
+	else if (strcmp(param, pager_min_lines) == 0)
+	{
+		printf(_(Pager won't be used for less than %d lines\n),
+			   popt-topt.pager_min_lines);
+	}
+
 	/* show record separator for unaligned text */
 	else if (strcmp(param, recordsep) == 0)
 	{
@@ -2779,6 +2795,8 @@ pset_value_string(const char *param, struct printQueryOpt *popt)
 		return pstrdup(pset_bool_string(popt-topt.numericLocale));
 	else if (strcmp(param, pager) == 0)
 		return psprintf(%d, popt-topt.pager);
+	else if (strcmp(param, pager_min_lines) == 0)
+		return psprintf(%d, popt-topt.pager_min_lines);
 	else if (strcmp(param, recordsep) == 0)
 		return pset_quoted_string(popt-topt.recordSep.separator
   ? popt-topt.recordSep.separator
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 66d80b5..91102f4 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1337,7 +1337,8 @@ ExecQueryUsingCursor(const char *query, double *elapsed_msec)
 			 * If query requires multiple result sets, hack to ensure that
 			 * only one pager instance is used for the whole mess
 			 */
-			pset.queryFout = PageOutput(10, my_popt.topt.pager);
+			pset.queryFout = PageOutput(10, my_popt.topt.pager,
+my_popt.topt.pager_min_lines);
 			did_pager = true;
 		}
 
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index ae5fe88..bb166b9 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -46,7 +46,7 @@
 #define ON(var) (var ? _(on) : _(off))
 
 

Re: [HACKERS] Patch to support SEMI and ANTI join removal

2014-11-15 Thread David Rowley
On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 15 October 2014 11:03, David Rowley dgrowle...@gmail.com wrote:

  The explain analyze from the above query looks like:
  test=# explain (analyze, costs off, timing off) select count(*) from t1
  inner join t2 on t1.t2_id=t2.id;
  QUERY PLAN
  --
   Aggregate (actual rows=1 loops=1)
 -  Nested Loop (actual rows=100 loops=1)
   -  Seq Scan on t1 (actual rows=100 loops=1)
   -  Index Only Scan using t2_pkey on t2 (never executed)
 Index Cond: (id = t1.t2_id)
 Heap Fetches: 0
   Execution time: 124.990 ms
  (7 rows)
 
  As you can see the scan on t2 never occurred.

 Very good, happy to see this happening (yay FKs!) and with
 PostgreSQL-style rigour.


:)


 I've reviewed the patch from cold and I have a few comments.


Thanks!


 The plan you end up with here works quite differently from left outer
 join removal, where the join is simply absent. That inconsistency
 causes most of the other problems I see.

 I propose that we keep track of whether there are any potentially
 skippable joins at the top of the plan. When we begin execution we do
 a single if test to see if there is run-time work to do. If we pass
 the run-time tests we then descend the tree and prune the plan to
 completely remove unnecessary nodes. We end with an EXPLAIN and
 EXPLAIN ANALYZE that looks like this

  QUERY PLAN
  --
   Aggregate (actual rows=1 loops=1)
   -  Seq Scan on t1 (actual rows=100 loops=1)

 Doing that removes all the overheads and complexity; it also matches
 how join removal currently works.


This sounds much cleaner than what I have at the moment, although, you say
EXPLAIN would look like that... I don't think that's quite true as the
EXPLAIN still would have the un-pruned version, as the pruning would be
done as executor start-up. Would it cause problems to have the EXPLAIN have
a different looking plan than EXPLAIN ANALYZE?

I'll need to look into how the plan is stored in the case of PREPARE
statements, as no doubt I can't go vandalising any plans that are stored in
the PREPARE hashtable. I'd need to make a copy first, unless that's already
done for me. But I guess I'd only have to do that if some flag on
PlannerInfo hasSkippableNodes was true, so likely the overhead of such a
copy would be regained by skipping some joins.


 The alternative is accepting some pretty horrible additional code in
 most join types, plus a small regression on nested loop joins which I
 would have to call out as regrettably unacceptable. (Horrible in this
 sense that we don't want that code, not that David's code is poor).


Yeah it is quite horrid. I did try and keep it as simple and as
non-invasive as possible, but for nest loop it seemed there was just no
better way.


 The tests on the patch are pretty poor. If we should use EXPLAINs to
 show a join removal that works and a join removal that fails. With a
 few of the main permutations.


Agreed. To be honest I abandoned the tests due to a problem with EXPLAIN
ANALYZE outputting the variable timing information at the bottom. There's
no way to disable this! So that makes testing much harder.

I added myself to the list of complainers over here -
http://www.postgresql.org/message-id/caaphdvovzbtzljbd9vfaznwo6jook1k6-7rfq8zym9h7ndc...@mail.gmail.com
but the proposed solution (diff tool which supports regex matching) is not
all that simple, and I've not gotten around to attempting to make one yet.

I've also attached a rebased patch, as the old one is no longer applying.

Regards

David Rowley


inner_join_removals_2014-11-16_3a40b4f.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve automatic analyze messages for inheritance trees

2014-11-15 Thread Simon Riggs
On 30 October 2014 03:30, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote:
 (2014/10/17 18:35), Etsuro Fujita wrote:

 (2014/10/16 17:17), Simon Riggs wrote:

 Would it be useful to keep track of how many tables just got analyzed?

 i.e. analyze of foo (including N inheritance children)


 I think that's a good idea.  So, I'll update the patch.


 Done.  Attached is an updated version of the patch.

 Thanks for the comment!

The patch was kinda ok, but we have deeper problems.

If we have a 3 level hierarchy like foo-(p1, p2-(p4), p3)
then we still report this pretty strangely
LOG:  automatic analyze of table postgres.public.p1 system usage:
CPU 0.00s/0.00u sec elapsed 0.05 sec
LOG:  automatic analyze of table postgres.public.foo system usage:
CPU 0.00s/0.00u sec elapsed 0.04 sec
LOG:  automatic analyze of table postgres.public.foo (including 3
inheritance children) system usage: CPU 0.00s/0.01u sec elapsed 0.12
sec
LOG:  automatic analyze of table postgres.public.p4 system usage:
CPU 0.00s/0.00u sec elapsed 0.00 sec

notice that p4 is not included as an inheritance child, even though it
most surely is. Why is p4 reported, when p1, p2 and p3 are not?

and I notice psql reports this incorrectly also

postgres=# \d+ foo

  Table public.foo
  Column  |  Type   | Modifiers | Storage | Stats target | Description

--+-+---+-+--+-

 ?column? | integer |   | plain   |  |

Child tables: p1,
  p2,
  p3

No mention of grandchildren...

Not your fault, but this patch doesn't sufficiently improve the
situation to commit it, yet.

Sorry, patch returned with feedback, for now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New Event Trigger: table_rewrite

2014-11-15 Thread Simon Riggs
On 7 November 2014 12:35, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 It would be more useful to work on the applications of this

 1. INSERT into a table
 * Action start time
 * Schema
 * Tablename
 * Number of blocks in table
 which would then allow you to do these things run an assessment report
 showing which tables would be rewritten.

 That should be done by the user, from within his Event Trigger code. For
 that to be possible, the previous patch was missing a way to expose the
 OID of the table being rewritten, I've now added support for that.

 2. Get access to number of blocks, so you could limit rewrites only to
 smaller tables by putting a block limit in place.

 Also, I did expand the docs to fully cover your practical use case of a
 table_rewrite Event Trigger implementing such a table rewrite policy.

That looks complete, very useful and well documented.

I'm looking to commit this tomorrow.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-11-15 Thread Peter Geoghegan
On Wed, Nov 12, 2014 at 12:59 PM, Robert Haas robertmh...@gmail.com wrote:
 On that topic, I think there's unanimous consensus against the design
 where equally-distant matches are treated differently based on whether
 they are in the same RTE or different RTEs.  I think you need to
 change that if you want to get anywhere with this.  On a related note,
 the use of the additional parameter AttrNumber closest[2] to
 searchRangeTableForCol() and of the additional parameters AttrNumber
 *matchedatt and int *distance to scanRTEForColumn() is less than
 self-documenting.  I suggest creating a structure called something
 like FuzzyAttrMatchState and passing a pointer to it down to both
 functions.

Attached patch incorporates this feedback.

The only user-visible difference between this revision and the
previous revision is that it's quite possible for two suggestion to
originate from the same RTE (there is exactly one change in the
regression test's expected output as compared to the last revision for
this reason. The regression tests are otherwise unchanged). It's still
not possible to see more than 2 suggestions under any circumstances,
no matter where they might have originated from, which I think is
appropriate -- we continue to not present any HINT in the event of 3
or more equidistant matches.

I think that the restructuring required to pass around a state
variable has resulted in somewhat clearer code.

-- 
Peter Geoghegan
From 0aef5253f10ebb1ee5bbcc73782eff1352c7ab84 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan p...@heroku.com
Date: Wed, 12 Nov 2014 15:31:37 -0800
Subject: [PATCH] Levenshtein distance column HINT

Add a new HINT -- a guess as to what column the user might have intended
to reference, to be shown in various contexts where an
ERRCODE_UNDEFINED_COLUMN error is raised.  The user will see this HINT
when he or she fat-fingers a column reference in an ad-hoc SQL query, or
incorrectly pluralizes or fails to pluralize a column reference, or
incorrectly omits or includes an underscore or other punctuation
character.

The HINT suggests a column in the range table with the lowest
Levenshtein distance, or the tied-for-best pair of matching columns in
the event of there being exactly two equally likely candidates (these
may come from multiple RTEs, or the same RTE).  Limiting to two the
number of cases where multiple equally likely suggestions are all
offered at once (i.e.  giving no hint when the number of equally likely
candidates exceeds two) is a measure against suggestions that are of low
quality in an absolute sense.

A further, final measure is taken against suggestions that are of low
absolute quality:  If the distance exceeds a normalized distance
threshold, no suggestion is given.
---
 src/backend/parser/parse_expr.c   |   9 +-
 src/backend/parser/parse_func.c   |   2 +-
 src/backend/parser/parse_relation.c   | 345 +++---
 src/backend/utils/adt/levenshtein.c   |   9 +
 src/include/parser/parse_relation.h   |  20 +-
 src/test/regress/expected/alter_table.out |   8 +
 src/test/regress/expected/join.out|  39 
 src/test/regress/expected/plpgsql.out |   1 +
 src/test/regress/expected/rowtypes.out|   1 +
 src/test/regress/expected/rules.out   |   1 +
 src/test/regress/expected/without_oid.out |   2 +
 src/test/regress/sql/join.sql |  24 +++
 12 files changed, 421 insertions(+), 40 deletions(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 4a8aaf6..a77a3a0 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -621,7 +621,8 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
 colname = strVal(field2);
 
 /* Try to identify as a column of the RTE */
-node = scanRTEForColumn(pstate, rte, colname, cref-location);
+node = scanRTEForColumn(pstate, rte, colname, cref-location,
+		NULL);
 if (node == NULL)
 {
 	/* Try it as a function call on the whole row */
@@ -666,7 +667,8 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
 colname = strVal(field3);
 
 /* Try to identify as a column of the RTE */
-node = scanRTEForColumn(pstate, rte, colname, cref-location);
+node = scanRTEForColumn(pstate, rte, colname, cref-location,
+		NULL);
 if (node == NULL)
 {
 	/* Try it as a function call on the whole row */
@@ -724,7 +726,8 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref)
 colname = strVal(field4);
 
 /* Try to identify as a column of the RTE */
-node = scanRTEForColumn(pstate, rte, colname, cref-location);
+node = scanRTEForColumn(pstate, rte, colname, cref-location,
+		NULL);
 if (node == NULL)
 {
 	/* Try it as a function call on the whole row */
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 9ebd3fd..472e15e 100644
--- a/src/backend/parser/parse_func.c
+++ 

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-15 Thread Alvaro Herrera
Amit Kapila wrote:
 On Sat, Nov 15, 2014 at 12:03 AM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:
 
  Amit Kapila wrote:

  I think symlink_label isn't a very good name.  This file is not a label
  in the sense that backup_label is; it seems more a catalog to me.  And
  it's not, in essence, about symlinks either, but rather about
  tablespaces.  I would name it following the term tablespace catalog or
  some variation thereof.
 
 This file is going to provide the symlink path for each tablespace, so
 it not be bad to have that in file name.  I agree with you that it's more
 about tablespaces.  So how about:
 
 tablespace_symlink
 symlink_tablespace
 tablespace_info

I think the fact that we use symlinks is an implementation detail;
aren't them actually junction points, not symlinks, in some Windows
cases?  The The pg_tablespace catalog uses (or used to use)
spclocation for this, not spcsymlink.

  One use case mentioned upthread is having the clone be created in the
  same machine as the source server.  Does your proposal help with it?
 
 Sorry, but I am not getting which proposal exactly you are referring here,
 Could you explain in more detail?

In the first message of this thread[1], Noah said:

: A pg_basebackup -Fp running on the same system as the target cluster will
: fail in the presence of tablespaces; it would backup each tablespace to its
: original path, and those paths are in use locally for the very originals we're
: copying.

 In general, if user took the backup (in tar format) using pg_basebackup,
 this
 patch will be able to restore such a backup even on the same server.

I must be misunderstanding either you or Noah.

[1] 
http://www.postgresql.org/message-id/20130801161519.ga334...@tornado.leadboat.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-15 Thread Amit Kapila
On Sun, Nov 16, 2014 at 6:15 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:
 Amit Kapila wrote:
  On Sat, Nov 15, 2014 at 12:03 AM, Alvaro Herrera 
alvhe...@2ndquadrant.com
  wrote:
  
   Amit Kapila wrote:

   I think symlink_label isn't a very good name.  This file is not a
label
   in the sense that backup_label is; it seems more a catalog to me.
And
   it's not, in essence, about symlinks either, but rather about
   tablespaces.  I would name it following the term tablespace catalog
or
   some variation thereof.
 
  This file is going to provide the symlink path for each tablespace, so
  it not be bad to have that in file name.  I agree with you that it's
more
  about tablespaces.  So how about:
 
  tablespace_symlink
  symlink_tablespace
  tablespace_info

 I think the fact that we use symlinks is an implementation detail;
 aren't them actually junction points, not symlinks, in some Windows
 cases?

Right, but they provide same functionality as symlinks and now we
are even planing to provide this feature for both linux and windows as
both Tom and Robert seems to feel, it's better that way.  Anyhow,
I think naming any entity generally differs based on individual's
perspective, so we can go with the name which appeals to more people.
In case, nobody else has any preference, I will change it to what both
of us can agree upon (either 'tablespace catalog', 'tablespace_info' ...).


   One use case mentioned upthread is having the clone be created in the
   same machine as the source server.  Does your proposal help with it?
 
  Sorry, but I am not getting which proposal exactly you are referring
here,
  Could you explain in more detail?

 In the first message of this thread[1], Noah said:

 : A pg_basebackup -Fp running on the same system as the target cluster
will
 : fail in the presence of tablespaces; it would backup each tablespace to
its
 : original path, and those paths are in use locally for the very
originals we're
 : copying.


That use case got addressed with -T option with which user can relocate
tablespace directory (Commit: fb05f3c;  pg_basebackup: Add support for
relocating tablespaces)

  In general, if user took the backup (in tar format) using pg_basebackup,
  this
  patch will be able to restore such a backup even on the same server.

 I must be misunderstanding either you or Noah.


Does the above information addressed your question?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] printing table in asciidoc with psql

2014-11-15 Thread Pavel Stehule
Hi

I can fix reported bugs today or tomorrow

Regards

Pavel

2014-11-14 21:00 GMT+01:00 Szymon Guz mabew...@gmail.com:


 On 14 November 2014 20:57, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:

 Is anyone going to submit a new version of this patch?




 Hi Alvaro,
 due to family issues I will not be able to work on it for the next 10 days.

 regards,
 Szymon



Re: [HACKERS] New Event Trigger: table_rewrite

2014-11-15 Thread Michael Paquier
On Sun, Nov 16, 2014 at 8:57 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 7 November 2014 12:35, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 It would be more useful to work on the applications of this

 1. INSERT into a table
 * Action start time
 * Schema
 * Tablename
 * Number of blocks in table
 which would then allow you to do these things run an assessment report
 showing which tables would be rewritten.

 That should be done by the user, from within his Event Trigger code. For
 that to be possible, the previous patch was missing a way to expose the
 OID of the table being rewritten, I've now added support for that.

 2. Get access to number of blocks, so you could limit rewrites only to
 smaller tables by putting a block limit in place.

 Also, I did expand the docs to fully cover your practical use case of a
 table_rewrite Event Trigger implementing such a table rewrite policy.

 That looks complete, very useful and well documented.

 I'm looking to commit this tomorrow.
Patch applies, with many hunks though. Patch and documentation compile
without warnings, passing make check-world.

Some comments:
1) This patch is authorizing VACUUM and CLUSTER to use the event
triggers ddl_command_start and ddl_command_end, but aren't those
commands actually not DDLs but control commands?
2) The documentation of src/sgml/event-trigger.sgml can be improved,
particularly I think that the example function should use a maximum of
upper-case letters for reserved keywords, and also this bit:
you're not allowed to rewrite the table foo
should be rewritten to something like that:
Rewrite of table foo not allowed
3) A typo, missing a plural:
provides two built-in event trigger helper functionS
4) pg_event_trigger_table_rewrite_oid is able to return only one OID,
which is the one of the table being rewritten, and it is limited to
one OID because VACUUM, CLUSTER and ALTER TABLE can only run on one
object at the same time in a single transaction. What about thinking
that we may have in the future multiple objects rewritten in a single
transaction, hence multiple OIDs could be fetched?
5) parsetree is passed to cluster_rel only for
EventTriggerTableRewrite. I am not sure if there are any extension
using cluster_rel as is but wouldn't it make more sense to call
EventTriggerTableRewrite before the calls to cluster_rel instead? ISTM
that this patch is breaking cluster_rel way of doing things.
6) in_table_rewrite seems unnecessary.
 typedef struct EventTriggerQueryState
 {
slist_head  SQLDropList;
boolin_sql_drop;
+   boolin_table_rewrite;
+   Oid tableOid;
We could simplify that by renaming tableOid to rewriteTableOid or
rewriteObjOid and check if its value is InvalidOid to determine if the
event table_rewrite is in use or not. Each code path setting those
variables sets them all the time similarly:
+   state-in_table_rewrite = false;
+   state-tableOid = InvalidOid;
And if tableOid is InvaliOid, in_table_rewrite is false. If it is a
valid Oid, in_table_rewrite is set to true.
7) table_rewrite is kicked in ALTER TABLE only when ATRewriteTables is
used. The list of commands that actually go through this code path
should be clarified in the documentation IMO to help the user
apprehend this function.

Note that this patch has been submitted but there have been no real
discussion around it.. This seems a bit too fast to commit it, no?
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers