Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.
On Sat, Nov 15, 2014 at 3:42 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-11-15 03:25:16 +0900, Fujii Masao wrote: On Fri, Nov 14, 2014 at 7:22 PM, furu...@pm.nttdata.co.jp wrote: pg_ctl stop does't work propley, if --slot option is specified when WAL is flushed only it has switched. These processes still continue even after the posmaster failed:pg_receivexlog, walsender and logger. I could reproduce this problem. At normal shutdown, walsender keeps waiting for the last WAL record to be replicated and flushed in pg_receivexlog. But pg_receivexlog issues sync command only when WAL file is switched. Thus, since pg_receivexlog may never flush the last WAL record, walsender may have to keep waiting infinitely. Right. It is surprising that nobody complained about that before, pg_receivexlog has been released two years ago. pg_recvlogical handles this problem by calling fsync() when it receives the request of immediate reply from the server. That is, at shutdown, walsender sends the request, pg_receivexlog receives it, flushes the last WAL record, and sends the flush location back to the server. Since walsender can see that the last WAL record is successfully flushed in pg_receivexlog, it can exit cleanly. One idea to the problem is to introduce the same logic as pg_recvlogical has, to pg_receivexlog. Thought? Sounds sane to me. Are you looking into doing that? Yep, sounds a good thing to do if master requested answer from the client in the keepalive message. Something like the patch attached would make the deal. -- Michael diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c index c6c90fb..b93d52b 100644 --- a/src/bin/pg_basebackup/receivelog.c +++ b/src/bin/pg_basebackup/receivelog.c @@ -149,6 +149,34 @@ open_walfile(XLogRecPtr startpoint, uint32 timeline, char *basedir, } /* + * Flush the current WAL file to disk and update the last WAL flush + * positions as well as the last fsync timestamp. On failure, print + * a message to stderr and return false, otherwise return true. + */ +static bool +fsync_walfile(XLogRecPtr blockpos, int64 timestamp) +{ + /* nothing to do if no WAL file open */ + if (walfile == -1) + return true; + + /* nothing to do if data has already been flushed */ + if (blockpos = lastFlushPosition) + return true; + + if (fsync(walfile) != 0) + { + fprintf(stderr, _(%s: could not fsync file \%s\: %s\n), +progname, current_walfile_name, strerror(errno)); + return false; + } + + lastFlushPosition = blockpos; + last_fsync = timestamp; + return true; +} + +/* * Close the current WAL file (if open), and rename it to the correct * filename if it's complete. On failure, prints an error message to stderr * and returns false, otherwise returns true. @@ -787,21 +815,12 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline, * If fsync_interval has elapsed since last WAL flush and we've written * some WAL data, flush them to disk. */ - if (lastFlushPosition blockpos - walfile != -1 - ((fsync_interval 0 - feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) || - fsync_interval 0)) + if ((fsync_interval 0 + feTimestampDifferenceExceeds(last_fsync, now, fsync_interval)) || + fsync_interval 0) { - if (fsync(walfile) != 0) - { -fprintf(stderr, _(%s: could not fsync file \%s\: %s\n), - progname, current_walfile_name, strerror(errno)); + if (!fsync_walfile(blockpos, now)) goto error; - } - - lastFlushPosition = blockpos; - last_fsync = now; } /* @@ -999,7 +1018,6 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len, { int pos; bool replyRequested; - int64 now; /* * Parse the keepalive message, enclosed in the CopyData message. @@ -1021,7 +1039,12 @@ ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len, /* If the server requested an immediate reply, send one. */ if (replyRequested still_sending) { - now = feGetCurrentTimestamp(); + int64 now = feGetCurrentTimestamp(); + + /* fsync data, so a fresh flush position is sent */ + if (!fsync_walfile(blockpos, now)) + return false; + if (!sendFeedback(conn, blockpos, now, false)) return false; *last_status = now; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 15 November 2014 04:32, Steve Singer st...@ssinger.info wrote: The use cases I'm talking about aren't really replication related. Often I have come across systems that want to do something such as 'select * from orders where X the_last_row_I_saw order by X' and then do further processing on the order. Yes, existing facilities provide mechanisms for different types of application change queues. If you want to write a processing queue in SQL, that isn't the best way. You'll need some way to keep track of whether or not its been successfully processed. That's either a column in the table, or a column in a queue table maintained by triggers, with the row write locked on read. You can then have multiple readers from this queue using the new SKIP LOCKED feature, which was specifically designed to facilitate that. Logical decoding was intended for much more than just replication. It provides commit order access to changed data in a form that is both usable and efficient for high volume applicatiion needs. I don't see any reason to add LSN into a SLRU updated at commit to support those application needs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On Fri, Nov 14, 2014 at 1:18 PM, Andres Freund and...@2ndquadrant.com wrote: I think it's a good idea to structure independent features in a way that other solutions can reuse them. But I sure as hell can't force them to use it - especially as there's unfortunately not too much development going on in the existing logical replication solutions for postgres. Btw, I really think there's quite legitimate use cases for this besides logical replication solutions - e.g. schema tracking is quite a sensible use case. Well, as I already said, despite my doubts about the general utility of this feature, I'm willing to see us take it IF we have a testing framework that will reliably catch bugs, including bugs of omission. Without that, I'm very confident it's going to be a maintenance nightmare, and I believe you admitted yourself that that concern was reasonable. Can you please hint at some other workable design? I don't think there really is anything else. I think this really depends on what you mean by anything else. Any DDL replication solution is necessarily going to involve the following steps: 1. Define some set of primitives such that any imaginable DDL operation can be expressed as a series of those primitives. 2. Find a way to capture those events as they happen. 3. Serialize them into some wire format and transport that format to the replica. 4. Apply them, possibly coordinating in some way with the master so that the user's original request fails if the apply fails. There are meaningful choices at every step. You're proposing that the primitives should be anything that can be expressed as a complete SQL command against a single object (I think - what are you going to emit for an ALL IN SCHEMA op - that thing itself, or a similar operation against each object in the schema?); that the capture mechanism should be an event trigger that inserts into a queue table; that the serialization format should be a JSON language designed to allow reassembly of the corresponding SQL statement; and that the apply coordination mechanism should be 2PC. But none of that is written in stone. As far as deciding what primitives to use, I think the principal decision to be made here is as to the appropriate level of granularity. For example, CREATE TABLE foo (a int DEFAULT 1, b int, CHECK (b 42)) could be emitted as a single event saying that a table was created. But it could also be emitted as create-table (foo), add-column (foo, a, int), add-column (foo, b, int), add-column-default (a, 1), add-check-constraint (foo, b 42). The appeal of a more fine-grained set of primitives is that there might be fewer of them, and that each of them might be simpler; one of the things that makes physical replication so reliable is that its primitives are very simple and thus easy to verify. The possible event-capture mechanisms seem to be to have either (a) event trigger or (b) a hook function in some yet-to-be-defined place or (c) core code which will either (i) write each event to a table, (ii) write each event directly into WAL, or perhaps (iii) write it someplace else (file? in-memory queue? network socket?). There are lots of possible serialization formats. Coordinating with the master could involve 2PC, as you propose; or trying to somehow validate that a given series of events is a valid state transformation based on the starting state on the standby before doing the operation on the master; or the use of a distributed transaction coordinated by something like PG-XC's global transaction manager; or you can skip it and hope for the best. In addition to the decisions above, you can try to prevent failures by restricting certain changes from happening, or you can let users change what they like and hope for the best. Different solutions can have different mechanisms for controlling which objects are under replication and which changes are not; or even allowing some individual DDL statements to opt out of replication while forcing others to participate. Administratively, solutions can be built to facilitate easy replication of an entire database to another node, or more specific applications like sharding, where creating a table on a master node creates child tables on a bunch of slave nodes, but they're not all identical, because we're partitioning the data so that only some of it will be on each node - thus the constraints and so on will be different. BDR has one set of answers to all of these questions, and despite my worries about a few points here and there, they are not stupid answers. But they are not the ONLY answers either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failback to old master
On 12/11/14 14:28, Ants Aasma wrote: On Tue, Nov 11, 2014 at 11:52 PM, Maeldron T. maeld...@gmail.com wrote: As far as I remember (I can’t test it right now but I am 99% sure) promoting the slave makes it impossible to connect the old master to the new one without making a base_backup. The reason is the timeline change. It complains. A safely shut down master (-m fast is safe) can be safely restarted as a slave to the newly promoted master. Fast shutdown shuts down all normal connections, does a shutdown checkpoint and then waits for this checkpoint to be replicated to all active streaming clients. Promoting slave to master creates a timeline switch, that prior to version 9.3 was only possible to replicate using the archive mechanism. As of version 9.3 you don't need to configure archiving to follow timeline switches, just add a recovery.conf to the old master to start it up as a slave and it will fetch everything it needs from the new master. I took your advice and I understood that removing the recovery.conf followed by a restart is wrong. I will not do that on my production servers. However, I can't make it work with promotion. What did I wrong? It was 9.4beta3. mkdir 1 mkdir 2 initdb -D 1/ edit config: change port, wal_level to hot_standby, hot_standby to on, max_wal_senders=7, wal_keep_segments=100, uncomment replication in hba.conf pg_ctl -D 1/ start createdb -p 5433 psql -p 5433 pg_basebackup -p 5433 -R -D 2/ mcedit 2/postgresql.conf change port chmod -R 700 1 chmod -R 700 2 pg_ctl -D 2/ start psql -p 5433 psql -p 5434 everything works pg_ctl -D 1/ stop pg_ctl -D 2/ promote psql -p 5434 cp 2/recovery.done 1/recovery.conf mcedit 1/recovery.conf change port pg_ctl -D 1/ start LOG: replication terminated by primary server DETAIL: End of WAL reached on timeline 1 at 0/3000AE0. LOG: restarted WAL streaming at 0/300 on timeline 1 LOG: replication terminated by primary server DETAIL: End of WAL reached on timeline 1 at 0/3000AE0. This is what I experienced in the past when I tried with promote. The old master disconnects from the new. What am I missing? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: multivariate statistics / proof of concept
Tomas Vondra t...@fuzzy.cz wrote: Dne 13 Listopad 2014, 16:51, Katharina Büchse napsal(a): On 13.11.2014 14:11, Tomas Vondra wrote: The only place where I think this might work are the associative rules. It's simple to specify rules like (ZIP code implies city) and we could even do some simple check against the data to see if it actually makes sense (and 'disable' the rule if not). and even this simple example has its limits, at least in Germany ZIP codes are not unique for rural areas, where several villages have the same ZIP code. as you point out most real-world data either contain bugs or are inherently imperfect (we have the same kind of ZIP/city inconsistencies in Czech). You can have lots of fun with U.S. zip code, too. Just on the nominally Madison, Wisconsin zip codes (those starting with 537), there are several exceptions: select zipcode, city, locationtype from zipcode where zipcode like '537%' and Decommisioned = 'false' and zipcodetype = 'STANDARD' and locationtype in ('PRIMARY', 'ACCEPTABLE') order by zipcode, city; zipcode | city | locationtype -+---+-- 53703 | MADISON | PRIMARY 53704 | MADISON | PRIMARY 53705 | MADISON | PRIMARY 53706 | MADISON | PRIMARY 53711 | FITCHBURG | ACCEPTABLE 53711 | MADISON | PRIMARY 53713 | FITCHBURG | ACCEPTABLE 53713 | MADISON | PRIMARY 53713 | MONONA | ACCEPTABLE 53714 | MADISON | PRIMARY 53714 | MONONA | ACCEPTABLE 53715 | MADISON | PRIMARY 53716 | MADISON | PRIMARY 53716 | MONONA | ACCEPTABLE 53717 | MADISON | PRIMARY 53718 | MADISON | PRIMARY 53719 | FITCHBURG | ACCEPTABLE 53719 | MADISON | PRIMARY 53725 | MADISON | PRIMARY 53726 | MADISON | PRIMARY 53744 | MADISON | PRIMARY (21 rows) If you eliminate the quals besides the zipcode column you get 61 rows and it gets much stranger, with legal municipalities that are completely surrounded by Madison that the postal service would rather you didn't use in addressing your envelopes, but they have to deliver to anyway, and organizations inside Madison receiving enough mail to (literally) have their own zip code -- where the postal service allows the organization name as a deliverable city. If you want to have your own fun with this data, you can download it here: http://federalgovernmentzipcodes.us/free-zipcode-database.csv I was able to load it into PostgreSQL with this: create table zipcode ( recordnumber integer not null, zipcode text not null, zipcodetype text not null, city text not null, state text not null, locationtype text not null, lat double precision, long double precision, xaxis double precision not null, yaxis double precision not null, zaxis double precision not null, worldregion text not null, country text not null, locationtext text, location text, decommisioned text not null, taxreturnsfiled bigint, estimatedpopulation bigint, totalwages bigint, notes text ); comment on column zipcode.zipcode is 'Zipcode or military postal code(FPO/APO)'; comment on column zipcode.zipcodetype is 'Standard, PO BOX Only, Unique, Military(implies APO or FPO)'; comment on column zipcode.city is 'offical city name(s)'; comment on column zipcode.state is 'offical state, territory, or quasi-state (AA, AE, AP) abbreviation code'; comment on column zipcode.locationtype is 'Primary, Acceptable,Not Acceptable'; comment on column zipcode.lat is 'Decimal Latitude, if available'; comment on column zipcode.long is 'Decimal Longitude, if available'; comment on column zipcode.location is 'Standard Display (eg Phoenix, AZ ; Pago Pago, AS ; Melbourne, AU )'; comment on column zipcode.decommisioned is 'If Primary location, Yes implies historical Zipcode, No Implies current Zipcode; If not Primary, Yes implies Historical Placename'; comment on column zipcode.taxreturnsfiled is 'Number of Individual Tax Returns Filed in 2008'; copy zipcode from 'filepath' with (format csv, header); alter table zipcode add primary key (recordnumber); create unique index zipcode_city on zipcode (zipcode, city); I bet there are all sorts of correlation possibilities with, for example, latitude and longitude and other variables. With 81831 rows and so many correlations among the columns, it might be a useful data set to test with. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: multivariate statistics / proof of concept
On 15.11.2014 18:49, Kevin Grittner If you eliminate the quals besides the zipcode column you get 61 rows and it gets much stranger, with legal municipalities that are completely surrounded by Madison that the postal service would rather you didn't use in addressing your envelopes, but they have to deliver to anyway, and organizations inside Madison receiving enough mail to (literally) have their own zip code -- where the postal service allows the organization name as a deliverable city. If you want to have your own fun with this data, you can download it here: http://federalgovernmentzipcodes.us/free-zipcode-database.csv ... I bet there are all sorts of correlation possibilities with, for example, latitude and longitude and other variables. With 81831 rows and so many correlations among the columns, it might be a useful data set to test with. Thanks for the link. I've been looking for a good dataset with such data, and this one is by far the best one. The current version of the patch supports only data types passed by value (i.e. no varlena types - text, ), which means it's impossible to build multivariate stats on some of the interesting columns (state, city, ...). I guess it's time to start working on removing this limitation. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED
On Sat, Nov 15, 2014 at 2:23 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Michael Paquier wrote: Btw, perhaps this diff should be pushed as a different patch as this is a rather different thing: - if (heapRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED + if (indexRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED !smgrexists(indexRelation-rd_smgr, INIT_FORKNUM) As this is a correctness fix, it does not seem necessary to back-patch it: the parent relation always has the same persistence as its indexes. There was an argument for doing it this way that only applies if this patch went in, but I can't remember now what it was. Anyway I pushed the patch after some slight additional changes. Thanks! Thanks! -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Patch to support SEMI and ANTI join removal
On 15 October 2014 11:03, David Rowley dgrowle...@gmail.com wrote: The explain analyze from the above query looks like: test=# explain (analyze, costs off, timing off) select count(*) from t1 inner join t2 on t1.t2_id=t2.id; QUERY PLAN -- Aggregate (actual rows=1 loops=1) - Nested Loop (actual rows=100 loops=1) - Seq Scan on t1 (actual rows=100 loops=1) - Index Only Scan using t2_pkey on t2 (never executed) Index Cond: (id = t1.t2_id) Heap Fetches: 0 Execution time: 124.990 ms (7 rows) As you can see the scan on t2 never occurred. Very good, happy to see this happening (yay FKs!) and with PostgreSQL-style rigour. I've reviewed the patch from cold and I have a few comments. The plan you end up with here works quite differently from left outer join removal, where the join is simply absent. That inconsistency causes most of the other problems I see. I propose that we keep track of whether there are any potentially skippable joins at the top of the plan. When we begin execution we do a single if test to see if there is run-time work to do. If we pass the run-time tests we then descend the tree and prune the plan to completely remove unnecessary nodes. We end with an EXPLAIN and EXPLAIN ANALYZE that looks like this QUERY PLAN -- Aggregate (actual rows=1 loops=1) - Seq Scan on t1 (actual rows=100 loops=1) Doing that removes all the overheads and complexity; it also matches how join removal currently works. The alternative is accepting some pretty horrible additional code in most join types, plus a small regression on nested loop joins which I would have to call out as regrettably unacceptable. (Horrible in this sense that we don't want that code, not that David's code is poor). The tests on the patch are pretty poor. If we should use EXPLAINs to show a join removal that works and a join removal that fails. With a few of the main permutations. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On 16 October 2014 02:26, Jeff Davis pg...@j-davis.com wrote: The inheritance is awkward anyway, though. If you create a tracked context as a child of an already-tracked context, allocations in the newer one won't count against the original. I don't see a way around that without introducing even more performance problems. This seems to have reached impasse, which is a shame. Let me throw out a few questions to see if that might help. Do I understand correctly that we are trying to account for exact memory usage at palloc/pfree time? Why?? Surely we just want to keep track of the big chunks? Does this need to be exact? How exact does it need to be? Or alternatively, can't we just sample the allocations to reduce the overhead? Are there some assumptions we can make about micro-contexts never needing to be tracked at all? Jeff seems not too bothered by inheritance, whereas Tomas thinks its essential. Perhaps there is a middle ground, depending upon the role of the context? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
On 16 October 2014 20:31, Michael Banck michael.ba...@credativ.de wrote: I'll attach it to the next commitfest and see whether anybody likes it. Not much... We may decide we wanted to always-log shutdown checkpoints. I'm neutral about that, but I can see the logic. But if we did, we would use exactly the same log message as if log_checkpoints = on. Having two different message texts is just confusing. I don't see the point of logging waiting for clients to disconnect since we might not wait very long. We do already log the type of shutdown received. A few things seem worth pursuing... * Logging a message that shows how many buffers need to be written for a shutdown checkpoint. (We might even store some info that allows us to estimate a time to shutdown, later). At present the starting: checkpoint shutdown isn't much use. I can see we could split CheckpointGuts() into two, so we mark the buffers first, then report how many there are to write, then go back to do the writing and fsync in part two. * Introducing a new shutdown mode that is genuinely smart. We send all backends a signal that will disconnect them *only* if they are idle and not in a transaction. I've never thought the current smart mode deserved its name. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PgBench's \setrandom could be better
Hi, I've just been looking into the TPC-H benchmark with intention to put together some scripts which can be run easily to output some run times for each of the 22 queries. I've not had a great deal of exposure to pgbench yet, but I had thought that it might be able to help me run these queries with the randomly chosen inputs that are described in the TPC-H spec. From the spec under 2.4.6.3: 1. DATE is the first of January of a randomly selected year within [1993 .. 1997]; 2. DISCOUNT is randomly selected within [0.02 .. 0.09]; 3. QUANTITY is randomly selected within [24 .. 25]. With pgbench I can do: \setrandom qty 24 25 then throw a :qty into the query and have pgbench replace that with either 24 or 25. The problem is that this does not work for anything apart from integer types. I also can't seem to do anything with the date type. Perhaps it would be nice to be able to do something like: \setrandom date '1993-01-01' '1994-01-01' '1995-01-01' '1996-01-01' '1997-01-01' But this would be ambiguous if I wanted to choose either from either 10 or 20 as: \setrandom var 10 20 would choose a random number between 10 and 20, not 10 or 20. I would imagine that code could be added to allow the random to choose a floating point number, if one or more of the inputs had a decimal point in them, but I can't see around how to make it work with lists without changing the syntax. Perhaps [between] could become a optional keyword in there so that we could do: \setrandom var between 0 10 And for lists it could work like: \setrandom var list 'a' 'b' 'c' There should be no conflicts with the original syntax, as as far as I'm aware only an integer number can follow the variable name. I was planning on putting something together to help me with this, but I thought I'd post here first so that I might actually end up with something useful at the end of it. So my questions are: 1. Would anyone else find this useful? 2. Is my proposed syntax ok? Can you think of anything better or more flexible? Regards David Rowley
Re: [HACKERS] controlling psql's use of the pager a bit more
On 11/13/2014 11:41 AM, Andrew Dunstan wrote: On 11/13/2014 11:09 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: I often get annoyed because psql is a bit too aggressive when it decides whether to put output through the pager, and the only way to avoid this is to turn the pager off (in which case your next query might dump many thousands of lines to the screen). I'd like a way to be able to specify a minumum number of lines of output before psql would invoke the pager, rather than just always using the terminal window size. Are you saying you'd want to set the threshold to *more* than the window height? Why? Because I might be quite happy with 100 or 200 lines I can just scroll in my terminal's scroll buffer, but want to use the pager for more than that. This is useful especially if I want to scroll back and see the results from a query or two ago. This patch shows more or less what I had in mind. However, there is more work to do. As Tom noted upthread, psql's calculation of the number of lines is pretty bad. For example, if I do: \pset pager_min_lines 100 select * from generate_series(1,50); the pager still gets invoked, which is unfortunate to say the least. So I'm going to take a peek at that. cheers andrew diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 26089352..b16149a 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -786,7 +786,7 @@ exec_command(const char *cmd, opt[--len] = '\0'; } - helpSQL(opt, pset.popt.topt.pager); + helpSQL(opt, pset.popt.topt.pager, pset.popt.topt.pager_min_lines); free(opt); } @@ -1053,7 +1053,8 @@ exec_command(const char *cmd, static const char *const my_list[] = { border, columns, expanded, fieldsep, fieldsep_zero, footer, format, linestyle, null, -numericlocale, pager, recordsep, recordsep_zero, +numericlocale, pager, pager_min_lines, +recordsep, recordsep_zero, tableattr, title, tuples_only, unicode_border_linestyle, unicode_column_linestyle, @@ -1252,7 +1253,8 @@ exec_command(const char *cmd, lines++; } -output = PageOutput(lineno, pset.popt.topt.pager); +output = PageOutput(lineno, pset.popt.topt.pager, + pset.popt.topt.pager_min_lines); is_pager = true; } else @@ -1504,13 +1506,13 @@ exec_command(const char *cmd, OT_NORMAL, NULL, false); if (!opt0 || strcmp(opt0, commands) == 0) - slashUsage(pset.popt.topt.pager); + slashUsage(pset.popt.topt.pager, pset.popt.topt.pager_min_lines); else if (strcmp(opt0, options) == 0) - usage(pset.popt.topt.pager); + usage(pset.popt.topt.pager, pset.popt.topt.pager_min_lines); else if (strcmp(opt0, variables) == 0) - helpVariables(pset.popt.topt.pager); + helpVariables(pset.popt.topt.pager, pset.popt.topt.pager_min_lines); else - slashUsage(pset.popt.topt.pager); + slashUsage(pset.popt.topt.pager, pset.popt.topt.pager_min_lines); } #if 0 @@ -2506,6 +2508,13 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt-topt.pager = 1; } + /* set minimum lines for pager use */ + else if (strcmp(param, pager_min_lines) == 0) + { + if (value) + popt-topt.pager_min_lines = atoi(value); + } + /* disable (x rows) footer */ else if (strcmp(param, footer) == 0) { @@ -2627,6 +2636,13 @@ printPsetInfo(const char *param, struct printQueryOpt *popt) printf(_(Pager usage is off.\n)); } + /* show minimum lines for pager use */ + else if (strcmp(param, pager_min_lines) == 0) + { + printf(_(Pager won't be used for less than %d lines\n), + popt-topt.pager_min_lines); + } + /* show record separator for unaligned text */ else if (strcmp(param, recordsep) == 0) { @@ -2779,6 +2795,8 @@ pset_value_string(const char *param, struct printQueryOpt *popt) return pstrdup(pset_bool_string(popt-topt.numericLocale)); else if (strcmp(param, pager) == 0) return psprintf(%d, popt-topt.pager); + else if (strcmp(param, pager_min_lines) == 0) + return psprintf(%d, popt-topt.pager_min_lines); else if (strcmp(param, recordsep) == 0) return pset_quoted_string(popt-topt.recordSep.separator ? popt-topt.recordSep.separator diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 66d80b5..91102f4 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -1337,7 +1337,8 @@ ExecQueryUsingCursor(const char *query, double *elapsed_msec) * If query requires multiple result sets, hack to ensure that * only one pager instance is used for the whole mess */ - pset.queryFout = PageOutput(10, my_popt.topt.pager); + pset.queryFout = PageOutput(10, my_popt.topt.pager, +my_popt.topt.pager_min_lines); did_pager = true; } diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index ae5fe88..bb166b9 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -46,7 +46,7 @@ #define ON(var) (var ? _(on) : _(off))
Re: [HACKERS] Patch to support SEMI and ANTI join removal
On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs si...@2ndquadrant.com wrote: On 15 October 2014 11:03, David Rowley dgrowle...@gmail.com wrote: The explain analyze from the above query looks like: test=# explain (analyze, costs off, timing off) select count(*) from t1 inner join t2 on t1.t2_id=t2.id; QUERY PLAN -- Aggregate (actual rows=1 loops=1) - Nested Loop (actual rows=100 loops=1) - Seq Scan on t1 (actual rows=100 loops=1) - Index Only Scan using t2_pkey on t2 (never executed) Index Cond: (id = t1.t2_id) Heap Fetches: 0 Execution time: 124.990 ms (7 rows) As you can see the scan on t2 never occurred. Very good, happy to see this happening (yay FKs!) and with PostgreSQL-style rigour. :) I've reviewed the patch from cold and I have a few comments. Thanks! The plan you end up with here works quite differently from left outer join removal, where the join is simply absent. That inconsistency causes most of the other problems I see. I propose that we keep track of whether there are any potentially skippable joins at the top of the plan. When we begin execution we do a single if test to see if there is run-time work to do. If we pass the run-time tests we then descend the tree and prune the plan to completely remove unnecessary nodes. We end with an EXPLAIN and EXPLAIN ANALYZE that looks like this QUERY PLAN -- Aggregate (actual rows=1 loops=1) - Seq Scan on t1 (actual rows=100 loops=1) Doing that removes all the overheads and complexity; it also matches how join removal currently works. This sounds much cleaner than what I have at the moment, although, you say EXPLAIN would look like that... I don't think that's quite true as the EXPLAIN still would have the un-pruned version, as the pruning would be done as executor start-up. Would it cause problems to have the EXPLAIN have a different looking plan than EXPLAIN ANALYZE? I'll need to look into how the plan is stored in the case of PREPARE statements, as no doubt I can't go vandalising any plans that are stored in the PREPARE hashtable. I'd need to make a copy first, unless that's already done for me. But I guess I'd only have to do that if some flag on PlannerInfo hasSkippableNodes was true, so likely the overhead of such a copy would be regained by skipping some joins. The alternative is accepting some pretty horrible additional code in most join types, plus a small regression on nested loop joins which I would have to call out as regrettably unacceptable. (Horrible in this sense that we don't want that code, not that David's code is poor). Yeah it is quite horrid. I did try and keep it as simple and as non-invasive as possible, but for nest loop it seemed there was just no better way. The tests on the patch are pretty poor. If we should use EXPLAINs to show a join removal that works and a join removal that fails. With a few of the main permutations. Agreed. To be honest I abandoned the tests due to a problem with EXPLAIN ANALYZE outputting the variable timing information at the bottom. There's no way to disable this! So that makes testing much harder. I added myself to the list of complainers over here - http://www.postgresql.org/message-id/caaphdvovzbtzljbd9vfaznwo6jook1k6-7rfq8zym9h7ndc...@mail.gmail.com but the proposed solution (diff tool which supports regex matching) is not all that simple, and I've not gotten around to attempting to make one yet. I've also attached a rebased patch, as the old one is no longer applying. Regards David Rowley inner_join_removals_2014-11-16_3a40b4f.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve automatic analyze messages for inheritance trees
On 30 October 2014 03:30, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/10/17 18:35), Etsuro Fujita wrote: (2014/10/16 17:17), Simon Riggs wrote: Would it be useful to keep track of how many tables just got analyzed? i.e. analyze of foo (including N inheritance children) I think that's a good idea. So, I'll update the patch. Done. Attached is an updated version of the patch. Thanks for the comment! The patch was kinda ok, but we have deeper problems. If we have a 3 level hierarchy like foo-(p1, p2-(p4), p3) then we still report this pretty strangely LOG: automatic analyze of table postgres.public.p1 system usage: CPU 0.00s/0.00u sec elapsed 0.05 sec LOG: automatic analyze of table postgres.public.foo system usage: CPU 0.00s/0.00u sec elapsed 0.04 sec LOG: automatic analyze of table postgres.public.foo (including 3 inheritance children) system usage: CPU 0.00s/0.01u sec elapsed 0.12 sec LOG: automatic analyze of table postgres.public.p4 system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec notice that p4 is not included as an inheritance child, even though it most surely is. Why is p4 reported, when p1, p2 and p3 are not? and I notice psql reports this incorrectly also postgres=# \d+ foo Table public.foo Column | Type | Modifiers | Storage | Stats target | Description --+-+---+-+--+- ?column? | integer | | plain | | Child tables: p1, p2, p3 No mention of grandchildren... Not your fault, but this patch doesn't sufficiently improve the situation to commit it, yet. Sorry, patch returned with feedback, for now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New Event Trigger: table_rewrite
On 7 November 2014 12:35, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Simon Riggs si...@2ndquadrant.com writes: It would be more useful to work on the applications of this 1. INSERT into a table * Action start time * Schema * Tablename * Number of blocks in table which would then allow you to do these things run an assessment report showing which tables would be rewritten. That should be done by the user, from within his Event Trigger code. For that to be possible, the previous patch was missing a way to expose the OID of the table being rewritten, I've now added support for that. 2. Get access to number of blocks, so you could limit rewrites only to smaller tables by putting a block limit in place. Also, I did expand the docs to fully cover your practical use case of a table_rewrite Event Trigger implementing such a table rewrite policy. That looks complete, very useful and well documented. I'm looking to commit this tomorrow. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
On Wed, Nov 12, 2014 at 12:59 PM, Robert Haas robertmh...@gmail.com wrote: On that topic, I think there's unanimous consensus against the design where equally-distant matches are treated differently based on whether they are in the same RTE or different RTEs. I think you need to change that if you want to get anywhere with this. On a related note, the use of the additional parameter AttrNumber closest[2] to searchRangeTableForCol() and of the additional parameters AttrNumber *matchedatt and int *distance to scanRTEForColumn() is less than self-documenting. I suggest creating a structure called something like FuzzyAttrMatchState and passing a pointer to it down to both functions. Attached patch incorporates this feedback. The only user-visible difference between this revision and the previous revision is that it's quite possible for two suggestion to originate from the same RTE (there is exactly one change in the regression test's expected output as compared to the last revision for this reason. The regression tests are otherwise unchanged). It's still not possible to see more than 2 suggestions under any circumstances, no matter where they might have originated from, which I think is appropriate -- we continue to not present any HINT in the event of 3 or more equidistant matches. I think that the restructuring required to pass around a state variable has resulted in somewhat clearer code. -- Peter Geoghegan From 0aef5253f10ebb1ee5bbcc73782eff1352c7ab84 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan p...@heroku.com Date: Wed, 12 Nov 2014 15:31:37 -0800 Subject: [PATCH] Levenshtein distance column HINT Add a new HINT -- a guess as to what column the user might have intended to reference, to be shown in various contexts where an ERRCODE_UNDEFINED_COLUMN error is raised. The user will see this HINT when he or she fat-fingers a column reference in an ad-hoc SQL query, or incorrectly pluralizes or fails to pluralize a column reference, or incorrectly omits or includes an underscore or other punctuation character. The HINT suggests a column in the range table with the lowest Levenshtein distance, or the tied-for-best pair of matching columns in the event of there being exactly two equally likely candidates (these may come from multiple RTEs, or the same RTE). Limiting to two the number of cases where multiple equally likely suggestions are all offered at once (i.e. giving no hint when the number of equally likely candidates exceeds two) is a measure against suggestions that are of low quality in an absolute sense. A further, final measure is taken against suggestions that are of low absolute quality: If the distance exceeds a normalized distance threshold, no suggestion is given. --- src/backend/parser/parse_expr.c | 9 +- src/backend/parser/parse_func.c | 2 +- src/backend/parser/parse_relation.c | 345 +++--- src/backend/utils/adt/levenshtein.c | 9 + src/include/parser/parse_relation.h | 20 +- src/test/regress/expected/alter_table.out | 8 + src/test/regress/expected/join.out| 39 src/test/regress/expected/plpgsql.out | 1 + src/test/regress/expected/rowtypes.out| 1 + src/test/regress/expected/rules.out | 1 + src/test/regress/expected/without_oid.out | 2 + src/test/regress/sql/join.sql | 24 +++ 12 files changed, 421 insertions(+), 40 deletions(-) diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 4a8aaf6..a77a3a0 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -621,7 +621,8 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref) colname = strVal(field2); /* Try to identify as a column of the RTE */ -node = scanRTEForColumn(pstate, rte, colname, cref-location); +node = scanRTEForColumn(pstate, rte, colname, cref-location, + NULL); if (node == NULL) { /* Try it as a function call on the whole row */ @@ -666,7 +667,8 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref) colname = strVal(field3); /* Try to identify as a column of the RTE */ -node = scanRTEForColumn(pstate, rte, colname, cref-location); +node = scanRTEForColumn(pstate, rte, colname, cref-location, + NULL); if (node == NULL) { /* Try it as a function call on the whole row */ @@ -724,7 +726,8 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref) colname = strVal(field4); /* Try to identify as a column of the RTE */ -node = scanRTEForColumn(pstate, rte, colname, cref-location); +node = scanRTEForColumn(pstate, rte, colname, cref-location, + NULL); if (node == NULL) { /* Try it as a function call on the whole row */ diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index 9ebd3fd..472e15e 100644 --- a/src/backend/parser/parse_func.c +++
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
Amit Kapila wrote: On Sat, Nov 15, 2014 at 12:03 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: I think symlink_label isn't a very good name. This file is not a label in the sense that backup_label is; it seems more a catalog to me. And it's not, in essence, about symlinks either, but rather about tablespaces. I would name it following the term tablespace catalog or some variation thereof. This file is going to provide the symlink path for each tablespace, so it not be bad to have that in file name. I agree with you that it's more about tablespaces. So how about: tablespace_symlink symlink_tablespace tablespace_info I think the fact that we use symlinks is an implementation detail; aren't them actually junction points, not symlinks, in some Windows cases? The The pg_tablespace catalog uses (or used to use) spclocation for this, not spcsymlink. One use case mentioned upthread is having the clone be created in the same machine as the source server. Does your proposal help with it? Sorry, but I am not getting which proposal exactly you are referring here, Could you explain in more detail? In the first message of this thread[1], Noah said: : A pg_basebackup -Fp running on the same system as the target cluster will : fail in the presence of tablespaces; it would backup each tablespace to its : original path, and those paths are in use locally for the very originals we're : copying. In general, if user took the backup (in tar format) using pg_basebackup, this patch will be able to restore such a backup even on the same server. I must be misunderstanding either you or Noah. [1] http://www.postgresql.org/message-id/20130801161519.ga334...@tornado.leadboat.com -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Sun, Nov 16, 2014 at 6:15 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: On Sat, Nov 15, 2014 at 12:03 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Amit Kapila wrote: I think symlink_label isn't a very good name. This file is not a label in the sense that backup_label is; it seems more a catalog to me. And it's not, in essence, about symlinks either, but rather about tablespaces. I would name it following the term tablespace catalog or some variation thereof. This file is going to provide the symlink path for each tablespace, so it not be bad to have that in file name. I agree with you that it's more about tablespaces. So how about: tablespace_symlink symlink_tablespace tablespace_info I think the fact that we use symlinks is an implementation detail; aren't them actually junction points, not symlinks, in some Windows cases? Right, but they provide same functionality as symlinks and now we are even planing to provide this feature for both linux and windows as both Tom and Robert seems to feel, it's better that way. Anyhow, I think naming any entity generally differs based on individual's perspective, so we can go with the name which appeals to more people. In case, nobody else has any preference, I will change it to what both of us can agree upon (either 'tablespace catalog', 'tablespace_info' ...). One use case mentioned upthread is having the clone be created in the same machine as the source server. Does your proposal help with it? Sorry, but I am not getting which proposal exactly you are referring here, Could you explain in more detail? In the first message of this thread[1], Noah said: : A pg_basebackup -Fp running on the same system as the target cluster will : fail in the presence of tablespaces; it would backup each tablespace to its : original path, and those paths are in use locally for the very originals we're : copying. That use case got addressed with -T option with which user can relocate tablespace directory (Commit: fb05f3c; pg_basebackup: Add support for relocating tablespaces) In general, if user took the backup (in tar format) using pg_basebackup, this patch will be able to restore such a backup even on the same server. I must be misunderstanding either you or Noah. Does the above information addressed your question? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] printing table in asciidoc with psql
Hi I can fix reported bugs today or tomorrow Regards Pavel 2014-11-14 21:00 GMT+01:00 Szymon Guz mabew...@gmail.com: On 14 November 2014 20:57, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Is anyone going to submit a new version of this patch? Hi Alvaro, due to family issues I will not be able to work on it for the next 10 days. regards, Szymon
Re: [HACKERS] New Event Trigger: table_rewrite
On Sun, Nov 16, 2014 at 8:57 AM, Simon Riggs si...@2ndquadrant.com wrote: On 7 November 2014 12:35, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Simon Riggs si...@2ndquadrant.com writes: It would be more useful to work on the applications of this 1. INSERT into a table * Action start time * Schema * Tablename * Number of blocks in table which would then allow you to do these things run an assessment report showing which tables would be rewritten. That should be done by the user, from within his Event Trigger code. For that to be possible, the previous patch was missing a way to expose the OID of the table being rewritten, I've now added support for that. 2. Get access to number of blocks, so you could limit rewrites only to smaller tables by putting a block limit in place. Also, I did expand the docs to fully cover your practical use case of a table_rewrite Event Trigger implementing such a table rewrite policy. That looks complete, very useful and well documented. I'm looking to commit this tomorrow. Patch applies, with many hunks though. Patch and documentation compile without warnings, passing make check-world. Some comments: 1) This patch is authorizing VACUUM and CLUSTER to use the event triggers ddl_command_start and ddl_command_end, but aren't those commands actually not DDLs but control commands? 2) The documentation of src/sgml/event-trigger.sgml can be improved, particularly I think that the example function should use a maximum of upper-case letters for reserved keywords, and also this bit: you're not allowed to rewrite the table foo should be rewritten to something like that: Rewrite of table foo not allowed 3) A typo, missing a plural: provides two built-in event trigger helper functionS 4) pg_event_trigger_table_rewrite_oid is able to return only one OID, which is the one of the table being rewritten, and it is limited to one OID because VACUUM, CLUSTER and ALTER TABLE can only run on one object at the same time in a single transaction. What about thinking that we may have in the future multiple objects rewritten in a single transaction, hence multiple OIDs could be fetched? 5) parsetree is passed to cluster_rel only for EventTriggerTableRewrite. I am not sure if there are any extension using cluster_rel as is but wouldn't it make more sense to call EventTriggerTableRewrite before the calls to cluster_rel instead? ISTM that this patch is breaking cluster_rel way of doing things. 6) in_table_rewrite seems unnecessary. typedef struct EventTriggerQueryState { slist_head SQLDropList; boolin_sql_drop; + boolin_table_rewrite; + Oid tableOid; We could simplify that by renaming tableOid to rewriteTableOid or rewriteObjOid and check if its value is InvalidOid to determine if the event table_rewrite is in use or not. Each code path setting those variables sets them all the time similarly: + state-in_table_rewrite = false; + state-tableOid = InvalidOid; And if tableOid is InvaliOid, in_table_rewrite is false. If it is a valid Oid, in_table_rewrite is set to true. 7) table_rewrite is kicked in ALTER TABLE only when ATRewriteTables is used. The list of commands that actually go through this code path should be clarified in the documentation IMO to help the user apprehend this function. Note that this patch has been submitted but there have been no real discussion around it.. This seems a bit too fast to commit it, no? Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers